Wednesday, July 1, 2020

Pandas DataFrame Introduction

Introduction To The Pandas DataFrame 

A Pandas DataFrame is a two-dimensional, size-mutable, and potentially heterogeneous tabular data structure with labelled axes (rows and columns).

A DataFrame can be imagined as a dict-like vessel holding Series objects.

The data is aligned in a tabular format in rows and columns( like SQL tables or Excel spreadsheets). Arithmetic operations can be aligned on both row and column. 

Pandas DataFrame has three basic components, the data, rows, and columns.


A DataFrame can be created using,
pandas.DataFrame(data=None, index: Optional[Collection] = None, 
columns: Optional[Collection] = None, dtype: Optional[Union[str, numpy.dtype, ExtensionDtype]] = None, 
copy: bool = False)


ndarray can be Iterable, dict, or DataFrame, Dict can contain Series, arrays, constants, or list-like 

indexIndex or array-like

Index to apply for the output DataFrame

columnsIndex or array-like

Column labels to use for Output DataFrame.

dtypedtype, default None

The data type of resulting DataFrame. Only one dtype is allowed.


Copy data from inputs, its default value is false.

Creating a Pandas DataFrame 

Generally, a DataFrame is created by loading the datasets from any data storage, such as any SQL Database, a CSV file, or an Excel spreadsheet. However, we can also create a DataFrame from Python list, dict, or from a list of dictionaries etc.

A simple dataframe is created using a list in example given below,

import pandas as pd
# list of strings
mylist = ['Green', 'Orange', 'Red', 'Purple', 
            'Black', 'Cyan', 'Magneta']
#Using the DataFrame constructor
dframe = pd.DataFrame(mylist)

0    Green
1   Orange
2      Red
3   Purple
4    Black
5     Cyan
6  Magneta

Similarly, it can be created using dictionary also.

import pandas as pd
# intialise data of lists.
somedata = {'Name':['Steve', 'Alex', 'JHON', 'KYLE','MAX'],
        'Marks':[78, 47, 79, 84, 32]}
# Create DataFrame
dframe = pd.DataFrame(somedata)
# Print the output.

    Name  Marks
0  Steve     78
1   Alex     47
2   JHON     79
3   KYLE     84
4    MAX     32

Operations on a DataFrame

The data is aligned in the tabular format( row and column-wise), do we can perform operations like select, add, delete and rename etc on a DataFrame.

Selecting the row data

# importing pandas 
import pandas as pd

# creating data frame from .csv file
data = pd.read_csv("C:/Users/my/Desktop/record.csv", index_col ="NAME")

# Accessing a row by loc method
steve = data.loc["STEVE"]
kyle = data.loc["KYLE"]

print(steve, "\n\n\n", kyle)

Unnamed: 0     2
AGE           16
GRADE          C
MARKS         34
Name: STEVE, dtype: object 

 Unnamed: 0     7
AGE           17
GRADE          C
MARKS         44
Name: KYLE, dtype: object

Selecting a Column

Place the name of the column in the brackets to select a single column, For example

# importing pandas
import pandas as pd

# creating data frame from .csv file
data = pd.read_csv("C:/Users/my/Desktop/record.csv")

# Accessing a row by loc method
res = data["NAME"]


0     ALEX
1    STEVE
2     JHON
3    WILEY
4    SMITH
5     DAVE
6     KYLE
7      SAM
8      MAX
9      RON
Name: NAME, dtype: object