Introduction To The Pandas DataFrame
A Pandas DataFrame is a two-dimensional, size-mutable, and potentially heterogeneous tabular data structure with labelled axes (rows and columns).
A DataFrame can be imagined as a dict-like vessel holding Series objects.
The data is aligned in a tabular format in rows and columns( like SQL tables or Excel spreadsheets). Arithmetic operations can be aligned on both row and column.
Pandas DataFrame has three basic components, the data, rows, and columns.
A DataFrame can be created using,
pandas.DataFrame(data=None, index: Optional[Collection] = None, columns: Optional[Collection] = None, dtype: Optional[Union[str, numpy.dtype, ExtensionDtype]] = None, copy: bool = False)
data
ndarray can be Iterable, dict, or DataFrame, Dict can contain Series, arrays, constants, or list-like
objects.
indexIndex or array-like
Index to apply for the output DataFramecolumnsIndex or array-like
Column labels to use for Output DataFrame.dtypedtype, default None
The data type of resulting DataFrame. Only one dtype is allowed.copybool
Copy data from inputs, its default value is false.Creating a Pandas DataFrame
Generally, a DataFrame is created by loading the datasets from any data storage, such as any SQL Database, a CSV file, or an Excel spreadsheet. However, we can also create a DataFrame from Python list, dict, or from a list of dictionaries etc.
A simple dataframe is created using a list in example given below,
import pandas as pd # list of strings mylist = ['Green', 'Orange', 'Red', 'Purple', 'Black', 'Cyan', 'Magneta'] #Using the DataFrame constructor dframe = pd.DataFrame(mylist) print(dframe) #Output 0 0 Green 1 Orange 2 Red 3 Purple 4 Black 5 Cyan 6 Magneta
Similarly, it can be created using dictionary also.
import pandas as pd # intialise data of lists. somedata = {'Name':['Steve', 'Alex', 'JHON', 'KYLE','MAX'], 'Marks':[78, 47, 79, 84, 32]} # Create DataFrame dframe = pd.DataFrame(somedata) # Print the output. print(dframe) #Output Name Marks 0 Steve 78 1 Alex 47 2 JHON 79 3 KYLE 84 4 MAX 32
Operations on a DataFrame
Selecting the row data
# importing pandas import pandas as pd # creating data frame from .csv file data = pd.read_csv("C:/Users/my/Desktop/record.csv", index_col ="NAME") # Accessing a row by loc method steve = data.loc["STEVE"] kyle = data.loc["KYLE"] print(steve, "\n\n\n", kyle) #Output Unnamed: 0 2 AGE 16 GRADE C MARKS 34 Name: STEVE, dtype: object Unnamed: 0 7 AGE 17 GRADE C MARKS 44 Name: KYLE, dtype: object
Selecting a Column
Place the name of the column in the brackets to select a single column, For example
# importing pandas import pandas as pd # creating data frame from .csv file data = pd.read_csv("C:/Users/my/Desktop/record.csv") # Accessing a row by loc method res = data["NAME"] print(res) #Output 0 ALEX 1 STEVE 2 JHON 3 WILEY 4 SMITH 5 DAVE 6 KYLE 7 SAM 8 MAX 9 RON Name: NAME, dtype: object