How to read Excel Data using pandas?
We can read two-dimensional data and store it into a DataFrame using pandas. The data can be in various forms like CSV, Excel, SQL, etc. We can use the read_excel() method to read the excel data.
read_excel() Method
This method can read data from an Excel file into a pandas DataFrame. xls, xlsx, xlsm, xlsb, and odf formats of excel file extensions can be read from a local filesystem or URL.
The syntax of read_excel() method is,
pandas.read_excel(io, sheet_name=0, header=0, names=None, index_col=None, usecols=None, squeeze=False, dtype=None, engine=None, converters=None, true_values=None, false_values=None, skiprows=None, nrows=None, na_values=None, keep_default_na=True, verbose=False, parse_dates=False, date_parser=None, thousands=None, comment=None, skipfooter=0, convert_float=True, mangle_dupe_cols=True, **kwds)
For example,
# importing pandas import pandas as pd excel_data = pd.read_excel('C:/Users/my/Desktop/record.xlsx', sheet_name='record') #read First 5 rows print(excel_data.head()) #Output Unnamed: 0 NAME AGE GRADE MARKS 0 1 ALEX 16 A 88 1 2 STEVE 16 C 34 2 3 JHON 17 B 66 3 4 WILEY 16 B 75 4 5 SMITH 18 A 82
The type of the received data is pandas DataFrame.
# importing pandas import pandas as pd excel_data = pd.read_excel('C:/Users/my/Desktop/record.xlsx', sheet_name='record') # Type print(type(excel_data)) #Output <class 'pandas.core.frame.DataFrame'>
To get the list of the column names from the excel data the column property can be used.
# importing pandas import pandas as pd excel_data = pd.read_excel('C:/Users/my/Desktop/record.xlsx', sheet_name='record') print(excel_data.columns.ravel()) #Output ['Unnamed: 0' 'NAME' 'AGE' 'GRADE' 'MARKS']
The column data can be transformed into the list of values.
# importing pandas import pandas as pd excel_data = pd.read_excel('C:/Users/my/Desktop/record.xlsx', sheet_name='record') print(excel_data['NAME'].to_list()) #Output ['ALEX', 'STEVE', 'JHON', 'WILEY', 'SMITH', 'DAVE', 'KYLE', 'SAM', 'MAX', 'RON']
Parameters of read_excel() method
Important parameters of the read_excel() method are,
- io: This can be str, bytes, ExcelFile, xlrd.Book, path object, or file-like object.
- sheet_name: The name or list of input excel sheets. It can be str, int, list, or None, default 0.
- header: Represents the header of the DataFrame. It can be int, list of int, default 0.
- names: List of column names to use. This can be array-like, default None.
- index_col: Represents index column (0-indexed) to use as the row labels of the DataFrame. It can be int, list of int, default None.
- usecols: The specified columns are taken as input, it can be int, str, list-like, or callable default None.
- dtype: Represents the data type for the data or columns.
- engine: If io is not a buffer or path, this must be set to identify io. The valid values are None, “xlrd”, “openpyxl” or “odf”.
- nrows: The Number of rows to parse. na_values The values to represent NaN values.
- squeeze: If data have one column then return a pandas Series only if this field is true. The by default value is False.
- true_values: (list) to specify the values that should be considered as True. The default value is None.
- false_values:(list) to specify the values that should be considered as False. The default value is None.
- comments: Comment out the remainder line(string).