Saturday, July 4, 2020

How to read Excel Data using pandas

How to read Excel Data using pandas?


We can read two-dimensional data and store it into a DataFrame using pandas. The data can be in various forms like CSV, Excel, SQL, etc. We can use the read_excel() method to read the excel data.

read_excel() Method


This method can read data from an Excel file into a pandas DataFrame. xls, xlsx, xlsm, xlsb, and odf formats of excel file extensions can be read from a local filesystem or URL.


The syntax of read_excel() method is,

pandas.read_excel(io, sheet_name=0, header=0, names=None, index_col=None, 
                usecols=None, squeeze=False, dtype=None, 
                engine=None, converters=None, true_values=None, 
                false_values=None, skiprows=None, nrows=None, 
                na_values=None, keep_default_na=True, verbose=False, 
                parse_dates=False, date_parser=None, thousands=None, 
                comment=None, skipfooter=0, convert_float=True, 
                mangle_dupe_cols=True, **kwds)

For example,

# importing pandas
import pandas as pd
excel_data = pd.read_excel('C:/Users/my/Desktop/record.xlsx', sheet_name='record')
#read First 5 rows
print(excel_data.head())

#Output
   Unnamed: 0   NAME  AGE GRADE  MARKS
0           1   ALEX   16     A     88
1           2  STEVE   16     C     34
2           3   JHON   17     B     66
3           4  WILEY   16     B     75
4           5  SMITH   18     A     82


The type of the received data is pandas DataFrame.

# importing pandas
import pandas as pd
excel_data = pd.read_excel('C:/Users/my/Desktop/record.xlsx', sheet_name='record')
# Type
print(type(excel_data))

#Output
<class 'pandas.core.frame.DataFrame'>


To get the list of the column names from the excel data the column property can be used.

# importing pandas
import pandas as pd
excel_data = pd.read_excel('C:/Users/my/Desktop/record.xlsx', sheet_name='record')
print(excel_data.columns.ravel())

#Output
['Unnamed: 0' 'NAME' 'AGE' 'GRADE' 'MARKS']

The column data can be transformed into the list of values.

# importing pandas
import pandas as pd
excel_data = pd.read_excel('C:/Users/my/Desktop/record.xlsx', sheet_name='record')
print(excel_data['NAME'].to_list())

#Output
['ALEX', 'STEVE', 'JHON', 'WILEY', 'SMITH', 'DAVE', 'KYLE', 'SAM', 'MAX', 'RON']


Parameters of read_excel() method


Important parameters of the read_excel() method are,


  • io: This can be str, bytes, ExcelFile, xlrd.Book, path object, or file-like object.
  • sheet_name: The name or list of input excel sheets. It can be str, int, list, or None, default 0. 
  • header: Represents the header of the DataFrame. It can be int, list of int, default 0. 
  • names: List of column names to use. This can be array-like, default None. 
  • index_col: Represents index column (0-indexed) to use as the row labels of the DataFrame. It can be int, list of int, default None. 
  • usecols: The specified columns are taken as input, it can be int, str, list-like, or callable default None.
  • dtype: Represents the data type for the data or columns.
  • engine: If io is not a buffer or path, this must be set to identify io. The valid values are None, “xlrd”, “openpyxl” or “odf”.
  • nrows: The Number of rows to parse. na_values The values to represent NaN values.
  • squeeze: If data have one column then return a pandas Series only if this field is true. The by default value is False.
  • true_values: (list) to specify the values that should be considered as True. The default value is None.
  • false_values:(list) to specify the values that should be considered as False. The default value is None.
  • comments: Comment out the remainder line(string).