Monday, June 29, 2020

Python pandas introduction

Python pandas introduction

The 'pandas' is one of the most popular, open-source Python package used for data analysis and manipulation. It was developed by Wes McKinney in 2008. This is the abbreviation of the Panel Data System. We can organize the data in two-dimensional tabular forms. Pandas is built on the top of Numpy

The pandas is an efficient, powerful, flexible, high performance, and easy to use data analysis and manipulation tool. Pandas provide us the power to work with data from comprehensive types of resources like .csv, .tsv, excel sheets, and webpages.

The pandas is a very suitable tool for the data scientists because it can be used for
  • Managing data
  • Cleaning data
  • Analyzing data
  • Modeling data
  • Organizing the data in the desired form to plot the results or display the data in tabular form.

pandas is an appropriate choice for various types of data, such as
  • The data kept in some tabular format (Excel spreadsheets, SQL table, etc.)
  • The data can be in ordered and unordered.
  • The data can be arbitrary matrix data (homogeneously typed or heterogeneous) with row and column labels 
  • The data in any other format of observational/statistical data sets. The data is not required to be labeled at all to be kept in a pandas data structure.

pandas Installation

Pandas is part of the Anaconda distribution and can be installed with Anaconda or Miniconda,

conda install pandas
 Or it can be installed using pip,

pip install pandas

Key features of pandas

Few key features of pandas are,

  • All the missing data in floating point, as well as non-floating point data, is handled very easily by pandas. This data is represented as NaN.
  • The size of a DataFrame is mutable, the columns can be added or removed.
  • Automatic and clear-cut data alignment
  • Pansdas provides a strong and versatile set of functions to execute split-apply-combine operations on data sets to aggregate and transform the data
  • Intelligent label-based slicing, fancy indexing, and subsetting of large data sets
  • Automatic merging and joining data sets
  • Flexible reshaping and pivoting of data sets
  • Hierarchical labeling of axes
  • Flavorsome IO tools for loading data Time series-specific functionality

Main components of pandas

Series and DataFrames are the two main components of pandas. A Series represents a column containing the data, and a DataFrame represents a multi-dimensional table composed of a set of multiple Series.