Sunday, July 12, 2020

Operating on a Sets in Python

Operating on a Sets in Python




A Set in Python is an unordered and unindexed collection of elements written within curly brackets. Sets are iterable and mutable. A set can never contain duplicate values. The repetition of items is not allowed in a set.

Various operations can be applied upon Python’s sets. These operations are possible using some Operators or Methods. However, only a few operations are possible by operators and methods both. It means some of these operations are possible with operators a few of them are possible by methods only.


For example, there are two sets, s1 and s2, the union of s1 and s2 results in a new set containing all the items in set s1 and s2.

s1= {10, 20, 'sum', True}
s2={20, 30, 'sub', False}
s3=s1.union(s2)
print('Method', s3) 
print('operator', s1 | s2)
 
 #Output
 Method {False, True, 'sub', 'sum', 10, 20, 30}
operator {False, True, 'sub', 'sum', 10, 20, 30}


Union of Sets


As discussed in the above example, we can perform the union operation by union() method or Union operator ( | ). The minute difference between these two is that for the ( | ) operator the operands must be sets only, but in case of the method union(), an iterable(read iterator and iterable in Python) can be passed as an argument. It will be converted to a set automatically to perform the operation.

s1= {10, 20, 'sum', True}
s2=(20, 30, 'sub', False)
s3=s1.union(s2)
print('Method', s3) # Works Fine
print('operator', s1 | s2) #ERROR

We can perform a union of multiple Sets together,

a = {10, 20, 30, 40}
b = {20, 30, 40, 50}
c = {30, 40, 50, 60}
d = {40, 50, 60, 70}

print("Method", a.union(b, c, d)) #Method
 
print("Operator", a | b | c | d)  #Operator

#Output
Method {70, 40, 10, 50, 20, 60, 30}
Operator {70, 40, 10, 50, 20, 60, 30}


Intersection of Sets


We can find the intersection of two sets using the method intersection() or the ( & ) operator. The intersection of sets results in a set containing the common values in the input sets.
 
For example,

s1= {10, 20, 5, 'sum', True}
s2={20, 30, 5, 'sub', False}
s3=s1.intersection(s2)
print('Method', s3)  #intersection
print('Operator', s1 & s2) #intersection

#Output
Method {20, 5}
operator {20, 5}

Similar to union operation, the subtle difference between using intersection() method and ( & ) operator is that both the operands must be sets for using ( & ) operator but in case of the intersection() method the argument can be any iterable. 

s1= {10, 20, 5, 'sum', True}
s2=(20, 30, 5, 'sub', False)
s3=s1.intersection(s2)
print('Method', s3)  #intersection
print('Operator', s1 & s2) #ERROR

#Output

We can apply intersection operation on multiple sets using both techniques. For example,

a = {10, 20, 30, 40}
b = {20, 30, 40, 50}
c = {30, 40, 50, 60}
d = {40, 50, 60, 70}

print("Method", a.intersection(b, c, d)) #Method
 
print("Operator", a & b & c & d)  #Operator

#Output

Method {40}
Operator {40}

Difference of Sets


The difference of the sets can be computed using the difference() method or the ( - ) operator in Python. Let there be two sets s1 and s2, s1- s2 return the set of all elements that are inset s1 but not in s2.

For example,

s1= {10, 20, 5, 'sum', True}
s2={20, 30, 5, 'sub', False}
s3=s1.difference(s2)
print('Method', s3)  #difference
print('Operator', s1 - s2) #differnce

#Output
Method {True, 10, 'sum'}
Operator {True, 10, 'sum'}

Similar to union operation, the subtle difference between using difference() method and ( - ) operator is that both the operands must be sets for using ( - ) operator but in case of the difference() method the argument can be any iterable.

s1= {10, 20, 5, 'sum', True}
s2=(20, 30, 5, 'sub', False)
s3=s1.difference(s2)
print('Method', s3)  #difference
print('Operator', s1 - s2) # ERROR

We can apply difference operations on multiple sets using both techniques. For example,

a = {10, 20, 30, 40}
b = {20, 30, 40, 50}
c = {30, 40, 50, 60}
d = {40, 50, 60, 70}

print("Method", a.difference(b, c, d)) #Method
 
print("Operator", a - b - c - d)  #Operator

#Output
Method {10}
Operator {10}
 

Checking Disjoint Operation


isdisjoint() method can be used to find if the two given sets (let s1 and s2) are disjoint or not. The two sets are said to be disjoint if they do not contain any common elements. There is no corresponding operator for this method. For example,

a = {10, 20, 30, 40}
b = {20, 30, 40, 50}
c = {70, 80, 90, 100}

print(a.isdisjoint(b))
print(a.isdisjoint(c))

#Output
False
True

Checking Subset Operation


issubset() method can be applied to check if a set is the subset of another set or not. We can determine the same using the ( <= ) operator.

a = {10, 20, 30, 40, 50, 60 , 70}
b = {100, 110, 40}
c = {30, 40, 60}

print("METHOD")
print(b.issubset(a))
print(c.issubset(a))

print("OPERATOR")
print(b <= a)
print(c <= a)

#Output
METHOD
False
True
OPERATOR
False
True


A set is always subset to itself. it means s1 <= s1, always returns True.

We can use s1 <  s2 to check if the set s1 is a proper subset of s2. It means s1 < s1 returns a False. The operator ( < ) is the only way to check the proper subset. There is no corresponding method for this operator.


Checking Superset Operation


issuperset() method can be applied to check if a set is the superset of another set or not. A set s1 is considered a superset of set s2 if all the elements of s1 are contained by the s2. We can determine the same using the ( > = ) operator.   

a = {10, 20, 30, 40, 50, 60 , 70}
b = {100, 110, 40}
c = {30, 40, 60}

print("METHOD")
print(a.issuperset(b))
print(a.issuperset(c))

print("OPERATOR")
print(a >= b)
print(a > c)

#Output
METHOD
False
True
OPERATOR
False
True


Updating Sets


The elements in the sets are immutable types, still, we can update the contents of a set. There are various operations to modify the contents of a set.

The easiest way to update a set is the update() method, For example,

a = {10, 20, True, 'Some text'}

a.update((15,25))

print(a)


#Output
{True, 10, 15, 20, 'Some text', 25}

For example, each of the union, intersection, difference, and symmetric difference operators can be augmented with an assignment operator to modify a set. For example,

For example,

a = {10, 20, True, 'Some text'}
b = {12.3, 19.5, 10}
a |= b

print(a)

a = {10, 20, True, 'Some text'}
b = {12.3, 19.5, 10}

a &= b
print(a)

#Output
{True, 19.5, 20, 'Some text', 10, 12.3}
{10}

We can use intersection_update() and intersection_difference() method corresponding to the ( |= ) and ( &=) operators.


We can also use the add(<element>), remove(<element>) and discard(<element>) methods to update a set by add, remove and remove operations from a set. The difference between the remove() and discard() method is that the first one raises an exception if the element is not found in the set.

a = {10, 20, True, 'Some text', 17.5}
a.add('new item')
print(a)
a.remove(10)
print(a)
a.discard(30)
print(a)


#Output
{True, 'Some text', 10, 17.5, 20, 'new item'}
{True, 'Some text', 17.5, 20, 'new item'}
{True, 'Some text', 17.5, 20, 'new item'}

The pop() method remove an item arbitrarily from the set.
The clear() method clears the set. 

a = {10, 20, True, 'Some text', 17.5}

a.pop()
print(a)
a.clear()
print('Cleared the Set')
print(a)


#Output
{'Some text', 10, 17.5, 20}
Cleared the Set
set()
 



Thursday, July 9, 2020

Reading and Writing the data to CSV files with Node.js

Reading and Writing the data to CSV files with Node.js


CSV Data


A CSV file stands for a comma-separated values file. This is a plain text file containing the data written in a particular format described for CSV files. All the fields in a CSV file are separated using commas in a particular line. We can store CSV data in the tabular form very easily. RFC4180 is the most commonly used CSV format.

For example,

NAME,AGE,GRADE,MARKS
1,ALEX,16,A,88
2,STEVE,16,C,34
3,JHON,17,B,66
4,WILEY,16,B,75
5,SMITH,18,A,82
6,DAVE,16,A,90
7,KYLE,17,C,44
8,SAM,18,A,85
9,MAX,16,B,77
10,RON,17,A,93


Reading CSV Files in Node.js


The Node.js fs module can be used to read CSV data. There are many modules available for performing various operations on CSV data in Node.js such as ‘csv-parser’. We can perform parsing operations with this module very easily. 

We can install the csv-parser module using the following command,

npm install csv-parser

For example,

const csv = require('csv-parser');
const fs = require('fs');

fs.createReadStream('record.csv')
  .pipe(csv())
  .on('data', (line) => {
    console.log(line);
  })
  .on('end', () => {
    console.log('CSV data displayed successfully');
  });


The data from readStream is piped into a csv object. The data event is fired every time a new row is processed from the entire CSV data. When entire data from the CSV file is consumed the ‘end’ event is fired.

The output will appear as,

{ '': '1', NAME: 'ALEX', AGE: '16', GRADE: 'A', MARKS: '88' }
{ '': '2', NAME: 'STEVE', AGE: '16', GRADE: 'C', MARKS: '34' }
{ '': '3', NAME: 'JHON', AGE: '17', GRADE: 'B', MARKS: '66' }
{ '': '4', NAME: 'WILEY', AGE: '16', GRADE: 'B', MARKS: '75' }
{ '': '5', NAME: 'SMITH', AGE: '18', GRADE: 'A', MARKS: '82' }
{ '': '6', NAME: 'DAVE', AGE: '16', GRADE: 'A', MARKS: '90' }
{ '': '7', NAME: 'KYLE', AGE: '17', GRADE: 'C', MARKS: '44' }
{ '': '8', NAME: 'SAM', AGE: '18', GRADE: 'A', MARKS: '85' }
{ '': '9', NAME: 'MAX', AGE: '16', GRADE: 'B', MARKS: '77' }
{ '': '10', NAME: 'RON', AGE: '17', GRADE: 'A', MARKS: '93' }


Writing data to CSV file


We can write data to file using the fs module because the CSV data is a kind of plain text in itself. However, we can perform this task easily using the 'csv-writer' module. 

To install this module use the following command,

npm install csv-writer

We can write a simple code to write data to CSV file as follows,

const createCsvWriter = require('csv-writer').createObjectCsvWriter;
const csvWriter = createCsvWriter({
  path: 'record.csv',
  header: [
    {id: 'StudentName', title: 'StudentName'},
    {id: 'Roll', title: 'Roll'},
    {id: 'Age', title: 'Age'},
    {id: 'Dept', title: 'Dept'},
  ]
});

const record = [
  {
    StudentName: 'Alex',
    Roll: 15,
    Age: 18,
    Dept: 'ME'
  }, {
    StudentName: 'Smith',
    Roll: 55,
    Age: 19,
    Dept: 'CSE',
  }, {
    StudentName: 'Rob',
    Roll: 48,
    Age: 18,
    Dept: 'ECE'
  }
];

csvWriter
  .writeRecords(record)
  .then(()=> console.log('Data written successfully'));

After compiling the above code using the node command, a new file with a name record.csv will be created and the following data will be written in this file.

StudentName,Roll,Age,Dept
Alex,15,18,ME
Smith,55,19,CSE
Rob,48,18,ECE

Reading CSV data and converting to JSON format


We can use the 'csvtojson' module to read a CSV file and convert the data into the JSON format. We can install 'csvtojson' module using the npm install command.

const csv=require('csvtojson');
const converter=csv()
.fromFile('./record.csv')
.then((json)=>{
    console.log(json);
});

The results will display the JSON data stored in the record file.







Saturday, July 4, 2020

How to read Excel Data using pandas

How to read Excel Data using pandas?


We can read two-dimensional data and store it into a DataFrame using pandas. The data can be in various forms like CSV, Excel, SQL, etc. We can use the read_excel() method to read the excel data.

read_excel() Method


This method can read data from an Excel file into a pandas DataFrame. xls, xlsx, xlsm, xlsb, and odf formats of excel file extensions can be read from a local filesystem or URL.


The syntax of read_excel() method is,

pandas.read_excel(io, sheet_name=0, header=0, names=None, index_col=None, 
                usecols=None, squeeze=False, dtype=None, 
                engine=None, converters=None, true_values=None, 
                false_values=None, skiprows=None, nrows=None, 
                na_values=None, keep_default_na=True, verbose=False, 
                parse_dates=False, date_parser=None, thousands=None, 
                comment=None, skipfooter=0, convert_float=True, 
                mangle_dupe_cols=True, **kwds)

For example,

# importing pandas
import pandas as pd
excel_data = pd.read_excel('C:/Users/my/Desktop/record.xlsx', sheet_name='record')
#read First 5 rows
print(excel_data.head())

#Output
   Unnamed: 0   NAME  AGE GRADE  MARKS
0           1   ALEX   16     A     88
1           2  STEVE   16     C     34
2           3   JHON   17     B     66
3           4  WILEY   16     B     75
4           5  SMITH   18     A     82


The type of the received data is pandas DataFrame.

# importing pandas
import pandas as pd
excel_data = pd.read_excel('C:/Users/my/Desktop/record.xlsx', sheet_name='record')
# Type
print(type(excel_data))

#Output
<class 'pandas.core.frame.DataFrame'>


To get the list of the column names from the excel data the column property can be used.

# importing pandas
import pandas as pd
excel_data = pd.read_excel('C:/Users/my/Desktop/record.xlsx', sheet_name='record')
print(excel_data.columns.ravel())

#Output
['Unnamed: 0' 'NAME' 'AGE' 'GRADE' 'MARKS']

The column data can be transformed into the list of values.

# importing pandas
import pandas as pd
excel_data = pd.read_excel('C:/Users/my/Desktop/record.xlsx', sheet_name='record')
print(excel_data['NAME'].to_list())

#Output
['ALEX', 'STEVE', 'JHON', 'WILEY', 'SMITH', 'DAVE', 'KYLE', 'SAM', 'MAX', 'RON']


Parameters of read_excel() method


Important parameters of the read_excel() method are,


  • io: This can be str, bytes, ExcelFile, xlrd.Book, path object, or file-like object.
  • sheet_name: The name or list of input excel sheets. It can be str, int, list, or None, default 0. 
  • header: Represents the header of the DataFrame. It can be int, list of int, default 0. 
  • names: List of column names to use. This can be array-like, default None. 
  • index_col: Represents index column (0-indexed) to use as the row labels of the DataFrame. It can be int, list of int, default None. 
  • usecols: The specified columns are taken as input, it can be int, str, list-like, or callable default None.
  • dtype: Represents the data type for the data or columns.
  • engine: If io is not a buffer or path, this must be set to identify io. The valid values are None, “xlrd”, “openpyxl” or “odf”.
  • nrows: The Number of rows to parse. na_values The values to represent NaN values.
  • squeeze: If data have one column then return a pandas Series only if this field is true. The by default value is False.
  • true_values: (list) to specify the values that should be considered as True. The default value is None.
  • false_values:(list) to specify the values that should be considered as False. The default value is None.
  • comments: Comment out the remainder line(string).




 

Friday, July 3, 2020

How to read CSV file using pandas

How to read CSV files using pandas?


The DataFrame in pandas is used to handle two-dimensional data arranged in the tabular data structure. Pandas is a tool to analyze and manipulate the data. Large datasets can be easily handled with pandas. It is a flexible, efficient, and high performance, well suited for homogenous or heterogeneous datasets.

Pandas provide us the power to work with data from comprehensive types of resources like .csv, .tsv, excel sheets, and webpage.

Importing CSV file


The easiest way to import a CSV file is the read_csv() method. For example

# importing pandas
import pandas as pd

# creating data frame from .csv file
dataframe = pd.read_csv("C:/Users/my/Desktop/record.csv")
#Read first 5 rows of dataframe
print(dataframe.head())

#Output
   Unnamed: 0   NAME  AGE GRADE  MARKS
0           1   ALEX   16     A     88
1           2  STEVE   16     C     34
2           3   JHON   17     B     66
3           4  WILEY   16     B     75
4           5  SMITH   18     A     82

read_csv() method


This method reads a comma-separated values (.csv) file into DataFrame. The read_csv() method supports iterating or breaking of the file into chunks optionally.  The syntax of this method is,

pandas.read_csv(filepath_or_buffer: Union[str, pathlib.Path, IO[~ AnyStr]],
                  sep=',', delimiter=None, header='infer', names=None, 
                  index_col=None, usecols=None, squeeze=False, prefix=None, 
                  mangle_dupe_cols=True, dtype=None, engine=None, converters=None, 
                  true_values=None, false_values=None, skipinitialspace=False, 
                  skiprows=None, skipfooter=0, nrows=None, na_values=None, 
                  keep_default_na=True, na_filter=True, verbose=False, 
                  skip_blank_lines=True, parse_dates=False, infer_datetime_format=False, 
                  keep_date_col=False, date_parser=None, dayfirst=False, 
                  cache_dates=True, iterator=False, chunksize=None, compression='infer', 
                  thousands=None, decimal: str = '.', lineterminator=None, quotechar='"',
                  quoting=0, doublequote=True, escapechar=None, comment=None, encoding=None, 
                  dialect=None, error_bad_lines=True, warn_bad_lines=True, 
                  delim_whitespace=False, low_memory=True, memory_map=False, 
                  float_precision=None)


Important options available in read_csv method are,

filepath_or_buffer


This is a filename or the path to the file,

dataframe = pd.read_csv("C:/Users/my/Desktop/record.csv") #filepath


sep


We can sep field also, by default separator is ‘,’. We can customize it as,

dataframe = pd.read_csv("C:/Users/my/Desktop/record.csv", sep='\t')

usecols


We can also import the dataset in the form of a column, using the usecols field, in the read_csv() method.

# importing pandas
import pandas as pd

# creating data frame from .csv file, ONLY selected columns
dataframe = pd.read_csv("C:/Users/my/Desktop/record.csv", usecols=['NAME','AGE'])
#Read first 5 rows of dataframe
print(dataframe.head())

#Output
    NAME  AGE
0   ALEX   16
1  STEVE   16
2   JHON   17
3  WILEY   16
4  SMITH   18

header 


We can specify the row that should be used as column names for the produced DataFrame. By default, the header field value is set to 0. The first row of the CSV file will specify the header. A DataFrame without header can be formed by simply setting header=None.

dataframe = pd.read_csv("C:/Users/my/Desktop/record.csv", header=None) #Without header

index_col


This field is used to specify the index column of the DataFrame. By default, it is set to None. It can be set as a column name or column index.

dataframe = pd.read_csv("C:/Users/my/Desktop/record.csv",index_col='Name') 
# Use 'Name' column as index

nrows


The number of rows starting from first those must be selected. It is int value.

# importing pandas
import pandas as pd
dataframe = pd.read_csv("C:/Users/my/Desktop/record.csv", nrows=5)
print(dataframe)

#Output
   Unnamed: 0   NAME  AGE GRADE  MARKS
0           1   ALEX   16     A     88
1           2  STEVE   16     C     34
2           3   JHON   17     B     66
3           4  WILEY   16     B     75
4           5  SMITH   18     A     82

na_values 


The missing values are specified as NaN. If other strings are required to be considered as NaN. The expected input is a list of strings.

dataframe = pd.read_csv("C:/Users/my/Desktop/record.csv", na_values=['x','y']) 
# x and y values will be represented as NaN after importing into dataframe