Tuesday, June 9, 2020

NumPy Statistical Functions

NumPy Statistical Functions


There are many useful statistical functions provided by Numpy for getting minimum, maximum, average statistics, percentile standard deviation, variance, correlating, etc. from a Numpy ndarray.

Order statistics


There are following methods available for order statistics,

  • amin(a[, axis, out, keepdims, initial, where])
  • amax(a[, axis, out, keepdims, initial, where])
  • nanmin(a[, axis, out, keepdims])
  • nanmax(a[, axis, out, keepdims])
  • ptp(a[, axis, out, keepdims])
  • percentile(a, q[, axis, out, …])
  • nanpercentile(a, q[, axis, out, …])
  • quantile(a, q[, axis, out, overwrite_input, …])
  • nanquantile(a, q[, axis, out, …])

numpy.amin() and numpyamax()


This method returns the minimum of an array or minimum along an axis. The syntax of this method is,

numpy.amin(a, axis=None, out=None, keepdims=<no value>, initial=<no value>, where=<no value>)
and,

numpy.amax(a, axis=None, out=None, keepdims=<no value>, initial=<no value>, where=<no value>)

for example,

import numpy as np 
arr = np.array([[3,7,5],[8,4,3],[2,4,9]]) 

print('Array->') 
print(arr)  

print('Applying amin() function:') 
print(np.amin(arr,1)) 
print('amin() Along axis=0')
print(np.amin(arr, axis=0))
print('Applying amax() function:') 
print(np.amax(arr,1))

#Output
Array->
[[3 7 5]
 [8 4 3]
 [2 4 9]]
Applying amin() function:
[3 3 2]
amin() Along axis=0
[2 4 3]
Applying amax() function:
[7 8 9]

numpy.ptp()


The numpy.ptp() function can be used to find the range (maximum-minimum) of values along an axis. The name of the method is the acronym for peek to peek. The syntax of the method is,

numpy.ptp(a, axis=None, out=None, keepdims=<no value>)
For example,

import numpy as np 
arr = np.array([[3,7,5],[8,4,3],[2,4,9]]) 

print('Original array')
print(arr)

print('ptp() function:') 
print(np.ptp(arr)) 

print('ptp() function along axis 1:') 
print(np.ptp(arr, axis = 1)) 

print('ptp() function along axis 0:')
print(np.ptp(arr, axis = 0))

#Output
Original array
[[3 7 5]
 [8 4 3]
 [2 4 9]]
ptp() function:
7
ptp() function along axis 1:
[4 5 7]
ptp() function along axis 0:
[6 3 6]

numpy.percentile()


We can compute the q-th percentile of the data along the provided axis. This method returns the q-th percentile(s) of the array entries. The syntax of the method is,

numpy.percentile(a, q, axis=None, out=None, overwrite_input=False, interpolation='linear', keepdims=False)
for example,

import numpy as np
arr = np.array([[10, 7, 4], [3, 2, 1]])
print(np.percentile(arr, 50))

print(np.percentile(arr, 50, axis=0))

print(np.percentile(arr, 50, axis=1, keepdims=True))

#Output
3.5
[6.5 4.5 2.5]
[[7.]
 [2.]]
 


Averages and variances


There are following methods available for average and variances,

  • median(a[, axis, out, overwrite_input, keepdims])
  • average(a[, axis, weights, returned])
  • mean(a[, axis, dtype, out, keepdims])
  • std(a[, axis, dtype, out, ddof, keepdims])
  • var(a[, axis, dtype, out, ddof, keepdims])
  • nanmedian(a[, axis, out, overwrite_input, …])
  • nanmean(a[, axis, dtype, out, keepdims])
  • nanstd(a[, axis, dtype, out, ddof, keepdims])
  • nanvar(a[, axis, dtype, out, ddof, keepdims])

numpy.median()


To find the median along the given axis. The syntax of the function is,

numpy.median(a, axis=None, out=None, overwrite_input=False, keepdims=False)

For example,

import numpy as np
arr = np.array([[4, 7, 11], [3, 5, 4]])
print(np.median(arr))
print(np.median(arr, axis=0))
#results will be stored in m
m = np.median(arr, axis=0)
print(np.median(arr, axis=0, out=m))
print(m)

#Output
4.5
[3.5 6.  7.5]
[3.5 6.  7.5]
[3.5 6.  7.5]

numpy.average()


This method can be used to get the weighted average along the specified axis. The syntax of the method is,
numpy.average(a, axis=None, weights=None, returned=False)
for example,

import numpy as np
arr = np.arange(1,10).reshape(3,3)
print('Array:\n',arr)

print('Average->',np.average(arr))

print('Average->',
np.average(arr, axis=0, weights=np.arange(9, 0, -1).reshape(3,3)))

#Output
Array:
 [[1 2 3]
 [4 5 6]
 [7 8 9]]
Average-> 5.0
Average-> [3.  3.8 4.5]

numpy.mean()


It is used to find the arithmetic mean along the given axis.The syntax of the method is,

numpy.mean(a, axis=None, dtype=None, out=None, keepdims=<no value>)
For example,

import numpy as np
arr = np.arange(1,10).reshape(3,3)
print('Array:\n',arr)

print('mean->',np.mean(arr))

print('mean->',
np.mean(arr, axis=0))

#Output
Array:
 [[1 2 3]
 [4 5 6]
 [7 8 9]]
mean-> 5.0
mean-> [4. 5. 6.]

numpy.std()


This method is used to get the standard deviation along the specified axis. The syntax of this method is,

numpy.std(a, axis=None, dtype=None, out=None, ddof=0, keepdims=<no value>)
for example,

import numpy as np
arr = np.arange(1,10).reshape(3,3)
print('Array:\n',arr)

print('std->',np.std(arr))

print('std->',
np.std(arr, axis=0))

#Output
Array:
 [[1 2 3]
 [4 5 6]
 [7 8 9]]
std-> 2.581988897471611
std-> [2.44948974 2.44948974 2.44948974] 


numpy.var()


This method is used to get the standard variance along the specified axis. The syntax of this method is,

numpy.var(a, axis=None, dtype=None, out=None, ddof=0, keepdims=<no value>)
for example,

import numpy as np
arr = np.arange(1,10).reshape(3,3)
print('Array:\n',arr)

print('variance->',np.var(arr))

print('variance->',
np.var(arr, axis=0))

#Output
Array:
 [[1 2 3]
 [4 5 6]
 [7 8 9]]
variance-> 6.666666666666667
variance-> [6. 6. 6.]

Correlating


There are following methods available for correlating,

  • corrcoef(x[, y, rowvar, bias, ddof])
  • correlate(a, v[, mode])
  • cov(m[, y, rowvar, bias, ddof, fweights, …])

numpy. correlate


This method is used to compute the cross-correlation of two 1-dimensional sequences. This function computes the correlation as generally defined in signal processing texts: 

c_{av}[k] = sum_n a[n+k] * conj(v[n])

with a and v sequences being zero-padded where necessary and conj being the conjugate. The syntax of the method is,

numpy.correlate(a, v, mode='valid')[source]
for example,

import numpy as np
print(np.correlate([1, 2, 3], [0, 1, 0.5]))
print(np.correlate([1, 2, 3], [0, 1, 0.5], "same"))
print(np.correlate([1, 2, 3], [0, 1, 0.5], "full"))

#Output
[3.5]
[2.  3.5 3. ]
[0.5 2.  3.5 3.  0. ]


Histogram


There are following methods available for histogram,

  • histogram(a[, bins, range, normed, weights, …])
  • histogram2d(x, y[, bins, range, normed, …])
  • histogramdd(sample[, bins, range, normed, …])
  • bincount(x[, weights, minlength])
  • histogram_bin_edges(a[, bins, range, weights])
  • digitize(x, bins[, right])