NumPy Statistical Functions
There are many useful statistical functions provided by Numpy for getting minimum, maximum, average statistics, percentile standard deviation, variance, correlating, etc. from a Numpy ndarray.
Order statistics
There are following methods available for order statistics,
- amin(a[, axis, out, keepdims, initial, where])
- amax(a[, axis, out, keepdims, initial, where])
- nanmin(a[, axis, out, keepdims])
- nanmax(a[, axis, out, keepdims])
- ptp(a[, axis, out, keepdims])
- percentile(a, q[, axis, out, …])
- nanpercentile(a, q[, axis, out, …])
- quantile(a, q[, axis, out, overwrite_input, …])
- nanquantile(a, q[, axis, out, …])
numpy.amin() and numpyamax()
This method returns the minimum of an array or minimum along an axis. The syntax of this method is,
numpy.amin(a, axis=None, out=None, keepdims=<no value>, initial=<no value>, where=<no value>)
and,
numpy.amax(a, axis=None, out=None, keepdims=<no value>, initial=<no value>, where=<no value>)
import numpy as np arr = np.array([[3,7,5],[8,4,3],[2,4,9]]) print('Array->') print(arr) print('Applying amin() function:') print(np.amin(arr,1)) print('amin() Along axis=0') print(np.amin(arr, axis=0)) print('Applying amax() function:') print(np.amax(arr,1)) #Output Array-> [[3 7 5] [8 4 3] [2 4 9]] Applying amin() function: [3 3 2] amin() Along axis=0 [2 4 3] Applying amax() function: [7 8 9]
numpy.ptp()
The numpy.ptp() function can be used to find the range (maximum-minimum) of values along an axis. The name of the method is the acronym for peek to peek. The syntax of the method is,
numpy.ptp(a, axis=None, out=None, keepdims=<no value>)
For example,
import numpy as np arr = np.array([[3,7,5],[8,4,3],[2,4,9]]) print('Original array') print(arr) print('ptp() function:') print(np.ptp(arr)) print('ptp() function along axis 1:') print(np.ptp(arr, axis = 1)) print('ptp() function along axis 0:') print(np.ptp(arr, axis = 0)) #Output Original array [[3 7 5] [8 4 3] [2 4 9]] ptp() function: 7 ptp() function along axis 1: [4 5 7] ptp() function along axis 0: [6 3 6]
numpy.percentile()
We can compute the q-th percentile of the data along the provided axis. This method returns the q-th percentile(s) of the array entries. The syntax of the method is,
numpy.percentile(a, q, axis=None, out=None, overwrite_input=False, interpolation='linear', keepdims=False)
for example,
import numpy as np arr = np.array([[10, 7, 4], [3, 2, 1]]) print(np.percentile(arr, 50)) print(np.percentile(arr, 50, axis=0)) print(np.percentile(arr, 50, axis=1, keepdims=True)) #Output 3.5 [6.5 4.5 2.5] [[7.] [2.]]
Averages and variances
- median(a[, axis, out, overwrite_input, keepdims])
- average(a[, axis, weights, returned])
- mean(a[, axis, dtype, out, keepdims])
- std(a[, axis, dtype, out, ddof, keepdims])
- var(a[, axis, dtype, out, ddof, keepdims])
- nanmedian(a[, axis, out, overwrite_input, …])
- nanmean(a[, axis, dtype, out, keepdims])
- nanstd(a[, axis, dtype, out, ddof, keepdims])
- nanvar(a[, axis, dtype, out, ddof, keepdims])
numpy.median()
numpy.median(a, axis=None, out=None, overwrite_input=False, keepdims=False)
For example,
import numpy as np arr = np.array([[4, 7, 11], [3, 5, 4]]) print(np.median(arr)) print(np.median(arr, axis=0)) #results will be stored in m m = np.median(arr, axis=0) print(np.median(arr, axis=0, out=m)) print(m) #Output 4.5 [3.5 6. 7.5] [3.5 6. 7.5] [3.5 6. 7.5]
numpy.average()
numpy.average(a, axis=None, weights=None, returned=False)
for example,
import numpy as np arr = np.arange(1,10).reshape(3,3) print('Array:\n',arr) print('Average->',np.average(arr)) print('Average->', np.average(arr, axis=0, weights=np.arange(9, 0, -1).reshape(3,3))) #Output Array: [[1 2 3] [4 5 6] [7 8 9]] Average-> 5.0 Average-> [3. 3.8 4.5]
numpy.mean()
It is used to find the arithmetic mean along the given axis.The syntax of the method is,
numpy.mean(a, axis=None, dtype=None, out=None, keepdims=<no value>)
For example,
import numpy as np arr = np.arange(1,10).reshape(3,3) print('Array:\n',arr) print('mean->',np.mean(arr)) print('mean->', np.mean(arr, axis=0)) #Output Array: [[1 2 3] [4 5 6] [7 8 9]] mean-> 5.0 mean-> [4. 5. 6.]
numpy.std()
This method is used to get the standard deviation along the specified axis. The syntax of this method is,
numpy.std(a, axis=None, dtype=None, out=None, ddof=0, keepdims=<no value>)
for example,
import numpy as np arr = np.arange(1,10).reshape(3,3) print('Array:\n',arr) print('std->',np.std(arr)) print('std->', np.std(arr, axis=0)) #Output Array: [[1 2 3] [4 5 6] [7 8 9]] std-> 2.581988897471611 std-> [2.44948974 2.44948974 2.44948974]
numpy.var()
This method is used to get the standard variance along the specified axis. The syntax of this method is,
numpy.var(a, axis=None, dtype=None, out=None, ddof=0, keepdims=<no value>)
for example,
import numpy as np arr = np.arange(1,10).reshape(3,3) print('Array:\n',arr) print('variance->',np.var(arr)) print('variance->', np.var(arr, axis=0)) #Output Array: [[1 2 3] [4 5 6] [7 8 9]] variance-> 6.666666666666667 variance-> [6. 6. 6.]
Correlating
There are following methods available for correlating,
- corrcoef(x[, y, rowvar, bias, ddof])
- correlate(a, v[, mode])
- cov(m[, y, rowvar, bias, ddof, fweights, …])
numpy. correlate
c_{av}[k] = sum_n a[n+k] * conj(v[n])
with a and v sequences being zero-padded where necessary and conj being the conjugate. The syntax of the method is,
numpy.correlate(a, v, mode='valid')[source]
for example,
import numpy as np print(np.correlate([1, 2, 3], [0, 1, 0.5])) print(np.correlate([1, 2, 3], [0, 1, 0.5], "same")) print(np.correlate([1, 2, 3], [0, 1, 0.5], "full")) #Output [3.5] [2. 3.5 3. ] [0.5 2. 3.5 3. 0. ]
Histogram
- histogram(a[, bins, range, normed, weights, …])
- histogram2d(x, y[, bins, range, normed, …])
- histogramdd(sample[, bins, range, normed, …])
- bincount(x[, weights, minlength])
- histogram_bin_edges(a[, bins, range, weights])
- digitize(x, bins[, right])