Tuesday, June 9, 2020

Numpy Histogram routines

Numpy Histogram Routines


A Histogram is a graphical representation of the data using bars of different heights. Histograms are much like to the Bar Charts, except a histogram group numbers into ranges. The height of each bar represents how many fall into the respective range.

The histogram routines belong to Numpy's statistical routines.

There are following methods available for the histogram,

  • histogram(a[, bins, range, normed, weights, …])
  • histogram2d(x, y[, bins, range, normed, …])
  • histogramdd(sample[, bins, range, normed, …])
  • bincount(x[, weights, minlength])
  • histogram_bin_edges(a[, bins, range, weights])
  • digitize(x, bins[, right])

numpy.histogram()


This method can be used to compute the histogram of a given set of data. The syntax of this method is,

numpy.histogram(arr, bins=10, range=None, normed=None, weights=None, density=None)

for example,

Here arr is array_like,
The bins are int or sequence of scalars or str. The bins are optional.
The range(float, float) is the range of bins, if it is not given the value is set (arr.min(), arr.max()), by default. The values beyond the range are omitted.
The normed (boolean) is deprecated and optional. This is basically a density argument (should not be used). 
The weights(optional) are array_like of the same size and shape as arr representing weights associated with entries.
The method returns an array containing the values of the histogram and an array of dtype float representing bin edges (length(hist)+1).

for example,

import numpy as np
h=np.histogram([4,14,7,9],[0,1,2,3])
print(h)

#Output
(array([0, 0, 0]), array([0, 1, 2, 3]))
import numpy as np
#with bins
print(np.histogram(np.arange(4), bins=np.arange(5), density=True))
#with weights
print('\n')
print(np.histogram([[1, 2, 1], [1, 0, 1]], bins=[0,1,2,3]))

#Output
(array([0.25, 0.25, 0.25, 0.25]), array([0, 1, 2, 3, 4]))


(array([1, 4, 1]), array([0, 1, 2, 3]))

numpy.histogram2d()


This method is used to get the two-dimensional histogram of two data samples. The syntax of the method is,

numpy.histogram2d(x, y, bins=10, range=None, normed=None, weights=None, density=None)


Here x is an array containing the x-coordinates of the points to be histogrammed and y is an array containing the y-coordinates of the points to be histogrammed. 

The bins(optional), are int or array_like or [int, int] or [array, array], The density (optional) is boolean.
The normed (optional) is boolean (should be avoided to use) and behaves like a density argument.
The weights(optional) are array_like of the same size and shape as arr representing weights associated with entries.
The method returns two-dimensional histogram.
Xedges, bin edges along the first edge. Yedges, bin edges along the second edge.

For example,

import numpy as np
from matplotlib.image import NonUniformImage
import matplotlib.pyplot as plt

xedges = [0, 1, 3, 5]
yedges = [0, 2, 3, 4, 6]
x = np.random.normal(2, 1, 100)
y = np.random.normal(1, 1, 100)
H, xedges, yedges = np.histogram2d(x, y, bins=(xedges, yedges))
print(H, xedges,yedges)

#Output
[[11.  2.  0.  0.]
 [51.  6.  1.  0.]
 [10.  4.  0.  0.]] [0 1 3 5] [0 2 3 4 6]


numpy.histogramdd() 


We can get the multidimensional histogram of some data with this method. The syntax of the method is,

numpy.histogramdd(sample, bins=10, range=None, normed=None, weights=None, density=None)

For example,

import numpy as np
r = np.random.randn(50, 3)
H, edges = np.histogramdd(r, bins = (5, 8, 4))
print(H.shape, edges[0].size, edges[2].size)

#Output
(5, 8, 4) 6 5
 

numpy.bincount()


This method can be used to count the number of appearances of each entry in the array of non-negative ints. The syntax of the method is,

umpy.bincount(x, weights=None, minlength=0)
for example,

import numpy as np
print(np.bincount(np.arange(6)))
print(np.bincount(np.array([0, 4 , 1, 3, 2, 1, 7])))


#Output
[1 1 1 1 1 1]
[1 2 1 1 1 0 0 1]

OR

import numpy as np
wt = np.array([0.2, 0.35, 0.2, 0.65, 1., -0.6]) # weights
arr = np.array([0, 1, 1, 2, 2, 2])
print(np.bincount(arr,  weights=wt))

#Output
[0.2  0.55 1.05]

numpy.histogram_bin_edges 


Computes only the edges of the bins used by the numpy.histogram method. The syntax of the method is,

numpy.histogram_bin_edges(a, bins=10, range=None, weights=None)

for example,

import numpy as np
arr = np.array([0, 0, 0, 1, 2, 3, 3, 4, 5])
print(np.histogram_bin_edges(arr, bins='auto', range=(0, 1)))

#Output
[0.   0.25 0.5  0.75 1.  ]

numpy.digitize()


This function returns the indices of the bins to which each value in the input array belongs.

numpy.digitize(x, bins, right=False)[source]

for example,

import numpy as np
arr = np.array([1.2, 10.0, 12.4, 15.5, 20.])
bins = np.array([0, 5, 10, 15, 20])
print(np.digitize(arr, bins, right=True))

#Output
[1 2 3 4 4]