Often you may be interested in placing the values of a variable into “bins” in Python.
Fortunately this is easy to do using the numpy.digitize() function, which uses the following syntax:
numpy.digitize(x, bins, right=False)
where:
- x: Array to be binned.
- bins: Array of bins.
- right: Indicating whether the intervals include the right or the left bin edge. Default is that the interval does not include the right edge.
This tutorial shows several examples of how to use this function in practice.
Example 1: Place All Values into Two Bins
The following code shows how to place the values of an array into two bins:
- 0 if x
- 1 if x ≥ 20
import numpy as np #create data data = [2, 4, 4, 7, 12, 14, 19, 20, 24, 31, 34] #place values into bins np.digitize(data, bins=[20]) array([0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1])
Example 2: Place All Values into Three Bins
The following code shows how to place the values of an array into three bins:
- 0 if x
- 1 if 10 ≤ x
- 2 if x ≥ 20
import numpy as np #create data data = [2, 4, 4, 7, 12, 14, 20, 22, 24, 31, 34] #place values into bins np.digitize(data, bins=[10, 20]) array([0, 0, 0, 0, 1, 1, 2, 2, 2, 2, 2])
Note that if we specify right=True then the values would be placed into the following bins:
- 0 if x ≤ 10
- 1 if 10
- 2 if x > 20
Each interval would include the right bin edge. Here’s what that looks like:
import numpy as np #create data data = [2, 4, 4, 7, 12, 14, 20, 22, 24, 31, 34] #place values into bins np.digitize(data, bins=[10, 20], right=True) array([0, 0, 0, 0, 1, 1, 1, 2, 2, 2, 2])
Example 3: Place All Values into Four Bins
The following code shows how to place the values of an array into three bins:
- 0 if x
- 1 if 10 ≤ x
- 2 if 20 ≤ x
- 3 if x ≥ 30
import numpy as np #create data data = [2, 4, 4, 7, 12, 14, 20, 22, 24, 31, 34] #place values into bins np.digitize(data, bins=[10, 20, 30]) array([0, 0, 0, 0, 1, 1, 2, 2, 2, 3, 3])
Example 4: Count the Frequency of Each Bin
Another useful NumPy function that complements the numpy.digitize() function is the numpy.bincount() function, which counts the frequencies of each bin.
The following code shows how to place the values of an array into three bins and then count the frequency of each bin:
import numpy as np #create data data = [2, 4, 4, 7, 12, 14, 20, 22, 24, 31, 34] #place values into bins bin_data = np.digitize(data, bins=[10, 20]) #view binned data bin_data array([0, 0, 0, 0, 1, 1, 2, 2, 2, 2, 2]) #count frequency of each bin np.bincount(bin_data) array([4, 2, 5])
The output tells us that:
- Bin “0” contains 4 data values.
- Bin “1” contains 2 data values.
- Bin “2” contains 5 data values.
Find more Python tutorials here.