The nth percentile of a dataset is the value that cuts off the first n percent of the data values when all of the values are sorted from least to greatest.
For example, the 90th percentile of a dataset is the value that cuts of the bottom 90% of the data values from the top 10% of data values.
We can quickly calculate percentiles in Python by using the numpy.percentile() function, which uses the following syntax:
numpy.percentile(a, q)
where:
- a: Array of values
- q: Percentile or sequence of percentiles to compute, which must be between 0 and 100 inclusive.
This tutorial explains how to use this function to calculate percentiles in Python.
How to Find Percentiles of an Array
The following code illustrates how to find various percentiles for a given array in Python:
import numpy as np #make this example reproducible np.random.seed(0) #create array of 100 random integers distributed between 0 and 500 data = np.random.randint(0, 500, 100) #find the 37th percentile of the array np.percentile(data, 37) 173.26 #Find the quartiles (25th, 50th, and 75th percentiles) of the array np.percentile(data, [25, 50, 75]) array([116.5, 243.5, 371.5])
How to Find Percentiles of a DataFrame Column
The following code shows how to find the 95th percentile value for a single pandas DataFrame column:
import numpy as np
import pandas as pd
#create DataFrame
df = pd.DataFrame({'var1': [25, 12, 15, 14, 19, 23, 25, 29, 33, 35],
'var2': [5, 7, 7, 9, 12, 9, 9, 4, 14, 15],
'var3': [11, 8, 10, 6, 6, 5, 9, 12, 13, 16]})
#find 90th percentile of var1 column
np.percentile(df.var1, 95)
34.1
How to Find Percentiles of Several DataFrame Columns
The following code shows how to find the 95th percentile value for a several columns in a pandas DataFrame:
import numpy as np
import pandas as pd
#create DataFrame
df = pd.DataFrame({'var1': [25, 12, 15, 14, 19, 23, 25, 29, 33, 35],
'var2': [5, 7, 7, 9, 12, 9, 9, 4, 14, 15],
'var3': [11, 8, 10, 6, 6, 5, 9, 12, 13, 16]})
#find 95th percentile of each column
df.quantile(.95)
var1 34.10
var2 14.55
var3 14.65
#find 95th percentile of just columns var1 and var2
df[['var1', 'var2']].quantile(.95)
var1 34.10
var2 14.55
Note that we were able to use the pandas quantile() function in the examples above to calculate percentiles.