You can use the describe() function to generate descriptive statistics for variables in a pandas DataFrame.
You can use the following basic syntax to use the describe() function with the groupby() function in pandas:
df.groupby('group_var')['values_var'].describe()
The following example shows how to use this syntax in practice.
Example: Use describe() by Group in Pandas
Suppose we have the following pandas DataFrame that contains information about basketball players on two different teams:
import pandas as pd
#create DataFrame
df = pd.DataFrame({'team': ['A', 'A', 'A', 'A', 'B', 'B', 'B', 'B'],
'points': [8, 12, 14, 14, 15, 22, 27, 24],
'assists':[2, 2, 3, 5, 7, 6, 8, 12]})
#view DataFrame
print(df)
team points assists
0 A 8 2
1 A 12 2
2 A 14 3
3 A 14 5
4 B 15 7
5 B 22 6
6 B 27 8
7 B 24 12
We can use the describe() function along with the groupby() function to summarize the values in the points column for each team:
#summarize points by team
df.groupby('team')['points'].describe()
count mean std min 25% 50% 75% max
team
A 4.0 12.0 2.828427 8.0 11.00 13.0 14.00 14.0
B 4.0 22.0 5.099020 15.0 20.25 23.0 24.75 27.0
From the output, we can see the following values for the points variable for each team:
- count (number of observations)
- mean (mean points value)
- std (standard deviation of points values)
- min (minimum points value)
- 25% (25th percentile of points)
- 50% (50th percentile (i.e. median) of points)
- 75% (75th percentile of points)
- max (maximum points value)
If you’d like the results to be displayed in a DataFrame format, you can use the reset_index() argument:
#summarize points by team df.groupby('team')['points'].describe().reset_index() team count mean std min 25% 50% 75% max 0 A 4.0 12.0 2.828427 8.0 11.00 13.0 14.00 14.0 1 B 4.0 22.0 5.099020 15.0 20.25 23.0 24.75 27.0
The variable team is now a column in the DataFrame and the index values are 0 and 1.
Additional Resources
The following tutorials explain how to perform other common operations in pandas:
Pandas: How to Calculate Cumulative Sum by Group
Pandas: How to Count Unique Values by Group
Pandas: How to Calculate Correlation By Group