Often you may be interested in finding the max value by group in a pandas DataFrame.
Fortunately this is easy to do using the groupby() and max() functions with the following syntax:
df.groupby('column_name').max()
This tutorial explains several examples of how to use this function in practice using the following pandas DataFrame:
import pandas as pd #create pandas DataFrame df = pd.DataFrame({'team': ['A', 'A', 'B', 'B', 'B', 'C', 'C'], 'points':[24, 23, 27, 11, 14, 8, 13], 'rebounds': [11, 8, 7, 6, 6, 5, 12]}) #display DataFrame print(df) team points rebounds 0 A 24 11 1 A 23 8 2 B 27 7 3 B 11 6 4 B 14 6 5 C 8 5 6 C 13 12
Example 1: Max Value of Multiple Columns Grouped by One Variable
The following code shows how to find the max value of multiple columns, grouped by one variable in a DataFrame:
#find max values of points and rebounds, grouped by team df.groupby('team').max().reset_index() team points rebounds 0 A 24 11 1 B 27 7 2 C 13 12
From the output we can see that:
- Team A has a max points value of 24 and a max rebounds value of 11.
- Team B has a max points value of 27 and a max rebounds value of 7.
- Team C has a max points value of 13 and a max rebounds value of 12.
Note that we used the reset_index() function to ensure that the index matches the index in the original DataFrame.
Example 2: Max Value of a Single Column Grouped by One Variable
The following code shows how to find the max value of just one column, grouped on a single variable:
#find max value of points, grouped by team df.groupby('team')['points'].max().reset_index() team points 0 A 24 1 B 27 2 C 13
Example 3: Sort by Max Values
We can also use the sort_values() function to sort the max values.
We can specify ascending=False to sort from largest to smallest:
#find max value by team, sort descending df.groupby('team')['points'].max().reset_index().sort_values(['points'], ascending=False) team points 1 B 27 0 A 24 2 C 13
Or we can specify ascending=True to sort from smallest to largest:
#find max value by team, sort ascending df.groupby('team')['points'].max().reset_index().sort_values(['points'], ascending=True) team points 2 C 13 0 A 24 1 B 27
Additional Resources
How to Calculate the Sum of Columns in Pandas
How to Calculate the Mean of Columns in Pandas
How to Find the Max Value of Columns in Pandas