Home » Pandas: A Simple Formula for “Group By Having”

Pandas: A Simple Formula for “Group By Having”

by Tutor Aspire

You can use the following basic syntax to perform the equivalent of a SQL “GROUP BY HAVING” statement in pandas:

df.groupby('some_column').filter(lambda x: some condition)

The following examples show how to use this syntax in practice with the following pandas DataFrame:

import pandas as pd

#create DataFrame
df = pd.DataFrame({'team': ['A', 'A', 'A', 'B', 'B', 'B', 'C', 'C'],
                   'position': ['G', 'F', 'F', 'G', 'F', 'F', 'G', 'G'],
                   'points': [30, 22, 19, 14, 14, 11, 20, 28]})

#view DataFrame
print(df)

  team position  points
0    A        G      30
1    A        F      22
2    A        F      19
3    B        G      14
4    B        F      14
5    B        F      11
6    C        G      20
7    C        G      28

Example 1: Pandas Group By Having with Count

The following code shows how to group the rows by the value in the team column, then filter for only the teams that have a count greater than 2:

#group by team and filter for teams with count > 2
df.groupby('team').filter(lambda x: len(x) > 2)


        team	position points
0	A	G	 30
1	A	F	 22
2	A	F	 19
3	B	G	 14
4	B	F	 14
5	B	F	 11

Notice that only the rows with a team value of ‘A’ or ‘B’ are returned since these are the two teams that have a count greater than 2.

Example 2: Pandas Group By Having with Mean

The following code shows how to group the rows by the value in the team column, then filter for only the teams that have a mean points value greater than 20:

#group by team and filter for teams with mean points > 20
df.groupby('team').filter(lambda x: x['points'].mean() > 20)

        team	position points
0	A	G	 30
1	A	F	 22
2	A	F	 19
6	C	G	 20
7	C	G	 28

Notice that only the rows with a team value of ‘A’ or ‘C’ are returned since these are the two teams that have a mean points value greater than 20.

Example 3: Pandas Group By Having with Sum

The following code shows how to group the rows by the value in the team column, then filter for only the teams that have a sum of points equal to exactly 48:

#group by team and filter for teams with sum of points equal to 48
df.groupby('team').filter(lambda x: x['points'].sum() == 48)

        team	position points
6	C	G	 20
7	C	G	 28

Notice that only the rows with a team value of ‘C’ are returned since this is the one team that has a sum of points equal to 48.

Additional Resources

The following tutorials explain how to perform other common operations in pandas:

Pandas: Get Index of Rows Whose Column Matches Value
Pandas: How to Select Columns Containing a Specific String
Pandas: How to Check if Column Contains String

You may also like