You can use the following methods to use the groupby() and transform() functions together in a pandas DataFrame:
Method 1: Use groupby() and transform() with built-in function
df['new'] = df.groupby('group_var')['value_var'].transform('mean')
Method 2: Use groupby() and transform() with custom function
df['new'] = df.groupby('group_var')['value_var'].transform(lambda x: some function)
The following examples show how to use each method in practice with the following pandas DataFrame:
import pandas as pd #create DataFrame df = pd.DataFrame({'team': ['A', 'A', 'A', 'A', 'B', 'B', 'B', 'B'], 'points': [30, 22, 19, 14, 14, 11, 20, 28]}) #view DataFrame print(df) team points 0 A 30 1 A 22 2 A 19 3 A Â Â Â 14 4 B 14 5 B 11 6 B 20 7 B 28
Example 1: Use groupby() and transform() with built-in function
The following code shows how to use the groupby() and transform() functions to add a new column to the DataFrame called mean_points:
#create new column called mean_points
df['mean_points'] = df.groupby('team')['points'].transform('mean')
#view updated DataFrame
print(df)
team points mean_points
0 A 30 21.25
1 A 22 21.25
2 A 19 21.25
3 A 14 21.25
4 B 14 18.25
5 B 11 18.25
6 B 20 18.25
7 B 28 18.25
The mean points value for players on team A was 21.25 and the mean points value for players on team B was 18.25, so these values were assigned accordingly to each player in a new column.
Note that we could also use another built-in function such as sum() to create a new column that shows the sum of points scored for each team:
#create new column called sum_points
df['sum_points'] = df.groupby('team')['points'].transform('sum')
#view updated DataFrame
print(df)
team points sum_points
0 A 30 85
1 A 22 85
2 A 19 85
3 A 14 85
4 B 14 73
5 B 11 73
6 B 20 73
7 B 28 73
The sum of points for players on team A was 85 and the sum of points for players on team B was 73, so these values were assigned accordingly to each player in a new column.
Example 2: Use groupby() and transform() with custom function
The following code shows how to use the groupby() and transform() functions to create a custom function that calculates the percentage of total points scored by each player on their respective teams:
#create new column called percent_of_points
df['percent_of_points'] = df.groupby('team')['points'].transform(lambda x: x/x.sum())
#view updated DataFrame
print(df)
team points percent_of_points
0 A 30 0.352941
1 A 22 0.258824
2 A 19 0.223529
3 A 14 0.164706
4 B 14 0.191781
5 B 11 0.150685
6 B 20 0.273973
7 B 28 0.383562
Here’s how to interpret the output:
- The first player on team A scored 30 out of 85 total points among team A players. Thus, his percentage of total points scored was 30/85 = 0.352941.
- The second player on team A scored 22 out of 85 total points among team A players. Thus, his percentage of total points scored was 22/85 = 0.258824.
And so on.
Note that we can use the lambda argument within the transform() function to perform any custom calculation that we’d like.
Additional Resources
The following tutorials explain how to perform other common operations in pandas:
How to Perform a GroupBy Sum in Pandas
How to Use Groupby and Plot in Pandas
How to Count Unique Values Using GroupBy in Pandas