You can use the following syntax to add a new column to a pandas DataFrame that shows if each row exists in another DataFrame:
#merge two DataFrames on specific columns all_df = pd.merge(df1, df2, on=['column1', 'column2'], how='left', indicator='exists') #drop unwanted columns all_df = all_df.drop('column3', axis=1) #add column that shows if each row in one DataFrame exists in another all_df['exists'] = np.where(all_df.exists == 'both', True, False)
The following example shows how to use this syntax in practice.
Example: Check if Row in One Pandas DataFrame Exist in Another
Suppose we have the following two pandas DataFrames:
import pandas as pd #create first DataFrame df1 = pd.DataFrame({'team' : ['A', 'B', 'C', 'D', 'E'], 'points' : [12, 15, 22, 29, 24]}) print(df1) team points 0 A 12 1 B 15 2 C 22 3 D 29 4 E 24 #create second DataFrame df2 = pd.DataFrame({'team' : ['A', 'D', 'F', 'G', 'H'], 'points' : [12, 29, 15, 19, 10], 'assists' : [4, 7, 7, 10, 12]}) print(df2) team points assists 0 A 12 4 1 D 29 7 2 F 15 7 3 G 19 10 4 H 10 12
We can use the following syntax to add a column called exists to the first DataFrame that shows if each value in the team and points column of each row exists in the second DataFrame:
import numpy as np
#merge two dataFrames and add indicator column
all_df = pd.merge(df1, df2, on=['team', 'points'], how='left', indicator='exists')
#drop assists columns
all_df = all_df.drop('assists', axis=1)
#add column to show if each row in first DataFrame exists in second
all_df['exists'] = np.where(all_df.exists == 'both', True, False)
#view updated DataFrame
print (all_df)
team points exists
0 A 12 True
1 B 15 False
2 C 22 False
3 D 29 True
4 E 24 False
The new exists column shows if each value in the team and points column of each row exists in the second DataFrame.
From the output we can see:
- A Team value of A and points value of 12 does exist in the second DataFrame.
- A Team value of B and points value of 15 does not exist in the second DataFrame.
- A Team value of C and points value of 22 does not exist in the second DataFrame.
- A Team value of D and points value of 29 does exist in the second DataFrame.
- A Team value of E and points value of 24 does not exist in the second DataFrame.
Also note that you can specify values other than True and False in the exists column by changing the values in the NumPy where() function.
For example, you could instead use ‘exists’ and ‘not exists’ as follows:
#add column to show if each row in first DataFrame exists in second
all_df['exists'] = np.where(all_df.exists == 'both', 'exists', 'not exists')
#view updated DataFrame
print (all_df)
team points exists
0 A 12 exists
1 B 15 not exists
2 C 22 not exists
3 D 29 exists
4 E 24 not exists
Notice that the values in the exists column have been changed.
Additional Resources
The following tutorials explain how to perform other common tasks in pandas:
Pandas: Add Column from One DataFrame to Another
Pandas: Get Rows Which Are Not in Another DataFrame
Pandas: How to Check if Multiple Columns are Equal