You can use the following basic syntax to get the rows in one pandas DataFrame which are not in another DataFrame:
#merge two DataFrames and create indicator column df_all = df1.merge(df2.drop_duplicates(), on=['col1','col2'], how='left', indicator=True) #create DataFrame with rows that exist in first DataFrame only df1_only = df_all[df_all['_merge'] == 'left_only']
The following example shows how to use this syntax in practice.
Example: Get Rows in Pandas DataFrame Which Are Not in Another DataFrame
Suppose we have the following two pandas DataFrames:
import pandas as pd #create first DataFrame df1 = pd.DataFrame({'team' : ['A', 'B', 'C', 'D', 'E'], 'points' : [12, 15, 22, 29, 24]}) print(df1) team points 0 A 12 1 B 15 2 C 22 3 D 29 4 E 24 #create second DataFrame df2 = pd.DataFrame({'team' : ['A', 'D', 'F', 'G', 'H'], 'points' : [12, 29, 15, 19, 10]}) print(df2) team points 0 A 12 1 D 29 2 F 15 3 G 19 4 H 10
We can use the following syntax to merge the two DataFrames and create an indicator column to indicate which rows belong in each DataFrame:
#merge two DataFrames and create indicator column df_all = df1.merge(df2.drop_duplicates(), on=['team','points'], how='left', indicator=True) #view result print(df_all)
We can then use the following syntax to only get the rows in the first DataFrame that are not in the second DataFrame:
#create DataFrame with rows that exist in first DataFrame only df1_only = df_all[df_all['_merge'] == 'left_only'] #view DataFrame print(df1_only) team points _merge 1 B 15 left_only 2 C 22 left_only 4 E 24 left_only
Lastly, we can drop the _merge column if we’d like:
#drop '_merge' column
df1_only = df1_only.drop('_merge', axis=1)
#view DataFrame
print(df1_only)
team points
1 B 15
2 C 22
4 E 24
The result is a DataFrame in which all of the rows exist in the first DataFrame but not in the second DataFrame.
Additional Resources
The following tutorials explain how to perform other common tasks in pandas:
How to Add Column from One DataFrame to Another in Pandas
How to Change the Order of Columns in Pandas
How to Sort Columns by Name in Pandas