Home » Pandas: Get Rows Which Are Not in Another DataFrame

Pandas: Get Rows Which Are Not in Another DataFrame

by Tutor Aspire

You can use the following basic syntax to get the rows in one pandas DataFrame which are not in another DataFrame:

#merge two DataFrames and create indicator column
df_all = df1.merge(df2.drop_duplicates(), on=['col1','col2'],
                   how='left', indicator=True)

#create DataFrame with rows that exist in first DataFrame only
df1_only = df_all[df_all['_merge'] == 'left_only']

The following example shows how to use this syntax in practice.

Example: Get Rows in Pandas DataFrame Which Are Not in Another DataFrame

Suppose we have the following two pandas DataFrames:

import pandas as pd

#create first DataFrame
df1 = pd.DataFrame({'team' : ['A', 'B', 'C', 'D', 'E'], 
                    'points' : [12, 15, 22, 29, 24]}) 

print(df1)

  team  points
0    A      12
1    B      15
2    C      22
3    D      29
4    E      24

#create second DataFrame
df2 = pd.DataFrame({'team' : ['A', 'D', 'F', 'G', 'H'],
                    'points' : [12, 29, 15, 19, 10]})

print(df2)

  team  points
0    A      12
1    D      29
2    F      15
3    G      19
4    H      10

We can use the following syntax to merge the two DataFrames and create an indicator column to indicate which rows belong in each DataFrame:

#merge two DataFrames and create indicator column
df_all = df1.merge(df2.drop_duplicates(), on=['team','points'],
                   how='left', indicator=True)

#view result
print(df_all)

We can then use the following syntax to only get the rows in the first DataFrame that are not in the second DataFrame:

#create DataFrame with rows that exist in first DataFrame only
df1_only = df_all[df_all['_merge'] == 'left_only']

#view DataFrame
print(df1_only)

  team  points     _merge
1    B      15  left_only
2    C      22  left_only
4    E      24  left_only

Lastly, we can drop the _merge column if we’d like:

#drop '_merge' column
df1_only = df1_only.drop('_merge', axis=1)

#view DataFrame
print(df1_only)

  team  points
1    B      15
2    C      22
4    E      24

The result is a DataFrame in which all of the rows exist in the first DataFrame but not in the second DataFrame.

Additional Resources

The following tutorials explain how to perform other common tasks in pandas:

How to Add Column from One DataFrame to Another in Pandas
How to Change the Order of Columns in Pandas
How to Sort Columns by Name in Pandas

You may also like