Often you may want to compare two columns in a Pandas DataFrame and write the results of the comparison to a third column.
You can easily do this by using the following syntax:
conditions=[(condition1),(condition2)] choices=["choice1","choice2"] df["new_column_name"]=np.select(conditions, choices, default)
Here’s what this code does:
- conditions are the conditions to check for between the two columns
- choices are the results to return based on the conditions
- np.select is used to return the results to the new column
The following example shows how to use this code in practice.
Example: Compare Two Columns in Pandas
Suppose we have the following DataFrame that shows the number of goals scored by two soccer teams in five different matches:
import numpy as np import pandas as pd #create DataFrame df = pd.DataFrame({'A_points': [1, 3, 3, 3, 5], 'B_points': [4, 5, 2, 3, 2]}) #view DataFrame df A_points B_points 0 1 4 1 3 5 2 3 2 3 3 3 4 5 2
We can use the following code to compare the number of goals by row and output the winner of the match in a third column:
#define conditions conditions = [df['A_points'] > df['B_points'], df['A_points'] B_points']] #define choices choices = ['A', 'B'] #create new column in DataFrame that displays results of comparisons df['winner'] = np.select(conditions, choices, default='Tie') #view the DataFrame df A_points B_points winner 0 1 4 B 1 3 5 B 2 3 2 A 3 3 3 Tie 4 5 2 A
The results of the comparison are shown in the new column called winner.
Notes
Here are a few things to keep in mind when comparing two columns in a pandas DataFrame:
- The number of conditions and choices should be equal.
- The default value specifies the value to display in the new column if none of the conditions are met.
- Both NumPy and Pandas are required to make this code work.
Additional Resources
The following tutorials explain how to perform other common tasks in pandas:
How to Rename Columns in Pandas
How to Add a Column to a Pandas DataFrame
How to Change the Order of Columns in Pandas DataFrame