You can use the following basic syntax to split a pandas DataFrame by column value:
#define value to split on x = 20 #define df1 as DataFrame where 'column_name' is >= 20 df1 = df[df['column_name'] >= x] #define df2 as DataFrame where 'column_name' is df2 = df[df['column_name'] x]
The following example shows how to use this syntax in practice.
Example: Split Pandas DataFrame by Column Value
Suppose we have the following pandas DataFrame:
import pandas as pd #create DataFrame df = pd.DataFrame({'team': ['A', 'B', 'C', 'D', 'E', 'F', 'G', 'H'], 'points': [22, 24, 19, 18, 14, 29, 31, 16], 'rebounds': [11, 8, 10, 6, 6, 5, 9, 12]}) #view DataFrame print(df) team points rebounds 0 A 22 11 1 B 24 8 2 C 19 10 3 D 18 6 4 E 14 6 5 F 29 5 6 G 31 9 7 H 16 12
We can use the following code to split the DataFrame into two DataFrames where the first contains the rows where ‘points’ is greater than or equal to 20 and the second contains the rows where ‘points’ is less than 20:
#define value to split on x = 20 #define df1 as DataFrame where 'points' is >= 20 df1 = df[df['points'] >= x] print(df1) team points rebounds 0 A 22 11 1 B 24 8 5 F 29 5 6 G 31 9 #define df2 as DataFrame where 'points' is df2 = df[df['points'] x] print(df2) team points rebounds 2 C 19 10 3 D 18 6 4 E 14 6 7 H 16 12
Note that we can also use the reset_index() function to reset the index values for each resulting DataFrame:
#define value to split on x = 20 #define df1 as DataFrame where 'points' is >= 20 df1 = df[df['points'] >= x].reset_index(drop=True) print(df1) team points rebounds 0 A 22 11 1 B 24 8 2 F 29 5 3 G 31 9 #define df2 as DataFrame where 'points' is df2 = df[df['points'] x].reset_index(drop=True) print(df2) team points rebounds 0 C 19 10 1 D 18 6 2 E 14 6 3 H 16 12
Notice that the index for each resulting DataFrame now starts at 0.
Additional Resources
The following tutorials explain how to fix other common errors in Python:
How to Fix KeyError in Pandas
How to Fix: ValueError: cannot convert float NaN to integer
How to Fix: ValueError: operands could not be broadcast together with shapes