The pandas factorize() function can be used to encode strings as numeric values.
You can use the following methods to apply the factorize() function to columns in a pandas DataFrame:
Method 1: Factorize One Column
df['col1'] = pd.factorize(df['col'])[0]
Method 2: Factorize Specific Columns
df[['col1', 'col3']] = df[['col1', 'col3']].apply(lambda x: pd.factorize(x)[0])
Method 3: Factorize All Columns
df = df.apply(lambda x: pd.factorize(x)[0])
The following example shows how to use each method with the following pandas DataFrame:
import pandas as pd #create DataFrame df = pd.DataFrame({'conf': ['West', 'West', 'East', 'East'], 'team': ['A', 'B', 'C', 'D'], 'position': ['Guard', 'Forward', 'Guard', 'Center'] }) #view DataFrame df conf team position 0 West A Guard 1 West B Forward 2 East C Guard 3 East D Center
Example 1: Factorize One Column
The following code shows how to factorize one column in the DataFrame:
#factorize the conf column only df['conf'] = pd.factorize(df['conf'])[0] #view updated DataFrame df conf team position 0 0 A Guard 1 0 B Forward 2 1 C Guard 3 1 D Center
Notice that only the ‘conf’ column has been factorized.
Every value that used to be ‘West’ is now 0 and every value that used to be ‘East’ is now 1.
Example 2: Factorize Specific Columns
The following code shows how to factorize specific columns in the DataFrame:
#factorize conf and team columns only df[['conf', 'team']] = df[['conf', 'team']].apply(lambda x: pd.factorize(x)[0]) #view updated DataFrame df conf team position 0 0 0 Guard 1 0 1 Forward 2 1 2 Guard 3 1 3 Center
Notice that the ‘conf’ and ‘team’ columns have both been factorized.
Example 3: Factorize All Columns
The following code shows how to factorize all columns in the DataFrame:
#factorize all columns df = df.apply(lambda x: pd.factorize(x)[0]) #view updated DataFrame df conf team position 0 0 0 0 1 0 1 1 2 1 2 0 3 1 3 2
Notice that all of the columns have been factorized.
Additional Resources
The following tutorials explain how to perform other common operations in pandas:
How to Convert Pandas DataFrame Columns to Strings
How to Convert Categorical Variable to Numeric in Pandas
How to Convert Pandas DataFrame Columns to Integer