In machine learning, label encoding is the process of converting the values of a categorical variable into integer values.
For example, the following screenshot shows how to convert each unique value in a categorical variable called Team into an integer value based on alphabetical order:
You can use the following syntax to perform label encoding across multiple columns in Python:
from sklearn.preprocessing import LabelEncoder #perform label encoding on col1, col2 columns df[['col1', 'col2']] = df[['col1', 'col2']].apply(LabelEncoder().fit_transform)
The following example shows how to use this syntax in practice.
Example: Label Encoding in Python
Suppose we have the following pandas DataFrame that contains information about various basketball players:
import pandas as pd
#create DataFrame
df = pd.DataFrame({'team': ['A', 'A', 'B', 'B', 'B', 'C', 'C', 'D'],
'position': ['G', 'F', 'G', 'F', 'F', 'G', 'G', 'F'],
'all_star': ['Y', 'N', 'Y', 'Y', 'Y', 'N', 'Y', 'N'],
'points': [11, 8, 10, 6, 6, 5, 9, 12]})
#view DataFrame
print(df)
team position all_star points
0 A G Y 11
1 A F N 8
2 B G Y 10
3 B F Y 6
4 B F Y 6
5 C G N 5
6 C G Y 9
7 D F N 12
We can use the following code to perform label encoding to convert each categorical value in the team, position, and all_star columns into integer values:
from sklearn.preprocessing import LabelEncoder #perform label encoding across team, position, and all_star columns df[['team', 'position', 'all_star']] = df[['team', 'position', 'all_star']].apply(LabelEncoder().fit_transform) #view udpated DataFrame print(df) team position all_star points 0 0 1 1 11 1 0 0 0 8 2 1 1 1 10 3 1 0 1 6 4 1 0 1 6 5 2 1 0 5 6 2 1 1 9 7 3 0 0 12
From the output we can see that each value in the team, position, and all_star columns have been converted into integer values.
For example, in the team column we can see:
- Each “A” value has been converted to 0.
- Each “B” value has been converted to 1.
- Each “C” value has been converted to 2.
- Each “D” value has been converted to 3.
Note that in this example we performed label encoding on three columns in the DataFrame, but we can use similar syntax to perform label encoding on as many categorical columns as we’d like.
Additional Resources
The following tutorials explain how to perform other common tasks in Python:
How to Convert Categorical Variable to Numeric in Pandas
How to Convert Boolean Values to Integer Values in Pandas
How to Use factorize() to Encode Strings as Numbers in Pandas