There are three common ways to visualize categorical data:
- Bar Charts
- Boxplots by Group
- Mosaic Plots
The following examples show how to create each of these plots for a pandas DataFrame in Python.
Example 1: Bar Charts
The following code shows how to create a bar chart to visualize the frequency of teams in a certain pandas DataFrame:
import pandas as pd
#create DataFrame
df = pd.DataFrame({'team': ['A', 'A', 'A', 'B', 'B', 'C', 'C', 'D'],
'points': [18, 22, 29, 25, 14, 11, 10, 15]})
#create bar plot to visualize frequency of each team
df['team'].value_counts().plot(kind='bar', xlabel='Team', ylabel='Count', rot=0)
The x-axis displays each team name and the y-axis shows the frequency of each team in the DataFrame.
Note: The argument rot=0 tells pandas to rotate the x-axis labels to be parallel to the x-axis.
Example 2: Boxplots by Group
Grouped boxplots are a useful way to visualize a numeric variable, grouped by a categorical variable.
For example, the following code shows how to create boxplots that show the distribution of points scored, grouped by team:
import pandas as pd
#create DataFrame
df = pd.DataFrame({'team': ['A', 'A', 'A', 'A', 'B', 'B', 'B', 'B'],
'points': [18, 22, 29, 25, 14, 11, 10, 15]})
#create boxplot of points, grouped by team
df.boxplot(column=['points'], by='team', grid=False, color='black')
The x-axis displays the teams and the y-axis displays the distribution of points scored by each team.
Example 3: Mosaic Plot
A mosaic plot is a type of plot that displays the frequencies of two different categorical variables in one plot.
For example, the following code shows how to create a mosaic plot that shows the frequency of the categorical variables ‘result’ and ‘team’ in one plot:
import pandas as pd
from statsmodels.graphics.mosaicplot import mosaic
#create DataFrame
df = pd.DataFrame({'team': ['A', 'A', 'A', 'B', 'B', 'B', 'C', 'C', 'C'],
'result': ['W', 'L', 'L', 'W', 'W', 'L', 'L', 'W', 'W']})
#create mosaic plot
mosaic(df, ['team', 'result']);
The x-axis displays the teams and the y-axis displays the frequency of results for each team.
Additional Resources
The following tutorials explain how to perform other common tasks in pandas:
How to Use Groupby and Plot in Pandas
How to Plot Distribution of Column Values in Pandas
How to Adjust the Figure Size of a Pandas Plot