You can use the following methods to plot a distribution of column values in a pandas DataFrame:
Method 1: Plot Distribution of Values in One Column
df['my_column'].plot(kind='kde')
Method 2: Plot Distribution of Values in One Column, Grouped by Another Column
df.groupby('group_column')['values_column'].plot(kind='kde')
The following examples show how to use each method in practice with the following pandas DataFrame:
import pandas as pd #create DataFrame df = pd.DataFrame({'team': ['A', 'A', 'A', 'A', 'A', 'A', 'A', 'A', 'A', 'A', 'B', 'B', 'B', 'B', 'B', 'B', 'B', 'B', 'B', 'B'], 'points': [3, 3, 4, 5, 4, 7, 7, 7, 10, 11, 8, 7, 8, 9, 12, 12, 12, 14, 15, 17]}) #view DataFrame print(df) team points 0 A 3 1 A 3 2 A 4 3 A 5 4 A 4 5 A 7 6 A 7 7 A 7 8 A 10 9 A 11 10 B 8 11 B 7 12 B 8 13 B 9 14 B 12 15 B 12 16 B 12 17 B 14 18 B 15 19 B 17
Example 1: Plot Distribution of Values in One Column
The following code shows how to plot the distribution of values in the points column:
#plot distribution of values in points column df['points'].plot(kind='kde')
Note that kind=’kde’ tells pandas to use kernel density estimation, which produces a smooth curve that summarizes the distribution of values for a variable.
If you’d like to create a histogram instead, you can specify kind=’hist’ as follows:
#plot distribution of values in points column using histogram df['points'].plot(kind='hist', edgecolor='black')
This method uses bars to represent frequencies of values in the points column as opposed to a smooth line that summarizes the shape of the distribution.
Example 2: Plot Distribution of Values in One Column, Grouped by Another Column
The following code shows how to plot the distribution of values in the points column, grouped by the team column:
import matplotlib.pyplot as plt #plot distribution of points by team df.groupby('team')['points'].plot(kind='kde') #add legend plt.legend(['A', 'B'], title='Team') #add x-axis label plt.xlabel('Points')
The blue line shows the distribution of points for players on team A while the orange line shows the distribution of points for players on team B.
Additional Resources
The following tutorials explain how to perform other common tasks in pandas:
How to Add Titles to Plots in Pandas
How to Adjust the Figure Size of a Pandas Plot
How to Plot Multiple Pandas DataFrames in Subplots
How to Create and Customize Plot Legends in Pandas