*43*

A **box plot** is a type of plot that displays the five number summary of a dataset, which includes:

- The minimum value
- The first quartile (the 25th percentile)
- The median value
- The third quartile (the 75th percentile)
- The maximum value

To make a box plot, we draw a box from the first to the third quartile. Then we draw a vertical line at the median. Lastly, we draw “whiskers” from the quartiles to the minimum and maximum value.

Box plots are useful because they allow us to gain a quick understanding of the distribution of values in a dataset. They’re also useful for comparing two different datasets.

When comparing two or more box plots, we can answer four different questions:

**1. How do the median values compare? **We can compare the vertical line in each box to determine which dataset has a higher median value.

**2. How does the dispersion compare?** We can compare the length of each box (which represents the distance between Q1 and Q3 – the interquartile range) to determine which dataset is more spread out.

**3. How does the skewness compare?** The closer the vertical line is to Q1, the more positively skewed the dataset. The closer the vertical line is to Q3, the more negatively skewed the dataset.

**4. Are outliers present?** In box plots, outliers are typically represented by tiny circles that extend beyond either whisker. An observation is defined to be an outlier if it meets one of the following criteria:

- An observation is less than Q1 – 1.5*IQR
- An observation is greater than Q3 + 1.5*IQR

The following example shows how to compare two different box plots and answer these four questions.

**Example: Comparing Box Plots**

The following datasets display the exam scores for students who used one of two studying techniques to prepare for the exam:

**Method 1:** 78, 78, 79, 80, 80, 82, 82, 83, 83, 86, 86, 86, 86, 87, 87, 87, 88, 88, 88, 91

**Method 2:** 66, 66, 66, 67, 68, 70, 72, 75, 75, 78, 82, 83, 86, 88, 89, 90, 93, 94, 95, 98

If we create box plots for each dataset, here’s what they would look like:

We can compare these two box plots and answer the following four questions:

**1. How do the median values compare? **The line in the middle of the box plot for Study Method 1 is higher than the line for Study Method 2, which indicates that the students who used Study Method 1 had a higher median exam score.

**2. How does the dispersion compare?** The box plot for Study Method 2 is much longer than Study Method 1, which indicates that the exam scores are much more spread out among students who used Study Method 2.

**3. How does the skewness compare?** The line in the middle of the box plot for Study Method 1 is close to Q3, which indicates that the distribution of exam scores for students who used Study Method 1 is negatively skewed. Conversely, the line in the middle of the box plot for Study Method 2 is near the center of the box, which means the distribution of scores has little skew at all.

**4. Are outliers present?** Neither box plot has tiny circles that extend beyond the top or bottom whiskers, which means neither dataset had any clear outliers.

**Additional Resources**

How to Create and Interpret Box Plots in Excel

How to Create and Interpret Box Plots in SPSS

How to Create Multiple Box Plots in R

How to Create and Interpret Box Plots in Stata