*37*

The **Mahalanobis distance **is the distance between two points in a multivariate space. It’s often used to find outliers in statistical analyses that involve several variables.

This tutorial explains how to calculate the Mahalanobis distance in SPSS.

**Example: Mahalanobis Distance in SPSS**

Suppose we have the following dataset that displays the exam score of 20 students along with the number of hours they spent studying, the number of prep exams they took, and their current grade in the course:

We can use the following steps to calculate the Mahalanobis distance for each observation in the dataset to determine if there are any multivariate outliers.

**Step 1: Select the linear regression option.**

Click the **Analyze **tab, then **Regression**, then **Linear**:

**Step 2: Select the Mahalanobis option.**

Drag the response variable *score *into the box labelled Dependent. Drag the other three predictor variables into the box labelled Independent(s). Then click the **Save **button. In the new window that pops up, make sure the box next to **Mahalanobis **is checked. Then click **Continue**. Then click **OK**.

Once you click **OK**, the Mahalanobis distance for each observation in the dataset will appear in a new column titled **MAH_1**:

We can see that some of the distances are much larger than others. To determine if any of the distances are statistically significant, we need to calculate their p-values.

**Step 3: Calculate the p-values of each Mahalanobis distance.**

Click the **Transform **tab, then **Compute Variable**.

In the **Target Variable **box, choose a new name for the variable you’re creating. We chose “pvalue.” In the **Numeric Expression **box, type the following:

**1 – CDF.CHISQ(MAH_1, 3)**

Then click **OK**.

This will produce a p-value that corresponds to the Chi-Square value with 3 degrees of freedom. We use **3 **degrees of freedom because there are 3 predictor variables in our regression model.

**Step 4: Interpret the p-values.**

Once you click **OK**, the p-value for each Mahalanobis distance will be displayed in a new column:

By default, SPSS only displays the p-values to two decimal places. You can increase the number of decimal places by clicking **Variable ****View **at the bottom of SPSS and increasing the number in the **Decimals **column:

Once you return to the **Data View**, you can see each p-value shown to five decimal places. Any p-value that is **less than .001 **is considered to be an outlier.

We can see that the first observation is the only outlier in the dataset because it has a p-value less than .001:

**How to Handle Outliers**

If an outlier is present in your data, you have a couple options:

**1. Make sure the outlier is not the result of a data entry error.**

Sometimes an individual simply enters the wrong data value when recording data. If an outlier is present, first verify that the data value was entered correctly and that it wasn’t an error.

**2. Remove the outlier.**

If the value is a true outlier, you may choose to remove it if it will have a significant impact on your overall analysis. Just make sure to mention in your final report or analysis that you removed an outlier.