*36*

TheÂ **Mahalanobis distanceÂ **is the distance between two points in a multivariate space.

It is often used to find outliers in statistical analyses that involve several variables.

This tutorial explains how to calculate the Mahalanobis distance in R.

**Example:Â Mahalanobis Distance in R**

Use the following steps to calculate the Mahalanobis distance for every observation in a dataset in R.

**Step 1: Create the dataset.**

First, weâ€™ll create a dataset thatÂ displays the exam score of 20 students along with the number of hours they spent studying, the number of prep exams they took, and their current grade in the course:

#create data df = data.frame(score = c(91, 93, 72, 87, 86, 73, 68, 87, 78, 99, 95, 76, 84, 96, 76, 80, 83, 84, 73, 74), hours = c(16, 6, 3, 1, 2, 3, 2, 5, 2, 5, 2, 3, 4, 3, 3, 3, 4, 3, 4, 4), prep = c(3, 4, 0, 3, 4, 0, 1, 2, 1, 2, 3, 3, 3, 2, 2, 2, 3, 3, 2, 2), grade = c(70, 88, 80, 83, 88, 84, 78, 94, 90, 93, 89, 82, 95, 94, 81, 93, 93, 90, 89, 89)) #view first six rows of data head(df) score hours prep grade 1 91 16 3 70 2 93 6 4 88 3 72 3 0 80 4 87 1 3 83 5 86 2 4 88 6 73 3 0 84

**Step 2: Calculate the Mahalanobis distance for each observation.**

Next, weâ€™ll use the built-in mahalanobis() function in R to calculate the Mahalanobis distance for each observation, which uses the following syntax:

**mahalanobis(x, center, cov)**

where:

**x:Â**matrix of data**center:Â**mean vector of the distribution**cov:Â**covariance matrix of the distribution

The following code shows how to implement this function for our dataset:

#calculate Mahalanobis distance for each observation mahalanobis(df, colMeans(df), cov(df)) [1] 16.5019630 2.6392864 4.8507973 5.2012612 3.8287341 4.0905633 [7] 4.2836303 2.4198736 1.6519576 5.6578253 3.9658770 2.9350178 [13] 2.8102109 4.3682945 1.5610165 1.4595069 2.0245748 0.7502536 [19] 2.7351292 2.2642268

**Step 3: Calculate the p-value for each Mahalanobis distance.**