*41*

You can use the **cor()** function in R to calculate correlation coefficients between variables.

Here are the most common ways to use this function:

**Method 1: Calculate Pearson Correlation Coefficient Between Two Variables**

cor(df$x, df$y)

Use the Pearson correlation coefficient when calculating the correlation between two continuous variables. (e.g. height and weight)

**Method 2: Calculate Pearson Correlation Coefficient Between All Numeric Variables in Data Frame**

cor(df)

This method will return a correlation matrix that contains the Pearson correlation coefficient between each pairwise combination of numeric variables in a data frame.

**Method 3: Calculate Spearman Correlation Coefficient Between Two Variables**

cor(df$x, df$y, method='spearman')

Use the Spearman correlation coefficient when calculating the correlation between two ranked variables. (e.g. rank of a studentâ€™s math exam score vs. rank of their science exam score in a class)

**Method 4: Calculate Kendallâ€™s Correlation Coefficient Between Two Variables**

cor(df$x, df$y, method='kendall')

Use the Kendall correlation coefficient when when you wish to use Spearman Correlation but the sample size is small and there are many tied ranks.

The following examples show how to use each method in practice with the following data frame in R that shows the number of hours spent studying, number of practice exams taken, and final exam score for eight different students:

#create data frame df frame(hours=c(1, 1, 3, 2, 4, 3, 5, 6), prac_exams=c(4, 3, 3, 2, 3, 2, 1, 4), score=c(69, 74, 74, 70, 89, 85, 99, 90)) #view data frame df hours prac_exams score 1 1 4 69 2 1 3 74 3 3 3 74 4 2 2 70 5 4 3 89 6 3 2 85 7 5 1 99 8 6 4 90

**Example 1: Calculate Pearson Correlation Coefficient Between Two Variables**

The following code shows how to use the **cor()** function to calculate the Pearson correlation coefficient between the **hours** and **score** variables:

#calculate Pearson correlation coefficient between hours and score cor(df$hours, df$score) [1] 0.8600528

The Pearson correlation coefficient between **hours** and **score** turns out to be **0.86.**

Note that if there are NA values in your data frame, you can use the argument **use=â€™complete.obsâ€™** to only use the rows where there are no NA values:

#calculate Pearson correlation coefficient and ignore any rows with NA cor(df$hours, df$score, use='complete.obs')

**Example 2: Calculate Pearson Correlation Coefficient Between All Numeric Variables**

The following code shows how to use the **cor()** function to create a correlation matrix that contains the Pearson correlation coefficient between all numeric variables in the data frame:

#calculate Pearson correlation coefficient between all numeric variables cor(df) hours prac_exams score hours 1.0000000 -0.1336063 0.8600528 prac_exams -0.1336063 1.0000000 -0.3951028 score 0.8600528 -0.3951028 1.0000000

Hereâ€™s how to interpret the output:

- The Pearson correlation coefficient between
**hours**and**prac_exams**is**-.13**. - The Pearson correlation coefficient between
**hours**and**score**is**.86**. - The Pearson correlation coefficient between
**prac_exams**and**score**is**-.39**.

**Note**: The Pearson correlation coefficient between each individual variable and itself is always 1, which is why each value along the diagonal of the correlation matrix is 1.

**Example 3: Calculate Spearman Correlation Coefficient Between Two Variables**

The following code shows how to use the **cor()** function to calculate the Spearman correlation coefficient between the **hours** and **prac_exams **variables:

#calculate Spearman correlation coefficient between hours and prac_exams cor(df$hours, df$prac_exams, method='spearman') [1] -0.1250391

The Spearman correlation coefficient between **hours** and **prac_exams **turns out to be **-.125.**

**Example 4: Calculate Kendallâ€™s Correlation Coefficient Between Two Variables**

The following code shows how to use the **cor()** function to calculate Kendallâ€™s correlation coefficient between the **hours** and **prac_exams **variables:

#calculate Kendall's correlation coefficient between hours and prac_exams cor(df$hours, df$prac_exams, method='kendall') [1] -0.1226791

Kendallâ€™s correlation coefficient between **hours** and **prac_exams **turns out to be **-.123.**

**Additional Resources**

The following tutorials explain how to perform other common tasks in R:

How to Calculate Rolling Correlation in R

How to Calculate Autocorrelation in R

How to Calculate Partial Correlation in R