In statistics, correlation refers to the strength and direction of a relationship between two variables. The value of a correlation coefficient can range from -1 to 1, with -1 indicating a perfect negative relationship, 0 indicating no relationship, and 1 indicating a perfect positive relationship.
The most commonly used correlation coefficient is the Pearson Correlation Coefficient, which measures the linear association between two numerical variables.
One less commonly used correlation coefficient is Kendall’s Tau, which measures the relationship between two columns of ranked data.
The formula to calculate Kendall’s Tau, often abbreviated τ, is as follows:
τ = (C-D) / (C+D)
where:
C = the number of concordant pairs
D = the number of discordant pairs
The following example illustrates how to use this formula to calculate Kendall’s Tau rank correlation coefficient for two columns of ranked data.
Example of Calculating Kendall’s Tau
Suppose two basketball coaches rank 12 of their players from worst to best. The following table shows the rankings that each coach assigned to the players:
Because we are working with two columns of ranked data, it’s appropriate to use Kendall’s Tau to calculate the correlation between the two coaches’ rankings. Use the following steps to calculate Kendall’s Tau:
Step 1: Count the number of concordant pairs.
Look only at the ranks for Coach #2. Starting with the first player, count how many ranks below him are larger. For example, there are 11 numbers below “1” that are larger, so we’ll write 11:
Move to the next player and repeat the process. There are 10 numbers below “2” that are larger, so we’ll write 10:
Once we reach a player whose rank is less than the player before him, we simply assign it the same value as the player before him. For example, Elliot has a rank of “4” which is less than the previous player’s rank of “5” so we simply assign him the same value as the player before him:
Repeat this process for all of the players:
Step 2: Count the number of discordant pairs.
Again, look only at the ranks for Coach #2. For each player, count how many ranks below him are smaller. For example, Coach #2 assigned AJ a rank of “1” and there are no players below him with a smaller rank. Thus, we assign him a value of 0:
Repeat this process for each player:
Step 3: Calculate the sum of each column and find Kendall’s Tau.
Kendall’s Tau = (C-D) / (C+D) = (63-3) / (63+3) = (60/66) = 0.909.
Statistical Significance of Kendall’s Tau
When you have more than n= 10 pairs, Kendall’s Tau generally follows a normal distribution. You can use the following formula to calculate a z-score for Kendall’s Tau:
z = 3τ*√n(n-1) / √2(2n+5)
where:
τ = value you calculated for Kendall’s Tau
n = number of pairs
Here’s how to calculate z for the previous example:
z = 3(.909)*√12(12-1) / √2(2*12+5) = 4.11.
Using the Z Score to P Value Calculator, we see that the p-value for this z-score is 0.00004, which is statistically significant at alpha level 0.05. Thus, there is a statistically significant correlation between the ranks that the two coaches assigned to the players.
Bonus: How to Calculate Kendall’s Tau in R
In the statistical software R, you can use the kendall.tau() function from the VGAM library to calculate Kendall’s Tau for two vectors, which uses the following syntax:
kendall.tau(x, y)
where x and y are two numerical vectors of equal lenghth.
The following code illustrates how to calculate Kendall’s Tau for the exact data that we used in the previous example:
#load VGAM library(VGAM) #create vector for each coach's rankings coach_1 #calculate Kendall's Tau kendall.tau(coach_1, coach_2) #[1] 0.9090909
Notice how the value for Kendall’s Tau matches the value that we calculated by hand.