Cosine Similarity is a measure of the similarity between two vectors of an inner product space.
For two vectors, A and B, the Cosine Similarity is calculated as:
Cosine Similarity = ΣAiBi / (√ΣAi2√ΣBi2)
This tutorial explains how to calculate the Cosine Similarity between vectors in R using the cosine() function from the lsa library.
Cosine Similarity Between Two Vectors in R
The following code shows how to calculate the Cosine Similarity between two vectors in R:
library(lsa) #define vectors a #calculate Cosine Similarity cosine(a, b) [,1] [1,] 0.965195
The Cosine Similarity between the two vectors turns out to be 0.965195.
Cosine Similarity of a Matrix in R
The following code shows how to calculate the Cosine Similarity between a matrix of vectors:
library(lsa) #define matrix a #calculate Cosine Similarity cosine(data) a b c a 1.0000000 0.9651950 0.9812406 b 0.9651950 1.0000000 0.9573478 c 0.9812406 0.9573478 1.0000000
Here is how to interpret the output:
- The Cosine Similarity between vectors a and b is 0.9651950.
- The Cosine Similarity between vectors a and c is 0.9812406.
- The Cosine Similarity between vectors b and c is 0.9573478.
Notes
1. The cosine()Â function will work with a square matrix of any size.
2. The cosine() function will work on a matrix, but not on a data frame. However, you can easily convert a data frame to a matrix in R by using the as.matrix() function.
3. Refer to this Wikipedia page to learn more details about Cosine Similarity.