Cosine Similarity is a measure of the similarity between two vectors of an inner product space.
For two vectors, A and B, the Cosine Similarity is calculated as:
Cosine Similarity = ΣAiBi / (√ΣAi2√ΣBi2)
This tutorial explains how to calculate the Cosine Similarity between vectors in Python using functions from the NumPy library.
Cosine Similarity Between Two Vectors in Python
The following code shows how to calculate the Cosine Similarity between two arrays in Python:
from numpy import dot from numpy.linalg import norm #define arrays a = [23, 34, 44, 45, 42, 27, 33, 34] b = [17, 18, 22, 26, 26, 29, 31, 30] #calculate Cosine Similarity cos_sim = dot(a, b)/(norm(a)*norm(b)) cos_sim 0.965195008357566
The Cosine Similarity between the two arrays turns out to be 0.965195.
Note that this method will work on two arrays of any length:
import numpy as np from numpy import dot from numpy.linalg import norm #define arrays a = np.random.randint(10, size=100) b = np.random.randint(10, size=100) #calculate Cosine Similarity cos_sim = dot(a, b)/(norm(a)*norm(b)) cos_sim 0.7340201613960431
However, it only works if the two arrays are of equal length:
import numpy as np from numpy import dot from numpy.linalg import norm #define arrays a = np.random.randint(10, size=90) #length=90 b = np.random.randint(10, size=100) #length=100 #calculate Cosine Similarity cos_sim = dot(a, b)/(norm(a)*norm(b)) cos_sim ValueError: shapes (90,) and (100,) not aligned: 90 (dim 0) != 100 (dim 0)
Notes
1. There are multiple ways to calculate the Cosine Similarity using Python, but as this Stack Overflow thread explains, the method explained in this post turns out to be the fastest.
2. Refer to this Wikipedia page to learn more details about Cosine Similarity.