Matthews correlation coefficient (MCC) is a metric we can use to assess the performance of a classification model.
It is calculated as:
MCC = (TP*TN – FP*FN) / √(TP+FP)(TP+FN)(TN+FP)(TN+FN)
where:
- TP: Number of true positives
- TN: Number of true negatives
- FP: Number of false positives
- FN: Number of false negatives
This metric is particularly useful when the two classes are imbalanced – that is, one class appears much more than the other.
The value for MCC ranges from -1 to 1 where:
- -1 indicates total disagreement between predicted classes and actual classes
- 0 is synonymous with completely random guessing
- 1 indicates total agreement between predicted classes and actual classes
For example, suppose a sports analyst uses a logistic regression model to predict whether or not 400 different college basketball players get drafted into the NBA.
The following confusion matrix summarizes the predictions made by the model:
To calculate the MCC of the model, we can use the following formula:
- MCC = (TP*TN – FP*FN) / √(TP+FP)(TP+FN)(TN+FP)(TN+FN)
- MCC = (15*375-5*5) / √(15+5)(15+5)(375+5)(375+5)
- MCC = 0.7368
Matthews correlation coefficient turns out to be 0.7368. This value is somewhat close to one, which indicates that the model does a decent job of predicting whether or not players will get drafted.
The following example shows how to calculate MCC for this exact scenario using the matthews_corrcoef() function from the sklearn library in Python.
Example: Calculating Matthews Correlation Coefficient in Python
The following code shows how to define an array of predicted classes and an array of actual classes, then calculate Matthews correlation coefficient of a model in Python:
import numpy as np from sklearn.metrics import matthews_corrcoef #define array of actual classes actual = np.repeat([1, 0], repeats=[20, 380]) #define array of predicted classes pred = np.repeat([1, 0, 1, 0], repeats=[15, 5, 5, 375]) #calculate Matthews correlation coefficient matthews_corrcoef(actual, pred) 0.7368421052631579
The MCC is 0.7368. This matches the value that we calculated earlier by hand.
Note: You can find the complete documentation for the matthews_corrcoef() function here.
Additional Resources
The following tutorials explain how to calculate other common metrics for classification models in Python:
An Introduction to Logistic Regression in Python
How to Calculate F1 Score in Python
How to Calculate Balanced Accuracy in Python