After training a ML model on binary classification, you might have been looking at your model performance at the accuracy. While this is the most natural and intuitive metric, you should also inspect another set of metrics that are derived from the Confusion Matrix. The topic of this article is about the confusion matrix.
Definition
In a quick glance, a 2×2 confusion matrix is defined as below
An Example
We have an model that tries to predict whether a Cancer is malignant (class 1) or benign (class 0) out of 1000 cases. The model makes the following predictions
- Model predicts 20 cases as malignant (class 1), and they are indeed malignant.
- Model predicts 10 cases as malignant (class 1), but they are actually benign.
- Model predicts 940 cases as benign (class 0), but they are actually malignant.
- Model predicts 30 cases as benign (class 0), and they are indeed benign.
The confusion matrix in this example would be evaluated as the followed
Let’s evaluate the model
The accuracy of the model seems quite high.
- The model correctly predicts the classification in 960 cases (20 + 940).
- The model incorrectly predicts the classification in 40 cases (30 + 10).
In terms of accuracy, the model is 96% (960/1000) accurate. But can we conclude that this is a good model?
Let’s focus on the cases that is indeed in class 1:
In the actually malignant cases, the model is correct at 40% = 20 / (20 + 30) of the time. This number is known as the true positive rate (TPR), which is defined as
TPR = TP / (TP + FN), also called sensitivity, recall, and hit rate.
This means for all the 60% of the patients who actually have cancer are diagnosed as benign. As a result, they would not receive further diagnosis/treatment. This is a very costly outcome.
Next we focus on the cases that the model predicts to be class 1:
In the 30 cases where the model is predicting to be class 1, the model is correct at 66.67% = 20 / (20 + 10) of the time. This number is known as the positive predicted value (PPV), which is defined as
PPV = TP / (TP + FP), and often called precision.
This means when the model predicts the patients to have cancer, only 1/3 of the patients actually do not have cancer and would undergo more extensive (and more expensive) diagnosis, which would rule out their cases.
Why do we need confusion matrix?
Accuracy is not enough
Accuracy is not a good metric in an imbalanced data set. Although the model has a seeming high accuracy of 96% overall, it performs quite poor on the cases that actually matter, which is the malignant (class 1) case.
Not all errors are equal
In the first case, the false negative is most severe because it causes the patients to miss out on the opportunity to detect the cancer at an early stage. In the second case, the cost would be some anxiety for the false positive patients, who are mistakenly classified as having cancer. In practice, we should favor model that can really limit the first by considering the cost.
Conclusion
When your data set is imbalanced (having many more example of class 1 vs class 0), it’s always a good practice to evaluation the model via the confusion matrix.
4 thoughts on “Machine learning 101: what is the Confusion Matrix?”