Confusion Matrix (Machine Learning)


A confusion matrix is a table or matrix that allows analysis of the accuracy of an ML algorithm in more nuanced ways than simply scoring overall accuracy as a single ratio. The table lays out on each axis all possible classifications of the data. Then usually the horizontal axis represents the predicted values and the vertical axis represents the actual values. Then each cell represents the frequency of the predicted values against their actual values as given by the data.


In a confusion matrix, the number of correct and incorrect predictions are tabluated so you can see the actual values versus the predicted values. You can also compare true positives/true negatives with false positives/false negatives.

Suppose that you are training an algorithm to predict if a credit card transaction is is fraudulent or not. Below is an image of a 2 x 2 confusion matrix for this binomial classification problem:

sample confusion matrix of a binary classification problem

To interpret the results of the model, consider the following:


There are several metrics that you can derive from a confusion matrix. They can help analyze the model. These metrics are: accuracy, precision, recall, specificity and F1 score.


Accuracy is the ratio of correct predictions to total predictions. Kind of like the counting principle from probability. Accuracy answers the question: what percentage of of predictions were correct?

Accuracy can be calculating using the following formula (Shin 2020):


Or basically counting the true predictions and the total predictions and dividing the former by the latter.


Precision is the ratio of correctly identified positive results to the total number of positive results. Precision answers the question: What is the ratio of positive predictions against all positive results?

Precision can be calculated using the formula (Shin 2020):


To put it simply, it's the ratio of true positives against all positive predictions (which includes true positives and false positives).


Recall measures the number of false negatives predicted. Recall can be calculated using this formula (Shin 2020):


Basically, the number of true negatives against the total number of negatives predicted (which includes true negatives and false negatives).

Precision vs. Recall

Precision and recall go hand in hand to analyze a model, and they will have varying levels of importance depending on the model that is being analyzed. For instance, if you are analyzing a model that classifies emails as spam or legitimate, although you are likely interested in high precision, it would be tolerable for your model to have a low recall. Recall, meaning it would occasionally incorrectly classify spam as legitimate.

In contrast, if your model is is predicting whether a patient tests positive or negative for COVID-19, a high recall would be more important than a high precision. For public health, it's far more dangerous to give people a false sense of security than to occasionally give people a false positive; because a false negative could lead to the spread of the virus to others since the carrier is unaware.


Specificity measures the proportion of negatives that are correctly identified. Specificity can be calculated using this formula (Shin 2020):


Basically, the number of true negatives against the total number of negatives predicted.

F1 Score

F1 Score is a measure of a test's accuracy. The f1 score is a value between 0 and 1, where 0 represents the lowest accuracy.

The f1 score can be calculated using this formula (Shin 2020):


Python Implementation of a Confusion Matrix

Use the code below to create a confusion matrix, given true positive and true negative values. (Shin 2020)

# Confusion Matrix
from sklearn.metrics import confusion_matrix
confusion_matrix(y_true, y_pred)

Use the code below to compute the accuracy, recall, specificity and precision. (Shin 2020)

# Accuracy
from sklearn.metrics import accuracy_score
accuracy_score(y_true, y_pred)

# Recall
from sklearn.metrics import recall_score
recall_score(y_true, y_pred)

# Precision
from sklearn.metrics import precision_score
precision_score(y_true, y_pred)

# Specificity - You can obtain it from the confusion matrix
from sklearn.metrics import confusion_matrix
y_true = [0, 0, 0, 0, 1, 1, 1, 1]
y_pred = [0, 1, 0, 1, 0, 1, 0, 1]
true_negative, false_positive, false_negative, true_positive =
confusion_matrix(y_true, y_pred).ravel()
specificity = true_negative / (true_negative + false_positive)

# F1 Score
from sklearn.metrics import f1_score
f1_score(y_true, y_pred, average=None)


Web Links

Note Links