Everything you need to know about confusion matrix!

We’ll take a trip to the world of confusion matrix and along the way explore many viewpoints like Precision, Recall, why one is preferred over the other etc.,

A confusion matrix is a table that is used to define the performance of a classification algorithm. The table is made up of rows and columns, with the rows representing the actual values and the columns representing the predicted values. The entries in the table are the number of observations that fall into each combination of actual and predicted values.

Here is an example of a confusion matrix for a binary classification problem:

Actual PositiveActual Negative
Predicted PositiveTP (True Positive)FP (False Positive)
Predicted NegativeFN (False Negative)TN (True Negative)
  • TP: The number of observations that are predicted as positive and are actually positive.
  • TN: The number of observations that are predicted as negative and are actually negative.
  • FP: The number of observations that are predicted as positive but are actually negative.
  • FN: The number of observations that are predicted as negative but are actually positive.

For example, let’s say we are building a model to predict whether a person is a dog or a cat lover. The actual values are “dog lover” and “cat lover”, and the predicted values are also “dog lover” and “cat lover”. The confusion matrix would look like this:

Predicted Dog LoversPredicted Cat Lovers
Actual Dog Lovers10020
Actual Cat Lovers10150
Sample confusion matrix

The values in the diagonal cells represent correct predictions, while the values in the other cells represent incorrect predictions. In this example, 100 people who are dog lovers were correctly identified as dog lovers, 150 people who are cat lovers were correctly identified as cat lovers, 20 people who are dog lovers were incorrectly identified as cat lovers, and 10 people who are cat lovers were incorrectly identified as dog lovers.

We can also calculate some metrics from the confusion matrix such as accuracy, precision, recall and F1-score to evaluate the performance of the model.

Accuracy:

For example, accuracy is the ratio of correct predictions to total predictions.

In this example, the accuracy is (100 + 150) / (100 + 150 + 10 + 20) = 0.85 which is 85% 

Precision:

Precision is the proportion of true positive predictions (i.e., the number of true positive predictions) out of all positive predictions (true positive predictions and false positive predictions). It is a measure of how many of the positive predictions made by a classifier are actually correct.

In this example, the precision for dog lover is 100 / (100 + 20) = 0.83 (TP / (TP + FP))

Recall:

Recall is the proportion of true positive predictions out of all actual positive instances (true positive predictions and false negative predictions). It is a measure of how many of the actual positive instances are correctly predicted by a classifier.

In this example, the recall for dog lover is 100 / (100 + 10) = 0.91 (TP / (TP + FN))

F1-score:

and F1-score is the harmonic mean of precision and recall.

In this example, the F1-score for dog lover is 2 * ((0.83 * 0.91) / (0.83 + 0.91)) = 0.87

These metrics can help you to understand the performance of your model and make decisions about how to improve it.

Too many numbers! Tell me why would I would want to prefer Recall over precision?

An example of a scenario where high recall is more important than high precision, say in the healthcare sector, is in the early detection of a disease. In this case, it is more important to find as many cases of the disease as possible, even if it means that there will be a higher number of false positives (people who are diagnosed with the disease but do not actually have it) because of the serious consequences of missing an actual case of the disease.

For example, let’s say a hospital is developing a screening test for a certain type of cancer. If the test is highly precise but has low recall, it will correctly identify only a small fraction of the people who actually have the disease (i.e., it will have a high number of false negatives) and many people with the disease will not be diagnosed and treated. In this case, it would be more important to have a screening test with high recall, even if it means that some people without the disease will be incorrectly diagnosed.

Another example is in the field of disease outbreak detection. In case of outbreak like Covid-19, rapid detection and isolation of infected individuals is of utmost importance, even if that means some non-infected individuals are also quarantined. In this scenario, high recall rate would be more important than high precision.

Now why would I want to prefer Precision?

An example of a scenario where high precision is more important than high recall is in the field of medical treatments or drug development. In this case, it is more important to avoid false positives (people who are prescribed a treatment or given a drug but do not actually need it) because of the potential side-effects and cost of the treatment or drug.

For example, let’s say a pharmaceutical company is developing a new drug for a certain condition. If the drug is highly effective but has a high rate of side-effects, it would be more important to have a highly precise diagnostic test to identify only those patients who will most likely benefit from the drug and avoid prescribing it to patients who do not need it.

Another example could be in the field of radiology, where radiologists use medical images to identify presence of diseases such as tumors, blood clots or fractures. In this field, high precision is more important as False positive (identifying a disease where it does not exists) can lead to unnecessary treatments, surgeries, and emotional stress for the patient. While a False negative (not identifying the disease) can lead to delayed treatment or further deterioration of the patient’s health.

In summary, a confusion matrix is a table that is used to define the performance of a classification algorithm by breaking down correct and incorrect predictions and giving an overall performance metric such as accuracy, precision, recall, and F1-score.

Leave a Reply

Your email address will not be published. Required fields are marked *