Confusion Matrix

Published by Mario Oettler on

Suppose we have an algorithm that recognizes cats in photos. Our data set consists of 1,000 photos. 250 show a cat, and the rest doesn’t. Our algorithm classifies 300 photos as a picture of a cat. Of these photos, 190 are correctly classified.

What is the likelihood that a photo classified by our algorithm as a “cat photo” really shows a cat?

Example 1

We can create a matrix that lists all possible results for that purpose.

Here are some hints on how to fill the matrix.

  • As we said, we have 250 pictures with a cat. 190 of them are correctly recognized by our algorithm. This leaves 60 photos that were incorrectly classified as “No Cat”.
  • We also know that our algorithm found 300 photos were it thought they show a cat. Since only 190 were correct, 110 are no cat in reality.
  • In total, we have 1,000 photos, which means that 1,000-250 = 750 photos don’t show a cat in reality.
  • With this information, we can calculate the number of photos that our algorithm correctly classified as “No Cat”.

To answer our question, what is the likelihood that a photo classified as “Cat” by our algorithm shows a cat in reality, we calculate:

190/300 = 63.33%

This confusion matrix helps us to assess the quality of an algorithm or test.

The fields in the matrix give us the following information:

Example 2

Let’s consider another example.

Suppose our algorithm classifies every photo as “Cat”. What would the confusion matrix look like?

What is the likelihood of getting a cat photo if the test shows “cat”?

1,000/100 = 0.1

Example 3

Now, our test randomly displays “Cat” or “No Cat” (50:50). The confusion matrix would look like:

What is the likelihood of getting a cat photo if the test shows “Cat”?

50/500 = 0.1

As you see, two different tests bring the same likelihood. Hence, in order to assess the quality of the algorithm (or test), we need to calculate some indicators:

Precision or Positive Prediction Value (PPV)

Negative Prediction Value (NPV)

Sensitivity (SE) or true positive rate (TPR)

Specificity (SP) or true negative rate (TNR)

KPIExample 1Example 2Example 3
Precision or Positive Prediction Value (PPV)0.6330.10.1
Negative Prediction Value (NPV)0.914DIV/0!0.9
Sensitivity (SE)0.7610.5
Specificity (SP)0.8500.5

The PPV (Positive Prediction Value) and NPV (Negative Prediction Value) can only be applied to other groups (or data sets) if the pre-test probability (prevalence) is the same in both groups.

To make up for this limitation, one can calculate the likelihood ratio:

Likelihood ratios compare the probability that someone with the disease, or in our case a picture with a cat, has a particular test result as compared to someone without the disease or a picture without a cat.

They exist for a positive test result and for a negative test result.

The positive likelihood ratio (LR+) tells us how the likelihood of a positive result changes if the test result is positive (or the algorithm says “Cat”).

The negative likelihood ratio (LR) tells us how the likelihood of a positive result changes if the test result is negative (or the algorithm says “No cat”).

A likelihood ratio of 1.0 indicates that there is no difference in the probability of the particular test result (positive result for LR+ and negative result for LR−) between those with and without the tested feature (cat).

A likelihood ratio >1.0 indicates that the particular test result is more likely to occur in pictures with a cat than in those without a cat.

A likelihood ratio <1.0 indicates that the particular test result is less likely to occur in those pictures with a cat those without a cat.

The greater the distance of the LR from 1, the greater the significance of the test.

Categories: