AUC and ROC - Complex systems and AI

Contents

AUC and ROC curve, interpretation and multiclass

This tutorial presents the AUC and ROC curve as well as how to interpret the results. The multiclass case is also presented.

Performance measures

In Machine Learning, measuring performance is an essential task. So when it comes to a classification problem, we can rely on an AUC – ROC Curve. When we need to check or visualize the performance of the multi-class classification problem, we use the AUC (Area Under The Curve) ROC (Receiver Operating Characteristics) curve. This is one of the most important evaluation metrics to check the performance of any classification model. It is also written AUROC (Area Under the Receiver Operating Characteristics).

The AUC–ROC curve is a performance measure for classification problems at different threshold settings. ROC is a probability curve and AUC represents the degree or measure of separability. It indicates to what extent the model is able to distinguish classes. The higher the AUC, the more the model is able to predict 0 classes as 0 and 1 class as 1. By analogy, the higher the AUC, the more the model is able to distinguish patients with the disease from those who do not have.

The ROC curve is plotted with TPR=TP/(TP+FN) versus FPR=1-Specificity=FP/(TN+FP) where TPR is on the y-axis and FPR is on the x-axis.

ROC calculation

Sensitivity and specificity (TPR and FPR) are inversely proportional to each other. So, as we increase sensitivity, specificity decreases, and vice versa.

When we train a classification model, we get the probability of getting a result. In this case, our example will be the probability of repaying a loan.

The probabilities vary between 0 and 1. The higher the value, the more likely the person is to repay a loan.

The next step is to find a threshold to classify the probabilities as “will refund” or “will not refund”.

In the example in the figure, we have selected a threshold of 0.35 (the classification models will automatically select the value giving the best precision):

All predictions at or above this threshold are classified as "will refund"
All predictions below this threshold are classified as "will not refund"

We then examine which of these predictions were correctly classified or misclassified. With such information we can construct a confusion matrix.

At this point we have

correctly classified 90% of all positives, those who have “repaid” (TPR)
40 % of all negatives were misclassified, those that "did not refund" (FPR)

We can notice that the results for TPR and FPR decrease as the threshold increases. If we look at the first, where the threshold is 0:

All positives have been correctly classified, so TPR = 100 %
All negatives were misclassified, so FPR = 100 %

In the last example graph, where the threshold is 1:

All positives were misclassified, so TPR = 0 %
All negatives were correctly classified, so FPR = 0 %

To plot the ROC curve, we need to calculate the TPR and FPR for many different thresholds (this step is included in all relevant libraries under the name scikit-learn).

For each threshold, we plot the FPR value on the x-axis and the TPR value on the y-axis. We then join the points with a line. That's it!

Below in the figure below we can see how each point on the ROC curve represents the FPR and TRP of a classification at a given threshold.

Notice how the threshold at 1 leads to the first point at (0, 0) and the threshold at 0 leads to the last point at (1, 1).

The area covered below the line is called the Area Under the Curve (AUC). This is used to evaluate the performance of a classification model. The higher the AUC, the better the model is at distinguishing classes.

This means that in an ideal world we would like to see our line cover most of the upper left corner of the chart to achieve a higher AUC.

Mathematical interpretation

As we know, ROC is a probability curve. Let us therefore plot the distributions of these probabilities:

Note: The red distribution curve is of the positive class (patients with disease) and the green distribution curve is of the negative class (patients without disease).

In this scenario, a regression finds a clear distinction between the two classes. In the case of a decision tree, a single split is enough to have 100% success! Here the AUC is 1, which would give the following curve:

It's an ideal situation. When two curves do not overlap at all, it means that the model has an ideal measure of separability. It is perfectly capable of distinguishing the positive class from the negative class.

When two distributions overlap, we introduce type 1 and type 2 errors. Depending on the threshold, we can minimize or maximize them. When the AUC is 0.7, it means that there is a 70 % chance that the model will be able to distinguish between a positive class and a negative class.

This is the worst situation. When the AUC is around 0.5, the model has no discrimination ability to distinguish the positive class from the negative class. This amounts to random prediction.

When the AUC is around 0, the model actually reciprocates the classes. This means that the model predicts a negative class as a positive class and vice versa.

AUC - ROC curve for the multiclass problem

In a multi-class model, we can plot N number of AUC ROC curves for N number classes using the One vs ALL methodology. So, for example, if you have three classes named X, Y, and Z, you will have one ROC for X ranked against Y and Z, another ROC for Y ranked against X and Z, and the third of Z ranked with respect to Y and X.