September 21, 2021

Metrics to Evaluate Classification Algorithms

Vivek Bhagatvivekbhagat

DURATION

15min

1. Accuracy:

It is quite an essential metric and easy to understand as well. The proportion of the count of true results to the total number of cases.

Accuracy = (True Positives + True Negatives ) / (True Positives + False Positives + False Negatives + True Negatives)

Use Case : It is both suited for binary and multiclass classification problems.
We can use accuracy as a metric when the dataset is well balanced and not skewed much.

From accuracy we cannot state how good the model’s predictions are, as it will just tell the probability of correct predictions of the model.

2. Confusion Matrix:

Before moving to confusion matrix let’s discuss the vocabulary of it with the help of examples:

True Positive (TP):
A transaction is fraud and is detected as fraud by the model.
False Positive (FP):
A transaction is not fraud and is detected as fraud. This is also called a Type I error.
True Negative (TN):
A transaction is not fraud and is detected as not fraud.
False Negative (FN):
A transaction is fraud and is detected not fraud by the model. This is also called Type II error.

Figure 1

Summary: The confusion matrix provides a detailed overview of the classification. For a better performing model TP, TN must be high and FN, FP should be low as possible.

3. Precision, Recall, and F-1 Score:

Precision:

It tells us the ratio of the true positives to the total predicted positives.

Precision = (TP) / (TP +FP)

Use case : It is used when we want to be very sure of the prediction. For example, if you want to decrease the credit limit of a customer, we need to be very sure, as it will result in a dissatisfied customer.

Recall:

It tells us the proportion that the model is accurately classifying the true positives. It is also called Sensitivity.

Recall = (TP) / (TP +FN)

Use case: One case would be when a patient has COVID and no treatment is given to that person as the model classified it as so. This situation must be avoided.

F1 Score:

It is a trade between recall and precision. In some cases we need high precision, others may require high recall, but there can be some cases in which both recall and precision are important equally.

Figure 2

Use case: When the data is imbalanced, F1 score can be used. When both precision and recall are low then F1 score can be used to balance it.

4. Log Loss:

It accounts for the performance of a model where the classification is a probability value that lies between 0 and 1. If the log loss increases, that means the estimated probability of the model is deviating from the actual.

Figure 3

N : Number of data points

Pi : Estimated probability of y

Yi: Actual value of y

The lower the log loss value, the better the model’s predictions.

5. ROC AUC:

Receiver operating characteristic (ROC) curve is the plot of false positive and true positive at different variations. The y axis is plotted with the cumulative distribution of all the true positives and the x axis is plotted with the cumulative distribution of all the false positives.

Figure 4

The area under the ROC curve is used as the metric. The higher the AUC, the better the performance of the model.

Use case: It is used when the data is not balanced. It can also be used to compare among various classification algorithms.

I hope the metrics and how and when they should be used are clear to you.

Chat on Discord

September 21, 2021

Metrics to Evaluate Classification Algorithms

DURATION

categories

Tags

share

Topcoder Thrive

1. Accuracy:

2. Confusion Matrix:

3. Precision, Recall, and F-1 Score:

Precision:

Recall:

F1 Score:

4. Log Loss:

5. ROC AUC:

Recommended for you

AI Ops for Cloud Native Observability

Top Three Tensorflow Tools for Data Scientists

Understanding Random Forest and Hyper Parameter Tuning