After studying a business problem, doing uni-variate and bi-variate analysis, feature engineering, selection and applying a model and getting the required output in terms of continuous variables or classes, the next step is to evaluate how effective our model is based on some performance metrics.
There are different performance metrics for regression and classification machine learning models. In this article, we will be discussing the performance metrics of classification algorithms of machine learning. Performance metrics are used to optimize a model.
For example we will train a model whether a person has COVID or not. For this, accuracy is essential, but more important is the false cases the model is predicting; if the person has COVID and the model is predictive negative for it. So, for such scenarios we need to reduce the false negatives, as the non-detected case is dangerous for other people too in real-time.
Let’s have a look at these metrics and their use cases.
It is quite an essential metric and easy to understand as well. The proportion of the count of true results to the total number of cases.
Accuracy = (True Positives + True Negatives ) / (True Positives + False Positives + False Negatives + True Negatives)
Use Case : It is both suited for binary and multiclass classification problems.
We can use accuracy as a metric when the dataset is well balanced and not skewed much.
From accuracy we cannot state how good the model’s predictions are, as it will just tell the probability of correct predictions of the model.
Before moving to confusion matrix let’s discuss the vocabulary of it with the help of examples:
True Positive (TP):
A transaction is fraud and is detected as fraud by the model.
False Positive (FP):
A transaction is not fraud and is detected as fraud. This is also called a Type I error.
True Negative (TN):
A transaction is not fraud and is detected as not fraud.
False Negative (FN):
A transaction is fraud and is detected not fraud by the model. This is also called Type II error.
Figure 1
Summary: The confusion matrix provides a detailed overview of the classification. For a better performing model TP, TN must be high and FN, FP should be low as possible.
It tells us the ratio of the true positives to the total predicted positives.
Precision = (TP) / (TP +FP)
Use case : It is used when we want to be very sure of the prediction. For example, if you want to decrease the credit limit of a customer, we need to be very sure, as it will result in a dissatisfied customer.
It tells us the proportion that the model is accurately classifying the true positives. It is also called Sensitivity.
Recall = (TP) / (TP +FN)
Use case: One case would be when a patient has COVID and no treatment is given to that person as the model classified it as so. This situation must be avoided.
It is a trade between recall and precision. In some cases we need high precision, others may require high recall, but there can be some cases in which both recall and precision are important equally.
Figure 2
Use case: When the data is imbalanced, F1 score can be used. When both precision and recall are low then F1 score can be used to balance it.
It accounts for the performance of a model where the classification is a probability value that lies between 0 and 1. If the log loss increases, that means the estimated probability of the model is deviating from the actual.
Figure 3
N : Number of data points
Pi : Estimated probability of y
Yi: Actual value of y
The lower the log loss value, the better the model’s predictions.
Receiver operating characteristic (ROC) curve is the plot of false positive and true positive at different variations. The y axis is plotted with the cumulative distribution of all the true positives and the x axis is plotted with the cumulative distribution of all the false positives.
Figure 4
The area under the ROC curve is used as the metric. The higher the AUC, the better the performance of the model.
Use case: It is used when the data is not balanced. It can also be used to compare among various classification algorithms.
I hope the metrics and how and when they should be used are clear to you.