What is logistic regression? Let’s break the problem down step by step in this article.
So first, what is an algorithm? In machine learning, it’s a set of steps performed on data to create a model. In machine learning, algorithms are of two types: classification and regression.
Classification algorithms deal with discrete values like yes or no, male or female, etc., whereas regression algorithms deal with making predictions about continuous variables like age, salary, weather forecasts, etc.
Let’s take a scenario where we need to classify whether a person is male or female. If we use linear regression for this problem, it’s a requirement for us to take a threshold value based on which the classification can be performed. Say, if the actual class is male, predicted continuous value is 0.3, and the threshold value is 0.5, the data point will be classified as not male, which can lead to serious issues in real-time. As there are no limiting values in linear regression, it cannot be used for classification problems. Therefore, there is a need for logistic regression.
Now, let’s understand logistic regression**. Logistic regression** is a classification algorithm in which the dependent variable is categorical. Logistic regression uses a logistic sigmoid function to transform its output to return a probability value.
This logistic regression limits the cost function between 0 and 1.
Figure 1
To map predicted values to a probability value, we use a sigmoid function. This function maps any real value into a value between 0 and 1. In machine learning algorithms, we use sigmoid function to map the predicted value to a probability value.
Figure 2
We should have the hypothesis that gives a value between 0 and 1. Therefore:
Figure 3
The result of the hypothesis is the estimated probability. This probability is used to determine how the predicted value can be compared to the actual value when given to input X.
Based on the input value X1, let’s say we get the estimated probability as 0.7. This tells us that there is a 70% chance that a transaction is fraud.
hΘ(x) = P(Y=1|X; theta)
The probability that Y = 1 when X is parameterized as ‘theta’.
P(Y=1|X; theta) + P(Y=0|X; theta) =1
P(Y=0|X; theta) = 1 – P(Y=1|X; theta)
We train our classifier to give us a set of results based on probability when we pass the independent variables through the sigmoid function, which results in an estimated value that is between 0 and 1.
For example, we have two classes, let us take them as fraud and not fraud (1 – fraud, 0 – not fraud).
We decide on a threshold value above which we classify values into class 1 and which falls below the threshold that comes under class 2. Mostly, the threshold is taken as 0.5.
Binary Logistic Regression:
The dependent variable has only two possible outcomes. Example: male or female
Multinomial Logistic Regression:
The dependent variable has three or more outcomes without ordering. Example: predicting which foods are most preferred (vegetarian, non-vegetarian, vegan).
Ordinal Logistic Regression:
The dependent variable has three or more outcomes with ordering. Example: product rating 1 to 5.
Cost Function:
-log(hΘ(x)) if y =1
-log(1- hΘ(x)) if y = 0
Cost (hΘ(x),y)= -log(hΘ(x)) if y =1
-log(1- hΘ(x)) if y = 0
The above two functions can be written into a single function like below:
Figure 4
Binary logistic regression requires the dependent variable to be binary.
In a binary logistic regression, level 1 of the target variable should represent the desired result.
Only the meaningful independent variables should be included.
The independent variables should not be related to each other as they must not have multicollinearity.
The independent variables of the data are related to log odds in a linear manner.
A credit company wants to detect whether a transaction is fraud or not.
To detect whether a product review rates among 1 to 5.
To predict weather types: stormy, sunny, cloudy, rainy.