September 16, 2021

Logistic Regression

Vivek Bhagatvivekbhagat

DURATION

15min

Tags

Classification

Statistics

Love to compete?

Join Topcoder Challenges

What is logistic regression? Let’s break the problem down step by step in this article.

So first, what is an algorithm? In machine learning, it’s a set of steps performed on data to create a model. In machine learning, algorithms are of two types: classification and regression.

Classification algorithms deal with discrete values like yes or no, male or female, etc., whereas regression algorithms deal with making predictions about continuous variables like age, salary, weather forecasts, etc.

Let’s take a scenario where we need to classify whether a person is male or female. If we use linear regression for this problem, it’s a requirement for us to take a threshold value based on which the classification can be performed. Say, if the actual class is male, predicted continuous value is 0.3, and the threshold value is 0.5, the data point will be classified as not male, which can lead to serious issues in real-time. As there are no limiting values in linear regression, it cannot be used for classification problems. Therefore, there is a need for logistic regression.

Now, let’s understand logistic regression**. Logistic regression** is a classification algorithm in which the dependent variable is categorical. Logistic regression uses a logistic sigmoid function to transform its output to return a probability value.

This logistic regression limits the cost function between 0 and 1.

Figure 1

What Does the Sigmoid Function Do?

To map predicted values to a probability value, we use a sigmoid function. This function maps any real value into a value between 0 and 1. In machine learning algorithms, we use sigmoid function to map the predicted value to a probability value.

Figure 2

Hypothesis Representation of Logistic Regression

We should have the hypothesis that gives a value between 0 and 1. Therefore:

Figure 3

Analysis of This Hypothesis

The result of the hypothesis is the estimated probability. This probability is used to determine how the predicted value can be compared to the actual value when given to input X.

Based on the input value X1, let’s say we get the estimated probability as 0.7. This tells us that there is a 70% chance that a transaction is fraud.

Mathematically This Can Be Written As

hΘ(x) = P(Y=1|X; theta)

The probability that Y = 1 when X is parameterized as ‘theta’.

P(Y=1|X; theta) + P(Y=0|X; theta) =1

P(Y=0|X; theta) = 1 – P(Y=1|X; theta)

Decision Boundary

We train our classifier to give us a set of results based on probability when we pass the independent variables through the sigmoid function, which results in an estimated value that is between 0 and 1.

For example, we have two classes, let us take them as fraud and not fraud (1 – fraud, 0 – not fraud).

We decide on a threshold value above which we classify values into class 1 and which falls below the threshold that comes under class 2. Mostly, the threshold is taken as 0.5.

Types of Logistic Regression

Binary Logistic Regression:

The dependent variable has only two possible outcomes. Example: male or female

Multinomial Logistic Regression:

The dependent variable has three or more outcomes without ordering. Example: predicting which foods are most preferred (vegetarian, non-vegetarian, vegan).

Ordinal Logistic Regression:

The dependent variable has three or more outcomes with ordering. Example: product rating 1 to 5.

Cost Function:

-log(hΘ(x)) if y =1

-log(1- hΘ(x)) if y = 0

Cost (hΘ(x),y)= -log(hΘ(x)) if y =1

-log(1- hΘ(x)) if y = 0

The above two functions can be written into a single function like below:

Figure 4

Logistic Regression Assumptions

Binary logistic regression requires the dependent variable to be binary.
In a binary logistic regression, level 1 of the target variable should represent the desired result.
Only the meaningful independent variables should be included.
The independent variables should not be related to each other as they must not have multicollinearity.
The independent variables of the data are related to log odds in a linear manner.

Real Time Examples of Logistic Regression

A credit company wants to detect whether a transaction is fraud or not.
To detect whether a product review rates among 1 to 5.
To predict weather types: stormy, sunny, cloudy, rainy.

Chat on Discord

September 16, 2021

Logistic Regression

DURATION

categories

Tags

share

Join Topcoder Challenges

What Does the Sigmoid Function Do?

Hypothesis Representation of Logistic Regression

Analysis of This Hypothesis

Mathematically This Can Be Written As

Decision Boundary

Types of Logistic Regression

Logistic Regression Assumptions

Real Time Examples of Logistic Regression

Recommended for you

Introduction to Linear Regression

Time Series With Facebook Prophet

Understanding Random Forest and Hyper Parameter Tuning