Understanding Logistic Regression (2024)

Published in

Towards Data Science

5 min read

May 17, 2018

Logistic Regression is one of the basic and popular algorithms to solve a classification problem. It is named ‘Logistic Regression’ because its underlying technique is quite the same as Linear Regression. The term “Logistic” is taken from the Logit function that is used in this method of classification.

This blog aims to answer the following questions:

1. What is the Classification problem?

2. Why not use Linear Regression?

3. Logistic Regression Algorithm?

4. What is Decision Boundary?

5. How to check model performance?

Today, let’s understand the Logistic Regression once and for all. Let’s start,

What is theClassification Problem?

We identify the problem as a classification problem when independent variables are continuous in nature and the dependent variable is in categorical form i.e. in classes like positive class and negative class. The real-life example of classification example would be, to categorize the mail as spam or not spam, to categorize the tumor as malignant or benign, and to categorize the transaction as fraudulent or genuine. All these problem’s answers are in categorical form i.e. Yes or No. and that is why they are two-class classification problems.

Although, sometimes we come across more than 2 classes, and still it is a classification problem. These types of problems are known as multi-class classification problems.

Why not use Linear Regression?

Suppose we have data of tumor size vs its malignancy. As it is a classification problem, if we plot, we can see, all the values will lie on 0 and 1. And if we fit the best-found regression line, by assuming the threshold at 0.5, we can do line pretty reasonable job.

We can decide the point on the x-axis from where all the values lie to its left side are considered as a negative class and all the values lie to its right side are positive class.

But what if there is an outlier in the data. Things would get pretty messy. For example, for 0.5 thresholds,

If we fit the best-found regression line, it still won’t be enough to decide any point by which we can differentiate classes. It will put some positive class examples into negative class. The green dotted line (Decision Boundary) is dividing malignant tumors from benign tumors but the line should have been at a yellow line which is clearly dividing the positive and negative examples. So just a single outlier is disturbing the whole linear regression predictions. And that is where logistic regression comes into the picture.

Logistic Regression Algorithm

As discussed earlier, to deal with outliers, Logistic Regression uses the Sigmoid function.

An explanation of logistic regression can begin with an explanation of the standard logistic function. The logistic function is a Sigmoid function, which takes any real value between zero and one. It is defined as

And if we plot it, the graph will be S curve,

Let’s consider t as a linear function in a univariate regression model.

So the Logistic Equation will become

Now, when the logistic regression model comes across an outlier, it will take care of it.

But sometimes it will shift its y-axis to left or right depending on outliers' positions.

What is Decision Boundary?

Decision boundary helps to differentiate probabilities into positive class and negative class.

Linear Decision Boundary

Non-Linear Decision Boundary

How to check performance?

To check the performance, we can use the confusion matrix and AUC - ROC Curve. To know what it is, check my article about the confusion matrix and AUC - ROC Curve.

References:

Images are taken from Andrew Ng Course and modified a bit as they are easy to understand😁.

Thanks for Reading.

I hope I’ve given you some understanding of what exactly is Logistic Regression. If you like this post, a tad of extra motivation will be helpful by giving this post some claps 👏. I am always open to your questions and suggestions. You can share this on Facebook, Twitter, Linkedin, so someone in need might stumble upon this.

You can reach me at:

LinkedIn : https://www.linkedin.com/in/narkhedesarang/

Twitter : https://twitter.com/narkhede_sarang

Github : https://github.com/TheSarang

FAQs

Understanding Logistic Regression? ›

Logistic regression is a data analysis technique that uses mathematics to find the relationships between two data factors. It then uses this relationship to predict the value of one of those factors based on the other. The prediction usually has a finite number of outcomes, like yes or no.

Find Out More ›

How do you interpret logistic regression? ›

Analysts often prefer to interpret the results of logistic regression using the odds and odds ratios rather than the logits (or log-odds) themselves. Applying an exponential (exp) transformation to the regression coefficient gives the odds ratio; you can do this using most hand calculators.

Tell Me More ›

Is logistic regression easy to understand? ›

Straightforward Relationship: Even though the logistic regression model might seem complex, the relationship between our inputs (like age, height, etc.) and the outcome (like yes/no) is pretty simple to understand. It's like drawing a straight line, but with a curve instead.

Get More Info ›

How does logistic regression work for dummies? ›

Understanding Probability: Logistic regression is all about probabilities. It looks at past data and calculates the likelihood of something happening based on the input factors. For example, it might say there's a 80% chance of a customer buying a product based on their age and income.

See Details ›

What are the 3 types of logistic regression? ›

There are three main types of logistic regression: binary, multinomial and ordinal. They differ in execution and theory. Binary regression deals with two possible values, essentially: yes or no. Multinomial logistic regression deals with three or more values.

Find Out More ›

What is the best explanation of logistic regression? ›

Logistic regression is defined as a supervised machine learning algorithm that accomplishes binary classification tasks by predicting the probability of an outcome, event, or observation.

What does p-value mean in logistic regression? ›

Originally Answered: how do I interpret p value in logistic regression? p-value helps you to decide whether there is a relationship between two variables or not. The smaller the p-value this mean the more confident you are about the existence of relationship between the two variables.

View Details ›

What is an example of logistic regression for beginners? ›

An example of logistic regression can be to find if a person will default their credit card payment or not. The probability of a person defaulting their credit card payment can be based on the pending credit card balance and income etc. when the P(default=yes)≥0.5, then we say the person will default their payment.

Know More ›

When not to use logistic regression? ›

If the number of observations is lesser than the number of features, Logistic Regression should not be used, otherwise, it may lead to overfitting. It makes no assumptions about distributions of classes in feature space.

Learn More Now ›

What is an example of logistic regression in real life? ›

Logistic regression is used across many scientific fields. In Natural Language Processing (NLP), it's used to determine the sentiment of movie reviews, while in Medicine it can be used to determine the probability of a patient developing a particular disease.

Keep Reading ›

What is logistic regression used for example? ›

Logistic regression is used to predict the categorical dependent variable. It's used when the prediction is categorical, for example, yes or no, true or false, 0 or 1. For instance, insurance companies decide whether or not to approve a new policy based on a driver's history, credit history and other such factors.

What is difference between linear regression and logistic regression? ›

Differences Between Linear Regression and Logistic Regression. Linear Regression is used to handle regression problems whereas Logistic regression is used to handle the classification problems. Linear regression provides a continuous output but Logistic regression provides discreet output.

Get More Info ›

What is the purpose of the simple logistic regression? ›

Simple Logistic Regression is a statistical test used to predict a single binary variable using one other variable. It also is used to determine the numerical relationship between two such variables. The variable you want to predict should be binary and your data should meet the other assumptions listed below.

Why is logistic regression very popular? ›

Logistic Regression is a popular algorithm as it converts the values of the log of odds which can range from -inf to +inf to a range between 0 and 1. Since logistic functions output the probability of occurrence of an event, they can be applied to many real-life scenarios therefore these models are very popular.

Discover More ›

Do I use logistic or linear regression? ›

Linear regression is used for continuous outcome variables (e.g., days of hospitalization or FEV1), and logistic regression is used for categorical outcome variables, such as death. Independent variables can be continuous, categorical, or a mix of both.

Get More Info Here ›

How do you interpret Z value in logistic regression? ›

In logistic regression, the z test can be used to interpret the significance of the coefficients in the model. It helps determine whether the coefficients are statistically different from zero, indicating a significant relationship between the predictor variables and the outcome variable.

Read On ›

How do you interpret regression analysis? ›

Linear regression and interpretation

In this equation, β₀ is the y intercept and refers to the estimated value of y when x is equal to 0. The coefficient β₁ is the regression coefficient and denotes that the estimated increase in the dependent variable for every unit increase in the independent variable.

Keep Reading ›