Published in · 5 min read · May 17, 2018
--
Logistic Regression is one of the basic and popular algorithms to solve a classification problem. It is named ‘Logistic Regression’ because its underlying technique is quite the same as Linear Regression. The term “Logistic” is taken from the Logit function that is used in this method of classification.
This blog aims to answer the following questions:
1. What is the Classification problem?
2. Why not use Linear Regression?
3. Logistic Regression Algorithm?
4. What is Decision Boundary?
5. How to check model performance?
Today, let’s understand the Logistic Regression once and for all. Let’s start,
What is theClassification Problem?
We identify the problem as a classification problem when independent variables are continuous in nature and the dependent variable is in categorical form i.e. in classes like positive class and negative class. The real-life example of classification example would be, to categorize the mail as spam or not spam, to categorize the tumor as malignant or benign, and to categorize the transaction as fraudulent or genuine. All these problem’s answers are in categorical form i.e. Yes or No. and that is why they are two-class classification problems.
Although, sometimes we come across more than 2 classes, and still it is a classification problem. These types of problems are known as multi-class classification problems.
Why not use Linear Regression?
Suppose we have data of tumor size vs its malignancy. As it is a classification problem, if we plot, we can see, all the values will lie on 0 and 1. And if we fit the best-found regression line, by assuming the threshold at 0.5, we can do line pretty reasonable job.
We can decide the point on the x-axis from where all the values lie to its left side are considered as a negative class and all the values lie to its right side are positive class.
But what if there is an outlier in the data. Things would get pretty messy. For example, for 0.5 thresholds,
If we fit the best-found regression line, it still won’t be enough to decide any point by which we can differentiate classes. It will put some positive class examples into negative class. The green dotted line (Decision Boundary) is dividing malignant tumors from benign tumors but the line should have been at a yellow line which is clearly dividing the positive and negative examples. So just a single outlier is disturbing the whole linear regression predictions. And that is where logistic regression comes into the picture.
Logistic Regression Algorithm
As discussed earlier, to deal with outliers, Logistic Regression uses the Sigmoid function.
An explanation of logistic regression can begin with an explanation of the standard logistic function. The logistic function is a Sigmoid function, which takes any real value between zero and one. It is defined as
And if we plot it, the graph will be S curve,
Let’s consider t as a linear function in a univariate regression model.
So the Logistic Equation will become
Now, when the logistic regression model comes across an outlier, it will take care of it.
But sometimes it will shift its y-axis to left or right depending on outliers' positions.
What is Decision Boundary?
Decision boundary helps to differentiate probabilities into positive class and negative class.
Linear Decision Boundary
Non-Linear Decision Boundary
How to check performance?
To check the performance, we can use the confusion matrix and AUC - ROC Curve. To know what it is, check my article about the confusion matrix and AUC - ROC Curve.
References:
Images are taken from Andrew Ng Course and modified a bit as they are easy to understand😁.
Thanks for Reading.
I hope I’ve given you some understanding of what exactly is Logistic Regression. If you like this post, a tad of extra motivation will be helpful by giving this post some claps 👏. I am always open to your questions and suggestions. You can share this on Facebook, Twitter, Linkedin, so someone in need might stumble upon this.
You can reach me at:
LinkedIn : https://www.linkedin.com/in/narkhedesarang/
Twitter : https://twitter.com/narkhede_sarang
Github : https://github.com/TheSarang