Logistic regression
In the previous tutorials, we discussed linear regression and built a model to predict the values. In this tutorial we will get to know about Logistic Regression.
Logistic regression:
This regression is used to solve classification problems, It predicts discrete values. We cannot use linear regression to build a model on classification problems. This is because logistic regression should predict the value either zero ( false ) or one ( True ), whereas a linear regression model can start below zero and may not reach up to one. But the logistic regression somewhere lies only between zero and one.
Cutoff point:
In classification models, we require prediction in the form of either zero or one, what if the value is 0.64. Then we set up a cutoff point ( let here it would be 0.5 ) , if the value is above 0.5 it should be considered as one or else to be zero.
Sigmoid function:
f ( z ) = 1 / ( 1 + e-z )
This function is also known as a logistic function because it’s range is [ 0, 1 ] and takes any input but provides logistic output.
To evaluate our logistic regression model we have to use a confusion matrix.
Confusion matrix:
True prediction | False prediction | |
Actually True condition | True positive | False Negative |
Actually False condition | False positive | True negative |
There are two types of errors:
- Type- 1 error ( False positive )
- Type -2 error ( False negative )
Here in these tutorials, we would work with the most famous dataset “titanic” from kaggle.com. This data set has a list of passengers who died or survived in the Titanic, here we will be predicting whether a person dies or not based on our model.
We would start building our machine learning model from next tutorial.