Logistic Regression
In this tutorial we will discuss briefly about the logistic regression and implement it.
Building a classification model :
Import the libraries and tools we will be using in logistic regression.
Pandas – for data cleaning and processing
Numpy – mathematical and logical operations
Seaborn and matplot – for data visualization in graphical form
Let us have an overview on our data set.
Columns in the dataset:
Survived | Whether the person survived or not1= survived |
pclass | Class of the passenger |
sex | Male or female |
age | Age of the person |
sibsp | Number of siblings and spouses |
parch | Number of parents or children to the particular traveller |
fare | Cost of the ticket |
Embarked | Boarding point |
Deck | Floor of the ship |
Empty data:
Completely there are 891 rows in the data set.
There are 177 empty data sets in the age column, so we have to fill the empty data by considering some logic. And there are lot of null points in deck row( eliminating this row would be a good option ) , and there are 2 null points in embarked ( we would remove 2 rows, which wouldn’t affect much to our dataset )
Eliminating the deck column:
Now we have to put the missing age values based on some other column.
The above pictures show the relation between the age and other variables. The best fit is age and pclass, higher the age greater class which means if a person is more aged which means he has earned more and thus he can afford to travel in a better class.
So now we would assign the null values in the age column based on their class. For example people travelling class 1 has age between 30 and 50 ( most of them , can be estimated by box plot )
Now we would define a function to replace all the null values based on class
Here we have filled the age null values based on box plot majority and assigned the random integers between that range.
Now we would drop all the remaining null points
Now the machine doesn’t understand a few things like male and female category it only understands yes or no. SO now we have to convert those things. There is a class in pandas known as dummies , which creates few columns based on number of categories ( here 2 ) and assign either zero or one values.
As we can see that if it is zero for male it would be definitely one for female, thing is also recognized by machine learning model. In this way we should modify embarked and pclass too. To drop the other value
Once we modify the specific columns , we have to concatenate them with the actual data set.
Now as we that there is no need of few columns such as sex, emnarked and we can drop those columns.
Now there are no null points in the data set, our first and important part of building a model is successfully completed.
It is suggested that try out some other logic to fill the missing age of the passengers.
Now we would divide our data into two parts, Target variable and other parameters.
X= titanic.drop(‘survived’,axis=1)
Y=titanic[‘survived’]
Now we have to import few classes from sklearn as we have done in linear regression.
Now we have to train the model
logre = LogisticRegression()
logre.fit(X_train,Y_train)
Now we would test and evaluate our model:
We also can print the confusion matrix for this model
This was evaluation of our model .
Topics covered in upcoming tutorials:
- KNN
This is end of this tutorial logistic regression. For more information visit the Data Science Section.