Logistic Regression

In this tutorial we will discuss briefly about the logistic regression and implement it.

Building a classification model :

Import the libraries and tools we will be using in logistic regression.

Pandas – for data cleaning and processing

Numpy – mathematical and logical operations

Seaborn and matplot – for data visualization in graphical form

Let us have an overview on our data set.

Columns in the dataset:

SurvivedWhether the person survived or not1= survived 
pclassClass of the passenger
sexMale or female
ageAge of the person
sibspNumber of siblings and spouses
parchNumber of parents or children to the particular traveller
fareCost of the ticket
Embarked Boarding point
Deck Floor of the ship

Empty data:

Completely there are 891 rows in the data set.

There are 177 empty data sets in the age column, so we have to fill the empty data by considering some logic. And there are lot of null points in deck row( eliminating this row would be a good option ) , and there are 2 null points in embarked ( we would remove 2 rows, which wouldn’t affect much to our dataset ) 

Eliminating the deck column:

Now we have to put the missing age values based on some other column.

The above pictures show the relation between the age and other variables. The best fit is age and pclass, higher the age greater class which means if a person  is more aged which means he has earned more and thus he can afford to travel in a better class.

So now we would assign the null values in the age column based on their class. For example people travelling class 1 has age between 30 and 50 ( most of them , can be estimated by box plot ) 

Now we would define a function to replace all the null values based on class

Here we have filled the age null values based on box plot majority and assigned the random integers between that range.

Now we would drop all the remaining null points

Now the machine doesn’t understand a few things like male and female category it only understands yes or no. SO now we have to convert those things. There is a class in pandas known as dummies , which creates few columns based on number of categories ( here 2 ) and assign either zero or one values.

As we can see that if it is zero for male it would be definitely one for female, thing is also recognized by machine learning model. In this way we should modify embarked and pclass too. To drop the other value

Once we modify the specific columns , we have to concatenate them with the actual data set.

Now as we that there is no need of few columns such as sex, emnarked and we can drop those columns.

Now there  are no null points in the data set, our first and important part of building a model is successfully completed.

It is suggested that try out some other logic to fill the missing age of the passengers. 

Now we would divide our data into two parts, Target variable and other parameters.

X= titanic.drop(‘survived’,axis=1)

Y=titanic[‘survived’]

Now we have to import few classes from sklearn as we have done in linear regression.

Now we have to train the model

logre = LogisticRegression()

logre.fit(X_train,Y_train)

Now we would test and evaluate our model:

We also can print the confusion matrix for this model

This was evaluation of our model .

Topics covered in upcoming tutorials:

  • KNN

This is end of this tutorial logistic regression. For more information visit the Data Science Section.

Spread knowledge

Leave a Comment

Your email address will not be published. Required fields are marked *