LINEAR REGRESSION WITH PYTHON

Linear regression is building a model that predicts one variable dependent on other variables that are dependent on each other. The variable to be predicted is known as the dependent variable. In simple words, Linear regression creates a linear relationship model to find the dependent variable. 

In this tutorial, we will learn how to perform the regression, training data, testing data and other important things related to linear regression. The dataset we have chosen for this tutorial is the number of people who spend time online on this website, an application and here our target variable would be Yearly Amount Spent.

Step-1:

Import few important libraries in jupyter notebook

pandas – for cleaning the data, and other manipulations.

Numpy – for mathematical operations on data sets

Matplot and seaborn – for data visualization 

%matplotlib inline – For graphs to be accessed in jupyter notebook

Step-2

Now let us check whether this dataset has some linear relationship or not. To test this we already have learned about linear model plot. Let us plot a Linear model graph between target variables and other few variables.  

From this we can say that there is increment in Yearly amount spent whenever there is increase in length of membership.

As we can see in the above linear relations between the variables, there is good linear relationship between Yearly spent Amount and Length of Membership.

Step-3

Let us divide this data into two sets X set and Y set, Y set is our target variable and X are the features that have impact on target variable.

Step 4:

Now we have to train some part of X_data and some part of Y_data. To train the data we have to import few classes from sklearn library, Here to import train_test_split we have to write

from sklearn.model_selection import train_test_split

Once the things are imported we have to declare the test and train data , by the passing the function as shown in figure, then we have to give the parameters and size is the percentage parameter ( 0.4 = 40% ) , which means 40% data should be test data.

Step 5:

Now import the linear regression from the sklearn , by passing 

from sklearn.linear_model import LinearRegression

Then we have to create a variable passing the linear regression function through it.

Now pass the trained models in the linear regression function.

This means the training of the model is completed , now we have to predict the y_test data by passing the x_test data ( which is still unknown data to the model ) 

Step-5:

Data prediction by the trained model.

Here a new terms we came across is lm.coef_  , which means let us consider coefficient of “ Time on App”, which means if all other 3 parameters are constant then one unit change in “Time on App” will cause 25.554076 increment or decrement in target variable “ here it is Yearly amount spent “ .

As we can see that we get almost a linear plot, which means we have built a good model.

Step 6:

Let us check the types of errors with the model we received. 

Firstly we have to import the metrics library

from sklearn import metrics 

As we see that errors we got are very less, which shows that our model is very good.

Step 7:

Comparison with actual value with predicted value.

Compare the first few values ( about 5 values ) , we can see that our model has really performed well. This is the end of this tutorial of Linear Regression

Spread knowledge

Leave a Comment

Your email address will not be published. Required fields are marked *