Introduction to seaborn in python
Seaborn in python is a library for statistical plotting , built upon the matplot library ( which means matplot is the base to learn seaborn ) . Seaborn has more features and styles compared to the matplot library. This library is designed to work very well with pandas dataframe object. Seaborn library is already inbuilt in jupyter notebook. Let us import seaborn library into the notebook.
Topics covered in last tutorial:
- Matplot library
- Plotting using matplot lib
- Functional method of plotting
- Object oriented method of plotting
Seaborn in python:
Importing only the library will not allow us to visualize plots in jupyter notebook. To access plots in jupyter notebook we have to pass the same thing we have passed while working with matplot library.
Here we will be working with a dataset named ‘tips’ which shows the information of customers who gave a tip to the waiters in the restaurant.
Let us learn how to import the dataset using seaborn.
In these seaborn tutorials we will be working with different type of graphical representations such as
- Histograms
- Heatmaps
- Linear model plots ( will be learning about this deeply in machine learning )
- Box plots
- Violin plots
- Scatter plots
- Hexa plots
- Matrix plots
- And much more
Let us basically plot a histogram of numerical data [ ‘tip’ ] from the data set we imported.
Syntax: sns.distplot( dataset_name[ ‘numeric_col_label’]
On plotting this we also get a line known as KDE ( kernel density estimation ) . To remove this line we have to pass another parameter in the distplot function.
Parameters of distplot ( ):
Parameter | It’s use |
hist | Default it is True which means visibility of histogram. |
kde | It is set True default, which means plot kde. Set it to False which will remove kde plot. |
bins | Specifications of hist bins. If unspecified , a reference rule is used to find a useful default. |
rug | This is a rug plot, default is rug= False.A rug plot is a dash marking for every single point along distribution line. |
color | This is the parameter to change the color of the plot. |
vertical | This parameter default is False, If it is given true, plot will be on y-axis instead of x-axis |
norm_hist | This parameter is False in default, if it is passed the graph will give density instead of count. |
axlabel | This parameter takes a string and places at the bottom of the plot. |
label | This parameter takes a string and places that relevant to the plot while using legend function. |
Now let us plot varying all these parameters.
- Let us set
hist = False - We already have seen kde as False in above plots
- Let us set
bins = 100 - Now we shall introduce rug plot in this plot.
- To change the color of the plot, we have to pass
sns.distplot( a[ ‘tip’ ], color=’red’ ) - Basically this plot is on the x-axis , to plot on y-axis pass
sns.displot ( a[‘ tip’ ], kde = False , vertical = True ) - As clearly observed the scale of the plots, that is the count of the tip( from the data set information ) , to get the density instead of count, we have to pass
sns.distplot ( a[ ‘tip’ ], kde = False , norm_hist = True )
These were the few important parameters of the distplot ( ).