Operations using pandas
Importing external files in jupyter notebook:
In jupyter notebook we can read and edit many types of files like excel files, csv files, html files and many more files.
data:image/s3,"s3://crabby-images/66a32/66a324899de0d8fa3c1d8ba31be4ff012229c5f6" alt=""
To read a file in jupyter notebook,
pd.read_csv(“file_name.csv”) #to read csv file
data:image/s3,"s3://crabby-images/2b453/2b45322b3fde33420236fda8a89a553b5a4dcee0" alt=""
There are many operations that can be performed on the data.
To view top few rows we have to pass
cars.head ( x ) #where x is number of rows we needed
Similarly to view botton part we have to pass
cars.tail ( x ) # where x in number of bottom rows we needed
data:image/s3,"s3://crabby-images/7330d/7330d4f91751136bcb58de9b6941de0e1b5d2e37" alt=""
To get an overview of the imported file, pass
cars.info( )
data:image/s3,"s3://crabby-images/86bd5/86bd566fdf4cbc0c8a5c3ff9e630da4ae7f30e3b" alt=""
Now where there are integer values we can get all statistical data by passing
cars.describe ( )
Here we only get the integer values and charcters get dropped, what if we also needed charcters then pass
data:image/s3,"s3://crabby-images/6532c/6532c4d47572653eaf8ee3b4aacd4cda8334bdec" alt=""
cars.describe ( include = “O” ) # here O stands for objects, In python objects means either a string or mixed data type
We can also check any presence of null values in the importes file.
Let us access all the column names by passing
cars.columns
data:image/s3,"s3://crabby-images/a5d57/a5d5749120c72fda14f9e5dfa65643f48ad04a6d" alt=""
Some main functions :
Unique values in a data frame:
When it comes to data manipulation there will be thousands of similar values in a data, sometimes it becomes important to extract unique values in a particular column. Here we will be working with a data set names “cars”, which has a collection of data of cars.
Let us see the number of unique cars present in the imported data set.
data:image/s3,"s3://crabby-images/fb2e3/fb2e308559cac5e964b5a8880d506267d4968fd7" alt=""
There are 300 unique cars in the data set. Let us check the car name which is repeated most.
data:image/s3,"s3://crabby-images/9d1ca/9d1ca0eb7c46bb5050467229988f4690bfb6440c" alt=""
value_counts( ) gives the count of each car, but now we need only top 5 repeated cars.
data:image/s3,"s3://crabby-images/8f6cc/8f6cc96b711c5f5e6cac164a526a0a5041f91a47" alt=""
Here we have created another dataframe by passing the previous function in it and then called the head part of the new dataframe.