Similar to ACFrOgDHQC5OjIl5Q9jxVubx7Sot2XrlBki_kWu7QeD_CcOBLjkoUqIWzF_pIdWB9F91KupVVJdfRHEOHBF03mcm2hOQdV2b9fbqjReLZ5O3WmoYg18c8AOi-6A-Y8P4FpfdEBlifXh_7_xm5NRj.pdf
Comparing EDA with classical and Bayesian analysis.pptxPremaGanesh1
Similar to ACFrOgDHQC5OjIl5Q9jxVubx7Sot2XrlBki_kWu7QeD_CcOBLjkoUqIWzF_pIdWB9F91KupVVJdfRHEOHBF03mcm2hOQdV2b9fbqjReLZ5O3WmoYg18c8AOi-6A-Y8P4FpfdEBlifXh_7_xm5NRj.pdf (20)
2. Introduction to pandas
Pandas is an open source python library providing high performance data
manipulation and analysis tool using its powerful data structure .Pandas is the
backbone for most of the data projects.
Through pandas , you get acquainted with your data by cleaning,transforming
and analyzing it.Python with pandas is used in a wide range of fields including
academic and commercial domains including
finance,economics,Statistics,analytics,etc.
3. Pandas package
You need to import the library module
pip install pandas
Import pandas as pd
Here, pd is referred to as an alias to the Pandas. However, it is not necessary to
import the library using alias, it just helps in writing less amount of code every
time a method or property is called.
Pandas generally provide two data structure for manipulating data, They are:
1)Series
2)DataFrame
5. Pandas Series
Pandas Series is a one-dimensional labeled array capable of holding data of any
type (integer, string, float, python datatypes, etc.). The axis labels are collectively
called index. Pandas Series is nothing but a column in an excel sheet.
6. Creating a series from array
In order to create a series from array, we have to import a numpy module and
have to use array() function.
import pandas as pd
import numpy as np
# simple array
data = np.array(['p','y','t','h','o','n'])
ser = pd.Series(data)
print(ser)
7. Creating a series from list
In order to create a series from list, we have to first create a list after that we can
create a series from list.
import pandas as pd
# a simple list
list = ['p', 'y', 't', 'h', ‘o', 'n']
# create series form a list
ser = pd.Series(list)
print(ser)
8. Creating a series from dictionary
# import the pandas lib as pd
import pandas as pd
# create a dictionary
dictionary = {'A' : 10, 'B' : 20, 'C' : 30}
# create a series
series = pd.Series(dictionary)
print(series)
9. Accessing elements from series with position
# import pandas and numpy
import pandas as pd
import numpy as np
# creating simple array
data = np.array(['p','y','t','h','o','n', 'p','r','o','g','r','a','m'])
ser = pd.Series(data)
#retrieve the first five elements
print(ser[:5])
10. Accessing Element Using Label (index)
# import pandas and numpy
import pandas as pd
import numpy as np
# creating simple array
data = np.array(['p','y','t','h','o','n'])
ser = pd.Series( data, index=[10,11,12,13,14,15])
print(ser)
# accessing a element using index element
print(ser[12])
11. To find maximum value
import pandas as pd
# Creating the Series
sr = pd.Series([10, 25, 3, 25, 24, 6],index=['Coca Cola', 'Sprite', 'Coke', 'Fanta',
'Dew', ‘ThumbsUp‘])
# Print the series
print(sr)
#maximum value
result=max(sr)
print(result)
14. Data frame
Pandas DataFrame is two-dimensional size-mutable, potentially heterogeneous
tabular data structure with labeled axes (rows and columns). A Data frame is a
two-dimensional data structure, i.e., data is aligned in a tabular fashion in rows
and columns.
15. Creating a data frame using List
import pandas as pd
# list of strings
lst = ['data', 'visualization', 'using', 'python',
'and', 'r', 'programming']
# converting list into data frame
df = pd.DataFrame(lst)
print(df)
16. Creating a dataframe from dictionary
import pandas as pd
# intialise data of lists.
data = {'Name':['Tom', 'nick', ‘krish', 'jack'],
'Age':[20, 21, 19, 18]}
# Create DataFrame
df = pd.DataFrame(data)
# Print the output.
print(df)
17. Column selection
import pandas as pd
# Define a dictionary containing employee data
data = {'Name':['Jai‘, ‘Princi', 'Gaurav', ‘Anuj'],
'Age':[27, 24, 22, 32],
'Address':['Delhi', 'Kanpur', 'Allahabad', ‘Kannauj'],
'Qualification':[‘Msc', 'MA', 'MCA', ‘Phd']}
# Convert the dictionary into DataFrame
df = pd.DataFrame(data)
print(df)
# select two columns
print(df[['Name', 'Qualification']])
18. Working with missing data
# importing pandas as pd
import pandas as pd
# dictionary of lists
dict = {'First Score':[100, 90, np.nan, 95],
'Second Score': [30, 45, 56, np.nan],
'Third Score':[np.nan, 40, 80, 98]}
# creating a dataframe from list
df = pd.DataFrame(dict)
print(df)
20. Filling missing values
In order to fill null values in the data sets we can use fillna() function
import pandas as pd
import numpy as np
dict = {'First Score':[100, 90, np.nan, 95],
'Second Score': [30, 45, 56, np.nan],
'Third Score':[np.nan, 40, 80, 98]}
df = pd.DataFrame(dict)
df.fillna(0)