SlideShare a Scribd company logo
1 of 21
For Python programming language the most
popular library for working with 1d/2d data sets
is Pandas.
 For 1D data such as a sequence of
numbers pandas.Series object is very
appropriate.
Output:
 For 2D data such object is
called pandas.DataFrame.
 3D data
#list
myList = ["The", "earth", "revolves", "around", "sun"]
print(myList) #printing list
['The', 'earth', 'revolves', 'around', 'sun']
 A Data frame is a two-dimensional data
structure, i.e., data is aligned in a tabular
fashion in rows and columns.
Features of Data Frame
 Potentially columns are of different types
 Size – Mutable
 Labeled axes (rows and columns)
 Can Perform Arithmetic operations on rows
and columns
 Let us assume that we are creating a data
frame with student’s data
 A pandas DataFrame can be created using the
following constructor −
pandas.DataFrame( data, index, columns, dtype, copy)
•Create an Empty DataFrame
A basic DataFrame, which can be created is an Empty Dataframe.
Example:
#import the pandas library and aliasing as pd
import pandas as pd
df = pd.DataFrame()
print df
Its output is as follows −
Empty DataFrame Columns: [] Index: []
Its Output is as follows:
import pandas as pd
data = [['Aman',10],[‘Ajay',12],[‘Abhi',13]]
df = pd.DataFrame(data,columns=['Name','Age'],dtype=float)
print df
Name Age
0 Aman 10.0
1 Ajay 12.0
2 Abhi 13.0
Output
import pandas as pd
names = ['Bob','Jessica','Mary','John','Mel']
births = [968, 155, 77, 578, 973]
BabyDataSet = list(zip(names,births))
print(BabyDataSet)
df = pd.DataFrame(data = BabyDataSet, columns=['Names', 'Births'])
print(df)
df.to_csv('demo.csv')
[('Bob', 968), ('Jessica', 155), ('Mary', 77), ('John', 578), ('Mel', 973)]
Names Births
0 Bob 968
1Jessica 155
2 Mary 77
3 John 578
4 Mel 973
 Output
import pandas as pd
d = {'one' : pd.Series([1, 2, 3], index=['a', 'b', 'c']), 'two' : pd.Series([1, 2, 3, 4], index=['a', 'b', 'c', 'd'])}
df = pd.DataFrame(d)
# Adding a new column to an existing DataFrame object with column label by passing new series
print ("Adding a new column by passing as Series:")
df['three']=pd.Series([10,20,30],index=['a','b','c'])
print df
print ("Adding a new column using the existing columns in DataFrame:")
df['four']=df['one']+df['three']
print df
Adding a new column by passing as Series:
one two three
a 1.0 1 10.0
b 2.0 2 20.0
c 3.0 3 30.0
d NaN 4 NaN
Adding a new column using the existing columns in DataFrame:
one two three four
a 1.0 1 10.0 11.0
b 2.0 2 20.0 22.0
c 3.0 3 30.0 33.0
d NaN 4 NaN
# importing pandas as pd
import pandas as pd
# Creating the dataframe
df = pd.DataFrame({"A":[12, 4, 5, None, 1],
"B":[7, 2, 54, 3, None],
"C":[20, 16, 11, 3, 8],
"D":[14, 3, None, 2, 6]})
# skip the Na values while finding the maximum
df.max(axis = 1)
Output:
Max() is used to find the maximum value .
Similarly , to find the minimum value we use min() in place of max()
Mean Function in Python pandas
(Dataframe, Row and column wise mean)
mean() – Mean Function in python pandas is used to calculate the
arithmetic mean of a given set of numbers, mean of a data frame
,mean of column and mean of rows .
import pandas as pd
import numpy as np
#Create a DataFrame
d = { 'Name':['Alisa','Bobby','Cathrine','Madonna','Rocky',
'Sebastian','Jaqluine', 'Rahul','David','Andrew','Ajay','Teresa'],
'Score1':[62,47,55,74,31,77,85,63,42,32,71,57],
'Score2':[89,87,67,55,47,72,76,79,44,92,99,69]}
df = pd.DataFrame(d)
df
# mean of the dataframe
df.mean()
Output:
Score1 58.0
Score2 73.0
dtype: float64
from pandas import DataFrame
import pandas as pd
d = {'one':[2,3,1,4,5], 'two':[5,4,3,2,1], 'letter':['a','a','b','b','c']}
df = DataFrame(d)
test = df.sort_values(['one'], ascending=[False])
the output is:
letter one two
2 b 1 3
0 a 2 5
1 a 3 4
3 b 4 2
4 c 5 1
Sorting :
If ascending=False , data will be sorted in descending order.
Otherwise, by default the data will be sorted in ascending
order.
Groupby
name age
employme
nt_status state
Anush 23emp pb
Ankush 32unemp pb
Alisha 21emp pb
Rohit 34emp hp
Komal 26unemp hr
Karthik 29emp hr
import pandas as pd
import numpy as np
df1 =
pd.read_csv('datasets/stackdata
setexample.csv')
print(df1)
#print
(df1.groupby(["state"])[['name']].
count())
j=df1['state'].value_counts()
print(j)
name age employment_status state
0 Anush 23 emp pb
1 Ankush 32 unemp pb
2 Alisha 21 emp pb
3 Rohit 34 emp hp
4 Komal 26 unemp hr
5 Karthik 29 emp hr
name
state
hp 1
hr 2
pb 3
pb 3
Hr 2
hp 1
Name: state, dtype: int64
Output:
Drop Duplicate and missing value
A B C
foo 0A
foo 1A
foo 1B
bar 1A
foo 0A
Aman CSE Python
Anu IT
Anuradha CSE PHP
Nisha BigData
Pankaj CSE
Ankit Java
Rohit IT Android
Anu IT
Duplicate data
Missing data
import pandas as pd
df = pd.read_csv('datasetsdropduplicatesexample.csv')
print(df)
ee=df.drop_duplicates()
#print(ee) #check whole row for duplicacy
e=df.drop_duplicates(subset=['A', 'C'])
print(e) #drop rows which match on columns A and C
e.to_csv("aaa.csv")
import pandas as pd
#if we want to write 0 in those columns which have nan
#df = pd.read_csv('datasets/dropnaexample.csv')
df = pd.read_csv('datasets/dropnaexample.csv', header=None)
print(df)
df_drop_missing = df.dropna()
#print(df_drop_missing)
df_fill = df.fillna(1) #you can fill any number
print(df_fill)
Filters
name year salary
0Aman 2017 40000
1Raman 2017 24000
2Anita 2017 31000
3Kajal 2017 20000
4Arun 2017 30000
5Aman 2017 25000
import pandas as pd
import numpy as np
df = pd.read_csv('datasets/filtersexample.csv')
#print(df)
filtered = df.query('salary>30000') #salary greater than 30,000
#print(filtered)
df_filtered = df[(df.salary >= 30000) & (df.year == 2017)]
#print(df_filtered)
#print(df.salary.unique()) # list of unique items
#print(df.name.nunique()) #give the count of unque values
Unnamed: 0 name year salary
0 0 Aman 2017 40000
1 1 Raman 2017 24000
2 2 Anita 2017 31000
3 3 Kajal 2017 20000
4 4 Arun 2017 30000
5 5 Aman 2017 25000
Unnamed: 0 name year salary
0 0 Aman 2017 40000
2 2 Anita 2017 31000
Unnamed: 0 name year salary
0 0 Aman 2017 40000
2 2 Anita 2017 31000
4 4 Arun 2017 30000
[40000 24000 31000 20000 30000 25000]
5
Output:
Joins
subject_id
first_nam
e
last_name
0 4 Billy Bonder
1 5 Navi Black
2 6 Swati Balwner
3 7 Shivali Brice
4 8 Kamal Btisan
df_new = pd.concat([df_a, df_b])
df_new
subject_id first_name
last_name
0 1 Ajay Anderson
1 2 Abhi Ackerman
2 3 Aman Ali
3 4 Avi Aoni
4 5 Aksh Atiches
0 4 Billy Bonder
1 5 Navi Black
2 6 Swati Balwner
3 7 Shivali Brice
4 8 Kamal Btisan
df_a df_b
df_new
subject_id first_name last_name
0 1 Ajay Anderson
1 2 Abhi Ackerman
2 3 Aman Ali
3 4 Avi Aoni
4 5 Aksh Atiches
pd.concat([df_a, df_b], axis=1)
subject_id first_name last_name subject_id first_name last_name
0 1 Ajay Anderson 4 Billy Bonder
1 2 Abhi Ackerman 5 Navi Black
2 3 Aman Ali 6 Swati Balwner
3 4 Avi Aoni 7 Shivali Brice
4 5 Aksh Atiches 8 Kamal Btisan
pd.merge(df_a, df_b, on='subject_id', how='right')
subject_id
first_name_x last_name_x first_name_y last_name_y
0 4 Avi Aoni Billy Bonder
1 5 Aksh Atiches Navi Black
2 6 NaN NaN Swati Balwner
3 7 NaN NaN Shivali Brice
4 8 NaN NaN Kamal Btisan
Merge with right join
pd.merge(df_a, df_b, on='subject_id', how='left')
subject_id first_name_x last_name_x first_name_y last_name_y
0 1 Ajay Anderson NaN NaN
1 2 Abhi Ackerman NaN NaN
2 3 Aman Ali NaN NaN
3 4 Avi Aoni Billy Bonder
4 5 Aksh Atiches Navi Black
Merge with left join
“Left outer join produces a complete set of records from Table A, with
the matching records (where available) in Table B. If there is no
match, the right side will contain null.”
pd.merge(df_a, df_b, on='subject_id', how='inner')
subject_id
first_name_x last_name_x first_name_y last_name_y
0 4 Avi Aoni Billy Bonder
1 5 Aksh Atiches Navi Black
Merge with inner join
“Inner join produces only the set of records
that match in both Table A and Table B.”
pd.merge(df_a, df_b, on='subject_id', how='outer')
subject_id first_name_x last_name_x first_name_y last_name_y
0 1 Ajay Anderson NaN NaN
1 2 Abhi Ackerman NaN NaN
2 3 Aman Ali NaN NaN
3 4 Avi Aoni Billy Bonder
4 5 Aksh Atiches Navi Black
5 6 NaN NaN Swati Balwner
6 7 NaN NaN Shivali Brice
7 8 NaN NaN Kamal Btisan
Merge with outer join
“Full outer join produces the set of all records in Table A and
Table B, with matching records from both sides where available.
If there is no match, the missing side will contain null.”

More Related Content

Similar to Presentation on Pandas in _ detail .pptx

Python-for-Data-Analysis.pptx
Python-for-Data-Analysis.pptxPython-for-Data-Analysis.pptx
Python-for-Data-Analysis.pptxParveenShaik21
 
ACFrOgDHQC5OjIl5Q9jxVubx7Sot2XrlBki_kWu7QeD_CcOBLjkoUqIWzF_pIdWB9F91KupVVJdfR...
ACFrOgDHQC5OjIl5Q9jxVubx7Sot2XrlBki_kWu7QeD_CcOBLjkoUqIWzF_pIdWB9F91KupVVJdfR...ACFrOgDHQC5OjIl5Q9jxVubx7Sot2XrlBki_kWu7QeD_CcOBLjkoUqIWzF_pIdWB9F91KupVVJdfR...
ACFrOgDHQC5OjIl5Q9jxVubx7Sot2XrlBki_kWu7QeD_CcOBLjkoUqIWzF_pIdWB9F91KupVVJdfR...DineshThallapelly
 
Working with Graphs _python.pptx
Working with Graphs _python.pptxWorking with Graphs _python.pptx
Working with Graphs _python.pptxMrPrathapG
 
XII - 2022-23 - IP - RAIPUR (CBSE FINAL EXAM).pdf
XII -  2022-23 - IP - RAIPUR (CBSE FINAL EXAM).pdfXII -  2022-23 - IP - RAIPUR (CBSE FINAL EXAM).pdf
XII - 2022-23 - IP - RAIPUR (CBSE FINAL EXAM).pdfKrishnaJyotish1
 
3 pandasadvanced
3 pandasadvanced3 pandasadvanced
3 pandasadvancedpramod naik
 
Pandas Dataframe reading data Kirti final.pptx
Pandas Dataframe reading data  Kirti final.pptxPandas Dataframe reading data  Kirti final.pptx
Pandas Dataframe reading data Kirti final.pptxKirti Verma
 
PPT on Data Science Using Python
PPT on Data Science Using PythonPPT on Data Science Using Python
PPT on Data Science Using PythonNishantKumar1179
 
python_lab_manual_final (1).pdf
python_lab_manual_final (1).pdfpython_lab_manual_final (1).pdf
python_lab_manual_final (1).pdfkeerthu0442
 
Getting started with Pandas Cheatsheet.pdf
Getting started with Pandas Cheatsheet.pdfGetting started with Pandas Cheatsheet.pdf
Getting started with Pandas Cheatsheet.pdfSudhakarVenkey
 
Python Cheat Sheet 2.0.pdf
Python Cheat Sheet 2.0.pdfPython Cheat Sheet 2.0.pdf
Python Cheat Sheet 2.0.pdfRahul Jain
 
Python Library-Series.pptx
Python Library-Series.pptxPython Library-Series.pptx
Python Library-Series.pptxJustinDsouza12
 
pandas for series and dataframe.pptx
pandas for series and dataframe.pptxpandas for series and dataframe.pptx
pandas for series and dataframe.pptxssuser52a19e
 
Data Wrangling with Pandas
Data Wrangling with PandasData Wrangling with Pandas
Data Wrangling with PandasLuis Carrasco
 
Pandas cheat sheet_data science
Pandas cheat sheet_data sciencePandas cheat sheet_data science
Pandas cheat sheet_data scienceSubrata Shaw
 
Pandas Cheat Sheet
Pandas Cheat SheetPandas Cheat Sheet
Pandas Cheat SheetACASH1011
 

Similar to Presentation on Pandas in _ detail .pptx (20)

interenship.pptx
interenship.pptxinterenship.pptx
interenship.pptx
 
Python-for-Data-Analysis.pptx
Python-for-Data-Analysis.pptxPython-for-Data-Analysis.pptx
Python-for-Data-Analysis.pptx
 
ACFrOgDHQC5OjIl5Q9jxVubx7Sot2XrlBki_kWu7QeD_CcOBLjkoUqIWzF_pIdWB9F91KupVVJdfR...
ACFrOgDHQC5OjIl5Q9jxVubx7Sot2XrlBki_kWu7QeD_CcOBLjkoUqIWzF_pIdWB9F91KupVVJdfR...ACFrOgDHQC5OjIl5Q9jxVubx7Sot2XrlBki_kWu7QeD_CcOBLjkoUqIWzF_pIdWB9F91KupVVJdfR...
ACFrOgDHQC5OjIl5Q9jxVubx7Sot2XrlBki_kWu7QeD_CcOBLjkoUqIWzF_pIdWB9F91KupVVJdfR...
 
Working with Graphs _python.pptx
Working with Graphs _python.pptxWorking with Graphs _python.pptx
Working with Graphs _python.pptx
 
Pandas.pptx
Pandas.pptxPandas.pptx
Pandas.pptx
 
XII - 2022-23 - IP - RAIPUR (CBSE FINAL EXAM).pdf
XII -  2022-23 - IP - RAIPUR (CBSE FINAL EXAM).pdfXII -  2022-23 - IP - RAIPUR (CBSE FINAL EXAM).pdf
XII - 2022-23 - IP - RAIPUR (CBSE FINAL EXAM).pdf
 
3 pandasadvanced
3 pandasadvanced3 pandasadvanced
3 pandasadvanced
 
Lecture 9.pptx
Lecture 9.pptxLecture 9.pptx
Lecture 9.pptx
 
Pandas Dataframe reading data Kirti final.pptx
Pandas Dataframe reading data  Kirti final.pptxPandas Dataframe reading data  Kirti final.pptx
Pandas Dataframe reading data Kirti final.pptx
 
PPT on Data Science Using Python
PPT on Data Science Using PythonPPT on Data Science Using Python
PPT on Data Science Using Python
 
python_lab_manual_final (1).pdf
python_lab_manual_final (1).pdfpython_lab_manual_final (1).pdf
python_lab_manual_final (1).pdf
 
Pandas Series
Pandas SeriesPandas Series
Pandas Series
 
Getting started with Pandas Cheatsheet.pdf
Getting started with Pandas Cheatsheet.pdfGetting started with Pandas Cheatsheet.pdf
Getting started with Pandas Cheatsheet.pdf
 
Python Cheat Sheet 2.0.pdf
Python Cheat Sheet 2.0.pdfPython Cheat Sheet 2.0.pdf
Python Cheat Sheet 2.0.pdf
 
Python Library-Series.pptx
Python Library-Series.pptxPython Library-Series.pptx
Python Library-Series.pptx
 
pandas for series and dataframe.pptx
pandas for series and dataframe.pptxpandas for series and dataframe.pptx
pandas for series and dataframe.pptx
 
Data Wrangling with Pandas
Data Wrangling with PandasData Wrangling with Pandas
Data Wrangling with Pandas
 
Pandas cheat sheet_data science
Pandas cheat sheet_data sciencePandas cheat sheet_data science
Pandas cheat sheet_data science
 
Pandas cheat sheet
Pandas cheat sheetPandas cheat sheet
Pandas cheat sheet
 
Pandas Cheat Sheet
Pandas Cheat SheetPandas Cheat Sheet
Pandas Cheat Sheet
 

More from 16115yogendraSingh

A bug reconnaissance tool is typically software or a system used by cybersecu...
A bug reconnaissance tool is typically software or a system used by cybersecu...A bug reconnaissance tool is typically software or a system used by cybersecu...
A bug reconnaissance tool is typically software or a system used by cybersecu...16115yogendraSingh
 
multipleaccesstechniques-201110153309.pptx
multipleaccesstechniques-201110153309.pptxmultipleaccesstechniques-201110153309.pptx
multipleaccesstechniques-201110153309.pptx16115yogendraSingh
 
loopoptimization-180418113642.pdf
loopoptimization-180418113642.pdfloopoptimization-180418113642.pdf
loopoptimization-180418113642.pdf16115yogendraSingh
 
Chapter 8 Deriving and Reclassifying Fields.pptx
Chapter 8 Deriving and Reclassifying Fields.pptxChapter 8 Deriving and Reclassifying Fields.pptx
Chapter 8 Deriving and Reclassifying Fields.pptx16115yogendraSingh
 

More from 16115yogendraSingh (7)

A bug reconnaissance tool is typically software or a system used by cybersecu...
A bug reconnaissance tool is typically software or a system used by cybersecu...A bug reconnaissance tool is typically software or a system used by cybersecu...
A bug reconnaissance tool is typically software or a system used by cybersecu...
 
multipleaccesstechniques-201110153309.pptx
multipleaccesstechniques-201110153309.pptxmultipleaccesstechniques-201110153309.pptx
multipleaccesstechniques-201110153309.pptx
 
Python Basics.pptx
Python Basics.pptxPython Basics.pptx
Python Basics.pptx
 
loopoptimization-180418113642.pdf
loopoptimization-180418113642.pdfloopoptimization-180418113642.pdf
loopoptimization-180418113642.pdf
 
pptseminar.pptx
pptseminar.pptxpptseminar.pptx
pptseminar.pptx
 
51095137-Semantic-WEB.ppt
51095137-Semantic-WEB.ppt51095137-Semantic-WEB.ppt
51095137-Semantic-WEB.ppt
 
Chapter 8 Deriving and Reclassifying Fields.pptx
Chapter 8 Deriving and Reclassifying Fields.pptxChapter 8 Deriving and Reclassifying Fields.pptx
Chapter 8 Deriving and Reclassifying Fields.pptx
 

Recently uploaded

Concrete Mix Design - IS 10262-2019 - .pptx
Concrete Mix Design - IS 10262-2019 - .pptxConcrete Mix Design - IS 10262-2019 - .pptx
Concrete Mix Design - IS 10262-2019 - .pptxKartikeyaDwivedi3
 
power system scada applications and uses
power system scada applications and usespower system scada applications and uses
power system scada applications and usesDevarapalliHaritha
 
What are the advantages and disadvantages of membrane structures.pptx
What are the advantages and disadvantages of membrane structures.pptxWhat are the advantages and disadvantages of membrane structures.pptx
What are the advantages and disadvantages of membrane structures.pptxwendy cai
 
Heart Disease Prediction using machine learning.pptx
Heart Disease Prediction using machine learning.pptxHeart Disease Prediction using machine learning.pptx
Heart Disease Prediction using machine learning.pptxPoojaBan
 
Software and Systems Engineering Standards: Verification and Validation of Sy...
Software and Systems Engineering Standards: Verification and Validation of Sy...Software and Systems Engineering Standards: Verification and Validation of Sy...
Software and Systems Engineering Standards: Verification and Validation of Sy...VICTOR MAESTRE RAMIREZ
 
CCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdf
CCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdfCCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdf
CCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdfAsst.prof M.Gokilavani
 
Call Girls Delhi {Jodhpur} 9711199012 high profile service
Call Girls Delhi {Jodhpur} 9711199012 high profile serviceCall Girls Delhi {Jodhpur} 9711199012 high profile service
Call Girls Delhi {Jodhpur} 9711199012 high profile servicerehmti665
 
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...Soham Mondal
 
Introduction to Microprocesso programming and interfacing.pptx
Introduction to Microprocesso programming and interfacing.pptxIntroduction to Microprocesso programming and interfacing.pptx
Introduction to Microprocesso programming and interfacing.pptxvipinkmenon1
 
Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort service
Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort serviceGurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort service
Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort servicejennyeacort
 
Past, Present and Future of Generative AI
Past, Present and Future of Generative AIPast, Present and Future of Generative AI
Past, Present and Future of Generative AIabhishek36461
 
Churning of Butter, Factors affecting .
Churning of Butter, Factors affecting  .Churning of Butter, Factors affecting  .
Churning of Butter, Factors affecting .Satyam Kumar
 
Electronically Controlled suspensions system .pdf
Electronically Controlled suspensions system .pdfElectronically Controlled suspensions system .pdf
Electronically Controlled suspensions system .pdfme23b1001
 
Oxy acetylene welding presentation note.
Oxy acetylene welding presentation note.Oxy acetylene welding presentation note.
Oxy acetylene welding presentation note.eptoze12
 
Sachpazis Costas: Geotechnical Engineering: A student's Perspective Introduction
Sachpazis Costas: Geotechnical Engineering: A student's Perspective IntroductionSachpazis Costas: Geotechnical Engineering: A student's Perspective Introduction
Sachpazis Costas: Geotechnical Engineering: A student's Perspective IntroductionDr.Costas Sachpazis
 
Call Girls Narol 7397865700 Independent Call Girls
Call Girls Narol 7397865700 Independent Call GirlsCall Girls Narol 7397865700 Independent Call Girls
Call Girls Narol 7397865700 Independent Call Girlsssuser7cb4ff
 
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130Suhani Kapoor
 

Recently uploaded (20)

Concrete Mix Design - IS 10262-2019 - .pptx
Concrete Mix Design - IS 10262-2019 - .pptxConcrete Mix Design - IS 10262-2019 - .pptx
Concrete Mix Design - IS 10262-2019 - .pptx
 
power system scada applications and uses
power system scada applications and usespower system scada applications and uses
power system scada applications and uses
 
What are the advantages and disadvantages of membrane structures.pptx
What are the advantages and disadvantages of membrane structures.pptxWhat are the advantages and disadvantages of membrane structures.pptx
What are the advantages and disadvantages of membrane structures.pptx
 
Heart Disease Prediction using machine learning.pptx
Heart Disease Prediction using machine learning.pptxHeart Disease Prediction using machine learning.pptx
Heart Disease Prediction using machine learning.pptx
 
Software and Systems Engineering Standards: Verification and Validation of Sy...
Software and Systems Engineering Standards: Verification and Validation of Sy...Software and Systems Engineering Standards: Verification and Validation of Sy...
Software and Systems Engineering Standards: Verification and Validation of Sy...
 
CCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdf
CCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdfCCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdf
CCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdf
 
9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf
9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf
9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf
 
Call Girls Delhi {Jodhpur} 9711199012 high profile service
Call Girls Delhi {Jodhpur} 9711199012 high profile serviceCall Girls Delhi {Jodhpur} 9711199012 high profile service
Call Girls Delhi {Jodhpur} 9711199012 high profile service
 
Call Us -/9953056974- Call Girls In Vikaspuri-/- Delhi NCR
Call Us -/9953056974- Call Girls In Vikaspuri-/- Delhi NCRCall Us -/9953056974- Call Girls In Vikaspuri-/- Delhi NCR
Call Us -/9953056974- Call Girls In Vikaspuri-/- Delhi NCR
 
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
 
Exploring_Network_Security_with_JA3_by_Rakesh Seal.pptx
Exploring_Network_Security_with_JA3_by_Rakesh Seal.pptxExploring_Network_Security_with_JA3_by_Rakesh Seal.pptx
Exploring_Network_Security_with_JA3_by_Rakesh Seal.pptx
 
Introduction to Microprocesso programming and interfacing.pptx
Introduction to Microprocesso programming and interfacing.pptxIntroduction to Microprocesso programming and interfacing.pptx
Introduction to Microprocesso programming and interfacing.pptx
 
Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort service
Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort serviceGurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort service
Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort service
 
Past, Present and Future of Generative AI
Past, Present and Future of Generative AIPast, Present and Future of Generative AI
Past, Present and Future of Generative AI
 
Churning of Butter, Factors affecting .
Churning of Butter, Factors affecting  .Churning of Butter, Factors affecting  .
Churning of Butter, Factors affecting .
 
Electronically Controlled suspensions system .pdf
Electronically Controlled suspensions system .pdfElectronically Controlled suspensions system .pdf
Electronically Controlled suspensions system .pdf
 
Oxy acetylene welding presentation note.
Oxy acetylene welding presentation note.Oxy acetylene welding presentation note.
Oxy acetylene welding presentation note.
 
Sachpazis Costas: Geotechnical Engineering: A student's Perspective Introduction
Sachpazis Costas: Geotechnical Engineering: A student's Perspective IntroductionSachpazis Costas: Geotechnical Engineering: A student's Perspective Introduction
Sachpazis Costas: Geotechnical Engineering: A student's Perspective Introduction
 
Call Girls Narol 7397865700 Independent Call Girls
Call Girls Narol 7397865700 Independent Call GirlsCall Girls Narol 7397865700 Independent Call Girls
Call Girls Narol 7397865700 Independent Call Girls
 
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
 

Presentation on Pandas in _ detail .pptx

  • 1.
  • 2. For Python programming language the most popular library for working with 1d/2d data sets is Pandas.  For 1D data such as a sequence of numbers pandas.Series object is very appropriate. Output:  For 2D data such object is called pandas.DataFrame.  3D data #list myList = ["The", "earth", "revolves", "around", "sun"] print(myList) #printing list ['The', 'earth', 'revolves', 'around', 'sun']
  • 3.  A Data frame is a two-dimensional data structure, i.e., data is aligned in a tabular fashion in rows and columns. Features of Data Frame  Potentially columns are of different types  Size – Mutable  Labeled axes (rows and columns)  Can Perform Arithmetic operations on rows and columns
  • 4.  Let us assume that we are creating a data frame with student’s data
  • 5.  A pandas DataFrame can be created using the following constructor − pandas.DataFrame( data, index, columns, dtype, copy) •Create an Empty DataFrame A basic DataFrame, which can be created is an Empty Dataframe. Example: #import the pandas library and aliasing as pd import pandas as pd df = pd.DataFrame() print df Its output is as follows − Empty DataFrame Columns: [] Index: []
  • 6. Its Output is as follows: import pandas as pd data = [['Aman',10],[‘Ajay',12],[‘Abhi',13]] df = pd.DataFrame(data,columns=['Name','Age'],dtype=float) print df Name Age 0 Aman 10.0 1 Ajay 12.0 2 Abhi 13.0
  • 7. Output import pandas as pd names = ['Bob','Jessica','Mary','John','Mel'] births = [968, 155, 77, 578, 973] BabyDataSet = list(zip(names,births)) print(BabyDataSet) df = pd.DataFrame(data = BabyDataSet, columns=['Names', 'Births']) print(df) df.to_csv('demo.csv') [('Bob', 968), ('Jessica', 155), ('Mary', 77), ('John', 578), ('Mel', 973)] Names Births 0 Bob 968 1Jessica 155 2 Mary 77 3 John 578 4 Mel 973
  • 8.  Output import pandas as pd d = {'one' : pd.Series([1, 2, 3], index=['a', 'b', 'c']), 'two' : pd.Series([1, 2, 3, 4], index=['a', 'b', 'c', 'd'])} df = pd.DataFrame(d) # Adding a new column to an existing DataFrame object with column label by passing new series print ("Adding a new column by passing as Series:") df['three']=pd.Series([10,20,30],index=['a','b','c']) print df print ("Adding a new column using the existing columns in DataFrame:") df['four']=df['one']+df['three'] print df Adding a new column by passing as Series: one two three a 1.0 1 10.0 b 2.0 2 20.0 c 3.0 3 30.0 d NaN 4 NaN Adding a new column using the existing columns in DataFrame: one two three four a 1.0 1 10.0 11.0 b 2.0 2 20.0 22.0 c 3.0 3 30.0 33.0 d NaN 4 NaN
  • 9. # importing pandas as pd import pandas as pd # Creating the dataframe df = pd.DataFrame({"A":[12, 4, 5, None, 1], "B":[7, 2, 54, 3, None], "C":[20, 16, 11, 3, 8], "D":[14, 3, None, 2, 6]}) # skip the Na values while finding the maximum df.max(axis = 1) Output: Max() is used to find the maximum value . Similarly , to find the minimum value we use min() in place of max()
  • 10. Mean Function in Python pandas (Dataframe, Row and column wise mean) mean() – Mean Function in python pandas is used to calculate the arithmetic mean of a given set of numbers, mean of a data frame ,mean of column and mean of rows . import pandas as pd import numpy as np #Create a DataFrame d = { 'Name':['Alisa','Bobby','Cathrine','Madonna','Rocky', 'Sebastian','Jaqluine', 'Rahul','David','Andrew','Ajay','Teresa'], 'Score1':[62,47,55,74,31,77,85,63,42,32,71,57], 'Score2':[89,87,67,55,47,72,76,79,44,92,99,69]} df = pd.DataFrame(d) df # mean of the dataframe df.mean() Output: Score1 58.0 Score2 73.0 dtype: float64
  • 11. from pandas import DataFrame import pandas as pd d = {'one':[2,3,1,4,5], 'two':[5,4,3,2,1], 'letter':['a','a','b','b','c']} df = DataFrame(d) test = df.sort_values(['one'], ascending=[False]) the output is: letter one two 2 b 1 3 0 a 2 5 1 a 3 4 3 b 4 2 4 c 5 1 Sorting : If ascending=False , data will be sorted in descending order. Otherwise, by default the data will be sorted in ascending order.
  • 12. Groupby name age employme nt_status state Anush 23emp pb Ankush 32unemp pb Alisha 21emp pb Rohit 34emp hp Komal 26unemp hr Karthik 29emp hr import pandas as pd import numpy as np df1 = pd.read_csv('datasets/stackdata setexample.csv') print(df1) #print (df1.groupby(["state"])[['name']]. count()) j=df1['state'].value_counts() print(j) name age employment_status state 0 Anush 23 emp pb 1 Ankush 32 unemp pb 2 Alisha 21 emp pb 3 Rohit 34 emp hp 4 Komal 26 unemp hr 5 Karthik 29 emp hr name state hp 1 hr 2 pb 3 pb 3 Hr 2 hp 1 Name: state, dtype: int64 Output:
  • 13. Drop Duplicate and missing value A B C foo 0A foo 1A foo 1B bar 1A foo 0A Aman CSE Python Anu IT Anuradha CSE PHP Nisha BigData Pankaj CSE Ankit Java Rohit IT Android Anu IT Duplicate data Missing data import pandas as pd df = pd.read_csv('datasetsdropduplicatesexample.csv') print(df) ee=df.drop_duplicates() #print(ee) #check whole row for duplicacy e=df.drop_duplicates(subset=['A', 'C']) print(e) #drop rows which match on columns A and C e.to_csv("aaa.csv") import pandas as pd #if we want to write 0 in those columns which have nan #df = pd.read_csv('datasets/dropnaexample.csv') df = pd.read_csv('datasets/dropnaexample.csv', header=None) print(df) df_drop_missing = df.dropna() #print(df_drop_missing) df_fill = df.fillna(1) #you can fill any number print(df_fill)
  • 14. Filters name year salary 0Aman 2017 40000 1Raman 2017 24000 2Anita 2017 31000 3Kajal 2017 20000 4Arun 2017 30000 5Aman 2017 25000 import pandas as pd import numpy as np df = pd.read_csv('datasets/filtersexample.csv') #print(df) filtered = df.query('salary>30000') #salary greater than 30,000 #print(filtered) df_filtered = df[(df.salary >= 30000) & (df.year == 2017)] #print(df_filtered) #print(df.salary.unique()) # list of unique items #print(df.name.nunique()) #give the count of unque values Unnamed: 0 name year salary 0 0 Aman 2017 40000 1 1 Raman 2017 24000 2 2 Anita 2017 31000 3 3 Kajal 2017 20000 4 4 Arun 2017 30000 5 5 Aman 2017 25000 Unnamed: 0 name year salary 0 0 Aman 2017 40000 2 2 Anita 2017 31000 Unnamed: 0 name year salary 0 0 Aman 2017 40000 2 2 Anita 2017 31000 4 4 Arun 2017 30000 [40000 24000 31000 20000 30000 25000] 5 Output:
  • 15. Joins
  • 16. subject_id first_nam e last_name 0 4 Billy Bonder 1 5 Navi Black 2 6 Swati Balwner 3 7 Shivali Brice 4 8 Kamal Btisan df_new = pd.concat([df_a, df_b]) df_new subject_id first_name last_name 0 1 Ajay Anderson 1 2 Abhi Ackerman 2 3 Aman Ali 3 4 Avi Aoni 4 5 Aksh Atiches 0 4 Billy Bonder 1 5 Navi Black 2 6 Swati Balwner 3 7 Shivali Brice 4 8 Kamal Btisan df_a df_b df_new subject_id first_name last_name 0 1 Ajay Anderson 1 2 Abhi Ackerman 2 3 Aman Ali 3 4 Avi Aoni 4 5 Aksh Atiches
  • 17. pd.concat([df_a, df_b], axis=1) subject_id first_name last_name subject_id first_name last_name 0 1 Ajay Anderson 4 Billy Bonder 1 2 Abhi Ackerman 5 Navi Black 2 3 Aman Ali 6 Swati Balwner 3 4 Avi Aoni 7 Shivali Brice 4 5 Aksh Atiches 8 Kamal Btisan
  • 18. pd.merge(df_a, df_b, on='subject_id', how='right') subject_id first_name_x last_name_x first_name_y last_name_y 0 4 Avi Aoni Billy Bonder 1 5 Aksh Atiches Navi Black 2 6 NaN NaN Swati Balwner 3 7 NaN NaN Shivali Brice 4 8 NaN NaN Kamal Btisan Merge with right join
  • 19. pd.merge(df_a, df_b, on='subject_id', how='left') subject_id first_name_x last_name_x first_name_y last_name_y 0 1 Ajay Anderson NaN NaN 1 2 Abhi Ackerman NaN NaN 2 3 Aman Ali NaN NaN 3 4 Avi Aoni Billy Bonder 4 5 Aksh Atiches Navi Black Merge with left join “Left outer join produces a complete set of records from Table A, with the matching records (where available) in Table B. If there is no match, the right side will contain null.”
  • 20. pd.merge(df_a, df_b, on='subject_id', how='inner') subject_id first_name_x last_name_x first_name_y last_name_y 0 4 Avi Aoni Billy Bonder 1 5 Aksh Atiches Navi Black Merge with inner join “Inner join produces only the set of records that match in both Table A and Table B.”
  • 21. pd.merge(df_a, df_b, on='subject_id', how='outer') subject_id first_name_x last_name_x first_name_y last_name_y 0 1 Ajay Anderson NaN NaN 1 2 Abhi Ackerman NaN NaN 2 3 Aman Ali NaN NaN 3 4 Avi Aoni Billy Bonder 4 5 Aksh Atiches Navi Black 5 6 NaN NaN Swati Balwner 6 7 NaN NaN Shivali Brice 7 8 NaN NaN Kamal Btisan Merge with outer join “Full outer join produces the set of all records in Table A and Table B, with matching records from both sides where available. If there is no match, the missing side will contain null.”