SlideShare a Scribd company logo
PANDAS
A POWERFUL DATA MANIPULATION TOOL
/ /EmmaJasonAMyers @jasonamyers
WHAT'S SO SPECIAL ABOUT PANDAS?
1. Tabular/Matrix
2. DataFlexibility
3. DataManipulation
4. Time Series
INSTALLATIONpipinstallpandas
pipinstallpandasaspd
PANDAS DATA STRUCTURES
Series -basicallyan ordered dictthatcan be named
Dataframe -Alabeled two dimensionaldatatype
SERIES
importpandasaspd
cookies=pd.Series(
[
'ChocolateChip,'
'PeanutButter,'
'GingerMolasses,'
'OatmealRaisin,'
'Sugar',
'Oreo',
]
)
WHAT DOES IT LOOK LIKE?
0 ChocolateChip
1 PeanutButter
2 GingerMolasses
3 OatmealRaisin
4 Sugar
5 Oreo
dtype:object
PROPERTIES
>>>cookies.values
array(['ChocolateChip','PeanutButter','GingerMolasses',
'OatmealRaisin','Sugar','Oreo'],dtype=object)
>>>cookies.index
Int64Index([0,1,2,3,4,5],dtype='int64')
SPECIFYING THE INDEX
cookies=pd.Series([12,10,8,6,4,2],index=['ChocolateChip',
'PeanutButter',
'GingerMolasses',
'OatmealRaisin',
'Sugar',
'PowderSugar'
])
INDEXED SERIES
ChocolateChip 12
PeanutButter 10
GingerMolasses 8
OatmealRaisin 6
Sugar 4
PowderSugar 2
dtype:int64
NAMING THE VALUES AND INDEXES
>>>cookies.name='counts'
>>>cookies.index.name='type'
type
ChocolateChip 12
PeanutButter 10
GingerMolasses 8
OatmealRaisin 6
Sugar 4
PowderSugar 2
Name:counts,dtype:int64
ACCESSING ELEMENTS
>>>cookies[[name.endswith('Sugar')fornameincookies.index]]
Sugar 4
PowderSugar 2
dtype:int64
>>>cookies[cookies>10]
ChocolateChip 12
Name:counts,dtype:int64
DATAFRAMES
df=pd.DataFrame({
'count':[12,10,8,6,2,2,2],
'type':['ChocolateChip','PeanutButter','GingerMolasses','OatmealRaisin','Sug
'owner':['Jason','Jason','Jason','Jason','Jason','Jason','Marvin']
})
count owner type
0 12 Jason ChocolateChip
1 10 Jason PeanutButter
2 8 Jason GingerMolasses
3 6 Jason OatmealRaisin
4 2 Jason Sugar
5 2 Jason PowderSugar
6 2 Marvin Sugar
ACCESSING COLUMNS
>>>df['type']
0 ChocolateChip
1 PeanutButter
2 GingerMolasses
3 OatmealRaisin
4 Sugar
5 PowderSugar
6 Sugar
Name:type,dtype:object
ACCESSING ROWS
>>>df.loc[2]
count 8
owner Jason
type GingerMolasses
Name:2,dtype:object
SLICING ROWS
>>>df.loc[2:5]
count owner type
2 8 Jason GingerMolasses
3 6 Jason OatmealRaisin
4 2 Jason Sugar
5 2 Jason PowderSugar
PIVOTING
>>>df.loc[3:4].T
3 4
count 6 2
owner Jason Jason
type OatmealRaisin Sugar
GROUPING
>>>df.groupby('owner').sum()
count
owner
Jason 40
Marvin 2
>>>df.groupby(['type','owner']).sum()
count
type owner
ChocolateChip Jason 12
GingerMolassesJason 8
OatmealRaisin Jason 6
PeanutButter Jason 10
PowderSugar Jason 2
Sugar Jason 2
Marvin 2
RENAMING COLUMNS
>>>g_sum=df.groupby(['type']).sum()
>>>g_sum.columns=['Total']
Total
sum
ChocolateChip 12
GingerMolasses 8
OatmealRaisin 6
PeanutButter 10
PowderSugar 2
Sugar 4
PIVOT TABLES
>>>pd.pivot_table(df,values='count',index=['type'],columns=['owner'])
Owner Jason Marvin
type
ChocolateChip 12 NaN
GingerMolasses 8 NaN
OatmealRaisin 6 NaN
PeanutButter 10 NaN
PowderSugar 2 NaN
Sugar 2 2
JOINING
>>>df=pivot_t.join(g_sum)
>>>df.fillna(0,inplace=True)
Jason Marvin Total
type
ChocolateChip 12 0 12
GingerMolasses 8 0 8
OatmealRaisin 6 0 6
PeanutButter 10 0 10
PowderSugar 2 0 2
Sugar 2 2 4
REAL WORLD PROBLEM
OUR DATASOURCE
2014-06-2417:20:23.014642,0,34,102,0,0,0,60
2014-06-2417:25:01.176772,0,32,174,0,0,0,133
2014-06-2417:30:01.370235,0,28,57,0,0,0,75
2014-07-2114:35:01.797838,0,39,74,0,0,0,30,0,262,2,3,3,0
2014-07-2114:40:02.000434,0,54,143,0,0,0,44,0,499,3,9,9,0
READING FROM A CSV
df=pd.read_csv('results.csv',header=0,quotechar=''')
datetime abuse_passthrough any_abuse_handled ...
0 2014-06-2417:20:23.014642 0 34 ...
SETTING THE DATETIME AS THE INDEX
>>>df['datetime']=pandas.to_datetime(df.datetime)
>>>df.index=df.datetime
>>>deldf['datetime']
abuse_passthrough any_abuse_handled...
datetime ...
2014-06-2417:20:23.014642 0 34...
2014-06-2417:25:01.176772 0 32...
TIME SLICING
>>>df['2014-07-2113:55:00':'2014-07-2114:10:00']
abuse_passthrough any_abuse_handled...
datetime ...
2014-07-2113:55:01.153706 0 24...
2014-07-2114:00:01.372624 0 24...
2014-07-2114:05:01.910827 0 32...
HandlingMissingDataPoints
>>>df.fillna(0,inplace=True)
FUNCTIONS
>>>df.sum()
abuse_passthrough 39
any_abuse_handled 81537
handle_bp_message_handled 271689
handle_bp_message_corrupt_handled 0
error 0
forward_all_unhandled 0
original_message_handled 136116
list_unsubscribe_optout 71
default_handler_dropped 1342285
default_unhandled 2978
default_opt_out_bounce 22044
default_opt_out 23132
default_handler_pattern_dropped 0
dtype:float64
>>>df.sum().sum()
1879891.0
>>>df.mean()
abuse_passthrough 0.009673
any_abuse_handled 20.222470
handle_bp_message_handled 67.383185
handle_bp_message_corrupt_handled 0.000000
error 0.000000
forward_all_unhandled 0.000000
original_message_handled 33.758929
list_unsubscribe_optout 0.017609
default_handler_dropped 332.907986
default_unhandled 0.738591
default_opt_out_bounce 5.467262
default_opt_out 5.737103
default_handler_pattern_dropped 0.000000
dtype:float64
>>>df['2014-07-2113:55:00':'2014-07-2114:10:00'].apply(np.cumsum)
abuse_passthrough any_abuse_handled...
datetime ...
2014-07-2113:55:01.153706 0 24...
2014-07-2114:00:01.372624 0 48...
2014-07-2114:05:01.910827 0 80...
RESAMPLING
>>>d_df=df.resample('1D',how='sum')
abuse_passthrough any_abuse_handled...
datetime ...
2014-07-07 0 3178...
2014-07-08 1 6536...
2014-07-09 2 6857...
SORTING
>>>d_df.sort('any_abuse_handled',ascending=False)
abuse_passthrough any_abuse_handled...
datetime ...
2014-07-15 21 7664...
2014-07-17 5 7548...
2014-07-10 0 7106...
2014-07-11 10 6942...
DESCRIBE
>>>d_df.describe()
abuse_passthrough any_abuse_handled...
count 15.00000 15.000000...
mean 2.60000 5435.800000...
std 5.79162 1848.716358...
min 0.00000 2174.000000...
25% 0.00000 3810.000000...
50% 0.00000 6191.000000...
75% 1.50000 6899.500000...
max 21.00000 7664.000000...
OUTPUT TO CSV
>>>d_df.to_csv(path_or_buf='output.csv')
ONE MORE THING...
Vincent
importvincent
CHARTS
chart=vincent.StackedArea(d_df)
chart.legend(title='Legend')
chart.colors(brew='Set3')
EXAMPLE STACKED AREA
EXAMPLE LINE
EXAMPLE PIE
QUESTIONS
JASON A MYERS / @JASONAMYERS

More Related Content

More from Jason Myers

Filling the flask
Filling the flaskFilling the flask
Filling the flask
Jason Myers
 
Building CLIs that Click
Building CLIs that ClickBuilding CLIs that Click
Building CLIs that Click
Jason Myers
 
Spanning Tree Algorithm
Spanning Tree AlgorithmSpanning Tree Algorithm
Spanning Tree Algorithm
Jason Myers
 
SQLAlchemy Core: An Introduction
SQLAlchemy Core: An IntroductionSQLAlchemy Core: An Introduction
SQLAlchemy Core: An Introduction
Jason Myers
 
Generating Power with Yield
Generating Power with YieldGenerating Power with Yield
Generating Power with Yield
Jason Myers
 
Introduction to SQLAlchemy and Alembic Migrations
Introduction to SQLAlchemy and Alembic MigrationsIntroduction to SQLAlchemy and Alembic Migrations
Introduction to SQLAlchemy and Alembic Migrations
Jason Myers
 
Diabetes and Me: My Journey So Far
Diabetes and Me: My Journey So FarDiabetes and Me: My Journey So Far
Diabetes and Me: My Journey So Far
Jason Myers
 
Selenium testing
Selenium testingSelenium testing
Selenium testing
Jason Myers
 
Coderfaire Data Networking for Developers
Coderfaire Data Networking for DevelopersCoderfaire Data Networking for Developers
Coderfaire Data Networking for DevelopersJason Myers
 

More from Jason Myers (9)

Filling the flask
Filling the flaskFilling the flask
Filling the flask
 
Building CLIs that Click
Building CLIs that ClickBuilding CLIs that Click
Building CLIs that Click
 
Spanning Tree Algorithm
Spanning Tree AlgorithmSpanning Tree Algorithm
Spanning Tree Algorithm
 
SQLAlchemy Core: An Introduction
SQLAlchemy Core: An IntroductionSQLAlchemy Core: An Introduction
SQLAlchemy Core: An Introduction
 
Generating Power with Yield
Generating Power with YieldGenerating Power with Yield
Generating Power with Yield
 
Introduction to SQLAlchemy and Alembic Migrations
Introduction to SQLAlchemy and Alembic MigrationsIntroduction to SQLAlchemy and Alembic Migrations
Introduction to SQLAlchemy and Alembic Migrations
 
Diabetes and Me: My Journey So Far
Diabetes and Me: My Journey So FarDiabetes and Me: My Journey So Far
Diabetes and Me: My Journey So Far
 
Selenium testing
Selenium testingSelenium testing
Selenium testing
 
Coderfaire Data Networking for Developers
Coderfaire Data Networking for DevelopersCoderfaire Data Networking for Developers
Coderfaire Data Networking for Developers
 

Recently uploaded

zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex Proofs
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex ProofszkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex Proofs
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex Proofs
Alex Pruden
 
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfObservability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Paige Cruz
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
Kari Kakkonen
 
Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
Alan Dix
 
The Metaverse and AI: how can decision-makers harness the Metaverse for their...
The Metaverse and AI: how can decision-makers harness the Metaverse for their...The Metaverse and AI: how can decision-makers harness the Metaverse for their...
The Metaverse and AI: how can decision-makers harness the Metaverse for their...
Jen Stirrup
 
Introduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - CybersecurityIntroduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - Cybersecurity
mikeeftimakis1
 
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
SOFTTECHHUB
 
PCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase TeamPCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase Team
ControlCase
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
DanBrown980551
 
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance
 
The Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and SalesThe Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and Sales
Laura Byrne
 
Video Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the FutureVideo Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the Future
Alpen-Adria-Universität
 
GraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge GraphGraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge Graph
Guy Korland
 
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
James Anderson
 
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdfFIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance
 
Generative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to ProductionGenerative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to Production
Aggregage
 
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
UiPathCommunity
 
Elizabeth Buie - Older adults: Are we really designing for our future selves?
Elizabeth Buie - Older adults: Are we really designing for our future selves?Elizabeth Buie - Older adults: Are we really designing for our future selves?
Elizabeth Buie - Older adults: Are we really designing for our future selves?
Nexer Digital
 
Removing Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software FuzzingRemoving Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software Fuzzing
Aftab Hussain
 
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdfFIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance
 

Recently uploaded (20)

zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex Proofs
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex ProofszkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex Proofs
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex Proofs
 
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfObservability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
 
Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
 
The Metaverse and AI: how can decision-makers harness the Metaverse for their...
The Metaverse and AI: how can decision-makers harness the Metaverse for their...The Metaverse and AI: how can decision-makers harness the Metaverse for their...
The Metaverse and AI: how can decision-makers harness the Metaverse for their...
 
Introduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - CybersecurityIntroduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - Cybersecurity
 
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
 
PCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase TeamPCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase Team
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
 
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
 
The Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and SalesThe Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and Sales
 
Video Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the FutureVideo Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the Future
 
GraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge GraphGraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge Graph
 
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
 
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdfFIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
 
Generative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to ProductionGenerative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to Production
 
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
 
Elizabeth Buie - Older adults: Are we really designing for our future selves?
Elizabeth Buie - Older adults: Are we really designing for our future selves?Elizabeth Buie - Older adults: Are we really designing for our future selves?
Elizabeth Buie - Older adults: Are we really designing for our future selves?
 
Removing Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software FuzzingRemoving Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software Fuzzing
 
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdfFIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
 

Introduction to Pandas