SlideShare a Scribd company logo
Data Transformation
Workflow
(from gsheet to jupyter)
July 2019, @DinisCruz
OSBOT
(OWASP Security Bot)
Start Jupyter Server
Jupyter Lab Dev Environment
Step 1
Loading Data from GSheet
Original Data Set
Viewing Data in Slack
Viewing data in Jupyter
Pandas
DataFrame
Jupyter QGrid
All code and data is auto-saved in Git
Step 2
Parsing dataset
Explaining what we are doing in an Jupyter Notebook
Powerful development environment
Starting refactoring code into methods
Get 3 datasets as separate DataFrames
Helper function to transform raw data into objects
Refactored code that extracts data into 3 DataFrames
3 DataFrames with data
Step 3
Merging data
Example 1 - Merge 3 datasets
All fields from all 3 data sets
(in this case there was only one exact match on email)
Example 2 - Merging data loses user with different email
When Merging DF_1 with DF_3 we lose `Alan Lee` due
to different email
Example 3 - Fixing data before merge (version 1)
Lost user due to bad data in Name
Lost `Bruno Lyon`
User
Because `Name`
value is wrong
Example 4 - Fixing data before merge (version 2)
The merge of the two DataFrames now successfully finds
all 4 users (including Alan who has two emails)
By using email to find the first and last names values

More Related Content

More from Dinis Cruz

The benefits of police and industry investigation - NPCC Conference
The benefits of police and industry investigation - NPCC ConferenceThe benefits of police and industry investigation - NPCC Conference
The benefits of police and industry investigation - NPCC Conference
Dinis Cruz
 
Serverless Security Workflows - cyber talks - 19th nov 2019
Serverless  Security Workflows - cyber talks - 19th nov 2019Serverless  Security Workflows - cyber talks - 19th nov 2019
Serverless Security Workflows - cyber talks - 19th nov 2019
Dinis Cruz
 
Modern security using graphs, automation and data science
Modern security using graphs, automation and data scienceModern security using graphs, automation and data science
Modern security using graphs, automation and data science
Dinis Cruz
 
Using Wardley Maps to Understand Security's Landscape and Strategy
Using Wardley Maps to Understand Security's Landscape and StrategyUsing Wardley Maps to Understand Security's Landscape and Strategy
Using Wardley Maps to Understand Security's Landscape and Strategy
Dinis Cruz
 
Dinis Cruz (CV) - CISO and Transformation Agent v1.2
Dinis Cruz (CV) - CISO and Transformation Agent v1.2Dinis Cruz (CV) - CISO and Transformation Agent v1.2
Dinis Cruz (CV) - CISO and Transformation Agent v1.2
Dinis Cruz
 
Making fact based decisions and 4 board decisions (Oct 2019)
Making fact based decisions and 4 board decisions (Oct 2019)Making fact based decisions and 4 board decisions (Oct 2019)
Making fact based decisions and 4 board decisions (Oct 2019)
Dinis Cruz
 
CISO Application presentation - Babylon health security
CISO Application presentation - Babylon health securityCISO Application presentation - Babylon health security
CISO Application presentation - Babylon health security
Dinis Cruz
 
Using OWASP Security Bot (OSBot) to make Fact Based Security Decisions
Using OWASP Security Bot (OSBot) to make Fact Based Security DecisionsUsing OWASP Security Bot (OSBot) to make Fact Based Security Decisions
Using OWASP Security Bot (OSBot) to make Fact Based Security Decisions
Dinis Cruz
 
GSBot Commands (Slack Bot used to access Jira data)
GSBot Commands (Slack Bot used to access Jira data)GSBot Commands (Slack Bot used to access Jira data)
GSBot Commands (Slack Bot used to access Jira data)
Dinis Cruz
 
(OLD VERSION) Dinis Cruz (CV) - CISO and Transformation Agent v0.6
(OLD VERSION) Dinis Cruz (CV) - CISO and Transformation Agent v0.6 (OLD VERSION) Dinis Cruz (CV) - CISO and Transformation Agent v0.6
(OLD VERSION) Dinis Cruz (CV) - CISO and Transformation Agent v0.6
Dinis Cruz
 
Jira schemas - Open Security Summit (Working Session 21th May 2019)
Jira schemas  - Open Security Summit (Working Session 21th May 2019)Jira schemas  - Open Security Summit (Working Session 21th May 2019)
Jira schemas - Open Security Summit (Working Session 21th May 2019)
Dinis Cruz
 
Template for "Sharing anonymised risk theme dashboards v0.8"
Template for "Sharing anonymised risk theme dashboards v0.8"Template for "Sharing anonymised risk theme dashboards v0.8"
Template for "Sharing anonymised risk theme dashboards v0.8"
Dinis Cruz
 
Owasp and summits (may 2019)
Owasp and summits (may 2019)Owasp and summits (may 2019)
Owasp and summits (may 2019)
Dinis Cruz
 
Creating a graph based security organisation - Apr 2019 (OWASP London chapter...
Creating a graph based security organisation - Apr 2019 (OWASP London chapter...Creating a graph based security organisation - Apr 2019 (OWASP London chapter...
Creating a graph based security organisation - Apr 2019 (OWASP London chapter...
Dinis Cruz
 
Open security summit 2019 owasp london 25th feb
Open security summit 2019   owasp london 25th febOpen security summit 2019   owasp london 25th feb
Open security summit 2019 owasp london 25th feb
Dinis Cruz
 
Owasp summit 2019 - OWASP London 25th feb
Owasp summit 2019  - OWASP London 25th febOwasp summit 2019  - OWASP London 25th feb
Owasp summit 2019 - OWASP London 25th feb
Dinis Cruz
 
Evolving challenges for modern enterprise architectures in the age of APIs
Evolving challenges for modern enterprise architectures in the age of APIsEvolving challenges for modern enterprise architectures in the age of APIs
Evolving challenges for modern enterprise architectures in the age of APIs
Dinis Cruz
 
How to not fail at security data analytics (by CxOSidekick)
How to not fail at security data analytics (by CxOSidekick)How to not fail at security data analytics (by CxOSidekick)
How to not fail at security data analytics (by CxOSidekick)
Dinis Cruz
 
Thinking in graphs v1.0
Thinking in graphs v1.0Thinking in graphs v1.0
Thinking in graphs v1.0
Dinis Cruz
 
Open Security Summit - April 2018
Open Security Summit - April 2018 Open Security Summit - April 2018
Open Security Summit - April 2018
Dinis Cruz
 

More from Dinis Cruz (20)

The benefits of police and industry investigation - NPCC Conference
The benefits of police and industry investigation - NPCC ConferenceThe benefits of police and industry investigation - NPCC Conference
The benefits of police and industry investigation - NPCC Conference
 
Serverless Security Workflows - cyber talks - 19th nov 2019
Serverless  Security Workflows - cyber talks - 19th nov 2019Serverless  Security Workflows - cyber talks - 19th nov 2019
Serverless Security Workflows - cyber talks - 19th nov 2019
 
Modern security using graphs, automation and data science
Modern security using graphs, automation and data scienceModern security using graphs, automation and data science
Modern security using graphs, automation and data science
 
Using Wardley Maps to Understand Security's Landscape and Strategy
Using Wardley Maps to Understand Security's Landscape and StrategyUsing Wardley Maps to Understand Security's Landscape and Strategy
Using Wardley Maps to Understand Security's Landscape and Strategy
 
Dinis Cruz (CV) - CISO and Transformation Agent v1.2
Dinis Cruz (CV) - CISO and Transformation Agent v1.2Dinis Cruz (CV) - CISO and Transformation Agent v1.2
Dinis Cruz (CV) - CISO and Transformation Agent v1.2
 
Making fact based decisions and 4 board decisions (Oct 2019)
Making fact based decisions and 4 board decisions (Oct 2019)Making fact based decisions and 4 board decisions (Oct 2019)
Making fact based decisions and 4 board decisions (Oct 2019)
 
CISO Application presentation - Babylon health security
CISO Application presentation - Babylon health securityCISO Application presentation - Babylon health security
CISO Application presentation - Babylon health security
 
Using OWASP Security Bot (OSBot) to make Fact Based Security Decisions
Using OWASP Security Bot (OSBot) to make Fact Based Security DecisionsUsing OWASP Security Bot (OSBot) to make Fact Based Security Decisions
Using OWASP Security Bot (OSBot) to make Fact Based Security Decisions
 
GSBot Commands (Slack Bot used to access Jira data)
GSBot Commands (Slack Bot used to access Jira data)GSBot Commands (Slack Bot used to access Jira data)
GSBot Commands (Slack Bot used to access Jira data)
 
(OLD VERSION) Dinis Cruz (CV) - CISO and Transformation Agent v0.6
(OLD VERSION) Dinis Cruz (CV) - CISO and Transformation Agent v0.6 (OLD VERSION) Dinis Cruz (CV) - CISO and Transformation Agent v0.6
(OLD VERSION) Dinis Cruz (CV) - CISO and Transformation Agent v0.6
 
Jira schemas - Open Security Summit (Working Session 21th May 2019)
Jira schemas  - Open Security Summit (Working Session 21th May 2019)Jira schemas  - Open Security Summit (Working Session 21th May 2019)
Jira schemas - Open Security Summit (Working Session 21th May 2019)
 
Template for "Sharing anonymised risk theme dashboards v0.8"
Template for "Sharing anonymised risk theme dashboards v0.8"Template for "Sharing anonymised risk theme dashboards v0.8"
Template for "Sharing anonymised risk theme dashboards v0.8"
 
Owasp and summits (may 2019)
Owasp and summits (may 2019)Owasp and summits (may 2019)
Owasp and summits (may 2019)
 
Creating a graph based security organisation - Apr 2019 (OWASP London chapter...
Creating a graph based security organisation - Apr 2019 (OWASP London chapter...Creating a graph based security organisation - Apr 2019 (OWASP London chapter...
Creating a graph based security organisation - Apr 2019 (OWASP London chapter...
 
Open security summit 2019 owasp london 25th feb
Open security summit 2019   owasp london 25th febOpen security summit 2019   owasp london 25th feb
Open security summit 2019 owasp london 25th feb
 
Owasp summit 2019 - OWASP London 25th feb
Owasp summit 2019  - OWASP London 25th febOwasp summit 2019  - OWASP London 25th feb
Owasp summit 2019 - OWASP London 25th feb
 
Evolving challenges for modern enterprise architectures in the age of APIs
Evolving challenges for modern enterprise architectures in the age of APIsEvolving challenges for modern enterprise architectures in the age of APIs
Evolving challenges for modern enterprise architectures in the age of APIs
 
How to not fail at security data analytics (by CxOSidekick)
How to not fail at security data analytics (by CxOSidekick)How to not fail at security data analytics (by CxOSidekick)
How to not fail at security data analytics (by CxOSidekick)
 
Thinking in graphs v1.0
Thinking in graphs v1.0Thinking in graphs v1.0
Thinking in graphs v1.0
 
Open Security Summit - April 2018
Open Security Summit - April 2018 Open Security Summit - April 2018
Open Security Summit - April 2018
 

Recently uploaded

LF Energy Webinar: Carbon Data Specifications: Mechanisms to Improve Data Acc...
LF Energy Webinar: Carbon Data Specifications: Mechanisms to Improve Data Acc...LF Energy Webinar: Carbon Data Specifications: Mechanisms to Improve Data Acc...
LF Energy Webinar: Carbon Data Specifications: Mechanisms to Improve Data Acc...
DanBrown980551
 
Session 1 - Intro to Robotic Process Automation.pdf
Session 1 - Intro to Robotic Process Automation.pdfSession 1 - Intro to Robotic Process Automation.pdf
Session 1 - Intro to Robotic Process Automation.pdf
UiPathCommunity
 
Essentials of Automations: Exploring Attributes & Automation Parameters
Essentials of Automations: Exploring Attributes & Automation ParametersEssentials of Automations: Exploring Attributes & Automation Parameters
Essentials of Automations: Exploring Attributes & Automation Parameters
Safe Software
 
Must Know Postgres Extension for DBA and Developer during Migration
Must Know Postgres Extension for DBA and Developer during MigrationMust Know Postgres Extension for DBA and Developer during Migration
Must Know Postgres Extension for DBA and Developer during Migration
Mydbops
 
QA or the Highway - Component Testing: Bridging the gap between frontend appl...
QA or the Highway - Component Testing: Bridging the gap between frontend appl...QA or the Highway - Component Testing: Bridging the gap between frontend appl...
QA or the Highway - Component Testing: Bridging the gap between frontend appl...
zjhamm304
 
The Microsoft 365 Migration Tutorial For Beginner.pptx
The Microsoft 365 Migration Tutorial For Beginner.pptxThe Microsoft 365 Migration Tutorial For Beginner.pptx
The Microsoft 365 Migration Tutorial For Beginner.pptx
operationspcvita
 
Northern Engraving | Nameplate Manufacturing Process - 2024
Northern Engraving | Nameplate Manufacturing Process - 2024Northern Engraving | Nameplate Manufacturing Process - 2024
Northern Engraving | Nameplate Manufacturing Process - 2024
Northern Engraving
 
AI in the Workplace Reskilling, Upskilling, and Future Work.pptx
AI in the Workplace Reskilling, Upskilling, and Future Work.pptxAI in the Workplace Reskilling, Upskilling, and Future Work.pptx
AI in the Workplace Reskilling, Upskilling, and Future Work.pptx
Sunil Jagani
 
"$10 thousand per minute of downtime: architecture, queues, streaming and fin...
"$10 thousand per minute of downtime: architecture, queues, streaming and fin..."$10 thousand per minute of downtime: architecture, queues, streaming and fin...
"$10 thousand per minute of downtime: architecture, queues, streaming and fin...
Fwdays
 
"NATO Hackathon Winner: AI-Powered Drug Search", Taras Kloba
"NATO Hackathon Winner: AI-Powered Drug Search",  Taras Kloba"NATO Hackathon Winner: AI-Powered Drug Search",  Taras Kloba
"NATO Hackathon Winner: AI-Powered Drug Search", Taras Kloba
Fwdays
 
Day 2 - Intro to UiPath Studio Fundamentals
Day 2 - Intro to UiPath Studio FundamentalsDay 2 - Intro to UiPath Studio Fundamentals
Day 2 - Intro to UiPath Studio Fundamentals
UiPathCommunity
 
Getting the Most Out of ScyllaDB Monitoring: ShareChat's Tips
Getting the Most Out of ScyllaDB Monitoring: ShareChat's TipsGetting the Most Out of ScyllaDB Monitoring: ShareChat's Tips
Getting the Most Out of ScyllaDB Monitoring: ShareChat's Tips
ScyllaDB
 
AWS Certified Solutions Architect Associate (SAA-C03)
AWS Certified Solutions Architect Associate (SAA-C03)AWS Certified Solutions Architect Associate (SAA-C03)
AWS Certified Solutions Architect Associate (SAA-C03)
HarpalGohil4
 
Biomedical Knowledge Graphs for Data Scientists and Bioinformaticians
Biomedical Knowledge Graphs for Data Scientists and BioinformaticiansBiomedical Knowledge Graphs for Data Scientists and Bioinformaticians
Biomedical Knowledge Graphs for Data Scientists and Bioinformaticians
Neo4j
 
"Scaling RAG Applications to serve millions of users", Kevin Goedecke
"Scaling RAG Applications to serve millions of users",  Kevin Goedecke"Scaling RAG Applications to serve millions of users",  Kevin Goedecke
"Scaling RAG Applications to serve millions of users", Kevin Goedecke
Fwdays
 
Discover the Unseen: Tailored Recommendation of Unwatched Content
Discover the Unseen: Tailored Recommendation of Unwatched ContentDiscover the Unseen: Tailored Recommendation of Unwatched Content
Discover the Unseen: Tailored Recommendation of Unwatched Content
ScyllaDB
 
Principle of conventional tomography-Bibash Shahi ppt..pptx
Principle of conventional tomography-Bibash Shahi ppt..pptxPrinciple of conventional tomography-Bibash Shahi ppt..pptx
Principle of conventional tomography-Bibash Shahi ppt..pptx
BibashShahi
 
Leveraging the Graph for Clinical Trials and Standards
Leveraging the Graph for Clinical Trials and StandardsLeveraging the Graph for Clinical Trials and Standards
Leveraging the Graph for Clinical Trials and Standards
Neo4j
 
What is an RPA CoE? Session 1 – CoE Vision
What is an RPA CoE?  Session 1 – CoE VisionWhat is an RPA CoE?  Session 1 – CoE Vision
What is an RPA CoE? Session 1 – CoE Vision
DianaGray10
 
GNSS spoofing via SDR (Criptored Talks 2024)
GNSS spoofing via SDR (Criptored Talks 2024)GNSS spoofing via SDR (Criptored Talks 2024)
GNSS spoofing via SDR (Criptored Talks 2024)
Javier Junquera
 

Recently uploaded (20)

LF Energy Webinar: Carbon Data Specifications: Mechanisms to Improve Data Acc...
LF Energy Webinar: Carbon Data Specifications: Mechanisms to Improve Data Acc...LF Energy Webinar: Carbon Data Specifications: Mechanisms to Improve Data Acc...
LF Energy Webinar: Carbon Data Specifications: Mechanisms to Improve Data Acc...
 
Session 1 - Intro to Robotic Process Automation.pdf
Session 1 - Intro to Robotic Process Automation.pdfSession 1 - Intro to Robotic Process Automation.pdf
Session 1 - Intro to Robotic Process Automation.pdf
 
Essentials of Automations: Exploring Attributes & Automation Parameters
Essentials of Automations: Exploring Attributes & Automation ParametersEssentials of Automations: Exploring Attributes & Automation Parameters
Essentials of Automations: Exploring Attributes & Automation Parameters
 
Must Know Postgres Extension for DBA and Developer during Migration
Must Know Postgres Extension for DBA and Developer during MigrationMust Know Postgres Extension for DBA and Developer during Migration
Must Know Postgres Extension for DBA and Developer during Migration
 
QA or the Highway - Component Testing: Bridging the gap between frontend appl...
QA or the Highway - Component Testing: Bridging the gap between frontend appl...QA or the Highway - Component Testing: Bridging the gap between frontend appl...
QA or the Highway - Component Testing: Bridging the gap between frontend appl...
 
The Microsoft 365 Migration Tutorial For Beginner.pptx
The Microsoft 365 Migration Tutorial For Beginner.pptxThe Microsoft 365 Migration Tutorial For Beginner.pptx
The Microsoft 365 Migration Tutorial For Beginner.pptx
 
Northern Engraving | Nameplate Manufacturing Process - 2024
Northern Engraving | Nameplate Manufacturing Process - 2024Northern Engraving | Nameplate Manufacturing Process - 2024
Northern Engraving | Nameplate Manufacturing Process - 2024
 
AI in the Workplace Reskilling, Upskilling, and Future Work.pptx
AI in the Workplace Reskilling, Upskilling, and Future Work.pptxAI in the Workplace Reskilling, Upskilling, and Future Work.pptx
AI in the Workplace Reskilling, Upskilling, and Future Work.pptx
 
"$10 thousand per minute of downtime: architecture, queues, streaming and fin...
"$10 thousand per minute of downtime: architecture, queues, streaming and fin..."$10 thousand per minute of downtime: architecture, queues, streaming and fin...
"$10 thousand per minute of downtime: architecture, queues, streaming and fin...
 
"NATO Hackathon Winner: AI-Powered Drug Search", Taras Kloba
"NATO Hackathon Winner: AI-Powered Drug Search",  Taras Kloba"NATO Hackathon Winner: AI-Powered Drug Search",  Taras Kloba
"NATO Hackathon Winner: AI-Powered Drug Search", Taras Kloba
 
Day 2 - Intro to UiPath Studio Fundamentals
Day 2 - Intro to UiPath Studio FundamentalsDay 2 - Intro to UiPath Studio Fundamentals
Day 2 - Intro to UiPath Studio Fundamentals
 
Getting the Most Out of ScyllaDB Monitoring: ShareChat's Tips
Getting the Most Out of ScyllaDB Monitoring: ShareChat's TipsGetting the Most Out of ScyllaDB Monitoring: ShareChat's Tips
Getting the Most Out of ScyllaDB Monitoring: ShareChat's Tips
 
AWS Certified Solutions Architect Associate (SAA-C03)
AWS Certified Solutions Architect Associate (SAA-C03)AWS Certified Solutions Architect Associate (SAA-C03)
AWS Certified Solutions Architect Associate (SAA-C03)
 
Biomedical Knowledge Graphs for Data Scientists and Bioinformaticians
Biomedical Knowledge Graphs for Data Scientists and BioinformaticiansBiomedical Knowledge Graphs for Data Scientists and Bioinformaticians
Biomedical Knowledge Graphs for Data Scientists and Bioinformaticians
 
"Scaling RAG Applications to serve millions of users", Kevin Goedecke
"Scaling RAG Applications to serve millions of users",  Kevin Goedecke"Scaling RAG Applications to serve millions of users",  Kevin Goedecke
"Scaling RAG Applications to serve millions of users", Kevin Goedecke
 
Discover the Unseen: Tailored Recommendation of Unwatched Content
Discover the Unseen: Tailored Recommendation of Unwatched ContentDiscover the Unseen: Tailored Recommendation of Unwatched Content
Discover the Unseen: Tailored Recommendation of Unwatched Content
 
Principle of conventional tomography-Bibash Shahi ppt..pptx
Principle of conventional tomography-Bibash Shahi ppt..pptxPrinciple of conventional tomography-Bibash Shahi ppt..pptx
Principle of conventional tomography-Bibash Shahi ppt..pptx
 
Leveraging the Graph for Clinical Trials and Standards
Leveraging the Graph for Clinical Trials and StandardsLeveraging the Graph for Clinical Trials and Standards
Leveraging the Graph for Clinical Trials and Standards
 
What is an RPA CoE? Session 1 – CoE Vision
What is an RPA CoE?  Session 1 – CoE VisionWhat is an RPA CoE?  Session 1 – CoE Vision
What is an RPA CoE? Session 1 – CoE Vision
 
GNSS spoofing via SDR (Criptored Talks 2024)
GNSS spoofing via SDR (Criptored Talks 2024)GNSS spoofing via SDR (Criptored Talks 2024)
GNSS spoofing via SDR (Criptored Talks 2024)
 

OSBot - Data transformation workflow (from GSheet to Jupyter)