SlideShare a Scribd company logo
TO
PARSE
HTML
LIZZIE SIEGLE
@LIZZIEPIKA
TO
PARSE
HTML
LIZZIE SIEGLE
@LIZZIEPIKA
@LIZZIEPIK
A
BEAUTI
FUL
SOUP
101
What is Beautiful
Soup?
@LIZZIEPIKA
- Web-scraping library
- Extract desired data
from page
- Clean content
@LIZZIEPIK
A
How will We Use
it?- Find every link on a webpage
@LIZZIEPIK
A
Some of our Tools
-Requests to access HTML
- bs4 to parse HTML
SEND
HTTP
REQUESTS
@LIZZIEPIK
A
@LIZZIEPIK
A
Inspect Page
@LIZZIEPIK
A
Find each
`div`
@LIZZIEPIK
A
Find each `a`
tag
@LIZZIEPIK
A
Multi-line -> Single
String
@LIZZIEPIK
A
Clean, Format
Quotes
@LIZZIEPIK
A
Text +1(813)906-5107
“harry potter”
“crazy rich asians”
“jane austen”
etc.
@LIZZIEPIK
A
@LIZZIEPIK
A.
LSIEGLE@
TWILIO.CO
M
-
HTTPS://WWW.CRUMMY.COM/SOFTWAR
E/BEAUTIFULSOUP/
-
HTTPS://WWW.TWILIO.COM/BLOG/PARSE
-HTML-FOR-BOOK-QUOTES-PYTHON-
BEAUTIFUL-SOUP-WHATSAPP
YOU!
@LIZZIEPIK
A.
LSIEGLE@
TWILIO.CO
M
-
HTTPS://WWW.CRUMMY.COM/SOFTWAR
E/BEAUTIFULSOUP/
-
HTTPS://WWW.TWILIO.COM/BLOG/PARSE
-HTML-FOR-BOOK-QUOTES-PYTHON-
BEAUTIFUL-SOUP-WHATSAPP

More Related Content

What's hot

Understanding & Facilitating Semantic Search - #SearchFest 2016
Understanding & Facilitating Semantic Search - #SearchFest 2016Understanding & Facilitating Semantic Search - #SearchFest 2016
Understanding & Facilitating Semantic Search - #SearchFest 2016
Mike Arnesen
 
Do you need a link audit - Marie Haynes Pubcon Vegas 2015
Do you need a link audit - Marie Haynes Pubcon Vegas 2015Do you need a link audit - Marie Haynes Pubcon Vegas 2015
Do you need a link audit - Marie Haynes Pubcon Vegas 2015
Marie Haynes
 
Understanding Google’s Penguin Algorithm by Marie Haynes
Understanding Google’s Penguin Algorithm by Marie HaynesUnderstanding Google’s Penguin Algorithm by Marie Haynes
Understanding Google’s Penguin Algorithm by Marie Haynes
Anton Shulke
 
Using Competitive Gap Analyses to Discover Low-Hanging Fruit
Using Competitive Gap Analyses to Discover Low-Hanging FruitUsing Competitive Gap Analyses to Discover Low-Hanging Fruit
Using Competitive Gap Analyses to Discover Low-Hanging Fruit
Keith Goode
 
Seo 101
Seo 101Seo 101
Punk Rock SEO from State of Search 2015
Punk Rock SEO from State of Search 2015Punk Rock SEO from State of Search 2015
Punk Rock SEO from State of Search 2015
Mike Arnesen
 
Evaluating web content authenticity
Evaluating web content authenticityEvaluating web content authenticity
Evaluating web content authenticity
Kelly Walsh
 
Keeping Things Lean & Mean: Crawl Optimisation - Search Marketing Summit AU
Keeping Things Lean & Mean: Crawl Optimisation - Search Marketing Summit AUKeeping Things Lean & Mean: Crawl Optimisation - Search Marketing Summit AU
Keeping Things Lean & Mean: Crawl Optimisation - Search Marketing Summit AU
Jason Mun
 
SEO - Stop Eating Your Words - Avoid Cannibalisation Of Your Sites
SEO - Stop Eating Your Words - Avoid Cannibalisation Of Your SitesSEO - Stop Eating Your Words - Avoid Cannibalisation Of Your Sites
SEO - Stop Eating Your Words - Avoid Cannibalisation Of Your Sites
Dawn Anderson MSc DigM
 
How to find other affiliates most successful content TPAS Patrick Stox Ahrefs
How to find other affiliates most successful content TPAS Patrick Stox AhrefsHow to find other affiliates most successful content TPAS Patrick Stox Ahrefs
How to find other affiliates most successful content TPAS Patrick Stox Ahrefs
Ahrefs
 
Technical SEO Myths Facts And Theories On Crawl Budget And The Importance Of ...
Technical SEO Myths Facts And Theories On Crawl Budget And The Importance Of ...Technical SEO Myths Facts And Theories On Crawl Budget And The Importance Of ...
Technical SEO Myths Facts And Theories On Crawl Budget And The Importance Of ...
Dawn Anderson MSc DigM
 
The 5 Ws Of Cyberspace
The 5 Ws Of CyberspaceThe 5 Ws Of Cyberspace
The 5 Ws Of Cyberspacetbladow
 
Brighton SEO - Site Speed for Content Marketers
Brighton SEO - Site Speed for Content MarketersBrighton SEO - Site Speed for Content Marketers
Brighton SEO - Site Speed for Content Marketers
Tom Bennet
 
Tip Per Minute: A Hyper-Active SEO Brain Dump
Tip Per Minute: A Hyper-Active SEO Brain DumpTip Per Minute: A Hyper-Active SEO Brain Dump
Tip Per Minute: A Hyper-Active SEO Brain Dump
Ian Lurie
 
Owl and The Hummingbird - Ontology and SEO
Owl and The Hummingbird - Ontology and SEOOwl and The Hummingbird - Ontology and SEO
Owl and The Hummingbird - Ontology and SEODawn Anderson MSc DigM
 
On site audit with screaming frog gdi
On site audit with screaming frog gdiOn site audit with screaming frog gdi
On site audit with screaming frog gdi
Glen Dimaandal
 
Indekspot.com - Trouble free Apache Solr
Indekspot.com - Trouble free Apache SolrIndekspot.com - Trouble free Apache Solr
Indekspot.com - Trouble free Apache Solr
Andrei Savu
 
Screaming Frog PPT
Screaming Frog PPTScreaming Frog PPT
All About HTML Tags
All About HTML TagsAll About HTML Tags
All About HTML Tags
Performics.Convonix
 

What's hot (19)

Understanding & Facilitating Semantic Search - #SearchFest 2016
Understanding & Facilitating Semantic Search - #SearchFest 2016Understanding & Facilitating Semantic Search - #SearchFest 2016
Understanding & Facilitating Semantic Search - #SearchFest 2016
 
Do you need a link audit - Marie Haynes Pubcon Vegas 2015
Do you need a link audit - Marie Haynes Pubcon Vegas 2015Do you need a link audit - Marie Haynes Pubcon Vegas 2015
Do you need a link audit - Marie Haynes Pubcon Vegas 2015
 
Understanding Google’s Penguin Algorithm by Marie Haynes
Understanding Google’s Penguin Algorithm by Marie HaynesUnderstanding Google’s Penguin Algorithm by Marie Haynes
Understanding Google’s Penguin Algorithm by Marie Haynes
 
Using Competitive Gap Analyses to Discover Low-Hanging Fruit
Using Competitive Gap Analyses to Discover Low-Hanging FruitUsing Competitive Gap Analyses to Discover Low-Hanging Fruit
Using Competitive Gap Analyses to Discover Low-Hanging Fruit
 
Seo 101
Seo 101Seo 101
Seo 101
 
Punk Rock SEO from State of Search 2015
Punk Rock SEO from State of Search 2015Punk Rock SEO from State of Search 2015
Punk Rock SEO from State of Search 2015
 
Evaluating web content authenticity
Evaluating web content authenticityEvaluating web content authenticity
Evaluating web content authenticity
 
Keeping Things Lean & Mean: Crawl Optimisation - Search Marketing Summit AU
Keeping Things Lean & Mean: Crawl Optimisation - Search Marketing Summit AUKeeping Things Lean & Mean: Crawl Optimisation - Search Marketing Summit AU
Keeping Things Lean & Mean: Crawl Optimisation - Search Marketing Summit AU
 
SEO - Stop Eating Your Words - Avoid Cannibalisation Of Your Sites
SEO - Stop Eating Your Words - Avoid Cannibalisation Of Your SitesSEO - Stop Eating Your Words - Avoid Cannibalisation Of Your Sites
SEO - Stop Eating Your Words - Avoid Cannibalisation Of Your Sites
 
How to find other affiliates most successful content TPAS Patrick Stox Ahrefs
How to find other affiliates most successful content TPAS Patrick Stox AhrefsHow to find other affiliates most successful content TPAS Patrick Stox Ahrefs
How to find other affiliates most successful content TPAS Patrick Stox Ahrefs
 
Technical SEO Myths Facts And Theories On Crawl Budget And The Importance Of ...
Technical SEO Myths Facts And Theories On Crawl Budget And The Importance Of ...Technical SEO Myths Facts And Theories On Crawl Budget And The Importance Of ...
Technical SEO Myths Facts And Theories On Crawl Budget And The Importance Of ...
 
The 5 Ws Of Cyberspace
The 5 Ws Of CyberspaceThe 5 Ws Of Cyberspace
The 5 Ws Of Cyberspace
 
Brighton SEO - Site Speed for Content Marketers
Brighton SEO - Site Speed for Content MarketersBrighton SEO - Site Speed for Content Marketers
Brighton SEO - Site Speed for Content Marketers
 
Tip Per Minute: A Hyper-Active SEO Brain Dump
Tip Per Minute: A Hyper-Active SEO Brain DumpTip Per Minute: A Hyper-Active SEO Brain Dump
Tip Per Minute: A Hyper-Active SEO Brain Dump
 
Owl and The Hummingbird - Ontology and SEO
Owl and The Hummingbird - Ontology and SEOOwl and The Hummingbird - Ontology and SEO
Owl and The Hummingbird - Ontology and SEO
 
On site audit with screaming frog gdi
On site audit with screaming frog gdiOn site audit with screaming frog gdi
On site audit with screaming frog gdi
 
Indekspot.com - Trouble free Apache Solr
Indekspot.com - Trouble free Apache SolrIndekspot.com - Trouble free Apache Solr
Indekspot.com - Trouble free Apache Solr
 
Screaming Frog PPT
Screaming Frog PPTScreaming Frog PPT
Screaming Frog PPT
 
All About HTML Tags
All About HTML TagsAll About HTML Tags
All About HTML Tags
 

More from Elizabeth (Lizzie) Siegle

PyBay23: Understanding LangChain Agents and Tools with Twilio (or with SMS)....
PyBay23:  Understanding LangChain Agents and Tools with Twilio (or with SMS)....PyBay23:  Understanding LangChain Agents and Tools with Twilio (or with SMS)....
PyBay23: Understanding LangChain Agents and Tools with Twilio (or with SMS)....
Elizabeth (Lizzie) Siegle
 
Intro to Text Classification with TensorFlow
Intro to Text Classification with TensorFlowIntro to Text Classification with TensorFlow
Intro to Text Classification with TensorFlow
Elizabeth (Lizzie) Siegle
 
Pytexas: Build ChatGPT over SMS in Python
Pytexas: Build ChatGPT over SMS in PythonPytexas: Build ChatGPT over SMS in Python
Pytexas: Build ChatGPT over SMS in Python
Elizabeth (Lizzie) Siegle
 
jsday 2023: Build ChatGPT over SMS in Italy
jsday 2023: Build ChatGPT over SMS in Italyjsday 2023: Build ChatGPT over SMS in Italy
jsday 2023: Build ChatGPT over SMS in Italy
Elizabeth (Lizzie) Siegle
 
Generate Art with DALL·E 2 and Twilio MMS.pptx
Generate Art with DALL·E 2 and Twilio MMS.pptxGenerate Art with DALL·E 2 and Twilio MMS.pptx
Generate Art with DALL·E 2 and Twilio MMS.pptx
Elizabeth (Lizzie) Siegle
 
Segment Data Analytics for Indie Developers: KCDC 2023
Segment Data Analytics for Indie Developers: KCDC 2023Segment Data Analytics for Indie Developers: KCDC 2023
Segment Data Analytics for Indie Developers: KCDC 2023
Elizabeth (Lizzie) Siegle
 
Refactr.tech.pptx
Refactr.tech.pptxRefactr.tech.pptx
Refactr.tech.pptx
Elizabeth (Lizzie) Siegle
 
AthenaHacks Keynote 2023
AthenaHacks Keynote 2023AthenaHacks Keynote 2023
AthenaHacks Keynote 2023
Elizabeth (Lizzie) Siegle
 
Build a Chatbot with TensorFlow.js and Twilio
Build a Chatbot with TensorFlow.js and TwilioBuild a Chatbot with TensorFlow.js and Twilio
Build a Chatbot with TensorFlow.js and Twilio
Elizabeth (Lizzie) Siegle
 
Build a Chatbot with Machine Learning Webinar
Build a Chatbot with Machine Learning WebinarBuild a Chatbot with Machine Learning Webinar
Build a Chatbot with Machine Learning Webinar
Elizabeth (Lizzie) Siegle
 
Build an AI/ML Chatbot Workshop
Build an AI/ML Chatbot WorkshopBuild an AI/ML Chatbot Workshop
Build an AI/ML Chatbot Workshop
Elizabeth (Lizzie) Siegle
 
Improve Communication Apps with Machine Learning
Improve Communication Apps with Machine LearningImprove Communication Apps with Machine Learning
Improve Communication Apps with Machine Learning
Elizabeth (Lizzie) Siegle
 
Autopilot workshop for Brazil Hackathon 4/2020
Autopilot workshop for Brazil Hackathon 4/2020Autopilot workshop for Brazil Hackathon 4/2020
Autopilot workshop for Brazil Hackathon 4/2020
Elizabeth (Lizzie) Siegle
 
Train to Tame: Improve Communications Apps with TensorFlow
Train to Tame: Improve Communications Apps with TensorFlowTrain to Tame: Improve Communications Apps with TensorFlow
Train to Tame: Improve Communications Apps with TensorFlow
Elizabeth (Lizzie) Siegle
 
Design Considerations for Building Better Bots x How Build a Facebook Messeng...
Design Considerations for Building Better Bots x How Build a Facebook Messeng...Design Considerations for Building Better Bots x How Build a Facebook Messeng...
Design Considerations for Building Better Bots x How Build a Facebook Messeng...
Elizabeth (Lizzie) Siegle
 
VoiceHacks 2019
VoiceHacks 2019VoiceHacks 2019
VoiceHacks 2019
Elizabeth (Lizzie) Siegle
 
Intro to AI and CoreML in Swift: Hear + Now 2019
Intro to AI and CoreML in Swift: Hear + Now 2019Intro to AI and CoreML in Swift: Hear + Now 2019
Intro to AI and CoreML in Swift: Hear + Now 2019
Elizabeth (Lizzie) Siegle
 
Git Fetch Coffee: Thoughts on Early in Career Developer Relations
Git Fetch Coffee: Thoughts on Early in Career Developer RelationsGit Fetch Coffee: Thoughts on Early in Career Developer Relations
Git Fetch Coffee: Thoughts on Early in Career Developer Relations
Elizabeth (Lizzie) Siegle
 
Chatbots & Voice Assistants London March 2019
Chatbots & Voice Assistants London March 2019Chatbots & Voice Assistants London March 2019
Chatbots & Voice Assistants London March 2019
Elizabeth (Lizzie) Siegle
 
iOSCon 2019: Generate a Song from Markov Models in Swift
iOSCon 2019: Generate a Song from Markov Models in SwiftiOSCon 2019: Generate a Song from Markov Models in Swift
iOSCon 2019: Generate a Song from Markov Models in Swift
Elizabeth (Lizzie) Siegle
 

More from Elizabeth (Lizzie) Siegle (20)

PyBay23: Understanding LangChain Agents and Tools with Twilio (or with SMS)....
PyBay23:  Understanding LangChain Agents and Tools with Twilio (or with SMS)....PyBay23:  Understanding LangChain Agents and Tools with Twilio (or with SMS)....
PyBay23: Understanding LangChain Agents and Tools with Twilio (or with SMS)....
 
Intro to Text Classification with TensorFlow
Intro to Text Classification with TensorFlowIntro to Text Classification with TensorFlow
Intro to Text Classification with TensorFlow
 
Pytexas: Build ChatGPT over SMS in Python
Pytexas: Build ChatGPT over SMS in PythonPytexas: Build ChatGPT over SMS in Python
Pytexas: Build ChatGPT over SMS in Python
 
jsday 2023: Build ChatGPT over SMS in Italy
jsday 2023: Build ChatGPT over SMS in Italyjsday 2023: Build ChatGPT over SMS in Italy
jsday 2023: Build ChatGPT over SMS in Italy
 
Generate Art with DALL·E 2 and Twilio MMS.pptx
Generate Art with DALL·E 2 and Twilio MMS.pptxGenerate Art with DALL·E 2 and Twilio MMS.pptx
Generate Art with DALL·E 2 and Twilio MMS.pptx
 
Segment Data Analytics for Indie Developers: KCDC 2023
Segment Data Analytics for Indie Developers: KCDC 2023Segment Data Analytics for Indie Developers: KCDC 2023
Segment Data Analytics for Indie Developers: KCDC 2023
 
Refactr.tech.pptx
Refactr.tech.pptxRefactr.tech.pptx
Refactr.tech.pptx
 
AthenaHacks Keynote 2023
AthenaHacks Keynote 2023AthenaHacks Keynote 2023
AthenaHacks Keynote 2023
 
Build a Chatbot with TensorFlow.js and Twilio
Build a Chatbot with TensorFlow.js and TwilioBuild a Chatbot with TensorFlow.js and Twilio
Build a Chatbot with TensorFlow.js and Twilio
 
Build a Chatbot with Machine Learning Webinar
Build a Chatbot with Machine Learning WebinarBuild a Chatbot with Machine Learning Webinar
Build a Chatbot with Machine Learning Webinar
 
Build an AI/ML Chatbot Workshop
Build an AI/ML Chatbot WorkshopBuild an AI/ML Chatbot Workshop
Build an AI/ML Chatbot Workshop
 
Improve Communication Apps with Machine Learning
Improve Communication Apps with Machine LearningImprove Communication Apps with Machine Learning
Improve Communication Apps with Machine Learning
 
Autopilot workshop for Brazil Hackathon 4/2020
Autopilot workshop for Brazil Hackathon 4/2020Autopilot workshop for Brazil Hackathon 4/2020
Autopilot workshop for Brazil Hackathon 4/2020
 
Train to Tame: Improve Communications Apps with TensorFlow
Train to Tame: Improve Communications Apps with TensorFlowTrain to Tame: Improve Communications Apps with TensorFlow
Train to Tame: Improve Communications Apps with TensorFlow
 
Design Considerations for Building Better Bots x How Build a Facebook Messeng...
Design Considerations for Building Better Bots x How Build a Facebook Messeng...Design Considerations for Building Better Bots x How Build a Facebook Messeng...
Design Considerations for Building Better Bots x How Build a Facebook Messeng...
 
VoiceHacks 2019
VoiceHacks 2019VoiceHacks 2019
VoiceHacks 2019
 
Intro to AI and CoreML in Swift: Hear + Now 2019
Intro to AI and CoreML in Swift: Hear + Now 2019Intro to AI and CoreML in Swift: Hear + Now 2019
Intro to AI and CoreML in Swift: Hear + Now 2019
 
Git Fetch Coffee: Thoughts on Early in Career Developer Relations
Git Fetch Coffee: Thoughts on Early in Career Developer RelationsGit Fetch Coffee: Thoughts on Early in Career Developer Relations
Git Fetch Coffee: Thoughts on Early in Career Developer Relations
 
Chatbots & Voice Assistants London March 2019
Chatbots & Voice Assistants London March 2019Chatbots & Voice Assistants London March 2019
Chatbots & Voice Assistants London March 2019
 
iOSCon 2019: Generate a Song from Markov Models in Swift
iOSCon 2019: Generate a Song from Markov Models in SwiftiOSCon 2019: Generate a Song from Markov Models in Swift
iOSCon 2019: Generate a Song from Markov Models in Swift
 

Recently uploaded

Elizabeth Buie - Older adults: Are we really designing for our future selves?
Elizabeth Buie - Older adults: Are we really designing for our future selves?Elizabeth Buie - Older adults: Are we really designing for our future selves?
Elizabeth Buie - Older adults: Are we really designing for our future selves?
Nexer Digital
 
UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6
DianaGray10
 
How to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptxHow to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptx
danishmna97
 
Removing Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software FuzzingRemoving Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software Fuzzing
Aftab Hussain
 
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
Neo4j
 
Microsoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdfMicrosoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdf
Uni Systems S.M.S.A.
 
20240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 202420240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 2024
Matthew Sinclair
 
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptx
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptxSecstrike : Reverse Engineering & Pwnable tools for CTF.pptx
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptx
nkrafacyberclub
 
UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5
DianaGray10
 
Artificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopmentArtificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopment
Octavian Nadolu
 
RESUME BUILDER APPLICATION Project for students
RESUME BUILDER APPLICATION Project for studentsRESUME BUILDER APPLICATION Project for students
RESUME BUILDER APPLICATION Project for students
KAMESHS29
 
By Design, not by Accident - Agile Venture Bolzano 2024
By Design, not by Accident - Agile Venture Bolzano 2024By Design, not by Accident - Agile Venture Bolzano 2024
By Design, not by Accident - Agile Venture Bolzano 2024
Pierluigi Pugliese
 
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
名前 です男
 
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfObservability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Paige Cruz
 
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Albert Hoitingh
 
Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
Ana-Maria Mihalceanu
 
Generative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to ProductionGenerative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to Production
Aggregage
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
DanBrown980551
 
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
SOFTTECHHUB
 
GraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge GraphGraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge Graph
Guy Korland
 

Recently uploaded (20)

Elizabeth Buie - Older adults: Are we really designing for our future selves?
Elizabeth Buie - Older adults: Are we really designing for our future selves?Elizabeth Buie - Older adults: Are we really designing for our future selves?
Elizabeth Buie - Older adults: Are we really designing for our future selves?
 
UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6
 
How to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptxHow to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptx
 
Removing Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software FuzzingRemoving Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software Fuzzing
 
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
 
Microsoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdfMicrosoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdf
 
20240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 202420240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 2024
 
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptx
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptxSecstrike : Reverse Engineering & Pwnable tools for CTF.pptx
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptx
 
UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5
 
Artificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopmentArtificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopment
 
RESUME BUILDER APPLICATION Project for students
RESUME BUILDER APPLICATION Project for studentsRESUME BUILDER APPLICATION Project for students
RESUME BUILDER APPLICATION Project for students
 
By Design, not by Accident - Agile Venture Bolzano 2024
By Design, not by Accident - Agile Venture Bolzano 2024By Design, not by Accident - Agile Venture Bolzano 2024
By Design, not by Accident - Agile Venture Bolzano 2024
 
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
 
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfObservability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
 
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
 
Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
 
Generative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to ProductionGenerative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to Production
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
 
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
 
GraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge GraphGraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge Graph
 

Editor's Notes

  1. Hi everyone, I’m Lizzie, and today we’ll be talking about how to parse HTML to extract book quotes (or any data you wish) with Beautiful Soup
  2. So first off, what is this mysterious Python module? Has anyone used beautiful soup? hmm..Slytherin gif instead?
  3. Simple yet powerful 3rd-party Python library to parse specific content from a webpage in HTML, XML, and other markup languages. It does the backend job of removing the HTML markup for us. Yay!
  4. For example, you can use it to find all the links of a website, every “a” element with a “href” attribute, use to extract data like stock indices, sports stats, specific words, etc. We’ll use it to find book quotes from goodreads.com/quotes! How many people use Goodreads? Read?
  5. Requests to access HTML pages to scrape bs4 (Beautiful Soup 4) to parse HTML pages Install Requests, bs4 on the command line with pip
  6. With requests.get, we open the webpage. Then we make a BeautifulSoup object representing the document as a nested data structure. An optional second parameter would be a parser. “html.parser" is the default parser used with BeautifulSoup, but other parsers you could use include lxml or html5lib but both require an external dependency.
  7. Next, we want to get quotes from the page. If you visit https://goodreads.com/quotes and right click -> view page source you would see HTML classes are nested like this.
  8. With this nested structure in mind we will find every div with the class "quoteText." We can do this with soup(“div"), the shorthand for Beautiful Soup's find_all method:
  9. Let's loop through the quotes and find each quote’s author. As shown in the HTML code we inspected, the only a tag in div_quotes is the author. If an a tag exists then we know an author exists. We can access all that text inside a tag as a single Unicode string with the get_text() method. If we can't find an author, we skip it because it may not be a good quote.
  10. default: Python 3 uses Unicode, so every string is a sequence of Unicode characters. We loop through a given tag's children by calling .contents on the Beautifulsoup object, + then encode each child as ASCII: ignore any foreign Unicode characters. If a line starts with a tag symbol then characters that are not part of the quote are ignored. Otherwise, that line is added to the quote to return
  11. To clean and format each quote, strip() removes leading + trailing characters which could mess up our data. We also format the quote along with the author, and add on a "#" character to know where to parse the list. Finally, we filter through all the quotes to find and return the quotes, including characters considered printable like digits, letters, punctuation, or whitespace.
  12. Time for audience participation!! Get out your phones Text this number a book title, series, or author
  13. hype repo twitter, email, around conf