SlideShare a Scribd company logo
1 of 14
Download to read offline
Web Scrape to “Make UI
Great Again”
6 December 2017Gavin Wiener
Goal
Using web-scraping, you seldom have to be forced into using
an unfriendly user interface
What is Web-Scraping
“Web scraping, web harvesting, or web data extraction is data
scraping used for extracting data from websites”
https://en.wikipedia.org/wiki/Web_scraping
Tools
● Python + BeautifulSoup (min. previous knowledge required)
○ https://www.python.org/
○ https://www.crummy.com/software/BeautifulSoup/bs4/doc/
● Mobile-Friendly CSS - Spectre.css
○ https://picturepan2.github.io/spectre/
● 1 or more badly designed websites
○ https://myciti.org.za/en/home/
○ https://myciti.org.za/en/timetables/route-stop-timetables/
● (Optional) Hosting
○ I essentially created a website
MyCiti: Initial
Simulator: iPhone 5
MyCiti: Goal
Simulator: iPhone 5
Identify Structure: Inspecting Elements
Get Data: Investigate URLs
Encoded
https://myciti.org.za/en/timetables/route-stop-timetables/?timetable%5Bweekday%5D=sunday&timetable
%5Bstation%5D=493&timetable%5Broute%5D=&timetable%5Bdirection%5D=
Decoded
https://myciti.org.za/en/timetables/route-stop-timetables/?timetable[weekday]=su
nday&timetable[station]=493&timetable[route]=&timetable[direction]=
Tool: https://meyerweb.com/eric/tools/dencoder/
I Can Haz Your Data - Code
Getting the timetable of a stop
I Can Haz Your Data - Code
Getting the timetable of a stop
I Can Haz Your Data - Raw
Getting the timetable of a stop
Create a New Interface
Summary
1. Find a website e.g. MyCiti
2. Identify the structure, and interesting components e.g. <table>
3. Identify how to reach the data e.g. urls
4. ‘Scrape’ the data with code e.g. code
5. Create your new interface
And You Have a Website
gavinwiener@gmail.com
http://github.com/divisionMax/

More Related Content

Similar to Web Scrape to "Make UI Great Again"

OISF Aniversary: Active Defense - Helping threat actors hack themselves!
OISF Aniversary: Active Defense - Helping threat actors hack themselves!OISF Aniversary: Active Defense - Helping threat actors hack themselves!
OISF Aniversary: Active Defense - Helping threat actors hack themselves!
ThreatReel Podcast
 
Performance Test Analysis- Hotels
Performance Test Analysis- HotelsPerformance Test Analysis- Hotels
Performance Test Analysis- Hotels
yassine Alozade
 
NKU Cybersecurity Symposium: Active Defense - Helping threat actors hack them...
NKU Cybersecurity Symposium: Active Defense - Helping threat actors hack them...NKU Cybersecurity Symposium: Active Defense - Helping threat actors hack them...
NKU Cybersecurity Symposium: Active Defense - Helping threat actors hack them...
ThreatReel Podcast
 
2019-12-11-OWASP-IoT-Top-10---Introduction-and-Root-Causes.pdf
2019-12-11-OWASP-IoT-Top-10---Introduction-and-Root-Causes.pdf2019-12-11-OWASP-IoT-Top-10---Introduction-and-Root-Causes.pdf
2019-12-11-OWASP-IoT-Top-10---Introduction-and-Root-Causes.pdf
dino715195
 

Similar to Web Scrape to "Make UI Great Again" (20)

Beautiful In Print
Beautiful In PrintBeautiful In Print
Beautiful In Print
 
10 Things Webdesigners tend to do Wrong in SEO - SMX 2014
10 Things Webdesigners tend to do Wrong in SEO  - SMX 201410 Things Webdesigners tend to do Wrong in SEO  - SMX 2014
10 Things Webdesigners tend to do Wrong in SEO - SMX 2014
 
Microsoft Digital Innovations - Mark Vozzo
Microsoft Digital Innovations - Mark VozzoMicrosoft Digital Innovations - Mark Vozzo
Microsoft Digital Innovations - Mark Vozzo
 
We Economy - Drupalsouth
We Economy - DrupalsouthWe Economy - Drupalsouth
We Economy - Drupalsouth
 
Battling Google PageSpeed Insights
Battling Google PageSpeed InsightsBattling Google PageSpeed Insights
Battling Google PageSpeed Insights
 
HoneyPy & HoneyDB (CarolinaCon 13)
HoneyPy & HoneyDB (CarolinaCon 13)HoneyPy & HoneyDB (CarolinaCon 13)
HoneyPy & HoneyDB (CarolinaCon 13)
 
Android Platform Debugging and Development
Android Platform Debugging and DevelopmentAndroid Platform Debugging and Development
Android Platform Debugging and Development
 
OISF Aniversary: Active Defense - Helping threat actors hack themselves!
OISF Aniversary: Active Defense - Helping threat actors hack themselves!OISF Aniversary: Active Defense - Helping threat actors hack themselves!
OISF Aniversary: Active Defense - Helping threat actors hack themselves!
 
Headless browser: puppeteer and git client : GitKraken
Headless browser: puppeteer and git client : GitKrakenHeadless browser: puppeteer and git client : GitKraken
Headless browser: puppeteer and git client : GitKraken
 
Toolbox for Web Designers
Toolbox for Web DesignersToolbox for Web Designers
Toolbox for Web Designers
 
MBA632 Lecture, Morehead State University
MBA632 Lecture, Morehead State UniversityMBA632 Lecture, Morehead State University
MBA632 Lecture, Morehead State University
 
Innovation report: Artificial Intelligence
Innovation report: Artificial IntelligenceInnovation report: Artificial Intelligence
Innovation report: Artificial Intelligence
 
Performance Test Analysis- Hotels
Performance Test Analysis- HotelsPerformance Test Analysis- Hotels
Performance Test Analysis- Hotels
 
Try to Make Google Glass by Maker Style
Try to Make Google Glass by Maker StyleTry to Make Google Glass by Maker Style
Try to Make Google Glass by Maker Style
 
Android best practices 2015
Android best practices 2015Android best practices 2015
Android best practices 2015
 
Shaun-Ellis-feb25
Shaun-Ellis-feb25Shaun-Ellis-feb25
Shaun-Ellis-feb25
 
NKU Cybersecurity Symposium: Active Defense - Helping threat actors hack them...
NKU Cybersecurity Symposium: Active Defense - Helping threat actors hack them...NKU Cybersecurity Symposium: Active Defense - Helping threat actors hack them...
NKU Cybersecurity Symposium: Active Defense - Helping threat actors hack them...
 
Understanding & analyzing obfuscated malicious web scripts by Vikram Kharvi
Understanding & analyzing obfuscated malicious web scripts by Vikram KharviUnderstanding & analyzing obfuscated malicious web scripts by Vikram Kharvi
Understanding & analyzing obfuscated malicious web scripts by Vikram Kharvi
 
Mobile backends with Google Cloud Platform (MBLTDev'14)
Mobile backends with Google Cloud Platform (MBLTDev'14)Mobile backends with Google Cloud Platform (MBLTDev'14)
Mobile backends with Google Cloud Platform (MBLTDev'14)
 
2019-12-11-OWASP-IoT-Top-10---Introduction-and-Root-Causes.pdf
2019-12-11-OWASP-IoT-Top-10---Introduction-and-Root-Causes.pdf2019-12-11-OWASP-IoT-Top-10---Introduction-and-Root-Causes.pdf
2019-12-11-OWASP-IoT-Top-10---Introduction-and-Root-Causes.pdf
 

Recently uploaded

Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
vu2urc
 

Recently uploaded (20)

08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 

Web Scrape to "Make UI Great Again"

  • 1. Web Scrape to “Make UI Great Again” 6 December 2017Gavin Wiener
  • 2. Goal Using web-scraping, you seldom have to be forced into using an unfriendly user interface
  • 3. What is Web-Scraping “Web scraping, web harvesting, or web data extraction is data scraping used for extracting data from websites” https://en.wikipedia.org/wiki/Web_scraping
  • 4. Tools ● Python + BeautifulSoup (min. previous knowledge required) ○ https://www.python.org/ ○ https://www.crummy.com/software/BeautifulSoup/bs4/doc/ ● Mobile-Friendly CSS - Spectre.css ○ https://picturepan2.github.io/spectre/ ● 1 or more badly designed websites ○ https://myciti.org.za/en/home/ ○ https://myciti.org.za/en/timetables/route-stop-timetables/ ● (Optional) Hosting ○ I essentially created a website
  • 8. Get Data: Investigate URLs Encoded https://myciti.org.za/en/timetables/route-stop-timetables/?timetable%5Bweekday%5D=sunday&timetable %5Bstation%5D=493&timetable%5Broute%5D=&timetable%5Bdirection%5D= Decoded https://myciti.org.za/en/timetables/route-stop-timetables/?timetable[weekday]=su nday&timetable[station]=493&timetable[route]=&timetable[direction]= Tool: https://meyerweb.com/eric/tools/dencoder/
  • 9. I Can Haz Your Data - Code Getting the timetable of a stop
  • 10. I Can Haz Your Data - Code Getting the timetable of a stop
  • 11. I Can Haz Your Data - Raw Getting the timetable of a stop
  • 12. Create a New Interface
  • 13. Summary 1. Find a website e.g. MyCiti 2. Identify the structure, and interesting components e.g. <table> 3. Identify how to reach the data e.g. urls 4. ‘Scrape’ the data with code e.g. code 5. Create your new interface
  • 14. And You Have a Website gavinwiener@gmail.com http://github.com/divisionMax/