SlideShare a Scribd company logo
1 of 40
Download to read offline
Paul Bradshaw
Leanpub.com/scrapingforjournalists*
Scraping
in 60 mins
Saturday, 10 May 14
https://www.youtube.com/watch?v=Efr-VEkwWoM
Saturday, 10 May 14
Saturday, 10 May 14
Saturday, 10 May 14
Saturday, 10 May 14
Saturday, 10 May 14
Saturday, 10 May 14
Saturday, 10 May 14
Saturday, 10 May 14
*
Saturday, 10 May 14
*
Saturday, 10 May 14
*
Function (Arguments)
(aka parameters)
Saturday, 10 May 14
*
Function (arguments)
=SUM(A2:A50)
=AVERAGE(B2:B300)
=COUNTIF(A10:A3000,”Smith”)
Saturday, 10 May 14
*
Function (parameters)
=SUM(range of cells to be
summed)
=AVERAGE(range of cells to be
averaged)
=COUNTIF(range of cells to be
counted,what to count)
Saturday, 10 May 14
*
(“string”, index)
Saturday, 10 May 14
*
Tip: search for
documentation
Saturday, 10 May 14
*
Variable
Saturday, 10 May 14
*
Variables
Saturday, 10 May 14
*
Jargon checklist:
Function
Arguments
Parameters
String
Index
Variable
Documentation
Saturday, 10 May 14
Vote:
=importXML
or
Python?
Saturday, 10 May 14
*
Another function?
Saturday, 10 May 14
*
Search for documentation!
https://www.distilled.net/blog/distilled/guide-to-google-docs-importxml/
Saturday, 10 May 14
*
Query (XPath)
Saturday, 10 May 14
*
XPath is a path through
XML (or HTML)
<table> = //table
<table><tr> = //table//tr
<table><tr><td> = //table//tr//td
Saturday, 10 May 14
*
Search for documentation!
http://www.w3schools.com/XPath/xpath_syntax.asp
Saturday, 10 May 14
*
Tip: search for structure
around data
http://www.w4mpjobs.org/SearchJobs.aspx?search=alljobs
Saturday, 10 May 14
*
http://
www.w4mpjobs.org/
SearchJobs.aspx?
http://www.w4mpjobs.org/SearchJobs.aspx?search=alljobs
Saturday, 10 May 14
*
Saturday, 10 May 14
*
"//div[@class=
'leftcolumn']"
Saturday, 10 May 14
*
//div[starts-with(@
class, ‘jobWrap’)]
Saturday, 10 May 14
*
A crib sheet:
Saturday, 10 May 14
*
Chrome extension:
Saturday, 10 May 14
Saturday, 10 May 14
#!/usr/bin/env python
import scraperwiki ‘This is a
Python
script’
(Shebang)
import the
Scraperwiki
library
Saturday, 10 May 14
#!/usr/bin/env python
import scraperwiki
html = scraperwiki.scrape('http://
uk.soccerway.com/teams/netherlands/fortuna-
sittard/1551/')
print html
Function
(argument)
Saturday, 10 May 14
#!/usr/bin/env python
import scraperwiki
html = scraperwiki.scrape('http://
uk.soccerway.com/teams/netherlands/fortuna-
sittard/1551/')
print html
Comes from
Scraperwiki
library (check
documentation)
Saturday, 10 May 14
#!/usr/bin/env python
import scraperwiki
html = scraperwiki.scrape('http://
uk.soccerway.com/teams/netherlands/fortuna-
sittard/1551/')
print html
Variable
(assigned with
= sign)
Statement
used to show
variable
Saturday, 10 May 14
#!/usr/bin/env python
import scraperwiki
html = scraperwiki.scrape('http://
uk.soccerway.com/teams/netherlands/fortuna-
sittard/1551/')
print html
Saturday, 10 May 14
Jargon checklist:
Library
Shebang
List
Saturday, 10 May 14
Paul Bradshaw
Leanpub.com/scrapingforjournalists*
Thank you.
Saturday, 10 May 14

More Related Content

More from Paul Bradshaw

Using narrative structures in shortform and longform journalism
Using narrative structures in shortform and longform journalismUsing narrative structures in shortform and longform journalism
Using narrative structures in shortform and longform journalismPaul Bradshaw
 
Narrative and multiplatform journalism (part 1)
Narrative and multiplatform journalism (part 1)Narrative and multiplatform journalism (part 1)
Narrative and multiplatform journalism (part 1)Paul Bradshaw
 
Teaching data journalism (Abraji 2021)
Teaching data journalism (Abraji 2021)Teaching data journalism (Abraji 2021)
Teaching data journalism (Abraji 2021)Paul Bradshaw
 
Data journalism on the air: 3 tips
Data journalism on the air: 3 tipsData journalism on the air: 3 tips
Data journalism on the air: 3 tipsPaul Bradshaw
 
7 angles for data stories
7 angles for data stories7 angles for data stories
7 angles for data storiesPaul Bradshaw
 
Uncertain times, stories of uncertainty
Uncertain times, stories of uncertaintyUncertain times, stories of uncertainty
Uncertain times, stories of uncertaintyPaul Bradshaw
 
Ergodic education (online teaching and interactivity)
Ergodic education (online teaching and interactivity)Ergodic education (online teaching and interactivity)
Ergodic education (online teaching and interactivity)Paul Bradshaw
 
Storytelling in the database era: uncertainty and science reporting
Storytelling in the database era: uncertainty and science reportingStorytelling in the database era: uncertainty and science reporting
Storytelling in the database era: uncertainty and science reportingPaul Bradshaw
 
Cognitive bias: a quick guide for journalists
Cognitive bias: a quick guide for journalistsCognitive bias: a quick guide for journalists
Cognitive bias: a quick guide for journalistsPaul Bradshaw
 
The 3 chords of data journalism
The 3 chords of data journalismThe 3 chords of data journalism
The 3 chords of data journalismPaul Bradshaw
 
Data journalism: what it is, how to use data for stories
Data journalism: what it is, how to use data for storiesData journalism: what it is, how to use data for stories
Data journalism: what it is, how to use data for storiesPaul Bradshaw
 
Teaching AI in data journalism
Teaching AI in data journalismTeaching AI in data journalism
Teaching AI in data journalismPaul Bradshaw
 
10 ways AI can be used for investigations
10 ways AI can be used for investigations10 ways AI can be used for investigations
10 ways AI can be used for investigationsPaul Bradshaw
 
Open Data Utopia? (SciCAR 19)
Open Data Utopia? (SciCAR 19)Open Data Utopia? (SciCAR 19)
Open Data Utopia? (SciCAR 19)Paul Bradshaw
 
Scraping for journalists - ideas, concepts and tips (CIJ Summer School 2019)
Scraping for journalists - ideas, concepts and tips (CIJ Summer School 2019)Scraping for journalists - ideas, concepts and tips (CIJ Summer School 2019)
Scraping for journalists - ideas, concepts and tips (CIJ Summer School 2019)Paul Bradshaw
 
Scraping in 60 minutes (CIJ Summer School 2019)
Scraping in 60 minutes (CIJ Summer School 2019)Scraping in 60 minutes (CIJ Summer School 2019)
Scraping in 60 minutes (CIJ Summer School 2019)Paul Bradshaw
 
MA Multiplatform and Mobile Journalism: Taster Class
MA Multiplatform and Mobile Journalism: Taster ClassMA Multiplatform and Mobile Journalism: Taster Class
MA Multiplatform and Mobile Journalism: Taster ClassPaul Bradshaw
 
Verification techniques, tips and tools
Verification techniques, tips and toolsVerification techniques, tips and tools
Verification techniques, tips and toolsPaul Bradshaw
 
Journalism, data and storytelling: navigating the battlefield
Journalism, data and storytelling: navigating the battlefieldJournalism, data and storytelling: navigating the battlefield
Journalism, data and storytelling: navigating the battlefieldPaul Bradshaw
 
Mobile + community: designing a news day that gets student journalists speaki...
Mobile + community: designing a news day that gets student journalists speaki...Mobile + community: designing a news day that gets student journalists speaki...
Mobile + community: designing a news day that gets student journalists speaki...Paul Bradshaw
 

More from Paul Bradshaw (20)

Using narrative structures in shortform and longform journalism
Using narrative structures in shortform and longform journalismUsing narrative structures in shortform and longform journalism
Using narrative structures in shortform and longform journalism
 
Narrative and multiplatform journalism (part 1)
Narrative and multiplatform journalism (part 1)Narrative and multiplatform journalism (part 1)
Narrative and multiplatform journalism (part 1)
 
Teaching data journalism (Abraji 2021)
Teaching data journalism (Abraji 2021)Teaching data journalism (Abraji 2021)
Teaching data journalism (Abraji 2021)
 
Data journalism on the air: 3 tips
Data journalism on the air: 3 tipsData journalism on the air: 3 tips
Data journalism on the air: 3 tips
 
7 angles for data stories
7 angles for data stories7 angles for data stories
7 angles for data stories
 
Uncertain times, stories of uncertainty
Uncertain times, stories of uncertaintyUncertain times, stories of uncertainty
Uncertain times, stories of uncertainty
 
Ergodic education (online teaching and interactivity)
Ergodic education (online teaching and interactivity)Ergodic education (online teaching and interactivity)
Ergodic education (online teaching and interactivity)
 
Storytelling in the database era: uncertainty and science reporting
Storytelling in the database era: uncertainty and science reportingStorytelling in the database era: uncertainty and science reporting
Storytelling in the database era: uncertainty and science reporting
 
Cognitive bias: a quick guide for journalists
Cognitive bias: a quick guide for journalistsCognitive bias: a quick guide for journalists
Cognitive bias: a quick guide for journalists
 
The 3 chords of data journalism
The 3 chords of data journalismThe 3 chords of data journalism
The 3 chords of data journalism
 
Data journalism: what it is, how to use data for stories
Data journalism: what it is, how to use data for storiesData journalism: what it is, how to use data for stories
Data journalism: what it is, how to use data for stories
 
Teaching AI in data journalism
Teaching AI in data journalismTeaching AI in data journalism
Teaching AI in data journalism
 
10 ways AI can be used for investigations
10 ways AI can be used for investigations10 ways AI can be used for investigations
10 ways AI can be used for investigations
 
Open Data Utopia? (SciCAR 19)
Open Data Utopia? (SciCAR 19)Open Data Utopia? (SciCAR 19)
Open Data Utopia? (SciCAR 19)
 
Scraping for journalists - ideas, concepts and tips (CIJ Summer School 2019)
Scraping for journalists - ideas, concepts and tips (CIJ Summer School 2019)Scraping for journalists - ideas, concepts and tips (CIJ Summer School 2019)
Scraping for journalists - ideas, concepts and tips (CIJ Summer School 2019)
 
Scraping in 60 minutes (CIJ Summer School 2019)
Scraping in 60 minutes (CIJ Summer School 2019)Scraping in 60 minutes (CIJ Summer School 2019)
Scraping in 60 minutes (CIJ Summer School 2019)
 
MA Multiplatform and Mobile Journalism: Taster Class
MA Multiplatform and Mobile Journalism: Taster ClassMA Multiplatform and Mobile Journalism: Taster Class
MA Multiplatform and Mobile Journalism: Taster Class
 
Verification techniques, tips and tools
Verification techniques, tips and toolsVerification techniques, tips and tools
Verification techniques, tips and tools
 
Journalism, data and storytelling: navigating the battlefield
Journalism, data and storytelling: navigating the battlefieldJournalism, data and storytelling: navigating the battlefield
Journalism, data and storytelling: navigating the battlefield
 
Mobile + community: designing a news day that gets student journalists speaki...
Mobile + community: designing a news day that gets student journalists speaki...Mobile + community: designing a news day that gets student journalists speaki...
Mobile + community: designing a news day that gets student journalists speaki...
 

Recently uploaded

Employee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxEmployee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxNirmalaLoungPoorunde1
 
Painted Grey Ware.pptx, PGW Culture of India
Painted Grey Ware.pptx, PGW Culture of IndiaPainted Grey Ware.pptx, PGW Culture of India
Painted Grey Ware.pptx, PGW Culture of IndiaVirag Sontakke
 
Biting mechanism of poisonous snakes.pdf
Biting mechanism of poisonous snakes.pdfBiting mechanism of poisonous snakes.pdf
Biting mechanism of poisonous snakes.pdfadityarao40181
 
DATA STRUCTURE AND ALGORITHM for beginners
DATA STRUCTURE AND ALGORITHM for beginnersDATA STRUCTURE AND ALGORITHM for beginners
DATA STRUCTURE AND ALGORITHM for beginnersSabitha Banu
 
Introduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher EducationIntroduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher Educationpboyjonauth
 
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
CELL CYCLE Division Science 8 quarter IV.pptx
CELL CYCLE Division Science 8 quarter IV.pptxCELL CYCLE Division Science 8 quarter IV.pptx
CELL CYCLE Division Science 8 quarter IV.pptxJiesonDelaCerna
 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)eniolaolutunde
 
Framing an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdf
Framing an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdfFraming an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdf
Framing an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdfUjwalaBharambe
 
Historical philosophical, theoretical, and legal foundations of special and i...
Historical philosophical, theoretical, and legal foundations of special and i...Historical philosophical, theoretical, and legal foundations of special and i...
Historical philosophical, theoretical, and legal foundations of special and i...jaredbarbolino94
 
EPANDING THE CONTENT OF AN OUTLINE using notes.pptx
EPANDING THE CONTENT OF AN OUTLINE using notes.pptxEPANDING THE CONTENT OF AN OUTLINE using notes.pptx
EPANDING THE CONTENT OF AN OUTLINE using notes.pptxRaymartEstabillo3
 
Solving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptxSolving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptxOH TEIK BIN
 
Types of Journalistic Writing Grade 8.pptx
Types of Journalistic Writing Grade 8.pptxTypes of Journalistic Writing Grade 8.pptx
Types of Journalistic Writing Grade 8.pptxEyham Joco
 
Crayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon ACrayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon AUnboundStockton
 
CARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptxCARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptxGaneshChakor2
 
How to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptxHow to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptxmanuelaromero2013
 
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17Celine George
 

Recently uploaded (20)

Employee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxEmployee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptx
 
Painted Grey Ware.pptx, PGW Culture of India
Painted Grey Ware.pptx, PGW Culture of IndiaPainted Grey Ware.pptx, PGW Culture of India
Painted Grey Ware.pptx, PGW Culture of India
 
Biting mechanism of poisonous snakes.pdf
Biting mechanism of poisonous snakes.pdfBiting mechanism of poisonous snakes.pdf
Biting mechanism of poisonous snakes.pdf
 
9953330565 Low Rate Call Girls In Rohini Delhi NCR
9953330565 Low Rate Call Girls In Rohini  Delhi NCR9953330565 Low Rate Call Girls In Rohini  Delhi NCR
9953330565 Low Rate Call Girls In Rohini Delhi NCR
 
DATA STRUCTURE AND ALGORITHM for beginners
DATA STRUCTURE AND ALGORITHM for beginnersDATA STRUCTURE AND ALGORITHM for beginners
DATA STRUCTURE AND ALGORITHM for beginners
 
Introduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher EducationIntroduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher Education
 
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
 
CELL CYCLE Division Science 8 quarter IV.pptx
CELL CYCLE Division Science 8 quarter IV.pptxCELL CYCLE Division Science 8 quarter IV.pptx
CELL CYCLE Division Science 8 quarter IV.pptx
 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)
 
Framing an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdf
Framing an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdfFraming an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdf
Framing an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdf
 
Historical philosophical, theoretical, and legal foundations of special and i...
Historical philosophical, theoretical, and legal foundations of special and i...Historical philosophical, theoretical, and legal foundations of special and i...
Historical philosophical, theoretical, and legal foundations of special and i...
 
EPANDING THE CONTENT OF AN OUTLINE using notes.pptx
EPANDING THE CONTENT OF AN OUTLINE using notes.pptxEPANDING THE CONTENT OF AN OUTLINE using notes.pptx
EPANDING THE CONTENT OF AN OUTLINE using notes.pptx
 
Solving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptxSolving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptx
 
Types of Journalistic Writing Grade 8.pptx
Types of Journalistic Writing Grade 8.pptxTypes of Journalistic Writing Grade 8.pptx
Types of Journalistic Writing Grade 8.pptx
 
Crayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon ACrayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon A
 
ESSENTIAL of (CS/IT/IS) class 06 (database)
ESSENTIAL of (CS/IT/IS) class 06 (database)ESSENTIAL of (CS/IT/IS) class 06 (database)
ESSENTIAL of (CS/IT/IS) class 06 (database)
 
TataKelola dan KamSiber Kecerdasan Buatan v022.pdf
TataKelola dan KamSiber Kecerdasan Buatan v022.pdfTataKelola dan KamSiber Kecerdasan Buatan v022.pdf
TataKelola dan KamSiber Kecerdasan Buatan v022.pdf
 
CARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptxCARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptx
 
How to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptxHow to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptx
 
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
 

Scraping in 60 minutes