Submit Search
Upload
Scraping the Olympics
•
3 likes
•
14,975 views
Paul Bradshaw
Follow
Presentation for a workshop at the BBC Data Journalism Day, July 2012
Read less
Read more
Education
News & Politics
Technology
Report
Share
Report
Share
1 of 32
Download now
Download to read offline
Recommended
Making data journalism work
Making data journalism work
Paul Bradshaw
Data validation in the Digital Age
Data validation in the Digital Age
J T "Tom" Johnson
Open Data in the Newsroom: What's the story? (Talk from OK Con 2011 in Berlin)
Open Data in the Newsroom: What's the story? (Talk from OK Con 2011 in Berlin)
Mirko Lorenz
Data Journalism
Data Journalism
pilhofer
Data journalism's future: new sources, new opportunities
Data journalism's future: new sources, new opportunities
Paul Bradshaw
Olympic Pages
Olympic Pages
Society for News Design
Brief introduction to data visualization
Brief introduction to data visualization
Zach Gemignani
How to work with a bullshitting robot
How to work with a bullshitting robot
Paul Bradshaw
Recommended
Making data journalism work
Making data journalism work
Paul Bradshaw
Data validation in the Digital Age
Data validation in the Digital Age
J T "Tom" Johnson
Open Data in the Newsroom: What's the story? (Talk from OK Con 2011 in Berlin)
Open Data in the Newsroom: What's the story? (Talk from OK Con 2011 in Berlin)
Mirko Lorenz
Data Journalism
Data Journalism
pilhofer
Data journalism's future: new sources, new opportunities
Data journalism's future: new sources, new opportunities
Paul Bradshaw
Olympic Pages
Olympic Pages
Society for News Design
Brief introduction to data visualization
Brief introduction to data visualization
Zach Gemignani
How to work with a bullshitting robot
How to work with a bullshitting robot
Paul Bradshaw
How to generate a 100+ page website using parameterisation in R
How to generate a 100+ page website using parameterisation in R
Paul Bradshaw
ChatGPT (and generative AI) in journalism
ChatGPT (and generative AI) in journalism
Paul Bradshaw
Data journalism: history and roles
Data journalism: history and roles
Paul Bradshaw
Working on data stories: different approaches
Working on data stories: different approaches
Paul Bradshaw
Visual journalism: gifs, emoji, memes and other techniques
Visual journalism: gifs, emoji, memes and other techniques
Paul Bradshaw
Using narrative structures in shortform and longform journalism
Using narrative structures in shortform and longform journalism
Paul Bradshaw
Narrative and multiplatform journalism (part 1)
Narrative and multiplatform journalism (part 1)
Paul Bradshaw
Teaching data journalism (Abraji 2021)
Teaching data journalism (Abraji 2021)
Paul Bradshaw
Data journalism on the air: 3 tips
Data journalism on the air: 3 tips
Paul Bradshaw
7 angles for data stories
7 angles for data stories
Paul Bradshaw
Uncertain times, stories of uncertainty
Uncertain times, stories of uncertainty
Paul Bradshaw
Ergodic education (online teaching and interactivity)
Ergodic education (online teaching and interactivity)
Paul Bradshaw
Storytelling in the database era: uncertainty and science reporting
Storytelling in the database era: uncertainty and science reporting
Paul Bradshaw
Cognitive bias: a quick guide for journalists
Cognitive bias: a quick guide for journalists
Paul Bradshaw
The 3 chords of data journalism
The 3 chords of data journalism
Paul Bradshaw
Data journalism: what it is, how to use data for stories
Data journalism: what it is, how to use data for stories
Paul Bradshaw
Teaching AI in data journalism
Teaching AI in data journalism
Paul Bradshaw
10 ways AI can be used for investigations
10 ways AI can be used for investigations
Paul Bradshaw
Open Data Utopia? (SciCAR 19)
Open Data Utopia? (SciCAR 19)
Paul Bradshaw
Scraping for journalists - ideas, concepts and tips (CIJ Summer School 2019)
Scraping for journalists - ideas, concepts and tips (CIJ Summer School 2019)
Paul Bradshaw
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
RAM LAL ANAND COLLEGE, DELHI UNIVERSITY.
Measures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SD
Thiyagu K
More Related Content
More from Paul Bradshaw
How to generate a 100+ page website using parameterisation in R
How to generate a 100+ page website using parameterisation in R
Paul Bradshaw
ChatGPT (and generative AI) in journalism
ChatGPT (and generative AI) in journalism
Paul Bradshaw
Data journalism: history and roles
Data journalism: history and roles
Paul Bradshaw
Working on data stories: different approaches
Working on data stories: different approaches
Paul Bradshaw
Visual journalism: gifs, emoji, memes and other techniques
Visual journalism: gifs, emoji, memes and other techniques
Paul Bradshaw
Using narrative structures in shortform and longform journalism
Using narrative structures in shortform and longform journalism
Paul Bradshaw
Narrative and multiplatform journalism (part 1)
Narrative and multiplatform journalism (part 1)
Paul Bradshaw
Teaching data journalism (Abraji 2021)
Teaching data journalism (Abraji 2021)
Paul Bradshaw
Data journalism on the air: 3 tips
Data journalism on the air: 3 tips
Paul Bradshaw
7 angles for data stories
7 angles for data stories
Paul Bradshaw
Uncertain times, stories of uncertainty
Uncertain times, stories of uncertainty
Paul Bradshaw
Ergodic education (online teaching and interactivity)
Ergodic education (online teaching and interactivity)
Paul Bradshaw
Storytelling in the database era: uncertainty and science reporting
Storytelling in the database era: uncertainty and science reporting
Paul Bradshaw
Cognitive bias: a quick guide for journalists
Cognitive bias: a quick guide for journalists
Paul Bradshaw
The 3 chords of data journalism
The 3 chords of data journalism
Paul Bradshaw
Data journalism: what it is, how to use data for stories
Data journalism: what it is, how to use data for stories
Paul Bradshaw
Teaching AI in data journalism
Teaching AI in data journalism
Paul Bradshaw
10 ways AI can be used for investigations
10 ways AI can be used for investigations
Paul Bradshaw
Open Data Utopia? (SciCAR 19)
Open Data Utopia? (SciCAR 19)
Paul Bradshaw
Scraping for journalists - ideas, concepts and tips (CIJ Summer School 2019)
Scraping for journalists - ideas, concepts and tips (CIJ Summer School 2019)
Paul Bradshaw
More from Paul Bradshaw
(20)
How to generate a 100+ page website using parameterisation in R
How to generate a 100+ page website using parameterisation in R
ChatGPT (and generative AI) in journalism
ChatGPT (and generative AI) in journalism
Data journalism: history and roles
Data journalism: history and roles
Working on data stories: different approaches
Working on data stories: different approaches
Visual journalism: gifs, emoji, memes and other techniques
Visual journalism: gifs, emoji, memes and other techniques
Using narrative structures in shortform and longform journalism
Using narrative structures in shortform and longform journalism
Narrative and multiplatform journalism (part 1)
Narrative and multiplatform journalism (part 1)
Teaching data journalism (Abraji 2021)
Teaching data journalism (Abraji 2021)
Data journalism on the air: 3 tips
Data journalism on the air: 3 tips
7 angles for data stories
7 angles for data stories
Uncertain times, stories of uncertainty
Uncertain times, stories of uncertainty
Ergodic education (online teaching and interactivity)
Ergodic education (online teaching and interactivity)
Storytelling in the database era: uncertainty and science reporting
Storytelling in the database era: uncertainty and science reporting
Cognitive bias: a quick guide for journalists
Cognitive bias: a quick guide for journalists
The 3 chords of data journalism
The 3 chords of data journalism
Data journalism: what it is, how to use data for stories
Data journalism: what it is, how to use data for stories
Teaching AI in data journalism
Teaching AI in data journalism
10 ways AI can be used for investigations
10 ways AI can be used for investigations
Open Data Utopia? (SciCAR 19)
Open Data Utopia? (SciCAR 19)
Scraping for journalists - ideas, concepts and tips (CIJ Summer School 2019)
Scraping for journalists - ideas, concepts and tips (CIJ Summer School 2019)
Recently uploaded
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
RAM LAL ANAND COLLEGE, DELHI UNIVERSITY.
Measures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SD
Thiyagu K
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
Sayali Powar
Disha NEET Physics Guide for classes 11 and 12.pdf
Disha NEET Physics Guide for classes 11 and 12.pdf
chloefrazer622
Web & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdf
Jayanti Pande
Separation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and Actinides
FatimaKhan178732
Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1
Maestría en Comunicación Digital Interactiva - UNR
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)
eniolaolutunde
mini mental status format.docx
mini mental status format.docx
PoojaSen20
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
SoniaTolstoy
Arihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdf
chloefrazer622
The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13
Steve Thomason
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
fonyou31
APM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across Sectors
Association for Project Management
Accessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impact
dawncurless
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdf
QucHHunhnh
Measures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and Mode
Thiyagu K
Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104
misteraugie
Interactive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communication
nomboosow
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy Consulting
TechSoup
Recently uploaded
(20)
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
Measures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SD
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
Disha NEET Physics Guide for classes 11 and 12.pdf
Disha NEET Physics Guide for classes 11 and 12.pdf
Web & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdf
Separation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and Actinides
Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)
mini mental status format.docx
mini mental status format.docx
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
Arihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdf
The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
APM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across Sectors
Accessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impact
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdf
Measures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and Mode
Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104
Interactive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communication
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy Consulting
Scraping the Olympics
1.
Scraping the
Olympics Paul Bradshaw, author: Scraping for Journalists * Leanpub.com/scrapingforjournalists
2.
? Scraping basics Combining data Finding
stories in data *
3.
*
4.
Function (Parameters)
*
5.
Function (Parameters) =SUM(A2:A50) =AVERAGE(B2:B300) =COUNTIF(A10:A3000,”Smith”)
*
6.
(“string”, index)
*
7.
Tip: search for documentation
*
8.
Tip: search for
structure around data *
9.
*
10.
//div[starts-with(@ class, ‘jobWrap’)]*
11.
*
12.
Combining data
*
13.
? Question: Which torchbearers are from
Dorset? *
14.
*
15.
*
16.
*
17.
*
18.
*
19.
*
20.
*
21.
*
22.
? Finding leads: Corporate torchbearers?
*
23.
*
24.
*
25.
*
26.
*
27.
New entries -
or disappearing ones *
28.
*
29.
*
30.
*
31.
*
32.
Leanpub.com/scrapingforjournalists
@paulbradshaw onlinejournalismblog.com helpmeinvestigate.com slideshare.net/onlinejournalist * linkedin.com/in/onlinejournalist
Download now