Introduction
Paul Bradshaw
Data journalism
Ivy Lee
“Each weekday, my computer
program goes to the Chicago
Police Department's website and
gathers all crimes reported in
Chic...
Great stories
Engagement
Targeting/relevance
Why?
“The Tribune’s biggest magnet
by far has been its more than
three dozen interactive
databases, which collectively
have dra...
Times film genres
Data Journalism Continuum
1. Finding data
What is data?
Numbers
Text
Connections
Live data
Behavioural data
Images, audio, video
Anything that a computer can
work with
Start with the data and look for the
stories? (MPs’ expenses)
Or start with a lead and look for the data?
Passive vs activ...
Data.gov.uk
What Do They Know
Openlylocal, Scraperwiki
Disclosure logs
RSS feeds, XML, structured data
Some UK projects
Delicious.com/paulb/car
CAR
Advanced search by file type
“Performance figures” Filetype: pdf
Filetype: xls
Filetype: doc
Filetype: ppt
Filetype: rdf O...
Advanced search by domain
“Disclosure logs” site: .gov.es
Database site: .org.cat OR .org
+Tables –chairs site:
Health, po...
Use overseas sources
• US medicine databases
• EU subsidy databases
• Swedish people data
• International police agency
co...
Scraping
Scraping can automate & schedule the
gathering process if there are multiple
sources
Tools: OutWit Hub plugin, Ya...
Interrogating data
Humans collect data
Humans enter data
Human error
Time spent now...
Different words for the same thing
Double spaces, punctuation
Wrong data type
Mistyped
Duplicate entries
Default entries (...
"Because we take the time to clean the
data, we are able to do lobbying stories
no other news organisation can do."
David ...
Group by term then sort to see
duplications
Find & replace double spaces, etc.
Select column/row & check data type
Sort to...
Never publish a name from data without
running a background check
Check.
Other tools
Freebase Gridworks:
see http://vimeo.com/10081183
Visualising data
or http://chartchooser.juiceanalytics.com/
(trends, dips, correlations)
(comparison, themes)
(proportions, comparison)
Mashing data
Geocoded data with map
- Live data (e.g. Twitter API)
- Static data (e.g. Google Docs)
- Dynamic data (e.g. Google Form)
2...
Twittermap
Wikipedia map
NYT Property
Guardian vs Nature
BBC Most Read
BBC Olympic Village
Combining data sources
Big events (protests, Olympics,
inauguration)
Comparisons
Geocoded data
Connections
What mashes well?
Aggregates
Maps
Filters
Counts
Cleans or reformats (regex)
Yahoo! Pipes
Scraperwiki – mapping library
Maptube – combine maps
Google Docs – publish in different
formats
+++
Other tools
Computer-readable data
Paris – France, Texas, or Hilton?
Unique identifiers – usually URI
RDF, RDFa, XML, etc.
Semantic we...
Application Programming Interface
Build on top of data
Google Maps, Twitter, Facebook, Digg,
Guardian, NYT, NPR, They Work...
Slideshare.net/onlinejournalist
Twitter.com/paulbradshaw
Q&A
Delicious.com/paulb/datajournalism
Delicious.com/paulb/visualisation
Delicious.com/paulb/statistics
Bookmarks
New information for new journalists pt2: data
New information for new journalists pt2: data
New information for new journalists pt2: data
New information for new journalists pt2: data
New information for new journalists pt2: data
New information for new journalists pt2: data
New information for new journalists pt2: data
New information for new journalists pt2: data
New information for new journalists pt2: data
New information for new journalists pt2: data
New information for new journalists pt2: data
New information for new journalists pt2: data
New information for new journalists pt2: data
New information for new journalists pt2: data
New information for new journalists pt2: data
New information for new journalists pt2: data
New information for new journalists pt2: data
New information for new journalists pt2: data
New information for new journalists pt2: data
New information for new journalists pt2: data
Upcoming SlideShare
Loading in...5
×

New information for new journalists pt2: data

1,818

Published on

Presentation to ESCACC, Barcelona, 2010

Published in: Education, Technology
0 Comments
3 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
1,818
On Slideshare
0
From Embeds
0
Number of Embeds
7
Actions
Shares
0
Downloads
26
Comments
0
Likes
3
Embeds 0
No embeds

No notes for slide

New information for new journalists pt2: data

  1. 1. Introduction Paul Bradshaw Data journalism
  2. 2. Ivy Lee
  3. 3. “Each weekday, my computer program goes to the Chicago Police Department's website and gathers all crimes reported in Chicago.” Adrian Holovaty
  4. 4. Great stories Engagement Targeting/relevance Why?
  5. 5. “The Tribune’s biggest magnet by far has been its more than three dozen interactive databases, which collectively have drawn three times as many page views as the site’s stories.” http://bit.ly/dj2dmz
  6. 6. Times film genres
  7. 7. Data Journalism Continuum
  8. 8. 1. Finding data
  9. 9. What is data?
  10. 10. Numbers Text Connections Live data Behavioural data Images, audio, video Anything that a computer can work with
  11. 11. Start with the data and look for the stories? (MPs’ expenses) Or start with a lead and look for the data? Passive vs active data journalism
  12. 12. Data.gov.uk What Do They Know Openlylocal, Scraperwiki Disclosure logs RSS feeds, XML, structured data Some UK projects
  13. 13. Delicious.com/paulb/car CAR
  14. 14. Advanced search by file type “Performance figures” Filetype: pdf Filetype: xls Filetype: doc Filetype: ppt Filetype: rdf OR xml
  15. 15. Advanced search by domain “Disclosure logs” site: .gov.es Database site: .org.cat OR .org +Tables –chairs site: Health, police, military domains
  16. 16. Use overseas sources • US medicine databases • EU subsidy databases • Swedish people data • International police agency correspondence
  17. 17. Scraping Scraping can automate & schedule the gathering process if there are multiple sources Tools: OutWit Hub plugin, Yahoo! Pipes, Scraperwiki, Google Spreadsheets formulae
  18. 18. Interrogating data
  19. 19. Humans collect data Humans enter data Human error Time spent now...
  20. 20. Different words for the same thing Double spaces, punctuation Wrong data type Mistyped Duplicate entries Default entries (1/1/00) ...Saves time later
  21. 21. "Because we take the time to clean the data, we are able to do lobbying stories no other news organisation can do." David Donald, Center for Public Integrity
  22. 22. Group by term then sort to see duplications Find & replace double spaces, etc. Select column/row & check data type Sort to find unusually large/small, and neighbouring misspellings Cleaning methods
  23. 23. Never publish a name from data without running a background check Check.
  24. 24. Other tools Freebase Gridworks: see http://vimeo.com/10081183
  25. 25. Visualising data
  26. 26. or http://chartchooser.juiceanalytics.com/
  27. 27. (trends, dips, correlations)
  28. 28. (comparison, themes)
  29. 29. (proportions, comparison)
  30. 30. Mashing data
  31. 31. Geocoded data with map - Live data (e.g. Twitter API) - Static data (e.g. Google Docs) - Dynamic data (e.g. Google Form) 2 spreadsheets with common data - Tools: MySQL, Access, etc. Combining data sources
  32. 32. Twittermap Wikipedia map NYT Property Guardian vs Nature BBC Most Read BBC Olympic Village Combining data sources
  33. 33. Big events (protests, Olympics, inauguration) Comparisons Geocoded data Connections What mashes well?
  34. 34. Aggregates Maps Filters Counts Cleans or reformats (regex) Yahoo! Pipes
  35. 35. Scraperwiki – mapping library Maptube – combine maps Google Docs – publish in different formats +++ Other tools
  36. 36. Computer-readable data Paris – France, Texas, or Hilton? Unique identifiers – usually URI RDF, RDFa, XML, etc. Semantic web & linked data
  37. 37. Application Programming Interface Build on top of data Google Maps, Twitter, Facebook, Digg, Guardian, NYT, NPR, They Work For You, etc. API
  38. 38. Slideshare.net/onlinejournalist Twitter.com/paulbradshaw Q&A
  39. 39. Delicious.com/paulb/datajournalism Delicious.com/paulb/visualisation Delicious.com/paulb/statistics Bookmarks
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×