Your SlideShare is downloading. ×
New information for new journalists pt2: data
Upcoming SlideShare
Loading in...5

Thanks for flagging this SlideShare!

Oops! An error has occurred.


Introducing the official SlideShare app

Stunning, full-screen experience for iPhone and Android

Text the download link to your phone

Standard text messaging rates apply

New information for new journalists pt2: data


Published on

Presentation to ESCACC, Barcelona, 2010

Presentation to ESCACC, Barcelona, 2010

Published in: Education, Technology

  • Be the first to comment

No Downloads
Total Views
On Slideshare
From Embeds
Number of Embeds
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

No notes for slide


  • 1. Introduction Paul Bradshaw Data journalism
  • 2. Ivy Lee
  • 3. “Each weekday, my computer program goes to the Chicago Police Department's website and gathers all crimes reported in Chicago.” Adrian Holovaty
  • 4. Great stories Engagement Targeting/relevance Why?
  • 5. “The Tribune’s biggest magnet by far has been its more than three dozen interactive databases, which collectively have drawn three times as many page views as the site’s stories.”
  • 6. Times film genres
  • 7. Data Journalism Continuum
  • 8. 1. Finding data
  • 9. What is data?
  • 10. Numbers Text Connections Live data Behavioural data Images, audio, video Anything that a computer can work with
  • 11. Start with the data and look for the stories? (MPs’ expenses) Or start with a lead and look for the data? Passive vs active data journalism
  • 12. What Do They Know Openlylocal, Scraperwiki Disclosure logs RSS feeds, XML, structured data Some UK projects
  • 13. CAR
  • 14. Advanced search by file type “Performance figures” Filetype: pdf Filetype: xls Filetype: doc Filetype: ppt Filetype: rdf OR xml
  • 15. Advanced search by domain “Disclosure logs” site: Database site: OR .org +Tables –chairs site: Health, police, military domains
  • 16. Use overseas sources • US medicine databases • EU subsidy databases • Swedish people data • International police agency correspondence
  • 17. Scraping Scraping can automate & schedule the gathering process if there are multiple sources Tools: OutWit Hub plugin, Yahoo! Pipes, Scraperwiki, Google Spreadsheets formulae
  • 18. Interrogating data
  • 19. Humans collect data Humans enter data Human error Time spent now...
  • 20. Different words for the same thing Double spaces, punctuation Wrong data type Mistyped Duplicate entries Default entries (1/1/00) ...Saves time later
  • 21. "Because we take the time to clean the data, we are able to do lobbying stories no other news organisation can do." David Donald, Center for Public Integrity
  • 22. Group by term then sort to see duplications Find & replace double spaces, etc. Select column/row & check data type Sort to find unusually large/small, and neighbouring misspellings Cleaning methods
  • 23. Never publish a name from data without running a background check Check.
  • 24. Other tools Freebase Gridworks: see
  • 25. Visualising data
  • 26. or
  • 27. (trends, dips, correlations)
  • 28. (comparison, themes)
  • 29. (proportions, comparison)
  • 30. Mashing data
  • 31. Geocoded data with map - Live data (e.g. Twitter API) - Static data (e.g. Google Docs) - Dynamic data (e.g. Google Form) 2 spreadsheets with common data - Tools: MySQL, Access, etc. Combining data sources
  • 32. Twittermap Wikipedia map NYT Property Guardian vs Nature BBC Most Read BBC Olympic Village Combining data sources
  • 33. Big events (protests, Olympics, inauguration) Comparisons Geocoded data Connections What mashes well?
  • 34. Aggregates Maps Filters Counts Cleans or reformats (regex) Yahoo! Pipes
  • 35. Scraperwiki – mapping library Maptube – combine maps Google Docs – publish in different formats +++ Other tools
  • 36. Computer-readable data Paris – France, Texas, or Hilton? Unique identifiers – usually URI RDF, RDFa, XML, etc. Semantic web & linked data
  • 37. Application Programming Interface Build on top of data Google Maps, Twitter, Facebook, Digg, Guardian, NYT, NPR, They Work For You, etc. API
  • 38. Q&A