Your SlideShare is downloading. ×
New information for new journalists pt2: data
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

New information for new journalists pt2: data

1,768
views

Published on

Presentation to ESCACC, Barcelona, 2010

Presentation to ESCACC, Barcelona, 2010

Published in: Education, Technology

0 Comments
3 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
1,768
On Slideshare
0
From Embeds
0
Number of Embeds
7
Actions
Shares
0
Downloads
25
Comments
0
Likes
3
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. Introduction Paul Bradshaw Data journalism
  • 2. Ivy Lee
  • 3. “Each weekday, my computer program goes to the Chicago Police Department's website and gathers all crimes reported in Chicago.” Adrian Holovaty
  • 4. Great stories Engagement Targeting/relevance Why?
  • 5. “The Tribune’s biggest magnet by far has been its more than three dozen interactive databases, which collectively have drawn three times as many page views as the site’s stories.” http://bit.ly/dj2dmz
  • 6. Times film genres
  • 7. Data Journalism Continuum
  • 8. 1. Finding data
  • 9. What is data?
  • 10. Numbers Text Connections Live data Behavioural data Images, audio, video Anything that a computer can work with
  • 11. Start with the data and look for the stories? (MPs’ expenses) Or start with a lead and look for the data? Passive vs active data journalism
  • 12. Data.gov.uk What Do They Know Openlylocal, Scraperwiki Disclosure logs RSS feeds, XML, structured data Some UK projects
  • 13. Delicious.com/paulb/car CAR
  • 14. Advanced search by file type “Performance figures” Filetype: pdf Filetype: xls Filetype: doc Filetype: ppt Filetype: rdf OR xml
  • 15. Advanced search by domain “Disclosure logs” site: .gov.es Database site: .org.cat OR .org +Tables –chairs site: Health, police, military domains
  • 16. Use overseas sources • US medicine databases • EU subsidy databases • Swedish people data • International police agency correspondence
  • 17. Scraping Scraping can automate & schedule the gathering process if there are multiple sources Tools: OutWit Hub plugin, Yahoo! Pipes, Scraperwiki, Google Spreadsheets formulae
  • 18. Interrogating data
  • 19. Humans collect data Humans enter data Human error Time spent now...
  • 20. Different words for the same thing Double spaces, punctuation Wrong data type Mistyped Duplicate entries Default entries (1/1/00) ...Saves time later
  • 21. "Because we take the time to clean the data, we are able to do lobbying stories no other news organisation can do." David Donald, Center for Public Integrity
  • 22. Group by term then sort to see duplications Find & replace double spaces, etc. Select column/row & check data type Sort to find unusually large/small, and neighbouring misspellings Cleaning methods
  • 23. Never publish a name from data without running a background check Check.
  • 24. Other tools Freebase Gridworks: see http://vimeo.com/10081183
  • 25. Visualising data
  • 26. or http://chartchooser.juiceanalytics.com/
  • 27. (trends, dips, correlations)
  • 28. (comparison, themes)
  • 29. (proportions, comparison)
  • 30. Mashing data
  • 31. Geocoded data with map - Live data (e.g. Twitter API) - Static data (e.g. Google Docs) - Dynamic data (e.g. Google Form) 2 spreadsheets with common data - Tools: MySQL, Access, etc. Combining data sources
  • 32. Twittermap Wikipedia map NYT Property Guardian vs Nature BBC Most Read BBC Olympic Village Combining data sources
  • 33. Big events (protests, Olympics, inauguration) Comparisons Geocoded data Connections What mashes well?
  • 34. Aggregates Maps Filters Counts Cleans or reformats (regex) Yahoo! Pipes
  • 35. Scraperwiki – mapping library Maptube – combine maps Google Docs – publish in different formats +++ Other tools
  • 36. Computer-readable data Paris – France, Texas, or Hilton? Unique identifiers – usually URI RDF, RDFa, XML, etc. Semantic web & linked data
  • 37. Application Programming Interface Build on top of data Google Maps, Twitter, Facebook, Digg, Guardian, NYT, NPR, They Work For You, etc. API
  • 38. Slideshare.net/onlinejournalist Twitter.com/paulbradshaw Q&A
  • 39. Delicious.com/paulb/datajournalism Delicious.com/paulb/visualisation Delicious.com/paulb/statistics Bookmarks