Your SlideShare is downloading. ×
  • Like

Loading…

Flash Player 9 (or above) is needed to view presentations.
We have detected that you do not have it on your computer. To install it, go here.

×

Now you can save presentations on your phone or tablet

Available for both IPhone and Android

Text the download link to your phone

Standard text messaging rates apply

New information for new journalists pt2: data

  • 1,712 views
Published

Presentation to ESCACC, Barcelona, 2010

Presentation to ESCACC, Barcelona, 2010

Published in Education , Technology
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
No Downloads

Views

Total Views
1,712
On SlideShare
0
From Embeds
0
Number of Embeds
7

Actions

Shares
Downloads
25
Comments
0
Likes
3

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. Introduction Paul Bradshaw Data journalism
  • 2. Ivy Lee
  • 3. “ Each weekday, my computer program goes to the Chicago Police Department's website and gathers all crimes reported in Chicago.” Adrian Holovaty
  • 4.  
  • 5. Great stories Engagement Targeting/relevance Why ?
  • 6.  
  • 7.  
  • 8.  
  • 9.  
  • 10.  
  • 11. “ The Tribune’s biggest magnet by far has been its more than three dozen interactive databases , which collectively have drawn three times as many page views as the site’s stories .” http://bit.ly/dj2dmz
  • 12.  
  • 13.  
  • 14. Times film genres
  • 15.  
  • 16. Data Journalism Continuum
  • 17. 1. Finding data
  • 18. What is data?
  • 19. Numbers Text Connections Live data Behavioural data Images, audio, video Anything that a computer can work with
  • 20.  
  • 21. Start with the data and look for the stories? (MPs’ expenses) Or start with a lead and look for the data? Passive vs active data journalism
  • 22. Data.gov.uk What Do They Know Openlylocal, Scraperwiki Disclosure logs RSS feeds, XML, structured data Some UK projects
  • 23. Delicious.com/paulb/car CAR
  • 24. Advanced search by file type “ Performance figures” Filetype: pdf Filetype: xls Filetype: doc Filetype: ppt Filetype: rdf OR xml
  • 25. Advanced search by domain “ Disclosure logs” site: .gov.es Database site: .org.cat OR .org +Tables –chairs site: Health, police, military domains
  • 26. Use overseas sources
      • US medicine databases
      • EU subsidy databases
      • Swedish people data
      • International police agency correspondence
  • 27. Scraping Scraping can automate & schedule the gathering process if there are multiple sources Tools: OutWit Hub plugin, Yahoo! Pipes, Scraperwiki, Google Spreadsheets formulae
  • 28. Interrogating data
  • 29. Humans collect data Humans enter data Human error Time spent now...
  • 30. Different words for the same thing Double spaces, punctuation Wrong data type Mistyped Duplicate entries Default entries (1/1/00) ...Saves time later
  • 31. "Because we take the time to clean the data, we are able to do lobbying stories no other news organisation can do." David Donald,  Center for Public Integrity
  • 32. Group by term then sort to see duplications Find & replace double spaces, etc. Select column/row & check data type Sort to find unusually large/small, and neighbouring misspellings Cleaning methods
  • 33. Never publish a name from data without  running a background check Check.
  • 34. Other tools Freebase Gridworks: see http://vimeo.com/10081183
  • 35. Visualising data
  • 36.  
  • 37. or http://chartchooser.juiceanalytics.com/
  • 38.  
  • 39. (trends, dips, correlations)
  • 40.  
  • 41. (comparison, themes)
  • 42. (proportions, comparison)
  • 43. Mashing data
  • 44. Geocoded data with map - Live data (e.g. Twitter API) - Static data (e.g. Google Docs) - Dynamic data (e.g. Google Form) 2 spreadsheets with common data - Tools: MySQL, Access, etc. Combining data sources
  • 45.  
  • 46.  
  • 47.  
  • 48.  
  • 49.  
  • 50.  
  • 51.  
  • 52. Twittermap Wikipedia map NYT Property Guardian vs Nature BBC Most Read BBC Olympic Village Combining data sources
  • 53. Big events (protests, Olympics, inauguration) Comparisons Geocoded data Connections What mashes well?
  • 54. Aggregates Maps Filters Counts Cleans or reformats (regex) Yahoo! Pipes
  • 55. Scraperwiki – mapping library Maptube – combine maps Google Docs – publish in different formats +++ Other tools
  • 56. Computer-readable data Paris – France, Texas, or Hilton? Unique identifiers – usually URI RDF, RDFa, XML, etc. Semantic web & linked data
  • 57. Application Programming Interface Build on top of data Google Maps, Twitter, Facebook, Digg, Guardian, NYT, NPR, They Work For You, etc. API
  • 58. Slideshare.net/onlinejournalist Twitter.com/paulbradshaw Q&A
  • 59. Delicious.com/paulb/datajournalism Delicious.com/paulb/visualisation Delicious.com/paulb/statistics Bookmarks