• Like

Loading…

Flash Player 9 (or above) is needed to view presentations.
We have detected that you do not have it on your computer. To install it, go here.

New information for new journalists pt2: data

  • 1,700 views
Uploaded on

Presentation to ESCACC, Barcelona, 2010

Presentation to ESCACC, Barcelona, 2010

More in: Education , Technology
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
No Downloads

Views

Total Views
1,700
On Slideshare
0
From Embeds
0
Number of Embeds
7

Actions

Shares
Downloads
25
Comments
0
Likes
3

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. Introduction Paul Bradshaw Data journalism
  • 2. Ivy Lee
  • 3. “ Each weekday, my computer program goes to the Chicago Police Department's website and gathers all crimes reported in Chicago.” Adrian Holovaty
  • 4.  
  • 5. Great stories Engagement Targeting/relevance Why ?
  • 6.  
  • 7.  
  • 8.  
  • 9.  
  • 10.  
  • 11. “ The Tribune’s biggest magnet by far has been its more than three dozen interactive databases , which collectively have drawn three times as many page views as the site’s stories .” http://bit.ly/dj2dmz
  • 12.  
  • 13.  
  • 14. Times film genres
  • 15.  
  • 16. Data Journalism Continuum
  • 17. 1. Finding data
  • 18. What is data?
  • 19. Numbers Text Connections Live data Behavioural data Images, audio, video Anything that a computer can work with
  • 20.  
  • 21. Start with the data and look for the stories? (MPs’ expenses) Or start with a lead and look for the data? Passive vs active data journalism
  • 22. Data.gov.uk What Do They Know Openlylocal, Scraperwiki Disclosure logs RSS feeds, XML, structured data Some UK projects
  • 23. Delicious.com/paulb/car CAR
  • 24. Advanced search by file type “ Performance figures” Filetype: pdf Filetype: xls Filetype: doc Filetype: ppt Filetype: rdf OR xml
  • 25. Advanced search by domain “ Disclosure logs” site: .gov.es Database site: .org.cat OR .org +Tables –chairs site: Health, police, military domains
  • 26. Use overseas sources
      • US medicine databases
      • EU subsidy databases
      • Swedish people data
      • International police agency correspondence
  • 27. Scraping Scraping can automate & schedule the gathering process if there are multiple sources Tools: OutWit Hub plugin, Yahoo! Pipes, Scraperwiki, Google Spreadsheets formulae
  • 28. Interrogating data
  • 29. Humans collect data Humans enter data Human error Time spent now...
  • 30. Different words for the same thing Double spaces, punctuation Wrong data type Mistyped Duplicate entries Default entries (1/1/00) ...Saves time later
  • 31. "Because we take the time to clean the data, we are able to do lobbying stories no other news organisation can do." David Donald,  Center for Public Integrity
  • 32. Group by term then sort to see duplications Find & replace double spaces, etc. Select column/row & check data type Sort to find unusually large/small, and neighbouring misspellings Cleaning methods
  • 33. Never publish a name from data without  running a background check Check.
  • 34. Other tools Freebase Gridworks: see http://vimeo.com/10081183
  • 35. Visualising data
  • 36.  
  • 37. or http://chartchooser.juiceanalytics.com/
  • 38.  
  • 39. (trends, dips, correlations)
  • 40.  
  • 41. (comparison, themes)
  • 42. (proportions, comparison)
  • 43. Mashing data
  • 44. Geocoded data with map - Live data (e.g. Twitter API) - Static data (e.g. Google Docs) - Dynamic data (e.g. Google Form) 2 spreadsheets with common data - Tools: MySQL, Access, etc. Combining data sources
  • 45.  
  • 46.  
  • 47.  
  • 48.  
  • 49.  
  • 50.  
  • 51.  
  • 52. Twittermap Wikipedia map NYT Property Guardian vs Nature BBC Most Read BBC Olympic Village Combining data sources
  • 53. Big events (protests, Olympics, inauguration) Comparisons Geocoded data Connections What mashes well?
  • 54. Aggregates Maps Filters Counts Cleans or reformats (regex) Yahoo! Pipes
  • 55. Scraperwiki – mapping library Maptube – combine maps Google Docs – publish in different formats +++ Other tools
  • 56. Computer-readable data Paris – France, Texas, or Hilton? Unique identifiers – usually URI RDF, RDFa, XML, etc. Semantic web & linked data
  • 57. Application Programming Interface Build on top of data Google Maps, Twitter, Facebook, Digg, Guardian, NYT, NPR, They Work For You, etc. API
  • 58. Slideshare.net/onlinejournalist Twitter.com/paulbradshaw Q&A
  • 59. Delicious.com/paulb/datajournalism Delicious.com/paulb/visualisation Delicious.com/paulb/statistics Bookmarks