Data Journalism (City Online Journalism wk8)


Published on

Week 8 lecture to students on the 8 MAs at City University

Data Journalism (City Online Journalism wk8)

  1. 1. Online Journalism City University Paul Bradshaw Data journalism
  2. 3. 1. What is it? 2. Where to get it 3. How to get it Themes
  3. 10. “ Each weekday, my computer program goes to the Chicago Police Department's website and gathers all crimes reported in Chicago.” Adrian Holovaty
  4. 13. Times film genres
  5. 14. <ul><ul><li>Times Data Blog </li></ul></ul>
  6. 17. ” QUOTE” Now is a good time.
  7. 18. “ The Tribune’s more than three dozen interactive databases , collectively have drawn three times as many page views as the site’s stories . [75% of traffic]”
  8. 19. . What is data?
  9. 20. Numbers Text Live data Behavioural data Images, audio, video Anything that a computer can work with
  10. 22. Start with the data and look for the stories? (MPs’ expenses) Or start with a lead and look for the data? Passive vs active data journalism
  11. 23. Data Journalism Continuum
  12. 24. Guardian datastore Openlylocal,Open Corporates, Open Charities, Who's Lobbying etc. FOI requests (WDTK), disclosure logs Books - British Political Facts Finding
  13. 25. WDMMG forums MySociety mailing lists Open Data Cookbook Wolfram Alpha forum Finding – data communities
  14. 27. Government - national and local 'Monitors' - regulators & other bodies Charities, pressure groups Institutions - academic, scientific, health Business, finance Media, entertainment, sport Other secondary sources
  15. 28. (etc) Filetype:pdf (etc) Imagine the page you hope to find, including jargon etc.  Database contents are invisible Google News alerts: report OR review   Advanced search
  16. 29. &quot;quotes search for exact phrases&quot; &quot;disclosure logs&quot;  + ensures page contains word: +logs - omits results with word: -wooden * wildcard, e.g. &quot;deaths * custody&quot; ~ synonyms, e.g. ~deaths   Advanced search
  17. 31. Tip: use overseas sources <ul><ul><li>US medicine databases </li></ul></ul><ul><ul><li>EU subsidy databases </li></ul></ul><ul><ul><li>Swedish people data </li></ul></ul><ul><ul><li>International police agency correspondence with UK </li></ul></ul>
  18. 32. RSS, XML, JSON, RDF - and APIs Scraperwiki Outwit Hub Yahoo! Pipes Spreadsheet formulae (look them up) Feeds and scrapers
  19. 33. Format? Table? Pattern? URL? 'Structured' data
  20. 34.
  21. 35. 'Structured' HTML? (Use Firebug) <ul><li><p>       <strong> </li></ul><ul><li>Case Ref: FS50295557 <br />Date: 04/11/2010 <br />Public Authority: London Borough of Southwark <br />Summary: </strong> </li></ul><ul><li>The complainant requested a copy of the authorities approved business plan  [...]<br /><strong>Section of Act/EIR &amp; Finding: </strong>FOI 1 - Complaint Upheld , FOI 10 - Complaint Upheld <br /> </li></ul><ul><li><a title=&quot;Opens in new window&quot; href=&quot;~/media/documents/decisionnotices/2010/fs_50295557.ashx&quot; target=&quot;_blank&quot;>View PDF of Decision Notice FS50295557</a></p> </li></ul>
  22. 36. =ImportHTML(&quot;;, &quot;table&quot;, 1) =ImportXML(&quot;”) =ImportFeed(&quot;;&A2) Spreadsheet formulae
  23. 37. Fetch Page module Regex Yahoo! Pipes
  24. 38. &quot;A problem for sites who want to provide privacy while allowing new users to join easily. Scraping services may constitute a violation of terms of service; tactics often resemble a denial-of-service attack or a security exploit.&quot; Ethics
  25. 39. . Questions?
  26. 40. Links
  27. 41. <ul><li>- Use advanced search to find data </li></ul><ul><li>- Use tools to scrape data </li></ul><ul><li>Visualise a politician's speeches using Wordle or Many Eyes </li></ul><ul><li>Read up on some of the tools or technologies before the lab </li></ul>  Lab
  28. 42. Books Darrell Huff - How To Lie With Statistics Blastland & Dilnot - The Tiger That Isn't Donna Wong - The WSJ Guide to Information Graphics Brian Suda - A Practical Guide to Designing with Data
  29. 43. . Assignments
  30. 44. Enough time? 10 credits = 100 hours Lectures = 15 hours Group blog = 60 hours (75%) Strategy = 20 hours (25%) (Some in labs) + 5 hours on other issues
  31. 45. Enough time? Blog Just an example: 10 posts ranging from simple links to interviews, analysis, experiment 5.5 hours ave per week x10 weeks = 55 hours + 5 hours to write evaluation
  32. 46. Enough time? Strategy Just an example: 12.5 hours researching community 30 mins per week x10 weeks with community (2.5 hours) 5 hours analysis & write up
  33. 47. Group blogs <ul><li>8 areas: </li></ul><ul><li>Online video; 2. Online audio </li></ul><ul><li>3. Data; 4. UGC </li></ul><ul><li>5. Community management </li></ul><ul><li>6. Mobile; 7. Social media </li></ul><ul><li>8. Infographics and photography </li></ul>
  34. 48. Criteria Ass1: Newsgathering/research Production Law, ethics and strategy Ass 2: Research Analysis Execution