Your SlideShare is downloading. ×
Data Journalism (City Online Journalism wk8)
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×

Introducing the official SlideShare app

Stunning, full-screen experience for iPhone and Android

Text the download link to your phone

Standard text messaging rates apply

Data Journalism (City Online Journalism wk8)

3,164
views

Published on

Week 8 lecture to students on the 8 MAs at City University

Week 8 lecture to students on the 8 MAs at City University


0 Comments
5 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
3,164
On Slideshare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
47
Comments
0
Likes
5
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. Online Journalism City University Paul Bradshaw Data journalism
  • 2.  
  • 3. 1. What is it? 2. Where to get it 3. How to get it Themes
  • 4.  
  • 5.  
  • 6.  
  • 7.  
  • 8.  
  • 9.  
  • 10. “ Each weekday, my computer program goes to the Chicago Police Department's website and gathers all crimes reported in Chicago.” Adrian Holovaty
  • 11.  
  • 12.  
  • 13. Times film genres
  • 14.
      • Times Data Blog
  • 15.  
  • 16.  
  • 17. ” QUOTE” Now is a good time.
  • 18. “ The Tribune’s more than three dozen interactive databases , collectively have drawn three times as many page views as the site’s stories . [75% of traffic]” http://bit.ly/dj2dmz
  • 19. . What is data?
  • 20. Numbers Text Live data Behavioural data Images, audio, video Anything that a computer can work with
  • 21.  
  • 22. Start with the data and look for the stories? (MPs’ expenses) Or start with a lead and look for the data? Passive vs active data journalism
  • 23. Data Journalism Continuum
  • 24. Data.gov.uk Guardian datastore Openlylocal,Open Corporates, Open Charities, Who's Lobbying etc. FOI requests (WDTK), disclosure logs Books - British Political Facts Finding
  • 25. GetTheData.org WDMMG forums MySociety mailing lists Open Data Cookbook Wolfram Alpha forum Finding – data communities
  • 26.  
  • 27. Government - national and local 'Monitors' - regulators & other bodies Charities, pressure groups Institutions - academic, scientific, health Business, finance Media, entertainment, sport Other secondary sources
  • 28. Site:gov.uk (etc) Filetype:pdf (etc) Imagine the page you hope to find, including jargon etc.  Database contents are invisible Google News alerts: report OR review   Advanced search
  • 29. "quotes search for exact phrases" "disclosure logs" site:nhs.uk  + ensures page contains word: +logs - omits results with word: -wooden * wildcard, e.g. "deaths * custody" ~ synonyms, e.g. ~deaths   Advanced search
  • 30.  
  • 31. Tip: use overseas sources
      • US medicine databases
      • EU subsidy databases
      • Swedish people data
      • International police agency correspondence with UK
  • 32. RSS, XML, JSON, RDF - and APIs Scraperwiki Outwit Hub Yahoo! Pipes Spreadsheet formulae (look them up) Feeds and scrapers
  • 33. Format? Table? Pattern? URL? 'Structured' data
  • 34. http://www.eib.org/projects/pipeline/?start=2009&end=2010&status=&region=&country=united+kingdom&sector=
  • 35. 'Structured' HTML? (Use Firebug)
    • <p>       <strong>
    • Case Ref: FS50295557 <br />Date: 04/11/2010 <br />Public Authority: London Borough of Southwark <br />Summary: </strong>
    • The complainant requested a copy of the authorities approved business plan  [...]<br /><strong>Section of Act/EIR &amp; Finding: </strong>FOI 1 - Complaint Upheld , FOI 10 - Complaint Upheld <br />
    • <a title=&quot;Opens in new window&quot; href=&quot;~/media/documents/decisionnotices/2010/fs_50295557.ashx&quot; target=&quot;_blank&quot;>View PDF of Decision Notice FS50295557</a></p>
  • 36. =ImportHTML(&quot;http://bob.com/mytable&quot;, &quot;table&quot;, 1) =ImportXML(&quot;http://backtweets.com/search.xml?itemsperpage=100&...”) =ImportFeed(&quot;http://search.twitter.com/search.atom?rpp=20&page=1&q=&quot;&A2) Spreadsheet formulae
  • 37. Fetch Page module Regex Yahoo! Pipes
  • 38. &quot;A problem for sites who want to provide privacy while allowing new users to join easily. Scraping services may constitute a violation of terms of service; tactics often resemble a denial-of-service attack or a security exploit.&quot; Ethics
  • 39. . Questions?
  • 40. Links OnlineJournalismClasses.tumblr.com Delicious.com/paulb/cityoj08 Delicious.com/paulb/datajournalism Delicious.com/paulb/visualisation Delicious.com/paulb/data
  • 41.
    • - Use advanced search to find data
    • - Use tools to scrape data
    • Visualise a politician's speeches using Wordle or Many Eyes
    • Read up on some of the tools or technologies before the lab
      Lab
  • 42. Books Darrell Huff - How To Lie With Statistics Blastland & Dilnot - The Tiger That Isn't Donna Wong - The WSJ Guide to Information Graphics Brian Suda - A Practical Guide to Designing with Data
  • 43. . Assignments
  • 44. Enough time? 10 credits = 100 hours Lectures = 15 hours Group blog = 60 hours (75%) Strategy = 20 hours (25%) (Some in labs) + 5 hours on other issues
  • 45. Enough time? Blog Just an example: 10 posts ranging from simple links to interviews, analysis, experiment 5.5 hours ave per week x10 weeks = 55 hours + 5 hours to write evaluation
  • 46. Enough time? Strategy Just an example: 12.5 hours researching community 30 mins per week x10 weeks with community (2.5 hours) 5 hours analysis & write up
  • 47. Group blogs
    • 8 areas:
    • Online video; 2. Online audio
    • 3. Data; 4. UGC
    • 5. Community management
    • 6. Mobile; 7. Social media
    • 8. Infographics and photography
  • 48. Criteria Ass1: Newsgathering/research Production Law, ethics and strategy Ass 2: Research Analysis Execution