City Journalism - Magazines MA - week 8 - Data journalism
Upcoming SlideShare
Loading in...5
×
 

City Journalism - Magazines MA - week 8 - Data journalism

on

  • 626 views

 

Statistics

Views

Total Views
626
Views on SlideShare
626
Embed Views
0

Actions

Likes
0
Downloads
2
Comments
0

0 Embeds 0

No embeds

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

City Journalism - Magazines MA - week 8 - Data journalism City Journalism - Magazines MA - week 8 - Data journalism Presentation Transcript

  • Data JournalismOnline Journalism - Magazines MA City University February 16 2012
  • What is data journalism?
  • The key thing here is to learn how to solveyour own problems. Asking a tutor should beyour last resort - they will not be there for the restof your life!
  • 1.Coming up with a questionYou need to find a data source. But where?Spend 15 minutes mapping out potentialdata sources related to your field. They might be commercial or governmental; theymight need collecting or already be compiled somewhere. For example, if your fieldwas cycling there will be : ● transport data ● crime data ● health data (encouraging people to cycle as part of healthy lifestyle, for example) ● environmental data (pollution) ● community data (things being shared online by cyclists)Also take a look at the examples at http://delicious.com/paulb/foieg
  • 2. Use advanced search techniques to find data for a journalisticquestionThere are lots of different ways to search, not just typing thingsinto Google.You can limit by file type, domain, site and use Boolean limits.
  • ● Limit by filetype: ○ filetype:xls will restrict results to Excel spreadsheets; ○ filetype:csv to comma separated values spreadsheets; ○ filetype:doc to Word documents - often used for internal documents ○ filetype:pdf to PDFs - often used for official reports● Limit by domain: ■ site:gov.uk will restrict results to UK government websites ■ .ac.uk to UK educational establishments (not all of them reputable) - the US equivalent is .edu ■ .org.uk to (mostly) nonprofit organisations - again, this is not guaranteed. You can also try .org although this will include results from other countries. ■ .mod.uk - the Ministry of Defence ■ .nhs.uk - NHS sites ■ .dh.gov.uk - Department of Health ■ .police.uk - police websites, including British Transport Police, the Met ○ Limit by website: ■ site:bolton.gov.uk will further limit results to just one website, rather than all local authority websites. ■ Likewise site:city.ac.uk would only return results from City Universitys website ○ You can limit your search further by using quotation marks so that only pages containing the exact phrase are returned, e.g. "annual report" ○ You can also expand it by using Boolean operators like OR, e.g.
  • Then put it all together:e.g. "deaths in police custody filetype:xls site:gov.uk"Try other operators such as ● + before a search term to ensure it is in the pages themselves, e.g. +custody ● phrases in quotes, e.g. "deaths in custody" ● The * wildcard, e.g. "deaths in * custody" ● The ~ operator for synonyms, e.g. ~deaths
  • 3. Making sense of the dataChances are that the data youve found will raise further questions.There may be: ● jargon that you need to understand, ● codes that need translating, ● holes in the data, ● contextual data needed: the populations of different regions; data for previous years; etc. ● questions about how it was gathered - the methodology Use your journalistic skills to answer those questions.
  • Spreadsheet skillsYou can also use some spreadsheet techniques to put the data into aform that is going to be easier to interrogate - for example try thefollowing: ● split addresses so that the postcode is in a separate column (Data > Text into columns in Excel, or =SPLIT in Google Docs) - or separate forename and surname. ● Or you want to count how many times a value appears (=COUNTIF), or how many values are above a certain number. ● Work out the total using =SUM(D:D) if your numbers are in column D, for example ● Work out the amount per day by using =SUM(D:D)/30 for a 30 day month, etc. ● Work out a median average by using a formula like =MEDIAN(D: D). Compare that with other types of average like =AVERAGE(D: D) or =MODE(D:D)
  • 4. Basic visualisationsFind a transcript of a politicians - or two politicians - speeches andvisualise them using Wordle.com, Tagxedo or ManyEyes. (Theadvanced search techniques mentioned above may help)You can either compare one politicians speeches on a particular issue beforeand after taking office - or one politicians speech with his or her replacement.Spend some time tweaking the visualisation: ● Are similar words treated differently, e.g. "patient" and "patients" or "choice" and "options"? Should you combine the counts to clarify the emphases? What are the ethical issues of doing so? ● Should you reduce your sample to the top 10 or 20 words or phrases to make it clearer? ● Can you customise the words included (try copying into a text editor first), colour scheme, arrangement, fonts, etc. to greater effect? ● Is a word cloud best - or should you use a bar chart based on word counts?
  • Advanced tutorial 1 - GDoc webscraperFollow the tutorials tagged importHTML on Excel Notes: http://excelnotes.posterous.com/tag/importhtml...and importXML on the Online Journalism Blog - http://onlinejournalismblog.com/tag/importxml (start from the bottom)For a really live scraper, see instructions on how to grab XML from Backtweets orRSS from a Twitter search in this tutorial:http://www.brelson.com/2009/11/using-google-spreadsheets-to-extract-twitter-data/
  • Advanced tutorial 2 - interrogating dataFollow the tutorial at http://excelnotes.posterous.com/tag/filtersAnd the one at http://excelnotes.posterous.com/tag/sumifsOr if you want to play with Google Refine, search for Getting StartedWith Local Council Spending Data or go to http://blog.ouseful.info/2011/01/28/getting-started-with-local-council-spending-data/
  • Advanced tutorial 3 - Scraper toolsData can come in all sorts of forms. Based on the data you found already, tryone or more of the following: ● Using a PDF conversion service to get to the data within - a list here: http: //helpmeinvestigate.posterous.com/tag/pdfs - also: http://www. pdftoexcelonline.com/ ● Grabbing tables from a database search: try the Firefox plugin Outwit Hub (free version stores 100 results; buy a licence for more)