Your SlideShare is downloading. ×

City Journalism - Magazines MA - week 8 - Data journalism


Published on

Published in: Technology
  • Be the first to comment

  • Be the first to like this

No Downloads
Total Views
On Slideshare
From Embeds
Number of Embeds
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

No notes for slide


  • 1. Data JournalismOnline Journalism - Magazines MA City University February 16 2012
  • 2. What is data journalism?
  • 3. The key thing here is to learn how to solveyour own problems. Asking a tutor should beyour last resort - they will not be there for the restof your life!
  • 4. 1.Coming up with a questionYou need to find a data source. But where?Spend 15 minutes mapping out potentialdata sources related to your field. They might be commercial or governmental; theymight need collecting or already be compiled somewhere. For example, if your fieldwas cycling there will be : ● transport data ● crime data ● health data (encouraging people to cycle as part of healthy lifestyle, for example) ● environmental data (pollution) ● community data (things being shared online by cyclists)Also take a look at the examples at
  • 5. 2. Use advanced search techniques to find data for a journalisticquestionThere are lots of different ways to search, not just typing thingsinto Google.You can limit by file type, domain, site and use Boolean limits.
  • 6. ● Limit by filetype: ○ filetype:xls will restrict results to Excel spreadsheets; ○ filetype:csv to comma separated values spreadsheets; ○ filetype:doc to Word documents - often used for internal documents ○ filetype:pdf to PDFs - often used for official reports● Limit by domain: ■ will restrict results to UK government websites ■ to UK educational establishments (not all of them reputable) - the US equivalent is .edu ■ to (mostly) nonprofit organisations - again, this is not guaranteed. You can also try .org although this will include results from other countries. ■ - the Ministry of Defence ■ - NHS sites ■ - Department of Health ■ - police websites, including British Transport Police, the Met ○ Limit by website: ■ will further limit results to just one website, rather than all local authority websites. ■ Likewise would only return results from City Universitys website ○ You can limit your search further by using quotation marks so that only pages containing the exact phrase are returned, e.g. "annual report" ○ You can also expand it by using Boolean operators like OR, e.g.
  • 7. Then put it all together:e.g. "deaths in police custody filetype:xls"Try other operators such as ● + before a search term to ensure it is in the pages themselves, e.g. +custody ● phrases in quotes, e.g. "deaths in custody" ● The * wildcard, e.g. "deaths in * custody" ● The ~ operator for synonyms, e.g. ~deaths
  • 8. 3. Making sense of the dataChances are that the data youve found will raise further questions.There may be: ● jargon that you need to understand, ● codes that need translating, ● holes in the data, ● contextual data needed: the populations of different regions; data for previous years; etc. ● questions about how it was gathered - the methodology Use your journalistic skills to answer those questions.
  • 9. Spreadsheet skillsYou can also use some spreadsheet techniques to put the data into aform that is going to be easier to interrogate - for example try thefollowing: ● split addresses so that the postcode is in a separate column (Data > Text into columns in Excel, or =SPLIT in Google Docs) - or separate forename and surname. ● Or you want to count how many times a value appears (=COUNTIF), or how many values are above a certain number. ● Work out the total using =SUM(D:D) if your numbers are in column D, for example ● Work out the amount per day by using =SUM(D:D)/30 for a 30 day month, etc. ● Work out a median average by using a formula like =MEDIAN(D: D). Compare that with other types of average like =AVERAGE(D: D) or =MODE(D:D)
  • 10. 4. Basic visualisationsFind a transcript of a politicians - or two politicians - speeches andvisualise them using, Tagxedo or ManyEyes. (Theadvanced search techniques mentioned above may help)You can either compare one politicians speeches on a particular issue beforeand after taking office - or one politicians speech with his or her replacement.Spend some time tweaking the visualisation: ● Are similar words treated differently, e.g. "patient" and "patients" or "choice" and "options"? Should you combine the counts to clarify the emphases? What are the ethical issues of doing so? ● Should you reduce your sample to the top 10 or 20 words or phrases to make it clearer? ● Can you customise the words included (try copying into a text editor first), colour scheme, arrangement, fonts, etc. to greater effect? ● Is a word cloud best - or should you use a bar chart based on word counts?
  • 11. Advanced tutorial 1 - GDoc webscraperFollow the tutorials tagged importHTML on Excel Notes: importXML on the Online Journalism Blog - (start from the bottom)For a really live scraper, see instructions on how to grab XML from Backtweets orRSS from a Twitter search in this tutorial:
  • 12. Advanced tutorial 2 - interrogating dataFollow the tutorial at the one at if you want to play with Google Refine, search for Getting StartedWith Local Council Spending Data or go to
  • 13. Advanced tutorial 3 - Scraper toolsData can come in all sorts of forms. Based on the data you found already, tryone or more of the following: ● Using a PDF conversion service to get to the data within - a list here: http: // - also: http://www. ● Grabbing tables from a database search: try the Firefox plugin Outwit Hub (free version stores 100 results; buy a licence for more)