Open Data Journalism:Introducing Key Concepts By Gabriella Razzano Middelburg: 20 October
ODAC is a specialist law centre working in the areas of access to information, open data and whistle blowing. We provide legal advice and We also provide training on support to access public and effective implementation of private information throughPAIA, the PDA and open data the Promotion of Access to issues. Information Act (PAIA). We support and provide legal advice to bona fide whistleblowers using the Protected Disclosures Act (PDA).
State of journalism• Mpumalanga: – While 71% of stories were potentially investigative, only 18% were investigative.• Limpopo: – While 73% of stories from papers were potentially investigative, only a quarter (24%) were actually investigative – Look at the event not the issue
Footprints on the beach near Coral Bay, Australia by Peter NijenhuisHow do we move forward?
Open Data Information library1912 2012
Information in Africa
Data is machine-readableOpen data is free for anyone to reuse or redistribute for any person
Open Government Data – UK, Kenya, USA – World Bank – OGP – StatSACommunity generated data – Open Street Map – Flickr, SlideShare Sources of open data
1s and 0s everywhere…so?
Data Journalism• “Data journalism is obtaining, reporting on, curating and publishing data in the public interest.”• (Jonathan Stray, professional journalist and a computer scientist)• “Data driven journalism is a workflow that consists of the following elements: digging deep into data by scraping, cleansing and structuring it, filtering by mining for specific information, visualizing it and making a story.”• (Mirko Lorenz, information architect and multimedia journalist)
Breaking news has already broken….so what are we contributing?
Butterfly by Charlene N Simmons’ photostreamWhen we are deluged with information, it is theconnecting of these different forms of data thatbecome really valuable.Its not about events, but contexts and trends.
Why bother? ―The Tribunes more than three dozen interactive websites have drawn three times as many page views as the sites stories [75% of traffic]‖ - http://bit.ly/dj2dmzBy Evan P. Cordes through Flickr
―Data-driven journalism is the future. Journalists need to be data-savvy. It used to be that you would get stories by chatting to people in bars, and it still might be that you‘ll do it that way some times. But now it‘s also going to be about poring over data and equipping yourself with the tools to analyze it and picking out what‘s interesting. And keeping it in perspective, helping people out by really seeing where it all fits together, and what‘s going on in the country‖.— Tim Berners-Lee, founder of the World Wide Web
―I think it‘s important to stress the ―journalism‖ or reporting aspect of ‗data journalism‘. The exercise should not be about just analyzing data or visualizing data for the sake of it, but to use it as a tool to get closer to the truth of what is going on in the world. I see the ability to be able to analyze and interpret data as an essential part of today‘s journalists toolkit, rather than a separate discipline. Ultimately, it is all about good reporting, and telling stories in the most appropriate way.‖— Cynthia O’Murchu, Financial Times
The ―Murder Mysteries‖ project by Tom Hargrove of the Scripps Howard News Service(Figure 8). He builtfrom government data and public records requests ademographically-detailed database of more than 185,000 unsolved murders, and thendesigned an algorithm to search it for patterns suggesting the possible presence ofserial killers. This project has it all: hard work gathering a database better than thegovernment‘s own, clever analysis using social science techniques, and interactivepresentation of the data online.
http://www.guardian.co.uk/news/datablog/interactive/2012/sep/07/full-list-mps-expenses-ipsa-data-interactive - Go Play!And…the Expenses Scandal again!Using ATI to get information, using data journalism to process. This leaked release of expense statementsfrom MPs by the Telegraph in May 2009 (Rayner, 2009) brought widespread attention to a perceived lack oftransparency by Government on how they spent the money paid to them in taxes. This ‗scandal‘ led tochanges throughout the political spectrum with much of the resulting datanow available (with regular updates) on data.gov.uk.
So a data story is...• Typical examples: Census, election results, service delivery, budget reporting, crime stats (see Follow the Money).• However, narrative is not excluded - the age-old news formula 5W+H remains: – What • History, dimensions, ... – Who • Individuals, crowds, ... – When • Dates, times, intervals, ... – Where • Locations; country, town, property, ... – Why – How Journalism = data gathering and data distribution, in story format (Izak Minaar)
Information out AnalysisData In
Gathering information for a story Localising Connecting andpersonalising news Data information that is gathered Expressing information as a story
How to? See:http://datajournalismhandbook.org /1.0/en/index.html
1. Finding• Wobbing (PAIA)• Browse data sites and services: – http://databank.worldbank.org/ddp/home.do – http://www.africaopendata.org/pt_BR/ – You‘d be surprised what you can find on SA sites! Lots of big databases online, though usability often an issue.• Scraping – ScraperWiki. an online tool to make the process of extracting "useful bits of data easier so they can be reused in other apps, or rummaged through by journalists and researchers." Most of the scrapers and their databases are public and can be re-used. Also, HacksHackers may be able to assist you in find someone to help scrape a particular site.• Ask a Forum• Search for existing answers or ask a question at Get The Data or on Quora. However, most of these will not have an African focus – no harm in exploiting journalist networks!
• Ask a Mailing List – Mailing lists combine the wisdom of a whole community on a particular topic. For data journalists, the Data Driven Journalism List and the NICAR-L lists are excellent starting points. Both of these lists are filled with data journalists and Computer Assisted Reporting (CAR) geeks, who work on all kinds of projects. You could also try Project Wombat (―a discussion list for difficult reference questions‖), theOpen Knowledge Foundation‘s many mailing lists, mailing lists at theInfo, or searching for mailing lists on the topic, or in the region that you are interested in.• Join Hacks/Hackers – Hacks/Hackers is a rapidly expanding international grassroots journalism organization with dozens of chapters and thousands of members across four continents. Its mission is to create a network of journalists ("hacks") and technologists ("hackers") who rethink the future of news and information. With such a broad network — you stand a strong chance of someone knowing where to look for the thing you seek. – There is a Johannesburg (Guy) and Cape Town (Raymond) branch.• Ask an Expert
• Streamlining Your Search Here are a few tips: – When searching for data, make sure that you include both search terms relating to the content of the data you‘re trying to find as well as some information on the format or source that you would expect it to be in. Google and other search engines allow you to search by file type. For example, you can look only for spreadsheets (by appending your search with ‗filetype:XLS filetype:CSV‘), geodata (‗filetype:shp‘), or database extracts (‗filetype:MDB, filetype:SQL, filetype:DB‘). If you‘re so inclined, you can even look for PDFs (‗filetype:pdf‘). – You can also search by part of a URL. Googling for ‗inurl:downloads filetype:xls‘ will try to find all Excel files that have ―downloads‖ in their web address (if you find a single download, it‘s often worth just checking what other results exist for the same folder on the web server). You can also limit your search to only those results on a single domain name, by searching for, e.g. ‗site:agency.gov‘. ―quotes search for exact phrase‖ + ensures it contains a word: +logs -Ensures words are omitted: -wooden ~ synonyms: ~death
2.Connecting and interrogating• Numeracy skills• Learn to love excel http://www.openoffice.org/• DocumentCloud if you don‘t have a database – Sorts through OpenCalais, you can annotate and reference your story from the source doc, then share• Newsrooms to develop toolboxes for: – Data gathering and capturing (eg spreadsheets in Google docs for team collaboration) – Analysis – Visualisation
The main contribution of excel foryour data:1.Sorting • Organises into more revealing order.2.Filtering • Gets rid of unnecessary data3.Using math and text functions • AutoSum, median, maximum, minimum4.Pivot tables • Creates new tables from your ‗labels‘ or variable
Data visualisationAlways remember, its essentially just charts.• Interactive – UK riots• Google Public Data (Google charts)• The Joy of Data (more visualisation gospel)• World Bank data, maps• UN data• Stats SAAlso about applications for delivering stories.
What not to do…Where‘s the story?
Multi-purpose Skill Data stored Designed forTool Category Mapping Platform Web visualization level or processed publishing?Data Wrangler Data cleaning No No Browser 2 External server NoGoogle Refine Data cleaning No No Browser 2 Local No Linux, Mac OS X, Unix,R Project Statistical Windows XP analysis Yes With plugin or later 4 Local NoGoogle Fusion Tables Visualization app/service Yes Yes Browser 1 External server YesImpure Visualization app/service Yes No Browser 3 Varies YesMany Eyes Visualization Public external app/service Yes Limited Browser 1 server YesTableau Public Visualization Public external app/service Yes Yes Windows 3 server YesVIDI Visualization app/service Yes Yes Browser 1 External server YesZoho Reports Visualization app/service Yes No Browser 2 External server Yes Chrome,Choosel Firefox, Local or external Framework Yes Yes Safari 4 server Not yet Code editor Local or externalExhibit Library Yes Yes and browser 4 server Yes Library andGoogle Chart Tools Visualization Code editor Local or external app/service Yes Yes and browser 2 server Yes
Multi-purpose Skill Data storedTool Category Mapping Platform visualization level or processed GIS/mapping:OpenHeatMap Web No Yes Browser 1 External serverOpenLayers GIS/mapping: Code editor local or external Web, Library No Yes and browser 4 server Browser orOpenStreetMap GIS/mapping: desktops Local or external Web No Yes running Java 3 server Temporal data DesktopsTimeFlow analysis No No running Java 1 LocalIBM Word-CloudGenerator Desktops Word clouds No No running Java 2 Local DesktopsGephi Network analysis No No running Java 4 Local Excel 2007 andNodeXL 2010 on Network analysis No No Windows 4 Local Linux, Mac OS X or Linux withCSVKit Python CSV file analysis No No installed 3 Local Create sortable,DataTables searchable Code editor Local or external tables No No and browser 3 server Create sortable,FreeDive searchable tables No No Browser 2 External serverHighcharts* Code editor Local or external Library Yes No and browser 3 server Local or externalMr. Data Converter Data reformatting No No Browser 1 server Browser with Create Amazon EC2Panda Project
4. Personalisation• Your users are an additional source of data: ―Give me a headline to a story that I have no interest in and Im not likely to click it; suggest a topic that I know something about and Ill read the article‖. Sarah Marshall• Personalised content is King• Solution to ―info glut‖ – filters out noise• About developing personal connections between publication and reader• Link to local content
How to• Start with data and look for stories? – MP expenses scandal again• Or start with lead and look for data?• Or redirect because of data?• Deductive v inductive
Starter Tools• ICFJ Anwhere – Online lessons• Many Eyes – Visualisation• Google fusion tables – Mapping – Don‘t forget Open Street Map• Google Refine – Tool for cleaning up data
What to do?1. Publish your own data using an open license • Creative Commons2. Work with existing communities • ODADI, HacksHackers3. Use and support existing initiatives and technologies • ODADI, CKAN4. Keep innovating
Let‘s rethinkLet‘s pick two or four of the reportedstories and rethink them in terms of the data journalism four steps.