Feb. 12, 2013
DATA-DRIVEN JOURNALISM: THE BASICS
WHAT IS DATA-DRIVE JOURNALISM?
• "Data-driven journalism enables reporters to
tell untold stories, find new angles or
complete stories via a workflow of finding,
processing and presenting significant
amounts of data….”
• Henk van Ess, Dutch reporter
FIRST: DATA OR STORY IDEA?
• “Data journalism begins in one of two ways: either
you have a question that needs data, or a dataset
that needs questioning.” –Paul Bradshaw
• Data has to be found, which may involve computer
research skills or good old reporting or FOI requests.
• Reporter has to get to know the data.
• Analysis: What story does the data tell?
• Make data accessible/understandable by readers:
FINDING THE DATA
• Bradshaw outlines the ways you might get data. They might be:
• Supplied by an organization (“how long until we see ‘data
releases’ alongside press releases?”)
• “Found through using advanced search techniques to plough
into the depths of government websites”
• “Compiled by scraping databases hidden behind online forms
or pages of results” using specialized tools.
• Converted from documents into a form that can be analyzed
• Pulled from APIs (application programming interfaces)
• Collected by the reporter
GETTING TO KNOW THE DATA
• CLEAN IT UP:
• Removing human error:
• Removing duplicate entries;
• Deleting blanks
• Converting descriptions to a uniform format/language (e.g.
BBC or B.B.C or British Broadcasting Corporation)
• Converting the data into a format that is consistent with other
data you are using.
• TOOLS: Find and Replace in Excel or Google Refine
INTERVIEW THE DATA
• Do you speak the same language?
• Where do you come from?
• Who created you?
• How were you gathered?
• What are your goals?
• Do they match yours?
ANALYSIS: SOME EXAMPLES
• Sort by scale: highest to lowest e.g. highest to lowest
paid public employees
• Adding it up: e.g. Total amount of salaries paid to players
of a professional baseball team
• Average: Average pay for an employee in a certain job
• Geographical groupings and distribution
TOOLS: WHAT ARE REPORTERS USING?
• Google Fusion
• Google Refine
• Social Explorer www.socialexplorer.com
• Tableau Public
• New York Times:
The 2012 Budget, How $3.7 trillion is spent.
• Immigration trends: New York Times
• Netflix rental patterns: New York Times
• Pay patterns: Sacramento Bee
• Gas prices: Los Angeles Times
DATA TO PLAY WITH
• Earthquake data
• Survey on gun ownership vs. gun control
• Rights to own guns survey