This document provides an overview of data-driven journalism, including definitions, the basic workflow, tools used, and examples. It defines data-driven journalism as using data to find untold stories or provide new angles to existing stories. The workflow involves finding data, cleaning and analyzing it, and presenting insights in an accessible way like stories or graphics. Reporters can find data from organizations, government websites, compiled databases, or by collecting it themselves. Common tools used include Excel, Google Fusion, and Tableau Public. Examples of data-driven journalism projects examine government spending, immigration trends, and geographic patterns in salaries.
2. WHAT IS DATA-DRIVE JOURNALISM?
• "Data-driven journalism enables reporters to
tell untold stories, find new angles or
complete stories via a workflow of finding,
processing and presenting significant
amounts of data….”
• Henk van Ess, Dutch reporter
5. FIRST: DATA OR STORY IDEA?
• “Data journalism begins in one of two ways: either
you have a question that needs data, or a dataset
that needs questioning.” –Paul Bradshaw
6. WHAT’S INVOLVED?
• Data has to be found, which may involve computer
research skills or good old reporting or FOI requests.
• Reporter has to get to know the data.
• Analysis: What story does the data tell?
• Make data accessible/understandable by readers:
Story/graphics
7. FINDING THE DATA
• Bradshaw outlines the ways you might get data. They might be:
• Supplied by an organization (“how long until we see ‘data
releases’ alongside press releases?”)
• “Found through using advanced search techniques to plough
into the depths of government websites”
• “Compiled by scraping databases hidden behind online forms
or pages of results” using specialized tools.
• Converted from documents into a form that can be analyzed
• Pulled from APIs (application programming interfaces)
• Collected by the reporter
8. GETTING TO KNOW THE DATA
• CLEAN IT UP:
• Removing human error:
• Removing duplicate entries;
• Deleting blanks
• Converting descriptions to a uniform format/language (e.g.
BBC or B.B.C or British Broadcasting Corporation)
• Converting the data into a format that is consistent with other
data you are using.
• TOOLS: Find and Replace in Excel or Google Refine
9. INTERVIEW THE DATA
• Do you speak the same language?
• Where do you come from?
• Who created you?
• How were you gathered?
• What are your goals?
• Do they match yours?
10. ANALYSIS: SOME EXAMPLES
• Sort by scale: highest to lowest e.g. highest to lowest
paid public employees
• Adding it up: e.g. Total amount of salaries paid to players
of a professional baseball team
• Average: Average pay for an employee in a certain job
category
• Geographical groupings and distribution
11. TOOLS: WHAT ARE REPORTERS USING?
• Excel
• Google Fusion
• SPSS
• Access
• Google Refine
• Social Explorer www.socialexplorer.com
• Python
• Tableau Public
12. VISUALIZATION: EXAMPLES
• New York Times:
The 2012 Budget, How $3.7 trillion is spent.
• Immigration trends: New York Times
• Netflix rental patterns: New York Times
• Pay patterns: Sacramento Bee
• Gas prices: Los Angeles Times
13. DATA TO PLAY WITH
• Earthquake data
• Earthquakes
• Survey on gun ownership vs. gun control
• Rights to own guns survey