This document provides tips for journalists on how to use data in their reporting. It discusses why data journalism is useful, how to get started with data skills like Excel, finding and obtaining data sources, analyzing data, and ensuring accurate reporting. Key tips include learning Excel, starting with small, related data projects, practicing skills regularly, and using data to uncover new stories and trends rather than just providing stats. Cautions include checking for errors in data and not overloading stories with numbers.
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Data-driven enterprise off your beat - Todd Wallack - New England NewsTrain - Oct. 14, 2017
1. Data-driven enterprise off your beat
Todd Wallack | @twallack | todd.wallack@globe.com | 617-929-2069
Have other questions? Just ask me.
Why data journalism?
• Discover new stories. Using data, you can sometimes uncover trends or
issues no one knows about.
• Find great examples. By obtaining a database, you can search for the
perfect example to illustrate your piece.
• Get the perfect stat. With data, you can do more than find three
examples. You can present numbers that show the broader picture.
• Credibility. Your stories will have more authority if you can back it up
with a database.
• Make better graphics. You can use data to create charts, maps or other
illustrations for your story.
• Reference material. Once you obtain a database, you can keep it handy
for breaking news.
• Burnish your resume. Increasingly, managers want journalists who
are comfortable with data.
How to get started
• Learn Excel or Google Sheets. You can do 90% of data stories with
spreadsheets.
• Start small. Do something simple. Analyze a spreadsheet someone sent
you. Look at the payroll or budget for an agency you cover.
• Hunt for data related to your beat. It is data you use right away.
• Copy others. Look for simple stories others have done nationally or at
other local news organizations. Then try to localize it for your market.
• Learn one skill at a time. It’s tempting to try to take on everything at
once – spreadsheets, databases, mapping, programming. But it’s more
manageable to master one thing at a time.
• Practice, practice, practice. It’s easy to forget how to use Excel or
another tool if you don’t use it for months. So, use it for anything you can
think of – even keeping track of FOIA requests.
• Consider taking a class (even a free or cheap one online). A class
is one way to force yourself to learn a little each week.
• Find someone who can help. Find someone in the newsroom who
uses data. Join NICAR-L, an email list of journalists interested in data. If
2. you get stuck, someone on NICAR-L can probably help you within hours.
bit.ly/subscribeNICAR-L
Finding data
• Ask sources. Ask watchdogs, government agencies, think tanks – anyone
– to point you to good data sources.
• Use Google. You can easily find interesting data by searching websites
for agencies you cover or doing a broader search of the web.
• Use a government data portal. Most states have gathered a portion of
their data sets in one place.
• Check IRE tip sheets. IRE has a library of hundreds of tip sheets, many
of which include suggestions on data.
• Examine the retention schedule for your city/state. It’s supposed
to be a guide to how long agencies must hold on to records. But it can also
be a tip sheet for records that agencies have.
• Read annual reports. See a stat? See a table? That means the agency
probably has a database that generated it. Ask for the data.
• Look at forms. Most agencies enter every box of a form into a database.
That means you can probably obtain the database with a FOIA request.
• FOIA. Sometimes data is on the web. Sometimes you can get it just by
asking. But don’t be afraid to file a public-records request for data
• Build your own. Sometimes, you just can’t find the data you need. Or it’s
only in paper form. In that case, it might be worth the effort to enter the
data into a spreadsheet so you can analyze it.
Open data portals
The federal government and many states operate websites where they feature
some of their databases.
• U.S.: data.gov
• Connecticut: data.ct.gov
• Maine: data.maine.gov
• Massachusetts: mass.gov/opendata
• New Hampshire: nh.gov/doit/open-source/data-sets.htm
• Rhode Island: ri.gov/data
• Vermont: data.vermont.gov
3. More Google tricks
• Use the advanced-search page: google.com/advanced_search.
• Search by file type. Examples: filetype:XLS Some common ones are:
XLS (old Excel), XLSX (new Excel), CSV (comma-separated values – Excel
can open it), TSV (tab-separated values – you can import it into Excel).
• Search a particular website or domain. Examples: site:gov or
site:boston.gov. Even if a website has a search box, sometimes Google
works better. Try it both ways.
Examples of databases to ask for
• Payroll/salary data
• Budgets
• Parking tickets
• Business/occupation licenses
• Census
• School test scores
• Crime reports
• Purchase data
• Campaign donations
For databases on non-governmental beats: bit.ly/otherbeats
Public-records tips
• Be positive. Assume it’s public. If you don’t ask, you won’t get it.
• Ask for the documentation. The technical documentation for
databases can be called many things: a record layout, field list or data
dictionary. But it’s helpful to ask for it. That way, you’ll know what data
the agency keeps. And you’ll notice if something is missing.
• Ask for the data in Excel, CSV or “machine-readable format”
(not PDFs). PDFs are designed to print out or look at – not analyze. You
want the data in a format a database can use.
• Ask for more than one year of data. You want to see trends.
• Talk to the data people. Sometimes, the PR people are friendly but
don’t know anything about the data and what is possible.
• Appeal if rejected. Go up the chain of the command. Or follow the
appeals process (if one exists in your state).
• Be polite, but be a pest. Sometimes agencies will simply hope you
forget about the request. But if you are persistent – even going to the
offices in person – they are less likely to blow you off.
4. Caution!!!
• Watch out for dirty data. Typos. Mistakes. Missing data. If something
in the data seems crazy, it just might be an error. So, verify it with the
original documents or with sources.
• Double-check your calculations. It’s usually a good idea to run them
by the agency or another trusted source before publication. Or ask a
colleague to check your math.
• Still need to do reporting. Data provides great examples and powerful
numbers. But you still have to do reporting to make sure the data is
reliable and confirm what it means.
• Don’t overload stories with numbers. A temptation with data stories
is to jam in every cool stat you find. But your stories will be stronger if you
use only the numbers that matter most and instead tell the stories through
people, anecdotes, quotes and traditional storytelling.
• Beware of working with new data on deadline. Every database has
quirks. Sometimes codes don’t mean what you think they mean.
Sometimes databases are incomplete. Try to avoid working with databases
for the first time on deadline.
Where can you learn more
IRE/NICAR conferences/workshops/tip sheets. IRE costs $70 ($25 for
students) a year: bit.ly/joinire. But membership gives you access to
thousands of tip sheets and stories. Plus, you can listen to recordings of
past conferences. IRE is holding two major conferences in 2018 – one in
Chicago in March and a second in Orlando in June. It offers fellowships to
the conferences and to its workshops. ire.org/events-and-training/
This tutorial from Berkeley Advanced Media Institute is for those who’ve
never opened a spreadsheet before: bit.ly/sheetbasics
The Data Journalism Handbook from the European Journalism Centre
and Open Knowledge Foundation is free to read online: bit.ly/datajbook
ICIJ reporter Kate Willson has four short videos on how to use Excel to
sort and filter, concatenate (link together), auto fill and make pivot tables:
bit.ly/ICIJexcel
Whether you cover education or anything else, this online guide to Excel
from the Education Writers Association will teach you everything you need
to know: ewa.org/reporter-guide/reporters-guide-excel
Knight Science Journalism at MIT has a fantastic resource for data work
that covers basics to programming.
Check it out: ksj.mit.edu/data-journalism-tools/
5. Spreadsheet basics
SAVE INITIAL FILE
Save the initial file somewhere safe, and make a new copy to work with. That way,
no matter what you do, you can go back to the original source of the data.
Excel Windows 2007: Click on the Windows icon in upper left-hand corner,
choose “Save As” and then Excel Workbook. Excel 2016: Click “File” in the
upper left corner. Choose “Save As” and then Excel Workbook.
Google Sheets: Click on File in upper left corner, choose “Make a Copy” option.
SAVE AS YOU GO
Always be saving.
Excel Windows 2007/2016: Ctrl-S or hit the disk icon in upper left-hand
corner
Google Sheets: Saves automatically.
CHECK OUT DATA FIRST
Try to find the “four corners” of the spreadsheet.
Excel Windows 2007/2016: Use the CTRL + Arrow keys to go up, down, left
and right.
Google Sheets: Use the CTRL + Arrow keys to go up, down, left and right.
UNDO
Sometimes, we all hit the wrong button. Here’s how to fix it.
Excel Windows 2007/2016: CTRL-Z (can hit more than once)
Google Sheets: CTRL-Z (can hit more than once). OR Go back to earlier
version by using the revision history (under File Menu or hit CTRL-ALT-SHIFT-
H. Then click on the version you want on the right, then click “restore this
version.” To cancel, click on the left arrow in the upper left-hand corner.)
MULTIPLE SHEETS?
Check to see whether the worksheet contains multiple “sheets” or “tabs.” Look at
the bottom left-had corner. To create a new one:
Excel Windows 2007: Click on the curled piece of paper on the lower left-hand
corner, next to the existing tabs. Excel 2016: Click the plus-sign-in-a-circle icon
on the lower left-hand corner, right of the existing tabs.
6. Google Sheets: Click on the plus sign on the lower left-hand corner, next to the
existing tabs.
FREEZE HEADERS
This is a handy command that lets you scroll through the data while still seeing
the headers/labels at the top.
Excel Windows 2007/2016: Hit the View tab at the top, select freeze panes
(middle right of the tool bar).
Google Sheets: Go to the View menu, select freeze.
WIDEN COLUMNS
Sometimes, columns are too narrow to read. (You will sometimes see ####s
when columns are too narrow to show a string of numbers.)
Excel Windows 2007/2016: Hover cursor between the two letters marking
the columns until the cursor changes to a cross. Press and hold down the left
mouse key and drag the mouse left and right until it is the right width. Release.
Google Sheets: Hover cursor between the two letters marking the columns
until the cursor changes to a cross. Press and hold down the left mouse key and
drag the mouse left and right until it is the right width. Release.
SORT COLUMNS
Use this command when you want to sort from high to low (or in alphabetical
order).
Excel Windows 2007/2016: Click on any cell within the column you want to
sort. (Note: Do NOT highlight the entire column.) Click the Data tab at the top,
then click on the Sort tool icon in the middle of toolbar. Make sure the right
column is selected, Sort by Values, and then pick either A to Z (low to high) or Z
to A (high to low.) Note: Be sure headers box is checked correctly.
Short cut: Instead of using the Sort tool, you can also just click on the A-
Z or Z-A buttons in the toolbar after hitting the data tab. This will usually
work, but sometimes Excel gets confused and sorts the headers along with
the rest of the data. To fix this, click on the Sort tool under the Data menu,
then make sure the headers box is checked. (Yet another option: Highlight
the area you want to sort first.)
Google Sheets: Click on the Data menu, select Sort Sheet by Column _, A→Z or
Z→A. (The blank is for the letter of the Column.)
7. FILTER COLUMNS
Use this command when you want to select rows that meet certain criteria, such
as all salaries from a certain department or all voters in a certain ZIP code.
Excel Windows 2007/2016: Make sure you click on a cell somewhere in the
data you are using. Click the Data menu button, hit the funnel button on the
Tools ribbon.
Little arrows should appear next to the columns. Click the arrow next to the
column you want to filter. Then select the criteria you want to use.
Google Sheets: Click the funnel on the upper right of the tool bar.
Mini funnels should appear next to the columns. Click the funnel next to the
column you want to filter. Then select the criteria you want to use.
INSERT COLUMNS, ROWS
It’s easy to add another column or row.
Excel Windows 2007/2016: Highlight row/column you want by clicking on
the letter or number that marks each row/column. Then right click, and then
click on insert.
Google Sheets: Highlight row/column you want by clicking on the letter or
number that marks each row/column. Then right click, and then click on insert.
OR
Hit the Insert menu option at the top, then choose either the column/row above
or below.
BASIC MATH
Formulas generally start with an = sign.
Addition: =SUM(cell range)
Example: =SUM(B2:B9)
Subtraction (change/difference): = New - Old
Example: =B2-C2
Percentage change: =(New - Old)/Old Way to remember: NOO!
Example: =(C2-B2)/B2
Then highlight the cell or column and hit the % button on the left-hand side of
the tool bar to convert to percent.
Percent of a total: = Part/Total
Example: =B2/$B$11
Note: Use the dollar signs to keep the second part of the formula from changing
when you copy the formula.
8. Average: =AVERAGE(cell range)
Example: =AVERAGE(B2:B10)
Median: =MEDIAN(cell range)
Example: =MEDIAN(B2:B10)
Maximum: =MAX(cell range)
Example: =MAX(B2:B10)
Minimum: =MIN(cell range)
Example: =MIN(B2:B10)
MORE ADVANCED FORMULAS
If/Then
=IF(comparison,”print this if true”,”print this if false”)
Example: IF(B2>100000,”High Earner”,”Low/Medium earner”)
Dates
=YEAR(CELL)
=WEEKDAY(CELL)
=CHOOSE(WEEKDAY(CELL), "Sun", "Mon", "Tue", "Wed", "Thu", "Fri", "Sat")
=MONTH(CELL)
Some formulas are slightly different in Excel and Google sheets.
For instance, to find the difference in dates:
Google: =CELL - CELL
Excel: =DATEDIF(A1,A2,"d") (for days) Use “m” for months or “y” for
years”
COPY A FORMULA DOWN AN ENTIRE COLUMN
Move the mouse to the formula, position the mouse in the lower right-hand
corner of the cell until you see the cursor change to a plus sign and double-click.
This will copy data down until it hits a blank row.
ANOTHER WAY TO COPY A FORMULA TO AN ENTIRE COLUMN
The above method will only work if the formula is next to a column with all the
rows filled out. Otherwise, it will only copy formulas down until the data next to
the column stops. If that is a problem, you can scroll to the bottom of the column
where you want to stick the formulas, enter some text - anything will do. Then go
back to the formula you want to copy, click on that cell, hit ctrl-c to copy, then
hold down the shift key, then hit ctrl-down-arrow to highlight the column (up
until the point where you typed in your random text), then hit ctrl-V to paste.