Pulitzer Prize winner, Michael J. Berens of The Seattle Times presents "Data Journalism 101," a three-hour, hands-on workshop for the Donald W. Reynolds National Center for Business Journalism at the Excellence in Journalism Conference in Nashville, Tenn. on Sept. 4, 2014.
Part 1 provides an intro to databases and their importance to reporting.
For more business journalism training opportunities and resources, please visit http://businessjournalism.org.
1. Data Journalism 101
Session One: Intro to Databases
Accessing and managing data for stories
Excellence in Journalism Conference 2014
Donald W. Reynolds National Center for Business
Journalism at ASU
Michael J. Berens – !e Seattle Times
2. He said. She said.
Now I’m going to tell you
who’s telling the truth.
4. Database Options
Create your own database
— Obtain sources of
information
(paper records)
Import existing database
— Obtain existing
database
— Scrape data from
the web
8. Track the
exploitation of
vulnerable
seniors
SUNDAY, SEPTEMBER 12, 2010
A SEATTLE TIMES INVESTIGATION / PART 4
Deaths in adult homes
hidden and ignored
Abuse and neglect may have killed hundreds of residents. But with
nobody questioning the circumstances, troubled homes are staying open.
COURTESY OF JAMES RUDOLPH
A HOME’S MISTREATMENT PROVES DEADLY
Neglect at an adult family home is blamed for the 2008 death of 87-year-old Jean Rudolph, a retired nursing educator
who had Alzheimer’s disease and heart problems. Infection from severe bedsores, which developed during her stay at the
home, spread to her vital organs.
12. Follow the Information
— You’ve received an unsolicited email from a doctor who
claims that scores of pain patients have accidentally died
from methadone overdoses.
— "e doctor claims that the State of Washington pushes
methadone as a “preferred drug” because it’s the least
expensive.
— "e doctor claims the state fails to warn patients about the
unique risks of methadone.
13. Find the data sources
— Death certi!cates – Track cause of death and number of
overdose victims
— ARCOS Database – Created by U.S. Drug Enforcement
Agency to track controlled substances
— In-patient hospital database – Created by a dozen or so
states to track types of hospitalizations
— My own questions – How many patients also took
benzodiazepines? Etc.
15. Fields, position, type, length
Field
Number Variable Type Format Label Comment
1 SEQ_NO Char $10. Sequence Number
Unique sequence number assigned to each record within a year. First four digits are
the year of discharge.
2 REC_KEY Num 11. Record Key Unique number assigned to each CHARS record. Added in 2003.
3 STAYTYPE Char $1 Type of Stay
1 = Inpatient
2 = Observation patient
4 HOSPITAL Char $4 Hospital Number
DOH assigned hospital number.
Fourth character describes the Medicare certified unit type with:
blank = acute care
R = Rehabilitation unit
P = Psychiatric unit
S = Swing bed unit
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
A = Alcohol (discontinued after 1992)
B = Bone marrow transplants (discontinued after 2000)
E = Extended care (discontinued after 2001)
H = Tacoma General & Group Health combined (discontinued after 1992)
I = Group Health only at Tacoma General (discontinued after 1992)
5 LINENO Num 3. Number of Reported Revenue Items Codes
6 ZIPCODE Char $5 Patient's Zip Code
99999 indicates the zip code is unknown.
99998 indicates homelessness (some homeless patients may have a zip code for a shelter or other
temporary location).
Blanks indicate non-U.S. residence.
7 STATERES Char $2 State of Residence
State abbreviation used by U.S. Postal Service.
This is assigned from the zip code.
Residents with zip code 99998 are assigned to Washington
XX = invalid zip code or a non-U.S. residence.
16.
17.
18.
19. Fixed length vs. delimited
— Fixed Length
— "e data !elds measure a
speci!c number of characters
— Field 1 = 10 characters long
— File layout is critical
— Delimited
— "e data !elds are separated by
a common character or mark
— Like a comma or tab
— Always ask for “text delimited
data,” which is easier to import
than !xed length