Data Journalism 101 
Session One: Intro to Databases 
Accessing and managing data for stories 
Excellence in Journalism Conference 2014 
Donald W. Reynolds National Center for Business 
Journalism at ASU 
Michael J. Berens – !e Seattle Times
He said. She said. 
Now I’m going to tell you 
who’s telling the truth.
Cells, !elds and headers – oh my!
Database Options 
Create your own database 
— Obtain sources of 
information 
(paper records) 
Import existing database 
— Obtain existing 
database 
— Scrape data from 
the web
Finding a 
serial killer
Track the 
exploitation of 
vulnerable 
seniors 
SUNDAY, SEPTEMBER 12, 2010 
A SEATTLE TIMES INVESTIGATION / PART 4 
Deaths in adult homes 
hidden and ignored 
Abuse and neglect may have killed hundreds of residents. But with 
nobody questioning the circumstances, troubled homes are staying open. 
COURTESY OF JAMES RUDOLPH 
A HOME’S MISTREATMENT PROVES DEADLY 
Neglect at an adult family home is blamed for the 2008 death of 87-year-old Jean Rudolph, a retired nursing educator 
who had Alzheimer’s disease and heart problems. Infection from severe bedsores, which developed during her stay at the 
home, spread to her vital organs.
Tracking 
fraudulent 
medical devices 
and pro!teers
Follow the Information 
— You’ve received an unsolicited email from a doctor who 
claims that scores of pain patients have accidentally died 
from methadone overdoses. 
— "e doctor claims that the State of Washington pushes 
methadone as a “preferred drug” because it’s the least 
expensive. 
— "e doctor claims the state fails to warn patients about the 
unique risks of methadone.
Find the data sources 
— Death certi!cates – Track cause of death and number of 
overdose victims 
— ARCOS Database – Created by U.S. Drug Enforcement 
Agency to track controlled substances 
— In-patient hospital database – Created by a dozen or so 
states to track types of hospitalizations 
— My own questions – How many patients also took 
benzodiazepines? Etc.
Step 1 
Request the !le layout
Fields, position, type, length 
Field 
Number Variable Type Format Label Comment 
1 SEQ_NO Char $10. Sequence Number 
Unique sequence number assigned to each record within a year. First four digits are 
the year of discharge. 
2 REC_KEY Num 11. Record Key Unique number assigned to each CHARS record. Added in 2003. 
3 STAYTYPE Char $1 Type of Stay 
1 = Inpatient 
2 = Observation patient 
4 HOSPITAL Char $4 Hospital Number 
DOH assigned hospital number. 
Fourth character describes the Medicare certified unit type with: 
blank = acute care 
R = Rehabilitation unit 
P = Psychiatric unit 
S = Swing bed unit 
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - 
A = Alcohol (discontinued after 1992) 
B = Bone marrow transplants (discontinued after 2000) 
E = Extended care (discontinued after 2001) 
H = Tacoma General & Group Health combined (discontinued after 1992) 
I = Group Health only at Tacoma General (discontinued after 1992) 
5 LINENO Num 3. Number of Reported Revenue Items Codes 
6 ZIPCODE Char $5 Patient's Zip Code 
99999 indicates the zip code is unknown. 
99998 indicates homelessness (some homeless patients may have a zip code for a shelter or other 
temporary location). 
Blanks indicate non-U.S. residence. 
7 STATERES Char $2 State of Residence 
State abbreviation used by U.S. Postal Service. 
This is assigned from the zip code. 
Residents with zip code 99998 are assigned to Washington 
XX = invalid zip code or a non-U.S. residence.
Fixed length vs. delimited 
— Fixed Length 
— "e data !elds measure a 
speci!c number of characters 
— Field 1 = 10 characters long 
— File layout is critical 
— Delimited 
— "e data !elds are separated by 
a common character or mark 
— Like a comma or tab 
— Always ask for “text delimited 
data,” which is easier to import 
than !xed length
Make a master copy
Keep a log
Delimited !le
Hands On - Hunting Database
Fixed width !le
Data Journalism 101 - Part 1 by Michael J. Berens
Data Journalism 101 - Part 1 by Michael J. Berens

Data Journalism 101 - Part 1 by Michael J. Berens

  • 1.
    Data Journalism 101 Session One: Intro to Databases Accessing and managing data for stories Excellence in Journalism Conference 2014 Donald W. Reynolds National Center for Business Journalism at ASU Michael J. Berens – !e Seattle Times
  • 2.
    He said. Shesaid. Now I’m going to tell you who’s telling the truth.
  • 3.
    Cells, !elds andheaders – oh my!
  • 4.
    Database Options Createyour own database — Obtain sources of information (paper records) Import existing database — Obtain existing database — Scrape data from the web
  • 5.
  • 8.
    Track the exploitationof vulnerable seniors SUNDAY, SEPTEMBER 12, 2010 A SEATTLE TIMES INVESTIGATION / PART 4 Deaths in adult homes hidden and ignored Abuse and neglect may have killed hundreds of residents. But with nobody questioning the circumstances, troubled homes are staying open. COURTESY OF JAMES RUDOLPH A HOME’S MISTREATMENT PROVES DEADLY Neglect at an adult family home is blamed for the 2008 death of 87-year-old Jean Rudolph, a retired nursing educator who had Alzheimer’s disease and heart problems. Infection from severe bedsores, which developed during her stay at the home, spread to her vital organs.
  • 9.
    Tracking fraudulent medicaldevices and pro!teers
  • 12.
    Follow the Information — You’ve received an unsolicited email from a doctor who claims that scores of pain patients have accidentally died from methadone overdoses. — "e doctor claims that the State of Washington pushes methadone as a “preferred drug” because it’s the least expensive. — "e doctor claims the state fails to warn patients about the unique risks of methadone.
  • 13.
    Find the datasources — Death certi!cates – Track cause of death and number of overdose victims — ARCOS Database – Created by U.S. Drug Enforcement Agency to track controlled substances — In-patient hospital database – Created by a dozen or so states to track types of hospitalizations — My own questions – How many patients also took benzodiazepines? Etc.
  • 14.
    Step 1 Requestthe !le layout
  • 15.
    Fields, position, type,length Field Number Variable Type Format Label Comment 1 SEQ_NO Char $10. Sequence Number Unique sequence number assigned to each record within a year. First four digits are the year of discharge. 2 REC_KEY Num 11. Record Key Unique number assigned to each CHARS record. Added in 2003. 3 STAYTYPE Char $1 Type of Stay 1 = Inpatient 2 = Observation patient 4 HOSPITAL Char $4 Hospital Number DOH assigned hospital number. Fourth character describes the Medicare certified unit type with: blank = acute care R = Rehabilitation unit P = Psychiatric unit S = Swing bed unit - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - A = Alcohol (discontinued after 1992) B = Bone marrow transplants (discontinued after 2000) E = Extended care (discontinued after 2001) H = Tacoma General & Group Health combined (discontinued after 1992) I = Group Health only at Tacoma General (discontinued after 1992) 5 LINENO Num 3. Number of Reported Revenue Items Codes 6 ZIPCODE Char $5 Patient's Zip Code 99999 indicates the zip code is unknown. 99998 indicates homelessness (some homeless patients may have a zip code for a shelter or other temporary location). Blanks indicate non-U.S. residence. 7 STATERES Char $2 State of Residence State abbreviation used by U.S. Postal Service. This is assigned from the zip code. Residents with zip code 99998 are assigned to Washington XX = invalid zip code or a non-U.S. residence.
  • 19.
    Fixed length vs.delimited — Fixed Length — "e data !elds measure a speci!c number of characters — Field 1 = 10 characters long — File layout is critical — Delimited — "e data !elds are separated by a common character or mark — Like a comma or tab — Always ask for “text delimited data,” which is easier to import than !xed length
  • 20.
  • 21.
  • 22.
  • 23.
    Hands On -Hunting Database
  • 25.