Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Data Journalism 101 - Part 1 by Michael J. Berens

Pulitzer Prize winner, Michael J. Berens of The Seattle Times presents "Data Journalism 101," a three-hour, hands-on workshop for the Donald W. Reynolds National Center for Business Journalism at the Excellence in Journalism Conference in Nashville, Tenn. on Sept. 4, 2014.

Part 1 provides an intro to databases and their importance to reporting.

For more business journalism training opportunities and resources, please visit http://businessjournalism.org.

  • Login to see the comments

Data Journalism 101 - Part 1 by Michael J. Berens

  1. 1. Data Journalism 101 Session One: Intro to Databases Accessing and managing data for stories Excellence in Journalism Conference 2014 Donald W. Reynolds National Center for Business Journalism at ASU Michael J. Berens – !e Seattle Times
  2. 2. He said. She said. Now I’m going to tell you who’s telling the truth.
  3. 3. Cells, !elds and headers – oh my!
  4. 4. Database Options Create your own database — Obtain sources of information (paper records) Import existing database — Obtain existing database — Scrape data from the web
  5. 5. Finding a serial killer
  6. 6. Track the exploitation of vulnerable seniors SUNDAY, SEPTEMBER 12, 2010 A SEATTLE TIMES INVESTIGATION / PART 4 Deaths in adult homes hidden and ignored Abuse and neglect may have killed hundreds of residents. But with nobody questioning the circumstances, troubled homes are staying open. COURTESY OF JAMES RUDOLPH A HOME’S MISTREATMENT PROVES DEADLY Neglect at an adult family home is blamed for the 2008 death of 87-year-old Jean Rudolph, a retired nursing educator who had Alzheimer’s disease and heart problems. Infection from severe bedsores, which developed during her stay at the home, spread to her vital organs.
  7. 7. Tracking fraudulent medical devices and pro!teers
  8. 8. Follow the Information — You’ve received an unsolicited email from a doctor who claims that scores of pain patients have accidentally died from methadone overdoses. — "e doctor claims that the State of Washington pushes methadone as a “preferred drug” because it’s the least expensive. — "e doctor claims the state fails to warn patients about the unique risks of methadone.
  9. 9. Find the data sources — Death certi!cates – Track cause of death and number of overdose victims — ARCOS Database – Created by U.S. Drug Enforcement Agency to track controlled substances — In-patient hospital database – Created by a dozen or so states to track types of hospitalizations — My own questions – How many patients also took benzodiazepines? Etc.
  10. 10. Step 1 Request the !le layout
  11. 11. Fields, position, type, length Field Number Variable Type Format Label Comment 1 SEQ_NO Char $10. Sequence Number Unique sequence number assigned to each record within a year. First four digits are the year of discharge. 2 REC_KEY Num 11. Record Key Unique number assigned to each CHARS record. Added in 2003. 3 STAYTYPE Char $1 Type of Stay 1 = Inpatient 2 = Observation patient 4 HOSPITAL Char $4 Hospital Number DOH assigned hospital number. Fourth character describes the Medicare certified unit type with: blank = acute care R = Rehabilitation unit P = Psychiatric unit S = Swing bed unit - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - A = Alcohol (discontinued after 1992) B = Bone marrow transplants (discontinued after 2000) E = Extended care (discontinued after 2001) H = Tacoma General & Group Health combined (discontinued after 1992) I = Group Health only at Tacoma General (discontinued after 1992) 5 LINENO Num 3. Number of Reported Revenue Items Codes 6 ZIPCODE Char $5 Patient's Zip Code 99999 indicates the zip code is unknown. 99998 indicates homelessness (some homeless patients may have a zip code for a shelter or other temporary location). Blanks indicate non-U.S. residence. 7 STATERES Char $2 State of Residence State abbreviation used by U.S. Postal Service. This is assigned from the zip code. Residents with zip code 99998 are assigned to Washington XX = invalid zip code or a non-U.S. residence.
  12. 12. Fixed length vs. delimited — Fixed Length — "e data !elds measure a speci!c number of characters — Field 1 = 10 characters long — File layout is critical — Delimited — "e data !elds are separated by a common character or mark — Like a comma or tab — Always ask for “text delimited data,” which is easier to import than !xed length
  13. 13. Make a master copy
  14. 14. Keep a log
  15. 15. Delimited !le
  16. 16. Hands On - Hunting Database
  17. 17. Fixed width !le

×