Data Journalism 101 - Day 1 by Michael J. Berens
Upcoming SlideShare
Loading in...5
×
 

Data Journalism 101 - Day 1 by Michael J. Berens

on

  • 874 views

Michael J. Berens presents the first part of the free, two-day webinar, "Data Journalism 101," hosted by the Donald W. Reynolds National Center for Business Journalism. ...

Michael J. Berens presents the first part of the free, two-day webinar, "Data Journalism 101," hosted by the Donald W. Reynolds National Center for Business Journalism.

For access to the webinar materials, visit http://bit.ly/datajourn101.

For more information about training for business journalists, please visit http://businessjournalism.org

Statistics

Views

Total Views
874
Views on SlideShare
867
Embed Views
7

Actions

Likes
0
Downloads
48
Comments
0

1 Embed 7

https://twitter.com 7

Accessibility

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

Data Journalism 101 - Day 1 by Michael J. Berens Data Journalism 101 - Day 1 by Michael J. Berens Presentation Transcript

  • Data Journalism 101 Donald W. Reynolds National Center for Business Journalism at ASU Michael J. Berens – The Seattle Times
  • Skills – rooted in past
  • Skills – lost in space
  • He said. She said. Now I’m going to tell you who’s telling the truth.
  • Poll Question: Have you ever been denied public data? 1) Yes 2) No
  • Finding a serial killer
  • Finding deadly germs and dirty hospitals
  • Tracking elephant deaths inside America’s zoos
  • Tracking fraudulent medical devices and profiteers
  • Tracking the exploitation of vulnerable seniors
  • Cops who own crack houses Secret release of fugitives Sexual misconduct in health care Jailing the poor Nursing errors Unsanitary hospitals
  • Most dangerous highway Most dangerous intersection Number of deadly police chases Most dangerous area for crime Most unsanitary restaurants “Quantitative”
  • Poll Question: Why were you denied data? • Too expensive • Agency claimed info was not a public record. • Agency claimed the request was a burden.
  • Negotiating for data • Delay - we’re working on it. • Deny – it’s proprietary software • Divert – yours for just $12,000
  • “If you don’t know who I am, then maybe your best course of action would be to tread lightly.” ""Walter White in "Breaking Bad"
  • Step One File layout (secret weapon to finding stories)
  • Fields, position, type, length Field Number Variable Type Format Label Comment 1 SEQ_NO Char $10. Sequence Number Unique sequence number assigned to each record within a year. First four digits are the year of discharge. 2 REC_KEY Num 11. Record Key Unique number assigned to each CHARS record. Added in 2003. 3 STAYTYPE Char $1 Type of Stay 1 = Inpatient 2 = Observation patient DOH assigned hospital number. Fourth character describes the Medicare certified unit type with: blank = acute care R = Rehabilitation unit P = Psychiatric unit S = Swing bed unit ----------------------------------------------------A = Alcohol (discontinued after 1992) B = Bone marrow transplants (discontinued after 2000) E = Extended care (discontinued after 2001) H = Tacoma General & Group Health combined (discontinued after 1992) I = Group Health only at Tacoma General (discontinued after 1992) 4 HOSPITAL Char $4 Hospital Number 5 LINENO Num 3. Number of Reported Revenue Items Codes 6 7 ZIPCODE STATERES Char Char $5 $2 Patient's Zip Code 99999 indicates the zip code is unknown. 99998 indicates homelessness (some homeless patients may have a zip code for a shelter or other temporary location). Blanks indicate non-U.S. residence. State of Residence State abbreviation used by U.S. Postal Service. This is assigned from the zip code. Residents with zip code 99998 are assigned to Washington XX = invalid zip code or a non-U.S. residence.
  • Code keys
  • Finding stories that lurk in code keys
  • Stories that hide in plain sight E9220 E9221 E9222 E9223 E9224 E9225 E9228 E9229 E9230 E9231 E9232 E9238 E9239 E9240 E9241 HANDGUN ACCIDENT SHOTGUN ACCIDENT HUNTING RIFLE ACCIDENT MILITARY FIREARM ACCID ACCIDENT - AIR GUN ACCIDENT-PAINTBALL GUN FIREARM ACCIDENT NEC FIREARM ACCIDENT NOS FIREWORKS ACCIDENT BLASTING MATERIALS ACCID EXPLOSIVE GASES ACCIDENT EXPLOSIVES ACCIDENT NEC EXPLOSIVES ACCIDENT NOS ACC-HOT LIQUID & STEAM ACCID-CAUSTIC SUBSTANCE
  • Secret release of fugitives – code in court data Rising tide of innocent people killed in police chases – code in NHTSA data How many people contracted a hospital-acquired infection during heart surgery – code in hospital data ---------------------Power of two – combining data Death certificates – list of adult family homes
  • Tips Know the rules of the data. No detail is too small.
  • Step Two File format
  • Every computer file has an extension: .txt Text file .csv Comma-separated value .dbf Database format .html Hyper-text mark-up language .mdb Microsoft database (Access file) .pdf Portable Document Format Rule of thumb: Always request commadelimited text if Excel format is unavailable
  • Two database structures: 1) Fixed length 2) Delimited
  • Fixed-length file Berens 2312 Columbus blue Anderson 4563625 Seattle violet Becker 45453 New York light brown
  • Delimited file berens,272464,Seattle,blue
  • Poll Question: In general, how long do you wait for public data? 1) Quickly - within a few weeks at most 2) Slowly – often takes a month or more 3) Never – there’s always some issue
  • Tip Talk first. File a request last.
  • Blank canvas - importing
  • Go to “Data” tab, then look for “Text” icon
  • CASE 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 DATE 11/21/87 11/21/87 11/21/87 11/21/87 11/21/87 11/21/87 11/21/87 11/21/87 11/21/87 11/21/87 11/21/87 11/21/87 11/21/87 11/21/87 11/22/87 11/22/87 11/22/87 11/22/87 11/22/87 TIME COUNTY 645 Sauk 730 Marathon 930 Oneida 945 Juneau 950 Buffalo 1000 Portage 1000 Portage 1135 Rock 1235 Columbia 1300 Columbia 1440 Shawano 1445 Trempealeau 1445 Columbia 1630 Langlade 815 Trempealeau 900 Oconto 900 Trempealeau 1130 Buffalo 1143 Door AREA south centrl north south centrl centrl centrl south south south centrl centrl south north centrl centrl centrl centrl north WOUND neck arm chest chest leg foot chest head head abdomn chest neck leg arm head toe leg head hand INJURY minor major fatal major major major major fatal major fatal fatal major major minor major major major minor major TYPE sp si sp si sp si si sp si sp sp si CAUSE victim in car-stray bullet loaded firearm in vehicle careless handling-tree involvd victim in line of fire victim out of sight of shooter careless handling-tree involvd careless handling-tree invovld victim in line of fire careless handling-tree involvd victim fell from tree victim out of sight of shooter ricochet-off gun gun hammer struck an object victim out of sight of shooter ricochet-bullet thru deer careless handling-tree involvd victim in line of fire victim out of sight of shooter unloading firearm-defective
  • Tip Make a copy of the database. Call it “master file” and never touch it. Always work from a copy. Hint: Keep a log of everything
  • Importing a fixed-length file
  • Tip Always show your results to the sources in your story. Remember: You’re one keystroke away from a career-ending error
  • Answer in the chat box What (and where) is your favorite source of Web-based data?
  • https://www.fpds.gov/
  • Searching for Microsoft
  • Instant database – 17,583 records
  • http://www.fda.gov/
  • Look for the entire download
  • https://oig.hhs.gov/exclusions/
  • Code key
  • http://ire.org/nicar
  • Don’t be obsolete.
  • Unleash your inner watchdog