Your SlideShare is downloading. ×
0
Data Journalism 101
Donald W. Reynolds National Center for Business
Journalism at ASU
Michael J. Berens – The Seattle Time...
Skills – rooted in past
Skills – lost in space
He said. She said.
Now I’m going to tell you
who’s telling the truth.
Poll Question:
Have you ever been
denied public data?
1) Yes
2) No
Finding a
serial killer
Finding
deadly
germs and
dirty
hospitals
Tracking elephant deaths inside
America’s zoos
Tracking
fraudulent
medical devices
and profiteers
Tracking the
exploitation of
vulnerable
seniors
Cops who own crack houses
Secret release of fugitives
Sexual misconduct in health care
Jailing the poor

Nursing errors
Un...
Most dangerous highway

Most dangerous intersection
Number of deadly police chases
Most dangerous area for crime
Most unsa...
Poll Question:
Why were you denied data?
• Too expensive
• Agency claimed info was not a public
record.
• Agency claimed t...
Negotiating for data
• Delay - we’re working on it.
• Deny – it’s proprietary software
• Divert – yours for just $12,000
“If you don’t know who I am, then
maybe your best course of action
would be to tread lightly.”
""Walter White in "Breaking...
Step One
File layout
(secret weapon to finding stories)
Fields, position, type, length
Field
Number

Variable

Type

Format

Label

Comment

1

SEQ_NO

Char

$10.

Sequence Numbe...
Code keys
Finding
stories that
lurk in code
keys
Stories that hide in plain sight
E9220
E9221
E9222
E9223
E9224
E9225
E9228
E9229
E9230
E9231
E9232
E9238
E9239
E9240
E9241...
Secret release of fugitives – code in court data
Rising tide of innocent people killed in police chases –
code in NHTSA da...
Tips
Know the rules of the data.
No detail is too small.
Step Two
File format
Every computer file has an extension:
.txt Text file
.csv Comma-separated value
.dbf Database format
.html Hyper-text mark...
Two database structures:
1) Fixed length
2) Delimited
Fixed-length file

Berens 2312
Columbus blue
Anderson 4563625 Seattle violet
Becker 45453 New York light brown
Delimited file
berens,272464,Seattle,blue
Poll Question:
In general, how long do you wait
for public data?
1) Quickly - within a few weeks at most
2) Slowly – often...
Tip
Talk first.
File a request last.
Blank canvas - importing
Go to “Data” tab, then
look for “Text” icon
CASE
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19

DATE
11/21/87
11/21/87
11/21/87
11/21/87
11/21/87
11/21/87
11/21/87
...
Tip
Make a copy of the database.
Call it “master file” and never
touch it.
Always work from a copy.
Hint: Keep a log of ev...
Importing a fixed-length file
Tip
Always show your results to
the sources in your story.
Remember: You’re one keystroke away from a
career-ending error
Answer in the chat box
What (and where) is your
favorite source of Web-based
data?
https://www.fpds.gov/
Searching for Microsoft
Instant database – 17,583 records
http://www.fda.gov/
Look for the entire download
https://oig.hhs.gov/exclusions/
Code key
http://ire.org/nicar
Don’t be
obsolete.
Unleash your inner watchdog
Data Journalism 101 - Day 1 by Michael J. Berens
Data Journalism 101 - Day 1 by Michael J. Berens
Data Journalism 101 - Day 1 by Michael J. Berens
Data Journalism 101 - Day 1 by Michael J. Berens
Data Journalism 101 - Day 1 by Michael J. Berens
Data Journalism 101 - Day 1 by Michael J. Berens
Data Journalism 101 - Day 1 by Michael J. Berens
Data Journalism 101 - Day 1 by Michael J. Berens
Data Journalism 101 - Day 1 by Michael J. Berens
Data Journalism 101 - Day 1 by Michael J. Berens
Data Journalism 101 - Day 1 by Michael J. Berens
Data Journalism 101 - Day 1 by Michael J. Berens
Data Journalism 101 - Day 1 by Michael J. Berens
Data Journalism 101 - Day 1 by Michael J. Berens
Data Journalism 101 - Day 1 by Michael J. Berens
Data Journalism 101 - Day 1 by Michael J. Berens
Data Journalism 101 - Day 1 by Michael J. Berens
Data Journalism 101 - Day 1 by Michael J. Berens
Data Journalism 101 - Day 1 by Michael J. Berens
Data Journalism 101 - Day 1 by Michael J. Berens
Upcoming SlideShare
Loading in...5
×

Data Journalism 101 - Day 1 by Michael J. Berens

844

Published on

Michael J. Berens presents the first part of the free, two-day webinar, "Data Journalism 101," hosted by the Donald W. Reynolds National Center for Business Journalism.

For access to the webinar materials, visit http://bit.ly/datajourn101.

For more information about training for business journalists, please visit http://businessjournalism.org

0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
844
On Slideshare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
50
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide

Transcript of "Data Journalism 101 - Day 1 by Michael J. Berens"

  1. 1. Data Journalism 101 Donald W. Reynolds National Center for Business Journalism at ASU Michael J. Berens – The Seattle Times
  2. 2. Skills – rooted in past
  3. 3. Skills – lost in space
  4. 4. He said. She said. Now I’m going to tell you who’s telling the truth.
  5. 5. Poll Question: Have you ever been denied public data? 1) Yes 2) No
  6. 6. Finding a serial killer
  7. 7. Finding deadly germs and dirty hospitals
  8. 8. Tracking elephant deaths inside America’s zoos
  9. 9. Tracking fraudulent medical devices and profiteers
  10. 10. Tracking the exploitation of vulnerable seniors
  11. 11. Cops who own crack houses Secret release of fugitives Sexual misconduct in health care Jailing the poor Nursing errors Unsanitary hospitals
  12. 12. Most dangerous highway Most dangerous intersection Number of deadly police chases Most dangerous area for crime Most unsanitary restaurants “Quantitative”
  13. 13. Poll Question: Why were you denied data? • Too expensive • Agency claimed info was not a public record. • Agency claimed the request was a burden.
  14. 14. Negotiating for data • Delay - we’re working on it. • Deny – it’s proprietary software • Divert – yours for just $12,000
  15. 15. “If you don’t know who I am, then maybe your best course of action would be to tread lightly.” ""Walter White in "Breaking Bad"
  16. 16. Step One File layout (secret weapon to finding stories)
  17. 17. Fields, position, type, length Field Number Variable Type Format Label Comment 1 SEQ_NO Char $10. Sequence Number Unique sequence number assigned to each record within a year. First four digits are the year of discharge. 2 REC_KEY Num 11. Record Key Unique number assigned to each CHARS record. Added in 2003. 3 STAYTYPE Char $1 Type of Stay 1 = Inpatient 2 = Observation patient DOH assigned hospital number. Fourth character describes the Medicare certified unit type with: blank = acute care R = Rehabilitation unit P = Psychiatric unit S = Swing bed unit ----------------------------------------------------A = Alcohol (discontinued after 1992) B = Bone marrow transplants (discontinued after 2000) E = Extended care (discontinued after 2001) H = Tacoma General & Group Health combined (discontinued after 1992) I = Group Health only at Tacoma General (discontinued after 1992) 4 HOSPITAL Char $4 Hospital Number 5 LINENO Num 3. Number of Reported Revenue Items Codes 6 7 ZIPCODE STATERES Char Char $5 $2 Patient's Zip Code 99999 indicates the zip code is unknown. 99998 indicates homelessness (some homeless patients may have a zip code for a shelter or other temporary location). Blanks indicate non-U.S. residence. State of Residence State abbreviation used by U.S. Postal Service. This is assigned from the zip code. Residents with zip code 99998 are assigned to Washington XX = invalid zip code or a non-U.S. residence.
  18. 18. Code keys
  19. 19. Finding stories that lurk in code keys
  20. 20. Stories that hide in plain sight E9220 E9221 E9222 E9223 E9224 E9225 E9228 E9229 E9230 E9231 E9232 E9238 E9239 E9240 E9241 HANDGUN ACCIDENT SHOTGUN ACCIDENT HUNTING RIFLE ACCIDENT MILITARY FIREARM ACCID ACCIDENT - AIR GUN ACCIDENT-PAINTBALL GUN FIREARM ACCIDENT NEC FIREARM ACCIDENT NOS FIREWORKS ACCIDENT BLASTING MATERIALS ACCID EXPLOSIVE GASES ACCIDENT EXPLOSIVES ACCIDENT NEC EXPLOSIVES ACCIDENT NOS ACC-HOT LIQUID & STEAM ACCID-CAUSTIC SUBSTANCE
  21. 21. Secret release of fugitives – code in court data Rising tide of innocent people killed in police chases – code in NHTSA data How many people contracted a hospital-acquired infection during heart surgery – code in hospital data ---------------------Power of two – combining data Death certificates – list of adult family homes
  22. 22. Tips Know the rules of the data. No detail is too small.
  23. 23. Step Two File format
  24. 24. Every computer file has an extension: .txt Text file .csv Comma-separated value .dbf Database format .html Hyper-text mark-up language .mdb Microsoft database (Access file) .pdf Portable Document Format Rule of thumb: Always request commadelimited text if Excel format is unavailable
  25. 25. Two database structures: 1) Fixed length 2) Delimited
  26. 26. Fixed-length file Berens 2312 Columbus blue Anderson 4563625 Seattle violet Becker 45453 New York light brown
  27. 27. Delimited file berens,272464,Seattle,blue
  28. 28. Poll Question: In general, how long do you wait for public data? 1) Quickly - within a few weeks at most 2) Slowly – often takes a month or more 3) Never – there’s always some issue
  29. 29. Tip Talk first. File a request last.
  30. 30. Blank canvas - importing
  31. 31. Go to “Data” tab, then look for “Text” icon
  32. 32. CASE 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 DATE 11/21/87 11/21/87 11/21/87 11/21/87 11/21/87 11/21/87 11/21/87 11/21/87 11/21/87 11/21/87 11/21/87 11/21/87 11/21/87 11/21/87 11/22/87 11/22/87 11/22/87 11/22/87 11/22/87 TIME COUNTY 645 Sauk 730 Marathon 930 Oneida 945 Juneau 950 Buffalo 1000 Portage 1000 Portage 1135 Rock 1235 Columbia 1300 Columbia 1440 Shawano 1445 Trempealeau 1445 Columbia 1630 Langlade 815 Trempealeau 900 Oconto 900 Trempealeau 1130 Buffalo 1143 Door AREA south centrl north south centrl centrl centrl south south south centrl centrl south north centrl centrl centrl centrl north WOUND neck arm chest chest leg foot chest head head abdomn chest neck leg arm head toe leg head hand INJURY minor major fatal major major major major fatal major fatal fatal major major minor major major major minor major TYPE sp si sp si sp si si sp si sp sp si CAUSE victim in car-stray bullet loaded firearm in vehicle careless handling-tree involvd victim in line of fire victim out of sight of shooter careless handling-tree involvd careless handling-tree invovld victim in line of fire careless handling-tree involvd victim fell from tree victim out of sight of shooter ricochet-off gun gun hammer struck an object victim out of sight of shooter ricochet-bullet thru deer careless handling-tree involvd victim in line of fire victim out of sight of shooter unloading firearm-defective
  33. 33. Tip Make a copy of the database. Call it “master file” and never touch it. Always work from a copy. Hint: Keep a log of everything
  34. 34. Importing a fixed-length file
  35. 35. Tip Always show your results to the sources in your story. Remember: You’re one keystroke away from a career-ending error
  36. 36. Answer in the chat box What (and where) is your favorite source of Web-based data?
  37. 37. https://www.fpds.gov/
  38. 38. Searching for Microsoft
  39. 39. Instant database – 17,583 records
  40. 40. http://www.fda.gov/
  41. 41. Look for the entire download
  42. 42. https://oig.hhs.gov/exclusions/
  43. 43. Code key
  44. 44. http://ire.org/nicar
  45. 45. Don’t be obsolete.
  46. 46. Unleash your inner watchdog
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×