Nr14: Ten tips for data journalists


Published on

Published in: Technology, Education
1 Comment
  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Nr14: Ten tips for data journalists

  1. 1. 10 things every data journalist should know NR14, Hamburg Jennifer LaFleur Center for Investigative Reporting
  2. 2. A bit about CIR Nonprofit investigative newsroom Public interest investigative journalism Based near San Francisco About 80 staff Print, web, radio and tv
  3. 3. A little data journalism history 1952 1967 1980s …
  4. 4. #1 data is a powerful reporting tool It takes you beyond the anecdote
  5. 5. And It’s easier than dealing with this
  6. 6. #1 data is a powerful reporting tool Contrasts are in the data
  7. 7. Caution: This slide contains extreme nerdiness
  8. 8. #1 data is a powerful reporting tool Contrasts are in the data Your most powerful figures are in the data
  9. 9. Source: California Health Dept. data, Medicare billing data Findings: Some hospitals had “alarming rates of a Third World nutritional disorder among its Medicare patients.”
  10. 10. Contrasts are in the data Your most powerful figures are in the data You can make connections you might not be able to make otherwise #1 data is a powerful reporting tool
  11. 11. Data: Youth prison workers, criminal convictions and grievance data Findings: Employees with criminal backgrounds were more likely to be accused of abusing inmates.
  12. 12. Data: Federal bridge inspections and stimulus funding. Findings: Some of the nation’s worst bridges did not get stimulus funds.
  13. 13. Contrasts are in the data Your most powerful figures are in the data You can make connections you might not be able to make otherwise You can test assumptions #1 data is a powerful reporting tool
  14. 14. Source: NHTSA complaint data Findings: “…unintended acceleration has been a problem across the auto industry.”
  15. 15. #2 data comes from many places
  16. 16. If something is inspected Licensed Enforced or Purchased …There probably is a database Where’s the data?
  17. 17. If there is a report Or a form There probably is a database Where’s the data?
  18. 18. Sometimes data is readily available online for download Where’s the data?
  19. 19. Sometimes you have to scrape it. That usually involves programs that automate searching tasks on Web sites. Where’s the data?
  20. 20. More often you need to go to an agency or source to get the data Where’s the data?
  21. 21. Source: School district credit card purchases Findings: District card holders made questionable purchases with their cards.
  22. 22. #3 people who keep data don’t always want t give it up
  23. 23. Getting electronic information Know the law. Know what information you want. Do your homework Know what the appropriate cost should be. Know who does the data entry. Get to know the computer people.
  24. 24. Just another way of saying no Huge costs Delay tactics “Oh you silly little journalist” Sending you the wrong thing “Your request was unclear” HIPAA Privacy Privatization
  25. 25. #4 Sometimes holes in data can be a story
  26. 26. #5 Even when there is no data, you can use techniques for sampling and building a database. Sampling Physical surveys – go look at one Testing Questionnaires, polls and surveys Building from documents
  27. 27. We built a database of 500 people who had been granted or denied pardons during the Bush administration. We started with a list of nearly 2,000 people. From that, we pulled a random sample. Then spent months researching the individuals. We found that even after controlling for other factors, whites were more likely to get a pardon.
  28. 28. To examine food safety, the Center for Investigative Reporting in Bosnia sampled food – literally -- and had it tested in labs.
  29. 29. SVT surveyed 355 counties and districts about drug control – all replied (Courtesy Helena Bengtsson)
  30. 30. #6 Sometimes the crowd can help you
  31. 31. #7 There are many data tools – choose the right one Spreadsheets Databases Mapping Statistics Programming
  32. 32. Source: Salary data and other charter school records Findings: Reporters Found nepotism in charter schools and administrators earning six-figure salaries to run schools with only a few hundred or a couple of thousand students
  33. 33. Source: Washington Health Department data Findings: “MRSA has been quietly killing in hospitals for decades.” But no one had tracked it until this story.
  34. 34. Source: City Budget Findings: Some neighborhoods suffer more than others as mayor cuts budgets
  35. 35. SOURCE: Local health department inspection reports FINDINGS: At 28% of the venues, more than half of the concession stands or restaurants had been cited for at least one "critical" or "major" health violation.
  36. 36. #8 Sharing data is good, but give it context and be sure it is right
  37. 37. Source: EPA and state data on hazardous chemical locations Findings: Dallas County has 900+ sites that store hazardous chemicals
  38. 38. Source: Medicaid outcomes data for dialysis facilities Findings: A CMS online tool did not tell the whole story about facilities. In some counties the gap in measures, such as survival rate were vast.
  39. 39. Source: Dam inspection data from Texas and federal government Findings: Dam records had not been updated to account for population growth
  40. 40. #9 Data intended for one purpose can be used in other ways
  41. 41. Source: 311 calls for downed trees Findings: After a tornado swept across New York City, 311 calls for downed trees helps trace its path
  42. 42. Disparities in water usage “Water use highest in poor areas of the city” Mapping and statistical analysis
  43. 43. #10: No data is perfect
  44. 44. Check your data • Read the documentation. Understand the contents of every field. • Know how many records you should have. • Check counts and totals against reports. • Are all possibilities included? All states, all counties, correct ranges? • Check for missing data, duplicates, internal problems
  45. 45. Jennifer LaFleur @j_la28