Mind the Gap NICAR14 (holes in data)

354 views

Published on

0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
354
On SlideShare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
2
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Mind the Gap NICAR14 (holes in data)

  1. 1. Mind the Gap How holes in your data can lead to stories Thomas Hargrove, Scripps News Washington Bureau Jennifer LaFleur, Center for Investigative Reporting NICAR Baltimore: 2 p.m. March 1, 2014 Salon DEF
  2. 2. • • • • • • Never assume data are whole – check !!! Simple techniques like sorting Many of these we use to integrity check Graphing over time Matching to other data sets Statistical tools
  3. 3. • • • • Look for research already done on the topic Find experts Talk to reporters who have done similar stories If possible, talk to records personnel who assembled the data • Follow data to their source – usually people
  4. 4. • Finding stories in the holes – Agencies failure to report – Varying reporting rules across geography or agency – Government computer system failures – Find patterns among missing records – Find the reasons behind missing records
  5. 5. How This Project Started Dr. David Icove Researcher, University of Tennessee Retired member of FBI Behavioral Science Unit
  6. 6. For many years, NFIRS reported only 5% of building fires are intentionally set in U.S.
  7. 7. The Impossible Variance of America’s Rate of Arson: 2006 to 2011 • • • • • • • • • • • • • • • • • • • • • • • Department State Indianapolis San Diego New York City Gwinnett County Houston Arlington Chicago Los Angeles City Phoenix Memphis Tulsa Gary Cleveland Toledo Saginaw Dayton Buffalo Youngstown Highland Park North Las Vegas IN CA NY GA TX TX IL CA AZ TN OK IN OH OH MI OH NY OH MI NV Fires 1,207 1,022 18,988 1,678 7,740 1,511 5,075 7,975 5,359 5,331 3,076 424 5,742 2,544 1,377 1,930 1,606 2,125 748 435 Arson Rate 0% 0 1 2 2 3 4 10 12 16 22 28 28 28 32 33 33 36 45 49
  8. 8. How Rare is Arson?
  9. 9. But They Should Have Reported:
  10. 10. “Arson is grossly under reported. The true rate, I believe, is 40% to 50% -- in that range.” --Bill Degnan, President National Association of State Fire Marshals.
  11. 11. “There isn’t a day that goes by that I don’t think: ‘Man, I was a monster.’ I’m just thankful no one was hurt,” --Kenneth Allen Muncie, Indiana
  12. 12. The Allen Conspiracy: 46 people set 73 home and vehicle fires to collect $3.8 million from insurance
  13. 13. Lessons Learned from 1 million fires: • 54,860 fires at ‘unlucky’ buildings that, like Allen’s home, experienced multiple fires but none of which were reported as arson. • 42,434 fires at buildings that experienced foreclosure, according to the national mortgage monitoring firm RealtyTrac. • 3,561 fires that had multiple points of ignition, suggesting someone set several fires at once. • 77,596 fires in unoccupied or vacant buildings.
  14. 14. What’s Next? • Collecting data on 4.8 million fires • Calculate geographic rates by merging aggregated fire counts to Census Bureau tract data • Correlate rates of suspicious fires to tracts with unusually high occurrences of fire • Contact local fire/police authorities to determine if serial arson is suspected or should be investigated
  15. 15. Local gap-mining stories
  16. 16. Here’s FBI data you were never supposed to see
  17. 17. Truck accidents by year and agency
  18. 18. Sometimes you find piles
  19. 19. Sometimes you find piles
  20. 20. Statistical tools • Time series correlation – are your ups and downs real? • Project/predict data and compare to actual results. What causes differences? • Population counts are pretty accurate. Use them to determine reporting rates • Regression with dummy variables
  21. 21. Make sure the holes are real EE000132 might actually be the same as EE-000-132
  22. 22. A word of caution • Do spot checks to make sure what you found is real • Run your findings by experts • If possible, engage government sources of data early. They may not be the enemy. • Challenge your assumptions. Data are only a clue, never an end results
  23. 23. Questions? Jennifer LaFleur jlafleur@cironline.org @j_la28 Thomas Hargrove hargrovet@scripps.com 202-408-2703 Arson Project syntax files: https://www.dropbox.com/l/LPB7l3kpz7wxvGsHSdTOy9 A copy of this presentation will be at www.jenster.com/2014

×