Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Getting it the rightest
you can
Thomas Hargrove, Scripps News
John Perry, Atlanta Journal-Constitution
Janet Roberts, Reut...
Beware duplicates
Every time Saint Paul, Minn.,
housing inspectors made follow-
up visits to check on violations, all
of t...
Beware dates
Did 592,000
people in Ohio
really vote before
they registered?
Do integrity checks from your desk
Does it make sense?
“We select things for publication
just to make available a wide
scope of data to the public ...
There ...
Do the data conform
to the real world?
Are half of the records male,
half female?
In a national data set, are
about 13 per...
Check for patterns
in missing data.
Do patterns render
estimates inaccurate?
Do integrity checks from your desk
Think like a statistician
Do integrity checks from your desk
a/k/a: How George Will became
the darling of statistics teach...
Statistical checks: From the simple to
the sophisticated
Do integrity checks from your desk
R-squared = .82
ss2 = 43 + 0.9...
Beware the documentation
Do integrity checks with other sources
Yes, that’s Harold Spaeth’s view and
mostly I think he’s r...
What’s
missing?
An estimated 30
percent of felony
convictions are
missing from the
Minnesota public
convictions file.
(ask...
Check those codes
Do integrity checks with other sources
(a/k/a: The codes are not what they seem)
Data spanned six years....
Beware
elements
of change
Do integrity checks with other sources
The “feename” – name
of the property owner –
in the Saint...
Summarize cases by institutions,
then spot check results.
Do integrity checks with other sources
Is it true only 6 percent...
Beware nulls!
Technology bites
Null scariness from the FDA’s MAUDE database
http://www.accessdata.fda.gov/scripts/cdrh/cfd...
Beware nulls!
Technology bites
We want to explore reports involving Promus heart
stents , but NOT the Promus Element devic...
Beware nulls!Technology bites
There are 50 records that mention Promus.
We can see by scrolling that four are the
Promus E...
Beware nulls!
Technology bites
Let’s get rid of those Elements.
Beware nulls!Technology bites
50 – 4 = …..????
Beware nulls!Technology bites
You’re supposed to have 46 records, but you
got 30. What are the missing 16 records?
Beware nulls!
Technology bites
Right:
Wrong:
Beware false joins in
"encrypted“ data.
Technology bites
Medicare 5 percent
sample: Doctors IDs
were encrypted in
some fil...
Don’t alter
original data.
As you report and just before you publish
Make a copy of the
original data file. Put it
somewhe...
Document
as you go
As you report and just before you publish
Keep track of all of
your queries so you
can retrace your
ste...
Cross check
As you report and just before you publish
If you summed data in SQL, can you reproduce the results
in a pivot ...
Beware the single case
As you report and just before you publish
Never report on one data record without pulling the
paper...
Recreate the wheel
As you report and just before you publish
For every fact, number,
finding in your story,
write an origi...
Fear is your friend
Upcoming SlideShare
Loading in …5
×

Getting it the rightest

1,089 views

Published on

NICAR - Gettin getting it the rightest - Janet Roberts, John Perry, Tom Hargrove

Published in: Automotive
  • Be the first to comment

Getting it the rightest

  1. 1. Getting it the rightest you can Thomas Hargrove, Scripps News John Perry, Atlanta Journal-Constitution Janet Roberts, Reuters Jennifer LaFleur, Reveal | The Center for Investigative Reporting IRE 2015 CAR Conference, Atlanta
  2. 2. Beware duplicates Every time Saint Paul, Minn., housing inspectors made follow- up visits to check on violations, all of the data entries from the previous visit were logged again. So every violation was listed in the database multiple times. Do integrity checks from your desk
  3. 3. Beware dates Did 592,000 people in Ohio really vote before they registered? Do integrity checks from your desk
  4. 4. Does it make sense? “We select things for publication just to make available a wide scope of data to the public ... There is some burden on the public to decide whether or not to use the material.” --Kathleen McGuire, Sourcebook of Criminal Justice Statistics (a/k/a: The case of the disappearing lifers) Do integrity checks from your desk
  5. 5. Do the data conform to the real world? Are half of the records male, half female? In a national data set, are about 13 percent of the records from California? Are racial minorities adequately represented? Do integrity checks from your desk
  6. 6. Check for patterns in missing data. Do patterns render estimates inaccurate? Do integrity checks from your desk
  7. 7. Think like a statistician Do integrity checks from your desk a/k/a: How George Will became the darling of statistics teachers "In 1992-93, none of the five states with the highest teachers' salaries were among the 15 states with the highest SAT scores. And the 10 states with the lowest per pupil spending included four . . . among the 10 states with the highest SAT scores." --George Will, 1993
  8. 8. Statistical checks: From the simple to the sophisticated Do integrity checks from your desk R-squared = .82 ss2 = 43 + 0.95(ss1) Descriptive statistics: Frequency Average Mode
  9. 9. Beware the documentation Do integrity checks with other sources Yes, that’s Harold Spaeth’s view and mostly I think he’s right, though I’d substitute the word more “efficient” for more “accurate.” --Lee Epstein (Find a power user, and compare notes.)
  10. 10. What’s missing? An estimated 30 percent of felony convictions are missing from the Minnesota public convictions file. (ask the keepers of the data) Do integrity checks with other sources
  11. 11. Check those codes Do integrity checks with other sources (a/k/a: The codes are not what they seem) Data spanned six years. Sometime in those six years, the violation codes changed. No one in the Housing Violations Bureau knew when the switch was made, and no one had definitions for the previous codes. (a/k/a: Why to pull some paper records)
  12. 12. Beware elements of change Do integrity checks with other sources The “feename” – name of the property owner – in the Saint Paul Housing Bureau’s code violations database is pulled in from property tax rolls. It shows the current owner. That person may not have owned the property at the time of the violation. (a/k/a: Why to pull some paper records)
  13. 13. Summarize cases by institutions, then spot check results. Do integrity checks with other sources Is it true only 6 percent of hospital emergency cases are transferred from other hospitals?
  14. 14. Beware nulls! Technology bites Null scariness from the FDA’s MAUDE database http://www.accessdata.fda.gov/scripts/cdrh/cfdocs/cfmaude/search.cfm
  15. 15. Beware nulls! Technology bites We want to explore reports involving Promus heart stents , but NOT the Promus Element devices. First, let’s see what’s in there for Promus.
  16. 16. Beware nulls!Technology bites There are 50 records that mention Promus. We can see by scrolling that four are the Promus Element that we wish to exclude.
  17. 17. Beware nulls! Technology bites Let’s get rid of those Elements.
  18. 18. Beware nulls!Technology bites 50 – 4 = …..????
  19. 19. Beware nulls!Technology bites You’re supposed to have 46 records, but you got 30. What are the missing 16 records?
  20. 20. Beware nulls! Technology bites Right: Wrong:
  21. 21. Beware false joins in "encrypted“ data. Technology bites Medicare 5 percent sample: Doctors IDs were encrypted in some files, not in others.
  22. 22. Don’t alter original data. As you report and just before you publish Make a copy of the original data file. Put it somewhere and don’t touch it again. Don’t edit an original column or field. Make a copy and edit that.
  23. 23. Document as you go As you report and just before you publish Keep track of all of your queries so you can retrace your steps or find where you went wrong. As you integrity check your data, annotate the queries to remember what you learned.
  24. 24. Cross check As you report and just before you publish If you summed data in SQL, can you reproduce the results in a pivot table? If you’re summing, do a list. Make sure there‘s nothing wacky in that list that would cause your count to be wrong; e.g., duplicates. If you have various data sources that should yield the same conclusions, do they?
  25. 25. Beware the single case As you report and just before you publish Never report on one data record without pulling the paper report or talking to the person in question. What if it was a data entry error? What if there are circumstances you don’t understand?
  26. 26. Recreate the wheel As you report and just before you publish For every fact, number, finding in your story, write an original query or formula to support it. Go back to your original data. Try to arrive at the same conclusion in different ways.
  27. 27. Fear is your friend

×