34. More often you need to go to an
agency to get the data
This can be tricky if an agency
doesn’t want to release it. (Stay
tuned for more on that…)
Where’s the data?
36. SOURCE: Local health department
inspection reports
FINDINGS: At 28% of the venues,
more than half of the concession
stands or restaurants had been cited
for at least one "critical" or "major"
health violation.
37.
38. Request records and data early
Get out and talk to real people
Keep track of your work and stay
organized
Understand the process of what
you’re covering
Tips for digging
39. Students are getting sick from eating
in the student center cafeteria.
• Who inspects the cafeteria?
• Has it has problems in the past?
• When/what did the students eat?
Did any of them file complaints?
Understand the process
40. Request records and data early
Get out and talk to real people
Keep track of your work and stay
organized
Understand the process of what
you’re covering
Confirm and corroborate
Make it something worth reading,
listening to, watching
Tips for digging
41.
42. Sometimes, there is no data.
But it’s okay because there are
techniques for sampling and building
a database.
43.
44.
45. ProPublica pulled a random
sample of 500 names from a
list of individuals who had
been granted or denied
pardons (around 2,000). We
created a database from
months or researching
individuals: their crime, age,
sentence…
We found that even after
controlling for other factors,
whites were more likely to get
a pardon.
46.
47.
48. Stories don’t end at the
records. We must find people
to tell the stories
49.
50. Source: School district
credit card purchases
Findings: District card
holders made
questionable
purchases with their
cards.
53. Source: 311 calls for downed trees
Findings: After a tornado swept across New York City, 311
calls for downed trees helps trace its path
54.
55.
56.
57. Bulletproof your data
Before ever reporting data or building an app
Do integrity checks to find the flaws
Add caveats where necessary
Do your own analysis rather than relying on an
agency’s analysis
58. External checks
Read the documentation. Understand the
contents of every field.
Know how many records you should
have.
Check counts and totals against reports.
Are all possibilities included?
59. Internal checks
Compare fields to check for red flags
• More teachers than students
• More money going to vendors
than to contractors
• What things just don’t make
sense
63. Integrity checks for every data set
Check for missing data, misplaced data or blank
fields
Check for duplicates
64. Integrity checks for every data set
Check for missing data, misplaced data or blank
fields
Check for duplicates
Check for outliers and extreme ups and downs
66. Beyond the basics
Keep a notes file/git
Don’t work off your original data/documents
Know the source
Check against summary reports
67. Beyond the basics
Keep a notes file
Don’t work off your original database
Know the source
Check against summary reports
Use the right tool
68.
69. Beyond the basics
Check with experts
Know the standards
Find out what others have done
Gut check – does it just seem wrong?
70. Beyond the basics
Check with experts
Know the standards
Find out what others have done
Gut check
Go physically see a record or spot check against
documents
71.
72. Checks when you’re matching data
A name is not enough. Lots of people have the same name
Get dates of birth and
other information to
make sure you have
the correct person.