This document provides an overview of techniques for collecting, cleaning, and manipulating data for investigative reporting purposes. It discusses finding and obtaining data from various sources like government databases and agencies. It also covers integrity checks, data cleaning, and evaluating outside studies. Examples are given of reports that were produced using analyzed data from sources like Medicaid and EPA. The document emphasizes being thorough in understanding data and accounting for limitations and errors.
Doug Caruso, assistant metro editor at The Columbus Dispatch, prepared this eight-page handout on producing data-driven enterprise stories off your beat for Columbus, Ohio, NewsTrain on Oct. 21, 2017. It includes why do data journalism, how to get started, finding data, links to several data sets for Ohio, where you can learn more, and spreadsheet basics. It accompanies slides for a presentation of the same name. NewsTrain is a training initiative of Associated Press Media Editors (APME). More info: http://bit.ly/NewsTrain
How do you fit enterprise stories around the many other demands you face to write dailies, file web updates, tweet and shoot video? This session focuses on taking advantage of the plethora of local data available online to spot and develop quick-turnaround stories. Learn how to develop a data state of mind, find newsworthy data and begin to analyze data sets. Spot the enterprise stories in the numbers. Trainer Dave Umhoefer directs the O’Brien Fellowship in Public Service Journalism at Marquette University, where he teaches investigative reporting. @GovWatcher
This handout on data-driven enterprise reporting was prepared by Steve Doig, professor of journalism, specializing in data reporting, at the Walter Cronkite School of Journalism and Mass Communication at Arizona State University. It includes why study data journalism; how to get started; how to find data, including selected data sets to get started; how to avoid errors; and where to get more training. He compiled it for Seattle NewsTrain on Nov. 11, 2017. It accompanies a presentation of the same name. NewsTrain is a training initiative of Associated Press Media Editors (APME). More info; http://bit.ly/NewsTrain
This handout on data-driven enterprise reporting was prepared by Steve Doig, professor of journalism, specializing in data reporting, at the Walter Cronkite School of Journalism and Mass Communication at Arizona State University. It was distributed by Sarah Cohen, Knight Chair specializing in data journalism at Cronkite, at Phoenix NewsTrain on April 6-7, 2018. It includes why study data journalism; how to get started; how to find data, including selected data sets to get started; how to avoid errors; and where to get more training. It accompanies Cohen's presentation, "Data-driven enterprise." NewsTrain is a training initiative of Associated Press Media Editors (APME). More info; http://bit.ly/NewsTrain
Doug Caruso, assistant metro editor at The Columbus Dispatch, prepared this eight-page handout on producing data-driven enterprise stories off your beat for Columbus, Ohio, NewsTrain on Oct. 21, 2017. It includes why do data journalism, how to get started, finding data, links to several data sets for Ohio, where you can learn more, and spreadsheet basics. It accompanies slides for a presentation of the same name. NewsTrain is a training initiative of Associated Press Media Editors (APME). More info: http://bit.ly/NewsTrain
How do you fit enterprise stories around the many other demands you face to write dailies, file web updates, tweet and shoot video? This session focuses on taking advantage of the plethora of local data available online to spot and develop quick-turnaround stories. Learn how to develop a data state of mind, find newsworthy data and begin to analyze data sets. Spot the enterprise stories in the numbers. Trainer Dave Umhoefer directs the O’Brien Fellowship in Public Service Journalism at Marquette University, where he teaches investigative reporting. @GovWatcher
This handout on data-driven enterprise reporting was prepared by Steve Doig, professor of journalism, specializing in data reporting, at the Walter Cronkite School of Journalism and Mass Communication at Arizona State University. It includes why study data journalism; how to get started; how to find data, including selected data sets to get started; how to avoid errors; and where to get more training. He compiled it for Seattle NewsTrain on Nov. 11, 2017. It accompanies a presentation of the same name. NewsTrain is a training initiative of Associated Press Media Editors (APME). More info; http://bit.ly/NewsTrain
This handout on data-driven enterprise reporting was prepared by Steve Doig, professor of journalism, specializing in data reporting, at the Walter Cronkite School of Journalism and Mass Communication at Arizona State University. It was distributed by Sarah Cohen, Knight Chair specializing in data journalism at Cronkite, at Phoenix NewsTrain on April 6-7, 2018. It includes why study data journalism; how to get started; how to find data, including selected data sets to get started; how to avoid errors; and where to get more training. It accompanies Cohen's presentation, "Data-driven enterprise." NewsTrain is a training initiative of Associated Press Media Editors (APME). More info; http://bit.ly/NewsTrain
Jaimi Dowdell presents "Data Journalism for Business Reporting" at the free business journalism workshop, "Be a Better Business Watchdog -- CAR for Business Journalists," hosted by the Donald W. Reynolds National Center for Business Journalists, The Seattle Times and the University of Washington.
A session that challenges professional and student journalists to dig deeper, deliver more accountability and bring an enterprising/investigative mindset to their work. Training will include examples of using records, documents, data and experiments to bring more impactful reporting. No matter what the size of your team, your journalism can go deeper. Bring your laptop for the exercises. No previous data experience is required. Trainer Aaron Mendelson is the data reporter at KPCC, the NPR affiliate in Los Angeles.
Burt Hubbard is a data journalist who has worked with the Rocky Mountain News, Denver Post, Rocky Mountain PBS and 9News on investigative projects and documentaries. His numerous awards include two prestigious Best of The West awards, a national education award for investigative reporting, and Reporter of the Year in Colorado. Burt has taught computer-assisted reporting and internet research to graduate students for 20 years at the University of Colorado.
Jaimi Dowdell, training director for the Investigative Reporters and Editors (IRE), and Mark Horvit, executive director of IRE, offer a guide to using data in business reporting for the free investigative workshop, "Accountability in Indian Country - Be a Better Business Watchdog," on July 18, 2013.
Presented by the Donald W. Reynolds National Center for Business Journalism, this workshop was part of the Native American Journalists Association's annual conference in Phoenix.
For more information about free training for business journalists, please visit businessjournalism.org.
For additional resources on using data to empower your coverage, please visit the training archive page at http://businessjournalism.org/2013/07/17/accountability-in-indian-country-be-a-better-business-watchdog-self-guided-training/.
Page 579Assess the Constituent Data. What is included Omi.docxbunyansaturnina
Page 579
Assess the Constituent Data. What is included? Omitted? What are the data based on?
What assumptions are being made? Different retirement calculators give widely different
estimates of how much savings is needed for retirement because of factors they include or omit
(such as entertainment) and assumptions they make (such as inflation rate or healthiness of
annuities and mutual funds) in the calculations.
Two reputable sources can give different figures because they take their data from different
places. Suppose you wanted to know employment figures. The Labor Department’s monthly
estimate of nonfarm payroll jobs is the most popular, but some economists like Automatic Data
Processing’s monthly estimate, which is based on the roughly 20 million paychecks it processes
for clients. Both survey approximately 400,000 workplaces, but the Labor Department selects
employers to mirror the U.S. economy, while ADP’s sample is skewed, with too many
construction firms and too few of the largest employers. On the other hand, the government has
trouble counting jobs at businesses that are opening and closing, and some employers do not
return the survey. (Both organizations do attempt to adjust their numbers to compensate
accurately.)7
Check the Currency of the Data. Population figures should be from the 2010 census, not
the 2000 one. Technology figures in particular need to be current. Do remember, however, that
some large data sets are one to two years behind in being analyzed. Such is the case for some
government figures, also. If you are doing a report in 2014 that requires national education data
from the Department of Education, for instance, 2013 data may not even be fully collected. And
even the 2012 data may not be fully analyzed, so indeed the 2011 data may be the most current
available.
Hard to Quantify Sports Participation
How many people participate In sports, and which sports do they choose?
Governments and equipment makers want to know, but the data are fuzzy. Multiple
questions contribute to the lack of clarity.
What is a sport? One survey includes bird-watching.
Who should be counted? Do young children count?
How often do you have to participate in a sport to be counted? Is once a year enough?
How was the count made? Because younger and more active people tend to have only cell phones, a survey
made through landlines probably won’t be accurate.
In case you are curious, the National Sporting Goods Association survey says hiking is the most popular
participation sport in the United States, with over 40 million people.
Adapted from Carl Bialik, “Sports Results that Leave Final Score Unclear,” Wall Street Journal, June 9, 2012, A2.
Choosing the Best Data
Sometimes even good sources and authorities can differ on the numbers they offer, or on the
PRINTED BY: SHERIFAT EGBERONGBE <[email protected]>. Printing is for personal, private use only. No part of this book may be reproduced or transmitted witho.
This handout accompanies a presentation, "Data-Driven Enterprise off Any Beat," by Manuel Torres, enterprise editor at The Times-Picayune | Nola.com. It details what data journalism can do for a journalist, how to get started with data journalism, how to find data and how to learn more about data journalism. It also lists links to many data sets by beat. NewsTrain is a training initiative of Associated Press Media Editors: http://bit.ly/NewsTrain
Difference between Crime and DevianceTheories offer an explanation.docxeve2xjazwa
Difference between Crime and Deviance
Theories offer an explanation as to why someone commits a crime or behaves in a certain manner. Your role as a learner is to understand the difference between crime and deviance while keeping some form of objectivity. To have objectivity in the field of criminal justice, one must be able to obtain data that isn't flawed or biased. Gathering official data can be cumbersome and challenging. For example, if you are trying to find out the number of rapes that have occurred in a particular neighborhood, you will rely on information that has been reported to the local law enforcement. If information hasn't been reported, the data can be flawed. Prepare a 4- to 5-page report in a Microsoft Word document analyzing different data collection methods (survey research, participant observation, and official data). In addition, determine the following:
Are these methods successful at capturing the information it's intended to capture?
Which method would you consider has had greater success? Which one has had the least success? Provide specific examples to support your answer.
What are some of the ethical concerns associated with the selected measures?
Should we consider other data collection methods in the field of criminal justice that have rarely been used (such as telephone surveys, online surveys, and so on)? Provide reasons to support your answer.
Support your responses with examples and credible resourses
Cite any sources in APA format.
Must be plagiarism
.
Thanks to computers, it is easier to collect and obtain data for a grant proposal than ever before. There’s so much data, though, that it can difficult to determine which data to present—especially when grant applications only allow concise answers.
This webinar explains which data grantmakers are looking for, and where to present statistics and other information within the grant proposal. Multiple examples from actual, funded grant proposals will show how data solves grant writers’ most knotty problems by creating need statements, formulating project objectives, devising evaluation measures, and more.
Discovering and mapping your community needs - HealthLandscape
Presented at the 2013 Community Connections Pre-Application Workshops for The HealthPath Foundation of Ohio
This presentation explains a predictive data modeling project that segmented Colorado's 3.5 million voters into 11 groups. The segmentation enables campaign decision makers the ability to understand and target voters beyond data that is is available on a Colorado voter file.
Jaimi Dowdell presents "Data Journalism for Business Reporting" at the free business journalism workshop, "Be a Better Business Watchdog -- CAR for Business Journalists," hosted by the Donald W. Reynolds National Center for Business Journalists, The Seattle Times and the University of Washington.
A session that challenges professional and student journalists to dig deeper, deliver more accountability and bring an enterprising/investigative mindset to their work. Training will include examples of using records, documents, data and experiments to bring more impactful reporting. No matter what the size of your team, your journalism can go deeper. Bring your laptop for the exercises. No previous data experience is required. Trainer Aaron Mendelson is the data reporter at KPCC, the NPR affiliate in Los Angeles.
Burt Hubbard is a data journalist who has worked with the Rocky Mountain News, Denver Post, Rocky Mountain PBS and 9News on investigative projects and documentaries. His numerous awards include two prestigious Best of The West awards, a national education award for investigative reporting, and Reporter of the Year in Colorado. Burt has taught computer-assisted reporting and internet research to graduate students for 20 years at the University of Colorado.
Jaimi Dowdell, training director for the Investigative Reporters and Editors (IRE), and Mark Horvit, executive director of IRE, offer a guide to using data in business reporting for the free investigative workshop, "Accountability in Indian Country - Be a Better Business Watchdog," on July 18, 2013.
Presented by the Donald W. Reynolds National Center for Business Journalism, this workshop was part of the Native American Journalists Association's annual conference in Phoenix.
For more information about free training for business journalists, please visit businessjournalism.org.
For additional resources on using data to empower your coverage, please visit the training archive page at http://businessjournalism.org/2013/07/17/accountability-in-indian-country-be-a-better-business-watchdog-self-guided-training/.
Page 579Assess the Constituent Data. What is included Omi.docxbunyansaturnina
Page 579
Assess the Constituent Data. What is included? Omitted? What are the data based on?
What assumptions are being made? Different retirement calculators give widely different
estimates of how much savings is needed for retirement because of factors they include or omit
(such as entertainment) and assumptions they make (such as inflation rate or healthiness of
annuities and mutual funds) in the calculations.
Two reputable sources can give different figures because they take their data from different
places. Suppose you wanted to know employment figures. The Labor Department’s monthly
estimate of nonfarm payroll jobs is the most popular, but some economists like Automatic Data
Processing’s monthly estimate, which is based on the roughly 20 million paychecks it processes
for clients. Both survey approximately 400,000 workplaces, but the Labor Department selects
employers to mirror the U.S. economy, while ADP’s sample is skewed, with too many
construction firms and too few of the largest employers. On the other hand, the government has
trouble counting jobs at businesses that are opening and closing, and some employers do not
return the survey. (Both organizations do attempt to adjust their numbers to compensate
accurately.)7
Check the Currency of the Data. Population figures should be from the 2010 census, not
the 2000 one. Technology figures in particular need to be current. Do remember, however, that
some large data sets are one to two years behind in being analyzed. Such is the case for some
government figures, also. If you are doing a report in 2014 that requires national education data
from the Department of Education, for instance, 2013 data may not even be fully collected. And
even the 2012 data may not be fully analyzed, so indeed the 2011 data may be the most current
available.
Hard to Quantify Sports Participation
How many people participate In sports, and which sports do they choose?
Governments and equipment makers want to know, but the data are fuzzy. Multiple
questions contribute to the lack of clarity.
What is a sport? One survey includes bird-watching.
Who should be counted? Do young children count?
How often do you have to participate in a sport to be counted? Is once a year enough?
How was the count made? Because younger and more active people tend to have only cell phones, a survey
made through landlines probably won’t be accurate.
In case you are curious, the National Sporting Goods Association survey says hiking is the most popular
participation sport in the United States, with over 40 million people.
Adapted from Carl Bialik, “Sports Results that Leave Final Score Unclear,” Wall Street Journal, June 9, 2012, A2.
Choosing the Best Data
Sometimes even good sources and authorities can differ on the numbers they offer, or on the
PRINTED BY: SHERIFAT EGBERONGBE <[email protected]>. Printing is for personal, private use only. No part of this book may be reproduced or transmitted witho.
This handout accompanies a presentation, "Data-Driven Enterprise off Any Beat," by Manuel Torres, enterprise editor at The Times-Picayune | Nola.com. It details what data journalism can do for a journalist, how to get started with data journalism, how to find data and how to learn more about data journalism. It also lists links to many data sets by beat. NewsTrain is a training initiative of Associated Press Media Editors: http://bit.ly/NewsTrain
Difference between Crime and DevianceTheories offer an explanation.docxeve2xjazwa
Difference between Crime and Deviance
Theories offer an explanation as to why someone commits a crime or behaves in a certain manner. Your role as a learner is to understand the difference between crime and deviance while keeping some form of objectivity. To have objectivity in the field of criminal justice, one must be able to obtain data that isn't flawed or biased. Gathering official data can be cumbersome and challenging. For example, if you are trying to find out the number of rapes that have occurred in a particular neighborhood, you will rely on information that has been reported to the local law enforcement. If information hasn't been reported, the data can be flawed. Prepare a 4- to 5-page report in a Microsoft Word document analyzing different data collection methods (survey research, participant observation, and official data). In addition, determine the following:
Are these methods successful at capturing the information it's intended to capture?
Which method would you consider has had greater success? Which one has had the least success? Provide specific examples to support your answer.
What are some of the ethical concerns associated with the selected measures?
Should we consider other data collection methods in the field of criminal justice that have rarely been used (such as telephone surveys, online surveys, and so on)? Provide reasons to support your answer.
Support your responses with examples and credible resourses
Cite any sources in APA format.
Must be plagiarism
.
Thanks to computers, it is easier to collect and obtain data for a grant proposal than ever before. There’s so much data, though, that it can difficult to determine which data to present—especially when grant applications only allow concise answers.
This webinar explains which data grantmakers are looking for, and where to present statistics and other information within the grant proposal. Multiple examples from actual, funded grant proposals will show how data solves grant writers’ most knotty problems by creating need statements, formulating project objectives, devising evaluation measures, and more.
Discovering and mapping your community needs - HealthLandscape
Presented at the 2013 Community Connections Pre-Application Workshops for The HealthPath Foundation of Ohio
This presentation explains a predictive data modeling project that segmented Colorado's 3.5 million voters into 11 groups. The segmentation enables campaign decision makers the ability to understand and target voters beyond data that is is available on a Colorado voter file.
JMeter webinar - integration with InfluxDB and GrafanaRTTS
Watch this recorded webinar about real-time monitoring of application performance. See how to integrate Apache JMeter, the open-source leader in performance testing, with InfluxDB, the open-source time-series database, and Grafana, the open-source analytics and visualization application.
In this webinar, we will review the benefits of leveraging InfluxDB and Grafana when executing load tests and demonstrate how these tools are used to visualize performance metrics.
Length: 30 minutes
Session Overview
-------------------------------------------
During this webinar, we will cover the following topics while demonstrating the integrations of JMeter, InfluxDB and Grafana:
- What out-of-the-box solutions are available for real-time monitoring JMeter tests?
- What are the benefits of integrating InfluxDB and Grafana into the load testing stack?
- Which features are provided by Grafana?
- Demonstration of InfluxDB and Grafana using a practice web application
To view the webinar recording, go to:
https://www.rttsweb.com/jmeter-integration-webinar
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...UiPathCommunity
💥 Speed, accuracy, and scaling – discover the superpowers of GenAI in action with UiPath Document Understanding and Communications Mining™:
See how to accelerate model training and optimize model performance with active learning
Learn about the latest enhancements to out-of-the-box document processing – with little to no training required
Get an exclusive demo of the new family of UiPath LLMs – GenAI models specialized for processing different types of documents and messages
This is a hands-on session specifically designed for automation developers and AI enthusiasts seeking to enhance their knowledge in leveraging the latest intelligent document processing capabilities offered by UiPath.
Speakers:
👨🏫 Andras Palfi, Senior Product Manager, UiPath
👩🏫 Lenka Dulovicova, Product Program Manager, UiPath
Connector Corner: Automate dynamic content and events by pushing a buttonDianaGray10
Here is something new! In our next Connector Corner webinar, we will demonstrate how you can use a single workflow to:
Create a campaign using Mailchimp with merge tags/fields
Send an interactive Slack channel message (using buttons)
Have the message received by managers and peers along with a test email for review
But there’s more:
In a second workflow supporting the same use case, you’ll see:
Your campaign sent to target colleagues for approval
If the “Approve” button is clicked, a Jira/Zendesk ticket is created for the marketing design team
But—if the “Reject” button is pushed, colleagues will be alerted via Slack message
Join us to learn more about this new, human-in-the-loop capability, brought to you by Integration Service connectors.
And...
Speakers:
Akshay Agnihotri, Product Manager
Charlie Greenberg, Host
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf91mobiles
91mobiles recently conducted a Smart TV Buyer Insights Survey in which we asked over 3,000 respondents about the TV they own, aspects they look at on a new TV, and their TV buying preferences.
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Tobias Schneck
As AI technology is pushing into IT I was wondering myself, as an “infrastructure container kubernetes guy”, how get this fancy AI technology get managed from an infrastructure operational view? Is it possible to apply our lovely cloud native principals as well? What benefit’s both technologies could bring to each other?
Let me take this questions and provide you a short journey through existing deployment models and use cases for AI software. On practical examples, we discuss what cloud/on-premise strategy we may need for applying it to our own infrastructure to get it to work from an enterprise perspective. I want to give an overview about infrastructure requirements and technologies, what could be beneficial or limiting your AI use cases in an enterprise environment. An interactive Demo will give you some insides, what approaches I got already working for real.
UiPath Test Automation using UiPath Test Suite series, part 3DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 3. In this session, we will cover desktop automation along with UI automation.
Topics covered:
UI automation Introduction,
UI automation Sample
Desktop automation flow
Pradeep Chinnala, Senior Consultant Automation Developer @WonderBotz and UiPath MVP
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
Essentials of Automations: Optimizing FME Workflows with ParametersSafe Software
Are you looking to streamline your workflows and boost your projects’ efficiency? Do you find yourself searching for ways to add flexibility and control over your FME workflows? If so, you’re in the right place.
Join us for an insightful dive into the world of FME parameters, a critical element in optimizing workflow efficiency. This webinar marks the beginning of our three-part “Essentials of Automation” series. This first webinar is designed to equip you with the knowledge and skills to utilize parameters effectively: enhancing the flexibility, maintainability, and user control of your FME projects.
Here’s what you’ll gain:
- Essentials of FME Parameters: Understand the pivotal role of parameters, including Reader/Writer, Transformer, User, and FME Flow categories. Discover how they are the key to unlocking automation and optimization within your workflows.
- Practical Applications in FME Form: Delve into key user parameter types including choice, connections, and file URLs. Allow users to control how a workflow runs, making your workflows more reusable. Learn to import values and deliver the best user experience for your workflows while enhancing accuracy.
- Optimization Strategies in FME Flow: Explore the creation and strategic deployment of parameters in FME Flow, including the use of deployment and geometry parameters, to maximize workflow efficiency.
- Pro Tips for Success: Gain insights on parameterizing connections and leveraging new features like Conditional Visibility for clarity and simplicity.
We’ll wrap up with a glimpse into future webinars, followed by a Q&A session to address your specific questions surrounding this topic.
Don’t miss this opportunity to elevate your FME expertise and drive your projects to new heights of efficiency.
GraphRAG is All You need? LLM & Knowledge GraphGuy Korland
Guy Korland, CEO and Co-founder of FalkorDB, will review two articles on the integration of language models with knowledge graphs.
1. Unifying Large Language Models and Knowledge Graphs: A Roadmap.
https://arxiv.org/abs/2306.08302
2. Microsoft Research's GraphRAG paper and a review paper on various uses of knowledge graphs:
https://www.microsoft.com/en-us/research/blog/graphrag-unlocking-llm-discovery-on-narrative-private-data/
Neuro-symbolic is not enough, we need neuro-*semantic*Frank van Harmelen
Neuro-symbolic (NeSy) AI is on the rise. However, simply machine learning on just any symbolic structure is not sufficient to really harvest the gains of NeSy. These will only be gained when the symbolic structures have an actual semantics. I give an operational definition of semantics as “predictable inference”.
All of this illustrated with link prediction over knowledge graphs, but the argument is general.
Securing your Kubernetes cluster_ a step-by-step guide to success !KatiaHIMEUR1
Today, after several years of existence, an extremely active community and an ultra-dynamic ecosystem, Kubernetes has established itself as the de facto standard in container orchestration. Thanks to a wide range of managed services, it has never been so easy to set up a ready-to-use Kubernetes cluster.
However, this ease of use means that the subject of security in Kubernetes is often left for later, or even neglected. This exposes companies to significant risks.
In this talk, I'll show you step-by-step how to secure your Kubernetes cluster for greater peace of mind and reliability.
9. Source: California
Health Dept.
data, Medicare billing
data
Findings: Some
hospitals had
“alarming rates of a
Third World nutritional
disorder among its
Medicare patients.”
10.
11. Why data?
Contrasts are in the data
Your most powerful figures are in the data
You can make connections you might not be
able to make otherwise
14. Why data?
Contrasts are in the data
Your most powerful figures are in the data
You can make connections you might not be
able to make otherwise
You can test assumptions
23. Source: Medicaid nursing home
survey data and finance
data, housing data
Findings: “…a shortage of places
for the disabled to live outside a
nursing home and regulations
that critics say make it hard to
qualify for home services mean
many who want out continue to
receive expensive nursing care.”
24. Where’s the data?
Sometimes you have to scrape it.
That usually involves programs
that automate searching tasks on
Web sites.
25.
26. Where’s the data?
More often you need to go to an
agency to get the data
This can be tricky if an agency
doesn’t want to release it. (Stay
tuned for more on that…)
27. Source: School district
credit card purchases
Findings: District card
holders made
questionable
purchases with their
cards.
28.
29.
30. Sometimes, there is no data.
But it’s okay because there are
techniques for sampling and building
a database.
31. ProPublica pulled a random
sample of 500 names from a
list of individuals who had
been granted or denied
pardons (around 2,000). We
created a database from
months or researching
individuals: their crime, age,
sentence…
We found that even after
controlling for other factors,
whites were more likely to get
a pardon.
32.
33. Source: Loan details,
foreclosure information and
bankruptcy filings
Findings: Loans leading to
foreclosure didn’t always
follow conventional wisdom
34. When you have to ask for the data
Before filing a request: Ask for it
If they require a formal request, find
out who it should go to and what you
should ask for
Letter should describe what you’re
asking for
Note that you’re willing to
negotiate
Ask for a cost estimate
35. Dear Records Administrator:
I’m writing to request under the Texas Public Information Act an electronic copy of the current health-
related services registry database for the state of Texas. I also am requesting electronic copies or a
database of all complaints filed against health-related service registry members since Jan. 1, 2000.
I frequently deal with large raw databases, so I would be able to accept information in several formats
including ASCII, dbf, xls, etc… and can accept the data on a variety of media (computer tape, CD-
ROM, FTP, email attachment, etc...). Please include record layouts, code sheets or any other
documentation necessary to interpret the data.
I am requesting all data fields. If there are any fields that you must withhold by law, please let me
know what those fields are, so I can amend my request.
In the interest of expediency, and to minimize the research and/or duplication burden on your staff, I
would be happy to speak with your database administrator to figure out a method that is easiest for
you.
If you have questions or need more information, please contact me by telephone or email. My
telephone number is: 214-977-8509. My email address is jlafleur@dallasnews.com.
If you will be charging processing fees, please send me an itemized estimate explaining how the
costs were calculated.
36. Getting electronic information
Know the law. Know how your state treats (or
doesn’t) the records you need.
Know what information you want.
Do your homework
Know what the appropriate cost should be.
Know who does the data entry.
Get to know Leon
When something may not clearly be public use
your sourcing
37. Just another way of saying no
Huge costs
Delay tactics
“Oh you silly little journalist”
Sending you the wrong thing
“Your request was unclear”
HIPAA
Privacy
Privatization
45. We have processed your request. The
labor cost for the request is as
follows.
Item # of hours
RESEARCH 20
CREATING FILES 6
CODING 24
TESTING 4
Total (54 X$72) = $3,888.00
46. From Texas Public Information Act:
111.67. Estimates and Waivers of Public Information Charges
(a) A governmental body is required to provide a requestor
with an itemized statement of estimated charges if charges for
copies of public information will exceed $40, or if a charge in
accordance with §111.65 of this title (relating to Access to
Information Where Copies Are Not Requested) will exceed
$40 for making public information available for inspection. A
governmental body that fails to provide the required
statement may not collect more than $40. The itemized
statement must be provided free of charge and must contain the
following information:
61. It doesn’t mean you can’t use it…
Do integrity checks to find the flaws
Add caveats where necessary
Do your own analysis rather than relying on an
agency’s analysis of bad data
62. Integrity checks for every data set
Read the documentation. Understand the
contents of every field.
Know how many records you should have.
Check counts and totals against reports.
Are all possibilities included? All states, all
counties, correct ranges?
63. Integrity checks for every data set
Internal data checks:
Is there more money going to sub-contractors than went to
the prime contractor?
Are there more teachers than students?
Do people have birth dates in the future or so long ago they
would be long gone?
64.
65.
66. If your data is in
Excel, use the filter
function to see what
the values are in
individual fields.
67. Integrity checks for every data set
Check for missing data, misplaced data or blank
fields
Use a standard naming convention for files and
tables (I wouldn’t recommend “final”)
Check for duplicates
Take margins of error into account if necessary
(important if you’re using Census data).
70. Beyond the basics
Keep a notes file
Don’t work off your original database
Know the source
Check against summary reports
Use the right tool
Check for outliers when it comes to ups and
downs
72. Beyond the basics
Check with experts
Are there standards? (ex: a drop by more than
10 perc pts is a red flag)
Find out what others have done
Gut check
Go physically see a record or spot check
against documents
73. Voter Fraud
Dozens of St. Louis voters are being wrongly accused
of casting ballots from fraudulent addresses in last
year's Nov. 7 election.
They are among thousands of registered voters who,
based on city property records, appear to live on
vacant lots.
74.
75. Texas test score data official
results versus district
Duncanville district reported
4th grade writing
Official report for Duncanville
4th grade writing
Courtesy Holly Hacker, The Dallas Morning News
76. Three rounds of analysis
after bouncing off subjects
and experts
Demographically based
Voir dire
Socioeconomics
77. Checks when you’re matching data
A name is not enough. Lots of people have the same name
Get dates of birth and
other information to
make sure you have
the correct person.
78. Source: Illinois health data, police data
Findings: Dangerous systemic failed to protect elderly patients in
Illinois nursing homes that also house mentally ill younger residents,
including murderers, sex offenders, and armed robbers.
80. Evaluating outside studies
Get the questionnaire and methodology
Beware of nonscientific methods: Web surveys,
man on the street
Know the sample size..sampling error
Account for margin of error and non-response
when drawing conclusions
Run statistical tests on the data if possible
81.
82. Reporting data
Consider reporting rates not raw numbers
Avoid false precision: 53.14 percent said … in a poll
with a 5 percentage point margin of error
Avoid number overload. About half is usually just as
useful as 51 percent in most cases
Adjust money for inflation
When analyzing income, use median rather than
average (Bill Gates factor)
83. When the data is the problem – you might still
have a story
Erroneous government databases – can often
be a story themselves
87. Know which tool to use
• Reporting individual records
• Counting/summing
• Mapping
• Statistics
88. Source: Medicaid
outcomes data for
dialysis facilities
Findings: A CMS
online tool did not
tell the whole story
about facilities. In
some counties the
gap in
measures, such as
survival rate were
vast.
89.
90. Source: Washington Health Department data
Findings: “MRSA has been quietly killing in hospitals for decades.” But no
one had tracked it until this story.
91. Source: Dept. of Ed data and surveys of campus crisis clinics
Findings: Many campuses had lax enforcement and reporting loop holes
mean problems go unchecked.
92.
93. Source: EPA and state data on hazardous chemical locations
Findings: Dallas County has 900+ sites that store hazardous chemicals
94. Source: Dam
inspection data
from Texas and
federal government
Findings: Dam
records had not
been updated to
account for
population growth
95. Source: 311 calls for downed trees
Findings: After a tornado swept across New York City, 311
calls for downed trees helps trace its path
100. Disparities in water
usage
“Water use highest in
poor areas of the city”
Mapping and statistical
analysis
101. Presenting the data
Include a methodology explaining what you did and
what you don’t know.
For really complicated analyses – consider a super
nerdy white paper explaining all of your findings
If you make data downloadable – include field
descriptions and anything users should watch for