Ask a data question at lunch
answer it in the afternoon.
Build a Health Data
Infrastructure that shares
every public ﬁle everywhere.
Eric Busboom & Michael Samuel, San Diego Regional Data Library
I want to start not with an idea but a vision, that anyone in the health ﬁelds who has basic skills with Excel and the internet can < quote
above>. If you’ve tried to answer simple questions, such as “What is the vaccination rate by community, and what are the correlates with low
rates?” know that if the data or a report isn’t already on your desk, this question can be difficult or impossible to answer without a lot of effort
or professional help.
1 Health Data Ideathon Presentation.key - June 10, 2014
•Every public health file.
•From every CA County and
Nonprofit, State & National
•On every county’s website
and every analysts desktop.
•Cleaned and prepared, right
A lot of data
From a lot of sources
In a familiar place
Ready to analyze
To the question of what data? All of it. We really want to incorporate every public health data ﬁle in the state. And,
we’ll make it useable, be ensuring the data is converted to the ﬁle formats people most need, in the formats they use,
such as CSV, STATA, SAS, or SQL databases. That data, in the right formats can be sent directory to analysts websites,
or even desktops, they places they already know where to get ﬁles.
2 Health Data Ideathon Presentation.key - June 10, 2014
How?: Industrial Process For Data
Deatils?: Ambry, Business Model
Ambry.io, Data Management
Convert Dataset, $100 per
$1000 / Mo / County, 25 Counties
This is an audacious goal, but it is based on sound principles, and it already works. We’ve developed an industrial
process for data, with an Open Source data management system, Ambry.io. Using ambry, we can build a business
around deploying public data, at a break-even cost of about $1000 per month per California county.
3 Health Data Ideathon Presentation.key - June 10, 2014
Inputs Library Repos Use
Our system involves the distributed conversion of datasets, at a cost of about $100 per dataset, by data wrangers,
who have common programming skills. An organization’s systems administrators can select datasets from a library of
all sets, so analysts only have to see the ﬁles they use most, and those data are already cleaned and ready to use.
4 Health Data Ideathon Presentation.key - June 10, 2014
•Data becomes comprehensible,
•More analysis, less talking.
•Syndicate kidsdata.org, 50% the
•Already exists, and works.
sandiegodata.org/hdi & ambry.io
This health data infrastructure makes data easier to ﬁnd, understand and use, allowing counties to share data with
less effort and communication. It makes it easy to send data to analysts but also databases, so data driven indicator
sites can be built much less expensively. Best of all, the system is already working, and is ready to be tested in a pilot
5 Health Data Ideathon Presentation.key - June 10, 2014