Ask a data question at lunch
answer it in the afternoon.
The Vision
Build a Health Data
Infrastructure that shares
every public file everywhere.
The Idea
Eric Busboom & Michael Samuel, San Diego Regional Data Library 
I want to start not with an idea but a vision, that anyone in the health fields who has basic skills with Excel and the internet can < quote
above>. If you’ve tried to answer simple questions, such as “What is the vaccination rate by community, and what are the correlates with low
rates?” know that if the data or a report isn’t already on your desk, this question can be difficult or impossible to answer without a lot of effort
or professional help.
1 Health Data Ideathon Presentation.key - June 10, 2014
•Every public health file.
•From every CA County and
Nonprofit, State & National
•On every county’s website
and every analysts desktop.
•Cleaned and prepared, right
format
A lot of data
From a lot of sources
In a familiar place
Ready to analyze
To the question of what data? All of it. We really want to incorporate every public health data file in the state. And,
we’ll make it useable, be ensuring the data is converted to the file formats people most need, in the formats they use,
such as CSV, STATA, SAS, or SQL databases. That data, in the right formats can be sent directory to analysts websites,
or even desktops, they places they already know where to get files.
2 Health Data Ideathon Presentation.key - June 10, 2014
Specialization
Standard Packages
Mass Distribution
How?: Industrial Process For Data
Deatils?: Ambry, Business Model
Ambry.io, Data Management
Convert Dataset, $100 per
$1000 / Mo / County, 25 Counties
This is an audacious goal, but it is based on sound principles, and it already works. We’ve developed an industrial
process for data, with an Open Source data management system, Ambry.io. Using ambry, we can build a business
around deploying public data, at a break-even cost of about $1000 per month per California county.
3 Health Data Ideathon Presentation.key - June 10, 2014
Database
Inputs Library Repos Use
Data 

Wranglers
Sys Admins
Data 	

Analysts
Our system involves the distributed conversion of datasets, at a cost of about $100 per dataset, by data wrangers,
who have common programming skills. An organization’s systems administrators can select datasets from a library of
all sets, so analysts only have to see the files they use most, and those data are already cleaned and ready to use.
4 Health Data Ideathon Presentation.key - June 10, 2014
•Data becomes comprehensible,
comparable.
•More analysis, less talking.
Coordinate without
communication.
•Syndicate kidsdata.org, 50% the
cost.
•Already exists, and works.
sandiegodata.org/hdi & ambry.io
This health data infrastructure makes data easier to find, understand and use, allowing counties to share data with
less effort and communication. It makes it easy to send data to analysts but also databases, so data driven indicator
sites can be built much less expensively. Best of all, the system is already working, and is ready to be tested in a pilot
project.
5 Health Data Ideathon Presentation.key - June 10, 2014

San Diego Data Library HCDS Idea-thon

  • 1.
    Ask a dataquestion at lunch answer it in the afternoon. The Vision Build a Health Data Infrastructure that shares every public file everywhere. The Idea Eric Busboom & Michael Samuel, San Diego Regional Data Library I want to start not with an idea but a vision, that anyone in the health fields who has basic skills with Excel and the internet can < quote above>. If you’ve tried to answer simple questions, such as “What is the vaccination rate by community, and what are the correlates with low rates?” know that if the data or a report isn’t already on your desk, this question can be difficult or impossible to answer without a lot of effort or professional help. 1 Health Data Ideathon Presentation.key - June 10, 2014
  • 2.
    •Every public healthfile. •From every CA County and Nonprofit, State & National •On every county’s website and every analysts desktop. •Cleaned and prepared, right format A lot of data From a lot of sources In a familiar place Ready to analyze To the question of what data? All of it. We really want to incorporate every public health data file in the state. And, we’ll make it useable, be ensuring the data is converted to the file formats people most need, in the formats they use, such as CSV, STATA, SAS, or SQL databases. That data, in the right formats can be sent directory to analysts websites, or even desktops, they places they already know where to get files. 2 Health Data Ideathon Presentation.key - June 10, 2014
  • 3.
    Specialization Standard Packages Mass Distribution How?:Industrial Process For Data Deatils?: Ambry, Business Model Ambry.io, Data Management Convert Dataset, $100 per $1000 / Mo / County, 25 Counties This is an audacious goal, but it is based on sound principles, and it already works. We’ve developed an industrial process for data, with an Open Source data management system, Ambry.io. Using ambry, we can build a business around deploying public data, at a break-even cost of about $1000 per month per California county. 3 Health Data Ideathon Presentation.key - June 10, 2014
  • 4.
    Database Inputs Library ReposUse Data 
 Wranglers Sys Admins Data Analysts Our system involves the distributed conversion of datasets, at a cost of about $100 per dataset, by data wrangers, who have common programming skills. An organization’s systems administrators can select datasets from a library of all sets, so analysts only have to see the files they use most, and those data are already cleaned and ready to use. 4 Health Data Ideathon Presentation.key - June 10, 2014
  • 5.
    •Data becomes comprehensible, comparable. •Moreanalysis, less talking. Coordinate without communication. •Syndicate kidsdata.org, 50% the cost. •Already exists, and works. sandiegodata.org/hdi & ambry.io This health data infrastructure makes data easier to find, understand and use, allowing counties to share data with less effort and communication. It makes it easy to send data to analysts but also databases, so data driven indicator sites can be built much less expensively. Best of all, the system is already working, and is ready to be tested in a pilot project. 5 Health Data Ideathon Presentation.key - June 10, 2014