Managing and Analyzing Global Health Data

769 views

Published on

Published in: Data & Analytics, Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
769
On SlideShare
0
From Embeds
0
Number of Embeds
242
Actions
Shares
0
Downloads
16
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Managing and Analyzing Global Health Data

  1. 1. UNIVERSITY OF WASHINGTON Managing and Analyzing Global Health Data Seattle, August 30, 2011 Peter Speyer, Director of Data Development
  2. 2. IHME Background • Global institute dedicated to providing independent, rigorous, and scientific measurements and evaluations to accelerate progress on global health • Part of the Department of Global Health at the University of Washington • Funded by the Bill & Melinda Gates Foundation and the state of Washington (“core funding”), and other funders through specific research grants • Created in 2007 • 70 researchers, 30 staff 2
  3. 3. IHME Mission Our goal is to improve the health of the world’s populations by providing the best information on population health 3
  4. 4. 4
  5. 5. Health-related data • Social determinants • Risk factors Health data 5 Population-based data • Household/facility surveys • Census • Vital registration • Registries (provider, disease) Facility-based data • Health records • Administrative data (financial, operational) • Research data (DSS, clinical trials, etc.) Individual-based data • Personal health records • “Quantified self” • Disease-based social networks Health Data Innovation Patient engagement Open data Health apps
  6. 6. Key health data challenges 6 Find & access data Dissemi- nate data Use data
  7. 7. Key health data challenges • Lack of transparency • Timeliness of data • Lack of documentation • Access vs. privacy 7 Find & access data Dissemi- nate data Use data
  8. 8. Key health data challenges • Sheer quantity of data files (30TB, 20K+ source datasets, 40M files) • Diverse source data types and formats (pdf, csv, SPSS, CSPro,…) • Data quality issues 8 Find & access data Dissemi- nate data Use data
  9. 9. Key health data challenges • Make results data engaging • Accountability: share results, code, source data • Accommodate diverse audiences (expertise, geographies) 9 Find & access data Dissemi- nate data Use data
  10. 10. Example: Global Burden of Disease Mortality & causes of death • Sources: census, surveys, vital registration, verbal autopsy • Estimates: covariate models, spatial-temporal regressions; weighted combination of models Morbidity • Sources: Literature reviews, surveys, registries, hospital data • Disease modeling: compartmental Bayesian model • Health severity weights Burden of disease • DALYnator 10 300 diseases 40 risk factors 21 regions 1990, 2005, 2010
  11. 11. GBD Country Years, Causes of Death 1950-2009 11
  12. 12. GBD Country Years, Causes of Death 1950-2009 12 Data source Countries Site-years # of Deaths VR 128 4,190 722,267,710 Household surveys 136 2,827 10,132,976 Surveillance systems 12 126 717,698 National VA 21 71 301,855 Subnational VA 59 442 2,606,815 Mortuary registries 6 25 54,316 TOTAL 7,680 735,564,116
  13. 13. Solutions: computing infrastructure • Analysis with statistical packages – Projects with 100K+ lines of code • File system – 60TB disk space – Redundant backup • Cluster with 63 nodes (+300% in 2011), ~2000 cores – Runs 24x7, very little downtime • Virtual environments to test new applications, serve them to collaborators, etc. 13
  14. 14. Solutions: Global Health Data Exchange • Transparency => data catalog • Access => data repository • Information => data community (future) • One record per dataset • Standardized metadata • Internal users (10K records): files on file server • External users (5K records): files for download • CMS: Drupal • Search: SOLR 14 Objectives Approach Implementation
  15. 15. 15
  16. 16. UNIVERSITY OF WASHINGTON Thank you! speyer@uw.edu @peterspeyer www.ghdx.org Peter Speyer Director of Data Development

×