Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

NSF Northeast Hub Big Data Workshop

270 views

Published on

Overview of the Health projects

Published in: Health & Medicine
  • Be the first to comment

  • Be the first to like this

NSF Northeast Hub Big Data Workshop

  1. 1. Building a search engine to find environmental and phenotypic factors associated with disease and health Chirag J Patel Northeast Big Data Innovation Hub Workshop 02/24/17 chirag@hms.harvard.edu @chiragjp www.chiragjpgroup.org
  2. 2. P = G + EType 2 Diabetes Cancer Alzheimer’s Gene expression Phenotype Genome Variants Environment Infectious agents Nutrients Pollutants Drugs
  3. 3. We are great at G investigation! over 2400 Genome-wide Association Studies (GWAS) https://www.ebi.ac.uk/gwas/ G
  4. 4. Nothing comparable to elucidate E influence! E: ??? We lack high-throughput methods and data to discover new E in P… until now!
  5. 5. σ2 G σ2P H2 = Heritability (H2) is the range of phenotypic variability attributed to genetic variability in a population Indicator of the proportion of phenotypic differences attributed to G.
  6. 6. Eye color Hair curliness Type-1 diabetes Height Schizophrenia Epilepsy Graves' disease Celiac disease Polycystic ovary syndrome Attention deficit hyperactivity disorder Bipolar disorder Obesity Alzheimer's disease Anorexia nervosa Psoriasis Bone mineral density Menarche, age at Nicotine dependence Sexual orientation Alcoholism Lupus Rheumatoid arthritis Crohn's disease Migraine Thyroid cancer Autism Blood pressure, diastolic Body mass index Depression Coronary artery disease Insomnia Menopause, age at Heart disease Prostate cancer QT interval Breast cancer Ovarian cancer Hangover Stroke Asthma Blood pressure, systolic Hypertension Osteoarthritis Parkinson's disease Longevity Type-2 diabetes Gallstone disease Testicular cancer Cervical cancer Sciatica Bladder cancer Colon cancer Lung cancer Leukemia Stomach cancer 0 25 50 75 100 Heritability: Var(G)/Var(Phenotype) Source: SNPedia.com G estimates for burdensome diseases are low and variable: massive opportunity for E discovery Type 2 Diabetes Heart Disease Asthma
  7. 7. Eye color Hair curliness Type-1 diabetes Height Schizophrenia Epilepsy Graves' disease Celiac disease Polycystic ovary syndrome Attention deficit hyperactivity disorder Bipolar disorder Obesity Alzheimer's disease Anorexia nervosa Psoriasis Bone mineral density Menarche, age at Nicotine dependence Sexual orientation Alcoholism Lupus Rheumatoid arthritis Crohn's disease Migraine Thyroid cancer Autism Blood pressure, diastolic Body mass index Depression Coronary artery disease Insomnia Menopause, age at Heart disease Prostate cancer QT interval Breast cancer Ovarian cancer Hangover Stroke Asthma Blood pressure, systolic Hypertension Osteoarthritis Parkinson's disease Longevity Type-2 diabetes Gallstone disease Testicular cancer Cervical cancer Sciatica Bladder cancer Colon cancer Lung cancer Leukemia Stomach cancer 0 25 50 75 100 Heritability: Var(G)/Var(Phenotype) Source: SNPedia.com G estimates for complex traits are low and variable: massive opportunity for high-throughput E discovery σ2 E : Exposome!
  8. 8. Enhance accessibility of clinical and environmental data, and analytic artificial intelligence tools! How can we drive discovery of environmental factors (E) in disease phenotypes (P)?
  9. 9. Enhance accessibility of large open data and tools to drive discovery of environmental factors (E) in disease phenotypes (P) Greg Cooper, MD, PhD Pittsburgh Vasant Honavar, PhD Penn State Noémie Elhadad, PhD Columbia Chirag Patel, PhD Harvard
  10. 10. Where do we get disease (P) data?
  11. 11. wearable.com
  12. 12. Where do we get disease P data? Health record data from your doctor! • Longitudinal data on millions of patients • diagnoses, prescriptions, lab reports, notes • Sitting there in institutional IT infrastructure • OHDSI provides a unified model to access data across institutions, enhancing the scientific process!
  13. 13. Noémie Elhadad, PhD Columbia
  14. 14. Capitalize on digitalized health record data (from around the world)! High-powered dataset(s) for discovery.
  15. 15. Where do we get environmental (E) data?
  16. 16. Examples of sources of disparate external exposome datasets available in the Exposome Data Warehouse • Geological • NASA - Cloud and Atmosphere Profiles • NOAA Climate Data • Pollution • EPA Air Quality Surveillance Data Mart, or AirData, • Socio-Economic • US Census American Community Survey (ACS) • Epidemiological • CDC Wonder, USDA Food Atlas
  17. 17. A key challenge: mashing up Exposome Data Warehouse with patient data from OHDSI f(location, time) PM2.5 income Pollen count EPA AirData American Community Survey NOAA Climate home zipcode encounter time
  18. 18. 0.50 gini index 0.20 0.1 hh income (K) 100 50 3030 15 Wind 0 individualn … .. .. .. .. 3/5/1998yes 10 E?age no 95376 20 sex M Temp individual2 PM2.5 individual1 75 55 02215 1/1/2016 23 Pb individual3 21 70 F M zip 35 Time(E) 2312/11/2015 0 yes 50 302124 OHDSI EPA NOAA Census millionsofpatients
  19. 19. Will it work? yep!
  20. 20. Does temperature (and weather) influence asthma-related pediatric ER visits? • Children <= 17 y/o with >=1 ICD9 code corresponding to 493.* • N=56K, >84K ER visits • Weather station data • (daily temperature, wind, humidity) • Case-crossover design (only investigated cases) Yeran Li, PhD (MS, HSPH) Chirag Lakhani, PhD (HMS) Yun Wang, PhD (Post-doc, HSPH)
  21. 21. Prevalence of asthma attack varies across the US
  22. 22. Temperature(F) 20 40 60 80 Lag 0 1 2 3 4 5 6 Relativerisk 0.95 1.00 1.05 20 40 60 80 0.91.11.3 Overall temperature effect Mean temperature (F) Relativerisk ● ● ● ● ● ● ● 0 1 2 3 4 5 6 0.901.001.10 Lag effect at T=60F Lag(day) RelativeRisk 20 40 60 80 0.901.001.10 Temperature effect at Lag=0 Mean temperature (F) RelativeRisk Does temperature influence asthma ER visits?: yes! Relative risk of asthma attack by mean temperature
  23. 23. Rates of asthma attacks depend on season?: yes! 0.9 1.0 1.1 20 40 60 80 season Fall Spring Summer Winter
  24. 24. Temperature)effects)vary)at)different)regions) (use)NOAA)defini9on,)ER)kids)) h@ps://www.ncdc.noaa.gov/monitoringEreferences/maps/usEclimateEregions.php) south northeast southeast central southwest wn_central en_central west northwest Region counts by NOAA Numofobs 02000060000 51052 78126 57238 68719 15256 18050 44551 22923 13895 south northeast southeast central southwest wn_central en_central west northwest 0.5 1.0 1.5 2.0 20 40 60 80 20 40 60 80 20 40 60 80 20 40 60 80 20 40 60 80 20 40 60 80 20 40 60 80 20 40 60 80 20 40 60 80 Temperature Risk weather effect: different weather zones by NOAA Rates of asthma attacks dependent on region?: yes!
  25. 25. Does temperature (and weather) influence asthma- related ER visits in kids?: the tip of the iceberg! Lag 0 1 2 20 40 60 80 0.91.11.3 Overall temperature effect Mean temperature (F) Relativerisk ● ● 5 6 0F 20 40 60 80 0.901.001.10 Temperature effect at Lag=0 Mean temperature (F) RelativeRisk What other scientific questions? •what is influence of pollen? •what is the influence of air pollution? •what about adults? Can we replicate the analysis? •different populations •using different data •with different analysts
  26. 26. ExposomeDB Environmental Protection Agency AirData US Census Socioeconomic data National Oceanic and Atmospheric Administration (NOAA) Climate Data Observational Health Data Sciences and Informatics (OHDSI) Harvard Medical School Columbia University P&S Geotemporal database Network of observational data Causal Analytics A2 A3 A1 AX Aim X Analytics Workshop Hands-on workshops for leveraging work in Aims 1-3 Training Resources Jupyter notebooks Student internships at Pitt, Harvard, PSU, and Columbia P&S OHDSI Node A4 University of Pittsburgh Penn State University Integrating the ExposomeData Warehouse with OHDSI and causal modeling tools to drive and demonstrate discovery.
  27. 27. Many hypotheses are possible to address: useful for public health & planning! What is the effect of air pollution levels in disease? Do adverse weather conditions influence hospital use? What pharmaceutical drugs lead to adverse health outcomes? How does socioeconomic context influence hospital use, disease rates, and recovery?
  28. 28. We will harness tools in machine learning extract signal from noise! Greg Cooper, MD, PhD Pittsburgh Vasant Honavar, PhD Penn State SES PM2.5 asthma oF bayesian networks socio-economic case-crossover weather air pollution Sci Rep 2016
  29. 29. http://www.ccd.pitt.edu/tools/
  30. 30. Integrating the ExposomeDB with OHDSI and analytics to drive and demonstrate discovery by the community! (trainees especially welcome) • 2-day hands-on workshop in New York or Boston • remote “exchange” internship program and 2-week immersion • dissemination of electronic training resources
  31. 31. Many hypotheses are possible to address: useful for can we build a machine learning predictor to estimate E? Eye color Hair curliness Type-1 diabetes Height Schizophrenia Epilepsy Graves' disease Celiac disease Polycystic ovary syndrome Attention deficit hyperactivity disorder Bipolar disorder Obesity Alzheimer's disease Anorexia nervosa Psoriasis Bone mineral density Menarche, age at Nicotine dependence Sexual orientation Alcoholism Lupus Rheumatoid arthritis Crohn's disease Migraine Thyroid cancer Autism Blood pressure, diastolic Body mass index Depression Coronary artery disease Insomnia Menopause, age at Heart disease Prostate cancer QT interval Breast cancer Ovarian cancer Hangover Stroke Asthma Blood pressure, systolic Hypertension Osteoarthritis Parkinson's disease Longevity Type-2 diabetes Gallstone disease Testicular cancer Cervical cancer Sciatica Bladder cancer Colon cancer Lung cancer Leukemia Stomach cancer 0 25 50 75 100 Heritability: Var(G)/Var(Phenotype) Source: SNPedia.com σ2 E : Exposome!
  32. 32. Harvard DBMI Isaac Kohane Susanne Churchill Nathan Palmer Sunny Alvear Michal Preminger Chirag J Patel chirag@hms.harvard.edu @chiragjp www.chiragjpgroup.org Thanks RagGroup Chirag Lakhani Yeran Li Shreyas Bhave Rolando Acosta Noémie Elhadad (Columbia) Vasant Honavar (PSU) Greg Cooper (Pitt) George Hripcsak (Columbia) René Baston Katie Naum Kathleen McKeown NIH Common Fund Big Data to Knowledge

×