Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Atul Butte NIPS 2017 ML4H

4,317 views

Published on

My talk at NIPS 2017 ML4H on December 8, 2017. I had to remove some of the unpublished slides, but all the rest is here. Enjoy! -- Atul Butte

Published in: Data & Analytics

Atul Butte NIPS 2017 ML4H

  1. 1. Translating a Trillion Points of Data into Diagnostics, Therapies and New Insights in Health and Disease atul.butte@ucsf.edu @atulbutte Atul Butte, MD, PhD Director, Institute for Computational Health Sciences University of California, San Francisco
  2. 2. Conflicts of Interest • Scientific founder and advisory board membership – Genstruct – NuMedii – Personalis – Carmenta • Honoraria for talks – Lilly – Pfizer – Siemens – Bristol Myers Squibb – AstraZeneca – Roche – Genentech – Warburg Pincus • Past or present consultancy – Lilly – Johnson and Johnson – Roche – NuMedii – Genstruct – Tercica – Ecoeos – Helix – Ansh Labs – Prevendia – Samsung – Assay Depot – Regeneron – Verinata – Pathway Diagnostics – Geisinger Health – Covance – Wilson Sonsini Goodrich & Rosati – Orrick – 10X Genomics – Medgenics – GNS Healthcare – Gerson Lehman Group – Coatue Management • Corporate Relationships – Northrop Grumman – Aptalis – Allergan – Astellas – Thomson Reuters – Intel – SAP – SV Angel – Progenity – Illumina • Speakers’ bureau – None • Companies started by students – Carmenta – Serendipity – Stimulomics – NunaHealth – Praedicat – MyTime – Flipora – Tumbl.in
  3. 3. @affymetrix
  4. 4. bit.ly/genedata
  5. 5. The Cancer Genome Atlas • 14 thousand cases • 39 types of cancers • 13 types of data: molecular, clinical, sequencing
  6. 6. 227 million substances x 1.3 million assays More than a billion measurements within a grid of 300 trillion cells 71 million meet Lipinski 5 1.2 million active substances
  7. 7. http://www.nap.edu/catalog.php?record_id=13284
  8. 8. Marina Sirota
  9. 9. Preeclampsia: large cause of maternal and fetal death • Incidence • 5-8% of all pregnancies in the U.S. and worldwide • 4.1 million births in the U.S. in 2009 • Up to 300K cases of preeclampsia annually in the U.S. • Mortality • Responsible for 18% of all maternal deaths in the U.S. • Maternal death in 56 out of every 100,000 live births in US • Neonatal death in 71 out of every 100,000 live births in US • Cost • $20 billion in direct costs in the U.S. annually • Average hospital stay of 3.5 days Linda Liu Bruce Ling Matt Cooper
  10. 10. New blood markers for preeclampsia Linda Liu Bruce Ling Matt Cooper @MarchofDimes bit.ly/preeclamp
  11. 11. Need a diagnostic for preeclampsia Public big data available March of Dimes Center for Prematurity Research Data analyzed, diagnostic designed SPARK grant ($50k) Life Science Angels, other seed investors ($2 million) @CarmentaBio progenity.com bit.ly/carm_prog
  12. 12. @MatthewHerper bit.ly/newdrug1
  13. 13. Cancer Discovery 2013, 3:1. Psychiatric Drug Imipramine Shows Significant Activity Against Small Cell Lung Cancer Vehicle control Imipramine p53/Rb/p130 triple knockout model of SCLC Mice dosed after tumor formation Joel Dudley Nadine Jahchan Julien Sage Alejandro Sweet-Cordero Joel Neal @NuMedii
  14. 14. Bin Chen Wei Wei Li Ma Bin Yang Mei-Sze Chua Samuel So Gastroenterology, 2017
  15. 15. Need more drugs for more diseases Public big data available NIH funding Data analyzed, method designed Company launched, ARRA, StartX, Stanford license, first deal Claremont Creek, Lightspeed ($3.5 million) @NuMedii
  16. 16. The next big open data: clinical trials Download 100+ studies today Drug repositioning, new patient subsets, digital comparative effectiveness, more! immport.org Sanchita Bhattacharya Elizabeth Thomson
  17. 17. 22
  18. 18. Clinical Data Warehouse A Big UC Healthcare Data Analytics Platform Combining healthcare data from across the six University of California medical schools and systems
  19. 19. The next big data: clinical data
  20. 20. ML lessons I’ve learned over 20 years • Get the question right; solve the problems that health care professionals need solved – Solve the problems that health care professionals need solved: Don't just guess • And verify good questions and good unmet needs with more than one doc – Build a great diagnostic vs. understanding the biology • Perfectly lassoed variables may miss the big picture biology – Biologists and medical professionals really love explanations over black boxes • Watch out for input limiting models – Patients might not type in the right codes for their symptoms – They barely enter their own race/ethnicity – And docs? • Learn what IRB, HIPAA, BAA are. Learn what ICD-10 and CPT codes are. CLIA and CAP. – And learn patience. – Not all of us are cloud allergic. • Not everything needs deep learning • Having all data on everyone is super rare: genomics, images, and longitudinal EHR data? • Health care inefficiency is not about friction • Data integration and harmonization can happen if there is a business reason for it • Platforms and their companies are seemingly commoditized – Come to us with more medical knowledge and background. Convince us you care about this vertical. – Show us that we are going to learn more from you, than we are going to have to teach you.
  21. 21. Open challenges • Can’t teach a computer with half the game. Need the start and finish of medical “stories” – In medical care, need primary and tertiary care in the same database – Compound data might be available, but trials data is not – Means data integration and harmonization is a rate limiter • Need the right diversity in data – Otherwise might be extrapolating beyond what was learned – Need enough data, big amounts of data, true-positive cases • Need methods to handle complicated multi-modal data • What does validation mean? – Drug discovery: pre-clinical success? Or Phase 2 success? – Clinical accuracy? Or clinical utility? • Career fear uncertainty and doubt (FUD) in some circles – Tech recruitment – Too many startups?
  22. 22. UC Clinical Data Warehouse Team Executive Team • Atul Butte • Joe Bengfort • Michael Pfeffer • Tom Andriola • Chris Longhurst Steering Committee • Irfan Chaudhry • Mohammed Mahbouba • Lisa Dahm • David Dobbs • Kent Andersen • Ralph James • Jennifer Holland • Eugene Lee ETL Team • Albert Dugan • Tony Choe • Michael Sweeney • Timothy Satterwhite • Ayan Patel • Niranjan Wagle • Ralph James • Joseph Dalton Data Harmonization • Dana Ludwig • Daniella Meeker Data Quality • Momeena Ali • Jodie Nygaard Epic • Kevin Ames • Ben Jenkins • Steve Gesualdo Business Analyst • Ankeeta Shukla Hardware • Sandeep Chandra • Jeff Love • Scott Bailey • Kwong Law • Pallav Saxena Support • Jack Stobo • Michael Blum • Sam Hawgood
  23. 23. Support Admin and Tech Staff • Mary Lyall • Mounira Kenaani • Kevin Kaier • Boris Oskotsky • Mae Moredo • Ada Chen • University of California, San Francisco • Pricilla Chan and Mark Zuckerberg • NIH: NIAID, NLM, NIGMS, NCI, NHLBI, OD; NIDDK, NHGRI, NIA, NCATS • March of Dimes • Juvenile Diabetes Research Foundation • Hewlett Packard • Howard Hughes Medical Institute • California Institute for Regenerative Medicine • Luke Evnin and Deann Wright (Scleroderma Research Foundation) • Clayville Research Fund • PhRMA Foundation • Stanford Cancer Center, Bio-X, SPARK • Tarangini Deshpande • Kimayani Butte • Sam Hawgood and Keith Yamamoto • Isaac Kohane

×