2015-04-28 Atul Butte's presentation to the NIH Precision Medicine Initiative working group meeting
1. Precision Medicine Workshop:
Big Data Aspects
Atul Butte, MD, PhD
Director, Institute for
Computational Health Science
University of California, San Francisco
atul.butte@ucsf.edu
@atulbutte
@ImmPortDB
2. Disclosures
• Scientific founder and
advisory board membership
– Genstruct
– NuMedii
– Personalis
– Carmenta
• Honoraria for talks
– Lilly
– Pfizer
– Siemens
– Bristol Myers Squibb
– AstraZeneca
– Roche
– Genentech
– Warburg Pincus
• Past or present consultancy
– Lilly
– Johnson and Johnson
– Roche
– NuMedii
– Genstruct
– Tercica
– Ecoeos
– Ansh Labs
– Prevendia
– Samsung
– Assay Depot
– Regeneron
– Verinata
– Pathway Diagnostics
– Geisinger Health
– Covance
– Wilson Sonsini Goodrich & Rosati
– 10X Genomics
– Medgenics
– GNS Healthcare
– Gerson Lehman Group
– Coatue Management
• Corporate Relationships
– Northrop Grumman
– Aptalis
– Thomson Reuters
– Intel
– SAP
– SV Angel
• Speakers’ bureau
– None
• Companies started by students
– Carmenta
– Serendipity
– NuMedii
– Stimulomics
– NunaHealth
– Praedicat
– MyTime
– Flipora
5. 1. Major potential for disparities
• Will you capture any from the 2.2 million
incarcerated? Nearly half black?
• The 43 million over age 65? Only 16%
over age 65 with income under $50k
have a smartphone.
• The 14 million disabled?
• The 4 million just born last year?
• The 2.6 million that died last year?
6. 2. Start with a million,
or end with a million?
• Keeping it sticky and useful?
7. 3. Active participants
• If data returned to participants, will
they alter their behavior and
exposures?
• Can we tell they are doing this?
8. 4. Not enough power
• So must think early about
downstream validation studies.
• Leave one sub-cohort out cross-
validation?
• Or are you testing whether every
individual gets something out of the
approach?
10. 6. Exploit the network effect
• Connect cohort and data to others, to gain synergy
• Need methods to connect data sets,
keep confidentiality
• Not just academic cohorts, also pharma trials?
• Maybe recruit at the end of a trial, and gather
starting data from pharma and contract research
organizations.
• Maybe start from the 35 million discharged from
a hospitalized last year?
• Maybe work with Quest and LabCorp to get
existing lab data on patients
11. 7. Success of the effort depends
on 3rd party usage
• Needs to be easy to access and understand data
without you.
• Easy to build useful tools and mashup data.
• Shouldn’t have to hire an insider or expert to
understand the data.
• Of course, the cloud and all modern commercial
tools and services should be allowed.
• Put real money into dissemination, do not
assume this will happen correctly.
• Beyond data sharing agreements
• Difference between Genome and ENCODE
12. 8. Perfection is the enemy
of the good
• Perfection delays data release.
• You won’t always make the right choices.
• Keep simple things simple (e.g. API), but
complex things possible (e.g.
downloading).
• Let others in, access, and build tools,
alternative representations.
13. 9. Data gets stale
• 1500 papers at Nucleic Acids Research on
open databases!
• Even reference data sets get stale.
• Will soon be a struggle to get eyes on this
data set.
• Shelf life from technologies, from
measurements. Freshen data.
• Framingham Health Study has great data
on dbGAP. Why aren’t you using it now?
14.
15. In August, I unveiled the Cancer Genome
Anatomy Project -- the comprehensive
clearinghouse of information about tens of
thousands of cancer genes, which will enable
scientists and researchers around the world to
work together through a website available on
the Internet, to bring us closer a cure.
-- Al Gore, 1998
16.
17. 9. Data gets stale
• 1500 papers at Nucleic Acids Research on
open databases!
• Even reference data sets get stale.
• Will soon be a struggle to get eyes on this
data set.
• Shelf life from technologies, from
measurements. Freshen data.
• Framingham Health Study has great data
on dbGAP. Why aren’t you using it now?
18. 10. Leave some interesting
questions open for others
• Don’t shoot for a whole issue of Science or
Nature that tries to answer everything about a
million people.
• Leave some of the Nature papers for others.
• The real value of this data set will be in the
questions others can see being asked and
answered
• Great success stories already with Geisinger,
Million Veterans, and many more.
• Create something here that cannot be done by
the academic, medical, and private world.