Data Science at NIH and its
Relationship to Social Computing,
Behavioral-Cultural Modeling, &
Prediction
Philip E. Bourne, PhD, FACMI
Associate Director for Data Science
National Institutes of Health
SBP15, Washington DC
April 2, 2015
[Thanks to Patty Mabry]
Disclaimer
I know little about methods and
developments in Social Computing,
Behavioral-Cultural Modeling, &
Prediction
However, I do believe it is critical to
NIH’s mission and am here to learn
how we can help
Office of Biomedical
Data Science
Mission Statement
To foster an open ecosystem that
enables biomedical research to be
conducted as a digital enterprise that
enhances health, lengthens life and
reduces illness and disability & to
train the next generation of data
scientists
Goals expanded from recommendations in the June 2012 DIWG and
BRWWG reports.
Let Me Give You 4 Examples of What
Drives Us …
1. We are at a Point of Disruption
 Evidence:
– Google car
– 3D printers
– Waze
– Robotics
– Sensors
From: The Second Machine Age: Work, Progress,
and Prosperity in a Time of Brilliant Technologies
by Erik Brynjolfsson & Andrew McAfee
2. Democratization
The Story of Meredith
http://fora.tv/2012/04/20/Congress_Unplugged_
Phil_Bourne
Stephen Friend
47/53 “landmark” publications
could not be replicated
[Begley, Ellis Nature,
483, 2012] [Carole Goble]
3. Reproducability
4. New Opportunities
“And that’s why we’re here today. Because something
called precision medicine … gives us one of the greatest
opportunities for new medical breakthroughs that we
have ever seen.”
President Barack Obama
January 30, 2015
Precision Medicine Initiative
Vision: Build a broad research program to encourage
creative approaches to precision medicine, test them
rigorously, and, ultimately, use them to build the
evidence base needed to guide clinical practice.
Near Term: apply the tenets of precision medicine to a
major health threat – cancer
Longer Term: generate the knowledge base necessary
to move precision medicine into virtually all areas of
health and disease
Precision Medicine Initiative
 National Research Cohort
– >1 million U.S. volunteers
– Numerous existing cohorts (many funded by NIH)
– New volunteers
 Participants will be centrally involved in design and
implementation of the cohort
 They will be able to share genomic data, lifestyle
information, biological samples – all linked to their
electronic health records
National Research Cohort:
What Early Success Might Look Like
 A real test of pharmacogenomics—right drug at the right
dose for the right patient
 New therapeutic targets by identifying loss-of-function
mutations protective against common diseases
– PCSK9 for cardiovascular disease
– SLC30A8 for type 2 diabetes
 Resilience – finding individuals who should be ill but aren’t
 New ways to evaluate mHealth technologies for
prevention/management of chronic diseases
Precision Medicine:
What Success Might Look Like
50-year-old woman with type 2 diabetes visits her
doctor
Now
– Though woman’s glucose control has been suboptimal,
doctor renews her prescription for drug often used for type 2
diabetes
– Continues to monitor blood glucose with fingersticks and
glucometer, despite dissatisfaction with these methods
Precision Medicine:
What Success Might Look Like
50-year-old woman with type 2 diabetes visits her
doctor
Future: + 2 years
– Volunteers for new national research network
• Sample of her DNA, along with her health information,
sent to researchers for sequencing/analysis
• Can view her health/research data via smartphone
– Agrees to researchers’ request to track her glucose levels
via tiny implantable chip that sends wireless signals to her
watch, researchers’ computers
• Using these data, she changes diet,
medicine dose schedule
Other Diseases:
What Success Might Look Like
50-year-old woman with type 2 diabetes visits her
doctor
Future: + 5 years
– Receives word from her doctor about a new drug based
upon improved molecular understanding of type 2 diabetes
– When she enters drug’s name into her smartphone’s Rx
app, her genomic data show she’ll metabolize the drug
slowly
• Her doctor alters the dose accordingly
Other Diseases:
What Success Might Look Like
50-year-old woman with type 2 diabetes visits her
doctor
Future: + 10 years
– Celebrates her 60th birthday and reflects with her family
about how proud she is to be part of cohort study
– Her glucose levels remain well controlled; she’s suffered no
diabetes-related complications
– Her children decide to volunteer for cohort study
EHRsPatient Partnerships
Data Science
GenomicsTechnologies
Example Relevant to SBP
 Just because you have all these data does not mean
it is useful:
– What span and temporal sequence do we need to make
establish meaningful outcomes?
– What range of factors so we need to consider?
 NIH BSSR-Systems Science Listserv contact Patty
Mabry, NIH Office of Disease Prevention, to join:
mabryp@od.nih.gov
The BD2K Program is Central
to the Mission
Planned – Black; Available- Green
Elements of The Digital Enterprise
Communities Policies
Infrastructure
• Intersection:
• Sustainability
• Efficiency
• Collaboration
• Training
Elements of The Digital Enterprise
Communities Policies
Infrastructure
• Intersection:
• Sustainability
• Efficiency
• Collaboration
• Training
Virtuous
Research
Cycle
Consider an example…
 Big Data: The study involved
MRI images & GWAS data
from over 30,000 people
 Collaboration: Data came
from many different sights
affiliated with the ENIGMA
consortium
 Methods: To homogenize
data from different sites, the
group designed standardized
protocols for image analysis,
quality assessment, genetic
imputation, and association
 Found five novel genetic
variants
 Results provided insight into
the variability of brain
development, and may be
applied to study of
neuropsychiatric dysfunction
 Community – Enigma, BD2K
 Policy
– Improved consent methods
– Cloud accessibility for human subjects data
– Trusted partners
– Data sharing
 Infrastructure
– Standards, compute resources, software
Communities: 2014 Summary
 Visioning workshop convened 9/3/14
 Launched BD2K ($32M)
– 12 Centers of data excellence
– Data Discovery Index Coordination Consortium
(DDICC)
– Training awards
 First successful consortia meeting 11/3-4
 Workshops to inform future funding
– Software indexing and discoverability
– Gaming
Communities: 2015 Activities
 New FOAs with outreach to new
communities – math, stats, comp science etc.
 Work with e.g GA4GH, RDA, FORCE11,
NDS ….
 IDEAS lab with NSF
 Competition with international funders
 Software carpentry, hackathons, Pi Day
Communities: Questions?
 Societies of the modern age?
 How to enable these groups?
 How to marry the funding of individuals with
the funding of communities?
Policies: Now & Forthcoming
 Data Sharing
– Genomic data sharing announced
– Data sharing plans on all research awards
– Data sharing plan enforcement
• Machine readable plan
• Repository requirements to include grant numbers
http://www.nih.gov/news/health/aug2014/od-27.htm
Policies - Forthcoming
 Data Citation
– Goal: legitimize data as a form of scholarship
– Process:
• Machine readable standard for data citation (done)
• Endorsement of data citation for inclusion in NIH bib
sketch, grants, reports, etc.
• Example formats for human readable data citations
• Slowly work into NLM/NCBI workflow
 dbGaP in the cloud (done!)
BD2K
Center
BD2K
Center
BD2K
Center
BD2K
Center
BD2K
Center
BD2K
Center
DDICC
Software
Standard
s
Infrastructure - The
Commons
Labs
Labs
Labs
Labs
The Commons
Digital Objects
(with UIDs)
Search
(indexed metadata)
Computing
Platform
TheCommons
Vivien Bonazzi
George Komatsoulis
The Commons: Compute Platforms
The Commons
Conceptual Framework
Public Cloud
Platforms
Super Computing
(HPC) Platforms
Other
Platforms ?
 Google, AWS (Amazon)
 Microsoft (Azure), IBM,
other?
 In house compute
solutions
 Private clouds, HPC
– Pharma
– The Broad
– Bionimbus
 Traditionally low access
by NIH
The Commons:
Business Model
[George Komatsoulis]
Infrastructure: Standards
 2013 Workshop on Frameworks for Community-
Based Standards
 August 2014 Input on Information Resources for
Data-Related Standards Widely Used in Biomedical
Science – 30 responses
 Feb 2015 Workshop Community-based Data and
Metadata Standards
 Internal CDE Registry project
Elements of The Digital Enterprise
Communities Policies
Infrastructure
• Intersection:
• Sustainability
• Efficiency
• Collaboration
• Training
Elements of The Digital Enterprise
Communities Policies
Infrastructure
• Intersection:
• Sustainability
• Efficiency
• Collaboration
• Training
Workforce Training
Problem: Lack of
Biomedical Data Science Specialists
BD2K T32/T15
BD2K K01
Career path workshops – eg AAU
Challenge
model of
funding
Problem: Limited Access to
Data Science Training
BD2K R25
Metadata
for training
materials
Community-sourced
cataloging and indexing
of training opportunities
Measure utility
NIH Workforce Development Center
RFA-ES-15-004
I not only use all the brains
I have, but all I can borrow.
– Woodrow Wilson
Associate Director for Data Science
Commons BD2K Efficiency
Sustainability Education Innovation Process
• Cloud – Data &
Compute
• Search
• Security
• Reproducibility
Standards
• App Store
• Coordinate
• Hands-on
• Syllabus
• MOOCs
• Community
• Centers
• Training Grants
• Catalogs
• Standards
• Analysis
• Data
Resource
Support
• Metrics
• Best
Practices
• Evaluation
• Portfolio
Analysis
The Biomedical Research Digital Enterprise
Partnerships
Collaboration
rogrammatic Theme
Deliverable
Example Features • IC’s
• Researchers
• Federal
Agencies
• International
Partners
• Computer
Scientists
Scientific Data Council External Advisory Board
Training
NIHNIH……
Turning Discovery Into HealthTurning Discovery Into Health
philip.bourne@nih.gov

Data Science at NIH and its Relationship to Social Computing, Behavioral-Cultural Modeling, & Prediction

  • 1.
    Data Science atNIH and its Relationship to Social Computing, Behavioral-Cultural Modeling, & Prediction Philip E. Bourne, PhD, FACMI Associate Director for Data Science National Institutes of Health SBP15, Washington DC April 2, 2015 [Thanks to Patty Mabry]
  • 2.
    Disclaimer I know littleabout methods and developments in Social Computing, Behavioral-Cultural Modeling, & Prediction However, I do believe it is critical to NIH’s mission and am here to learn how we can help
  • 3.
    Office of Biomedical DataScience Mission Statement To foster an open ecosystem that enables biomedical research to be conducted as a digital enterprise that enhances health, lengthens life and reduces illness and disability & to train the next generation of data scientists Goals expanded from recommendations in the June 2012 DIWG and BRWWG reports.
  • 4.
    Let Me GiveYou 4 Examples of What Drives Us …
  • 5.
    1. We areat a Point of Disruption  Evidence: – Google car – 3D printers – Waze – Robotics – Sensors From: The Second Machine Age: Work, Progress, and Prosperity in a Time of Brilliant Technologies by Erik Brynjolfsson & Andrew McAfee
  • 6.
    2. Democratization The Storyof Meredith http://fora.tv/2012/04/20/Congress_Unplugged_ Phil_Bourne Stephen Friend
  • 7.
    47/53 “landmark” publications couldnot be replicated [Begley, Ellis Nature, 483, 2012] [Carole Goble] 3. Reproducability
  • 8.
  • 9.
    “And that’s whywe’re here today. Because something called precision medicine … gives us one of the greatest opportunities for new medical breakthroughs that we have ever seen.” President Barack Obama January 30, 2015
  • 10.
    Precision Medicine Initiative Vision:Build a broad research program to encourage creative approaches to precision medicine, test them rigorously, and, ultimately, use them to build the evidence base needed to guide clinical practice. Near Term: apply the tenets of precision medicine to a major health threat – cancer Longer Term: generate the knowledge base necessary to move precision medicine into virtually all areas of health and disease
  • 11.
    Precision Medicine Initiative National Research Cohort – >1 million U.S. volunteers – Numerous existing cohorts (many funded by NIH) – New volunteers  Participants will be centrally involved in design and implementation of the cohort  They will be able to share genomic data, lifestyle information, biological samples – all linked to their electronic health records
  • 12.
    National Research Cohort: WhatEarly Success Might Look Like  A real test of pharmacogenomics—right drug at the right dose for the right patient  New therapeutic targets by identifying loss-of-function mutations protective against common diseases – PCSK9 for cardiovascular disease – SLC30A8 for type 2 diabetes  Resilience – finding individuals who should be ill but aren’t  New ways to evaluate mHealth technologies for prevention/management of chronic diseases
  • 13.
    Precision Medicine: What SuccessMight Look Like 50-year-old woman with type 2 diabetes visits her doctor Now – Though woman’s glucose control has been suboptimal, doctor renews her prescription for drug often used for type 2 diabetes – Continues to monitor blood glucose with fingersticks and glucometer, despite dissatisfaction with these methods
  • 14.
    Precision Medicine: What SuccessMight Look Like 50-year-old woman with type 2 diabetes visits her doctor Future: + 2 years – Volunteers for new national research network • Sample of her DNA, along with her health information, sent to researchers for sequencing/analysis • Can view her health/research data via smartphone – Agrees to researchers’ request to track her glucose levels via tiny implantable chip that sends wireless signals to her watch, researchers’ computers • Using these data, she changes diet, medicine dose schedule
  • 15.
    Other Diseases: What SuccessMight Look Like 50-year-old woman with type 2 diabetes visits her doctor Future: + 5 years – Receives word from her doctor about a new drug based upon improved molecular understanding of type 2 diabetes – When she enters drug’s name into her smartphone’s Rx app, her genomic data show she’ll metabolize the drug slowly • Her doctor alters the dose accordingly
  • 16.
    Other Diseases: What SuccessMight Look Like 50-year-old woman with type 2 diabetes visits her doctor Future: + 10 years – Celebrates her 60th birthday and reflects with her family about how proud she is to be part of cohort study – Her glucose levels remain well controlled; she’s suffered no diabetes-related complications – Her children decide to volunteer for cohort study
  • 17.
  • 18.
    Example Relevant toSBP  Just because you have all these data does not mean it is useful: – What span and temporal sequence do we need to make establish meaningful outcomes? – What range of factors so we need to consider?  NIH BSSR-Systems Science Listserv contact Patty Mabry, NIH Office of Disease Prevention, to join: mabryp@od.nih.gov
  • 19.
    The BD2K Programis Central to the Mission Planned – Black; Available- Green
  • 20.
    Elements of TheDigital Enterprise Communities Policies Infrastructure • Intersection: • Sustainability • Efficiency • Collaboration • Training
  • 21.
    Elements of TheDigital Enterprise Communities Policies Infrastructure • Intersection: • Sustainability • Efficiency • Collaboration • Training Virtuous Research Cycle
  • 22.
  • 23.
     Big Data:The study involved MRI images & GWAS data from over 30,000 people  Collaboration: Data came from many different sights affiliated with the ENIGMA consortium  Methods: To homogenize data from different sites, the group designed standardized protocols for image analysis, quality assessment, genetic imputation, and association  Found five novel genetic variants  Results provided insight into the variability of brain development, and may be applied to study of neuropsychiatric dysfunction
  • 24.
     Community –Enigma, BD2K  Policy – Improved consent methods – Cloud accessibility for human subjects data – Trusted partners – Data sharing  Infrastructure – Standards, compute resources, software
  • 25.
    Communities: 2014 Summary Visioning workshop convened 9/3/14  Launched BD2K ($32M) – 12 Centers of data excellence – Data Discovery Index Coordination Consortium (DDICC) – Training awards  First successful consortia meeting 11/3-4  Workshops to inform future funding – Software indexing and discoverability – Gaming
  • 26.
    Communities: 2015 Activities New FOAs with outreach to new communities – math, stats, comp science etc.  Work with e.g GA4GH, RDA, FORCE11, NDS ….  IDEAS lab with NSF  Competition with international funders  Software carpentry, hackathons, Pi Day
  • 27.
    Communities: Questions?  Societiesof the modern age?  How to enable these groups?  How to marry the funding of individuals with the funding of communities?
  • 28.
    Policies: Now &Forthcoming  Data Sharing – Genomic data sharing announced – Data sharing plans on all research awards – Data sharing plan enforcement • Machine readable plan • Repository requirements to include grant numbers http://www.nih.gov/news/health/aug2014/od-27.htm
  • 29.
    Policies - Forthcoming Data Citation – Goal: legitimize data as a form of scholarship – Process: • Machine readable standard for data citation (done) • Endorsement of data citation for inclusion in NIH bib sketch, grants, reports, etc. • Example formats for human readable data citations • Slowly work into NLM/NCBI workflow  dbGaP in the cloud (done!)
  • 30.
  • 31.
    The Commons Digital Objects (withUIDs) Search (indexed metadata) Computing Platform TheCommons Vivien Bonazzi George Komatsoulis
  • 32.
    The Commons: ComputePlatforms The Commons Conceptual Framework Public Cloud Platforms Super Computing (HPC) Platforms Other Platforms ?  Google, AWS (Amazon)  Microsoft (Azure), IBM, other?  In house compute solutions  Private clouds, HPC – Pharma – The Broad – Bionimbus  Traditionally low access by NIH
  • 33.
  • 34.
    Infrastructure: Standards  2013Workshop on Frameworks for Community- Based Standards  August 2014 Input on Information Resources for Data-Related Standards Widely Used in Biomedical Science – 30 responses  Feb 2015 Workshop Community-based Data and Metadata Standards  Internal CDE Registry project
  • 35.
    Elements of TheDigital Enterprise Communities Policies Infrastructure • Intersection: • Sustainability • Efficiency • Collaboration • Training
  • 36.
    Elements of TheDigital Enterprise Communities Policies Infrastructure • Intersection: • Sustainability • Efficiency • Collaboration • Training
  • 37.
  • 38.
    Problem: Lack of BiomedicalData Science Specialists BD2K T32/T15 BD2K K01 Career path workshops – eg AAU Challenge model of funding
  • 39.
    Problem: Limited Accessto Data Science Training BD2K R25 Metadata for training materials Community-sourced cataloging and indexing of training opportunities Measure utility NIH Workforce Development Center RFA-ES-15-004
  • 40.
    I not onlyuse all the brains I have, but all I can borrow. – Woodrow Wilson
  • 41.
    Associate Director forData Science Commons BD2K Efficiency Sustainability Education Innovation Process • Cloud – Data & Compute • Search • Security • Reproducibility Standards • App Store • Coordinate • Hands-on • Syllabus • MOOCs • Community • Centers • Training Grants • Catalogs • Standards • Analysis • Data Resource Support • Metrics • Best Practices • Evaluation • Portfolio Analysis The Biomedical Research Digital Enterprise Partnerships Collaboration rogrammatic Theme Deliverable Example Features • IC’s • Researchers • Federal Agencies • International Partners • Computer Scientists Scientific Data Council External Advisory Board Training
  • 42.
    NIHNIH…… Turning Discovery IntoHealthTurning Discovery Into Health philip.bourne@nih.gov

Editor's Notes

  • #8 Ioannidis JPA (2005) Why Most Published Research Findings Are False. PLoS Med 2(8): e124. doi:10.1371/journal.pmed.0020124 http://www.reuters.com/article/2012/03/28/us-science-cancer-idUSBRE82R12P20120328
  • #10 Photos: FC tweet; RK screen grab
  • #11 Draft vision statement taken from Collins/Varmus NEJM article, pg 1, colum2, last sentence Images that reflect both cancer and cohort study
  • #12 Images of people from Infographic (NOTE: Image is just a placeholder—Jill will tweak) Detailed Notes: National Research Cohort <<OR name of study>> >1 million U.S. volunteers committed to participating in research Will combine a number of existing cohorts Will include Dept of Veterans Affairs Million Veteran Program—note Veteran is singular per http://www.research.va.gov/MVP/
  • #13 Images--pharmacogenomics—helix on pill, bottle, prescription pad, Drug targets—DNA helix, whatever we might have used for PCSK9, unidentifiable pills/capsules; smart phone, the “fall prevention” watch from gerontology meeting, other mobile health images Molecular structure structure of PCSK9 from NIBIB database.
  • #14 Images: Overweight woman, someone testing blood sugar, pills for metformin or other diabetes drug?
  • #16 Images: DNA sequencing, readouts, pills, maybe the DNA on pill bottle that we’ve used for pharmacogenomics.
  • #39 Green: already done Yellow: under consideration
  • #40 Green: already done Yellow: under consideration