Big Data as a Catalyst for
Collaboration & Innovation
Philip E. Bourne Ph.D., FACMI
Associate Director for Data Science
National Institutes of Health
philip.bourne@nih.gov
SRP Annual Meeting
Puerto Rico, November 18, 2015
Thesis…
We are entering a period of disruption
in biomedical research and we should
all be thinking about what this means
http://i1.wp.com/chisconsult.com/wp-
content/uploads/2013/05/disruption-is-a-
process.jpg
http://cdn2.hubspot.net/hubfs/418817/disruption1.jpg
Evidence of Disruption …
 Evidence:
– Google car
– 3D printers
– Waze
– Robotics
– Sensors
From: The Second Machine Age: Work, Progress,
and Prosperity in a Time of Brilliant Technologies
by Erik Brynjolfsson & Andrew McAfee
Disruption: Example - Photography
Digitization
Deception
Disruption
Demonetization
Dematerialization
Democratization
Time
Volume,Velocity,Variety
Digital camera invented by
Kodak but shelved
Megapixels & quality improve slowly;
Kodak slow to react
Film market collapses;
Kodak goes bankrupt
Phones replace
cameras
Instagram,
Flickr become the
value proposition
Digital media becomes bona fide
form of communication
Disruption: Biomedical Research
Digitization of Basic &
Clinical Research & EHR’s
Deception
We Are Here
Disruption
Demonetization
Dematerialization
Democratization
Open science
Patient centered health care
Disruptive Features: Sustainability
Source Michael Bell http://homepages.cs.ncl.ac.uk/m.j.bell1/blog/?p=830
Disruptive Features:
Reproducibility
Changing Value of Scholarship (?)
“And that’s why we’re here today. Because something
called precision medicine … gives us one of the greatest
opportunities for new medical breakthroughs that we
have ever seen.”
President Barack Obama
January 30, 2015
Disruptive Features – New Science
Precision Medicine Initiative
 National Research Cohort
– >1 million U.S. volunteers
– Numerous existing cohorts (many funded by NIH)
– New volunteers
 Participants will be centrally involved in design and
implementation of the cohort
 They will be able to share genomic data, lifestyle
information, biological samples – all linked to their
electronic health records
Big Data in Biomedicine…
This speaks to something more
fundamental that more data …
It speaks to new methodologies, new
skills, new emphasis, new cultures,
new modes of discovery …
Open
Educational
Resources
Software
Discovery and
Sustainability
Community
Based
Standards
Clinical Data
Challenges
Database
Sustainability
About the BD2K Program
• Initial FY 2014 funding
of $32M.
• Proposed investment
of $656M through
2020.
• Funded by ALL NIH
Institutes and
Centers.
Software and
Tool
Development
Centers of
Excellence for Big
Data Computing
Data
Discovery
Index
Coordination
Consortium
Scientist
Training
Awards
Data
Science
Courses
Data
Sharing
BD2K FY14 Awards
supported by all NIH Institutes
 Big Data: MRI images &
GWAS data from over 30,000
people.
 Collaboration: Data came
from 190 worldwide sites
across 33 countries.
 Methods: To homogenize
data from different sites, the
group designed standardized
protocols for image analysis,
quality assessment, genetic
imputation, and association.
 Found five novel genetic
variants influencing volume
of brain regions.
 Results provided insight into
the variability of brain
development, and may be
applied to mechanisms of
neuropsychiatric dysfunction.
(Nature, 2015)
Detect Predict Adapt
MD2K Applications – CHF and Smoking
Elements of The Digital Enterprise
Community
Policy
Infrastructure
• Sustainability
• Collaboration
• Training
Elements of The Digital Enterprise
Community
Policy
Infrastructure
• Sustainability
• Collaboration
• Training
A Culture of Sharing
1999 20042003 2007 20142008
Research
Tools
Policy
NIH Data
Sharing Policy
Model
Organism
Policy
Genome-wide
Association
(GWAS) Policy
2012
NIH Public
Access Policy
(Publications)
Big Data to
Knowledge
(BD2K) Initiative
Genomic Data
Sharing (GDS)
Policy
Modernization of
NIH Clinical
Trials
White House
Initiative
(2013 “Holdren
Memo”)
Policies – Now & Forthcoming
 Data Sharing
– Goal: legitimize data as a form of scholarship
– Now: Genomic data sharing announced
– Coming: Data sharing plans on all research awards
– Data sharing plan enforcement
• Machine readable plan
• Repository requirements to include grant numbers
– Data citation
http://www.nih.gov/news/health/aug2014/od-27.htm
Elements of The Digital Enterprise
Community
Policy
Infrastructure
• Sustainability
• Collaboration
• Training
Elements of The Digital Enterprise
Community
Policy
Infrastructure
• Sustainability
• Collaboration
• Training
 The Commons is a shared virtual space which is
FAIR:
– Find
– Access (use effectively)
– Interoperate
– Reuse
 An environment to find and catalyze the use of
shared digital research objects
The Commons
Concept
The Developer or User Defines the
Environment from the Appropriate
Building Blocks
Infrastructure - The Commons:
Conceptual Framework
Research Objects
(with UIDs)
Discoverability
(Search & Find)
TheCommons
Open APIs
Data and tools
Computing
Platform(s)
Containers
Packaging software
BD2K
Center
BD2K
Center
BD2K
Center
BD2K
Center
BD2K
Center
BD2K
Center
DDICC
Software
Standards
Infrastructure - The
Commons
Labs
Labs
Labs
Labs
Commons - Pilots
 The Cloud Credits - business model
 BD2K Centers
 MODs (Model Organism Databases)
 HMP Data and tools available in the cloud
 NCI Cloud Pilots & Genomic Data
Commons
Elements of The Digital Enterprise
Community
Policy
Infrastructure
• Sustainability
• Collaboration
• Training
SRP & BD2K
Possible Interactions? (1/3)
 Innovation
– Join IDEAS Labs
– Propose new EHS programs
 Collaboration
– Participation in standards efforts
• EHS vocabulary
• CDE harmonization
• BD2K standards coordination center
 Training
– BD2K training coordination center
SRP & BD2K
Possible Interactions? (2/3)
 Open Science - FAIR
– Participate in the Open Science Competitions
– With EPA challenge to produce device to measure
atmospheric pollutants and biologic response (2013).
• Toxicogenomics challenge with NTP/NIEHS, UNC, and
Sage/DREAM to predict response to a given chemical
based upon individual genetic susceptibility (2014).
• With HHS to generate visualization tool to demonstrate
impact of climate change on health risk (2016).
– CHEAR data science center.
SRP & BD2K
Possible Interactions? (3/3)
 Leadership and coordination
– Trans-NIH, BD2K
– Interagency Big Data Senior Steering Group
– Sustainability Working Group
– Research Data Alliance Interest Group, Toxicogenomics
Databases Interoperability
ADDS Team
BD2K Representatives
NIH…
Turning Discovery Into Health
philip.bourne@nih.gov
https://datascience.nih.gov/
http://www.ncbi.nlm.nih.gov/research/staff/bourne/

Big Data as a Catalyst for Collaboration & Innovation

  • 1.
    Big Data asa Catalyst for Collaboration & Innovation Philip E. Bourne Ph.D., FACMI Associate Director for Data Science National Institutes of Health philip.bourne@nih.gov SRP Annual Meeting Puerto Rico, November 18, 2015
  • 2.
    Thesis… We are enteringa period of disruption in biomedical research and we should all be thinking about what this means http://i1.wp.com/chisconsult.com/wp- content/uploads/2013/05/disruption-is-a- process.jpg http://cdn2.hubspot.net/hubfs/418817/disruption1.jpg
  • 3.
    Evidence of Disruption…  Evidence: – Google car – 3D printers – Waze – Robotics – Sensors From: The Second Machine Age: Work, Progress, and Prosperity in a Time of Brilliant Technologies by Erik Brynjolfsson & Andrew McAfee
  • 4.
    Disruption: Example -Photography Digitization Deception Disruption Demonetization Dematerialization Democratization Time Volume,Velocity,Variety Digital camera invented by Kodak but shelved Megapixels & quality improve slowly; Kodak slow to react Film market collapses; Kodak goes bankrupt Phones replace cameras Instagram, Flickr become the value proposition Digital media becomes bona fide form of communication
  • 5.
    Disruption: Biomedical Research Digitizationof Basic & Clinical Research & EHR’s Deception We Are Here Disruption Demonetization Dematerialization Democratization Open science Patient centered health care
  • 6.
    Disruptive Features: Sustainability SourceMichael Bell http://homepages.cs.ncl.ac.uk/m.j.bell1/blog/?p=830
  • 7.
  • 8.
    “And that’s whywe’re here today. Because something called precision medicine … gives us one of the greatest opportunities for new medical breakthroughs that we have ever seen.” President Barack Obama January 30, 2015 Disruptive Features – New Science
  • 9.
    Precision Medicine Initiative National Research Cohort – >1 million U.S. volunteers – Numerous existing cohorts (many funded by NIH) – New volunteers  Participants will be centrally involved in design and implementation of the cohort  They will be able to share genomic data, lifestyle information, biological samples – all linked to their electronic health records
  • 10.
    Big Data inBiomedicine… This speaks to something more fundamental that more data … It speaks to new methodologies, new skills, new emphasis, new cultures, new modes of discovery …
  • 11.
    Open Educational Resources Software Discovery and Sustainability Community Based Standards Clinical Data Challenges Database Sustainability Aboutthe BD2K Program • Initial FY 2014 funding of $32M. • Proposed investment of $656M through 2020. • Funded by ALL NIH Institutes and Centers. Software and Tool Development Centers of Excellence for Big Data Computing Data Discovery Index Coordination Consortium Scientist Training Awards Data Science Courses Data Sharing
  • 12.
    BD2K FY14 Awards supportedby all NIH Institutes
  • 13.
     Big Data:MRI images & GWAS data from over 30,000 people.  Collaboration: Data came from 190 worldwide sites across 33 countries.  Methods: To homogenize data from different sites, the group designed standardized protocols for image analysis, quality assessment, genetic imputation, and association.  Found five novel genetic variants influencing volume of brain regions.  Results provided insight into the variability of brain development, and may be applied to mechanisms of neuropsychiatric dysfunction. (Nature, 2015)
  • 14.
    Detect Predict Adapt MD2KApplications – CHF and Smoking
  • 15.
    Elements of TheDigital Enterprise Community Policy Infrastructure • Sustainability • Collaboration • Training
  • 16.
    Elements of TheDigital Enterprise Community Policy Infrastructure • Sustainability • Collaboration • Training
  • 17.
    A Culture ofSharing 1999 20042003 2007 20142008 Research Tools Policy NIH Data Sharing Policy Model Organism Policy Genome-wide Association (GWAS) Policy 2012 NIH Public Access Policy (Publications) Big Data to Knowledge (BD2K) Initiative Genomic Data Sharing (GDS) Policy Modernization of NIH Clinical Trials White House Initiative (2013 “Holdren Memo”)
  • 18.
    Policies – Now& Forthcoming  Data Sharing – Goal: legitimize data as a form of scholarship – Now: Genomic data sharing announced – Coming: Data sharing plans on all research awards – Data sharing plan enforcement • Machine readable plan • Repository requirements to include grant numbers – Data citation http://www.nih.gov/news/health/aug2014/od-27.htm
  • 19.
    Elements of TheDigital Enterprise Community Policy Infrastructure • Sustainability • Collaboration • Training
  • 20.
    Elements of TheDigital Enterprise Community Policy Infrastructure • Sustainability • Collaboration • Training
  • 21.
     The Commonsis a shared virtual space which is FAIR: – Find – Access (use effectively) – Interoperate – Reuse  An environment to find and catalyze the use of shared digital research objects The Commons Concept
  • 22.
    The Developer orUser Defines the Environment from the Appropriate Building Blocks
  • 23.
    Infrastructure - TheCommons: Conceptual Framework Research Objects (with UIDs) Discoverability (Search & Find) TheCommons Open APIs Data and tools Computing Platform(s) Containers Packaging software
  • 24.
  • 25.
    Commons - Pilots The Cloud Credits - business model  BD2K Centers  MODs (Model Organism Databases)  HMP Data and tools available in the cloud  NCI Cloud Pilots & Genomic Data Commons
  • 26.
    Elements of TheDigital Enterprise Community Policy Infrastructure • Sustainability • Collaboration • Training
  • 27.
    SRP & BD2K PossibleInteractions? (1/3)  Innovation – Join IDEAS Labs – Propose new EHS programs  Collaboration – Participation in standards efforts • EHS vocabulary • CDE harmonization • BD2K standards coordination center  Training – BD2K training coordination center
  • 28.
    SRP & BD2K PossibleInteractions? (2/3)  Open Science - FAIR – Participate in the Open Science Competitions – With EPA challenge to produce device to measure atmospheric pollutants and biologic response (2013). • Toxicogenomics challenge with NTP/NIEHS, UNC, and Sage/DREAM to predict response to a given chemical based upon individual genetic susceptibility (2014). • With HHS to generate visualization tool to demonstrate impact of climate change on health risk (2016). – CHEAR data science center.
  • 29.
    SRP & BD2K PossibleInteractions? (3/3)  Leadership and coordination – Trans-NIH, BD2K – Interagency Big Data Senior Steering Group – Sustainability Working Group – Research Data Alliance Interest Group, Toxicogenomics Databases Interoperability
  • 30.
  • 31.
    NIH… Turning Discovery IntoHealth philip.bourne@nih.gov https://datascience.nih.gov/ http://www.ncbi.nlm.nih.gov/research/staff/bourne/