Open Data in a Global Ecosystem
Philip E. Bourne Ph.D., FACMI
Associate Director for Data Science
National Institutes of Health
philip.bourne@nih.gov
BioMedBridges, EBI, November 17, 2015
http://www.slideshare.net/pebourne
Not a talking head….
An on-going conversation
Some context to start that
conversation …
Perspective
 Structural bioinformatics researcher
 Former custodian of the RCSB PDB
 Obsessive about open science e.g., PLOS
 NIH-wide responsibility for developments in
data science
Consider this change from my own
career experience ….
The History of Computational
Biomedicine According to Bourne
1980s 1990s 2000s 2010s 2020
Discipline:
Unknown Expt. Driven Emergent Over-sold A Service A Partner A Driver
The Raw Material:
Non-existent Limited /Poor More/Ontologies Big Data/Siloed Open/Integrated
The People:
No name Technicians Industry recognition data scientists Academics
Searls (ed) The Roots in Bioinformatics Series PLOS Comp Biol
It Follows …
We are entering a period of disruption
in biomedical research and we should
all be thinking about what this means
to bioinformatics & biomedicine
http://i1.wp.com/chisconsult.com/wp-
content/uploads/2013/05/disruption-is-a-
process.jpg
http://cdn2.hubspot.net/hubfs/418817/disruption1.jpg
Big Data in Biomedicine…
This speaks to something more
fundamental that more data …
It speaks to new methodologies, new
skills, new emphasis, new cultures,
new modes of discovery …
We are at a Point of Deception …
 Evidence:
– Google car
– 3D printers
– Waze
– Robotics
– Sensors
From: The Second Machine Age: Work, Progress,
and Prosperity in a Time of Brilliant Technologies
by Erik Brynjolfsson & Andrew McAfee
Disruption: Example - Photography
Digitization
Deception
Disruption
Demonetization
Dematerialization
Democratization
Time
Volume,Velocity,Variety
Digital camera invented by
Kodak but shelved
Megapixels & quality improve slowly;
Kodak slow to react
Film market collapses;
Kodak goes bankrupt
Phones replace
cameras
Instagram,
Flickr become the
value proposition
Digital media becomes bona fide
form of communication
Disruption: Biomedical Research
Digitization of Basic &
Clinical Research & EHR’s
Deception
We Are Here
Disruption
Demonetization
Dematerialization
Democratization
Open science
Patient centered health care
Disruptive Features: Sustainability
Source Michael Bell http://homepages.cs.ncl.ac.uk/m.j.bell1/blog/?p=830
Disruptive Features:
Reproducibility
Changing Value of Scholarship (?)
“And that’s why we’re here today. Because something
called precision medicine … gives us one of the greatest
opportunities for new medical breakthroughs that we
have ever seen.”
President Barack Obama
January 30, 2015
Disruptive Features – New Science
Precision Medicine Initiative
 National Research Cohort
– >1 million U.S. volunteers
– Numerous existing cohorts (many funded by NIH)
– New volunteers
 Participants will be centrally involved in design and
implementation of the cohort
 They will be able to share genomic data, lifestyle
information, biological samples – all linked to their
electronic health records
What Are Some General Implications
of Such a Future?
 Open collaborative science becomes of increasing
importance nationally and internationally
 The value of data and associated analytics becomes
of increasing value to scholarship
 Opportunities exist to improve the efficiency of the
research enterprise and hence fund more research
 Global cooperation between funders will be needed
to sustain the emergent digital enterprise
 Current training content and modalities will not match
supply to demand
 Balancing accessibility vs security becomes more
important yet more complex
What Are Some General Implications
of Such a Future?
 Open collaborative science becomes of increasing
importance nationally and internationally
 The value of data and associated analytics becomes
of increasing value to scholarship
 Opportunities exist to improve the efficiency of the
research enterprise and hence fund more research
 Global cooperation between funders will be needed
to sustain the emergent digital enterprise
 Current training content and modalities will not match
supply to demand
 Balancing accessibility vs security becomes more
important yet more complex
How Should We Respond as Funders?
 Community:
– Encourage wherever possible a global cultural shift towards
open science
– Encourage global exchanges
– Encourage global projects
 Policies:
– Understand and map data sharing policies, standards etc.
– Understand ethical, legal and societal differences
 Infrastructure:
– Share the burden and the reward
How Should We Respond as Funders?
 Community:
– Encourage wherever possible a global cultural shift towards
open science
– Encourage global exchanges
 Policies:
– Understand and map data sharing policies, standards etc.
– Understand ethical, legal and societal differences
 Infrastructure:
– Share the burden and the reward
https://www.openscienceprize.org/
A Culture of Sharing
1999 20042003 2007 20142008
Research
Tools
Policy
NIH Data
Sharing Policy
Model
Organism
Policy
Genome-wide
Association
(GWAS) Policy
2012
NIH Public
Access Policy
(Publications)
Big Data to
Knowledge
(BD2K) Initiative
Genomic Data
Sharing (GDS)
Policy
Modernization of
NIH Clinical
Trials
White House
Initiative
(2013 “Holdren
Memo”)
The BD2K Program
BD2K Budget
BD2K FY14 Awards
supported by all NIH Institutes
MD2K Applications – CHF and Smoking
How Should We Respond as Funders?
 Community:
– Encourage wherever possible a global cultural shift towards
open science
– Encourage global exchanges
– Encourage global projects
 Policies:
– Understand and map data sharing policies, standards etc.
– Understand ethical, legal and societal differences
 Infrastructure:
– Share the burden and the reward
 The Commons is a shared virtual space which is
FAIR:
– Find
– Access (use effectively)
– Interoperate
– Reuse
 An environment to find and catalyze the use of
shared digital research objects
The Commons
Concept
The Developer or User Defines the
Environment from the Appropriate
Building Blocks
The Commons
Components
BD2K
Center
BD2K
Center
BD2K
Center
BD2K
Center
BD2K
Center
BD2K
Center
DDICC
Software
Standard
s
Infrastructure - The
Commons
Labs
Labs
Labs
Labs
Public Beacons
Host Content
AMPLab 1000 Genomes Project
Broad Institute ExAC
Curoverse PGP, GA4GH Example Data
EBI
1000 Genomes Project, UK10K, GoNL, EVS,
GEUVADIS, UMCG Cardio GenePanel
Google
1000 Genomes Project, Phase III, Illumina Platinum
Genomes
ISB Known VARiants
NCBI NHLBI Exome Sequence Project
OICR 55 cancer datasets
SolveBio 56 public datasets
UCSC ClinVar, LOVD, UniProt
University of Leicester Cafe CardioKit, Cafe Variome Central
WTSI IBD, Native American, Egyptian, UK10K
Over 120 public datasets beaconized across 21 institutions
10s thousands of individuals
Commons - Pilots
 The Cloud Credits - business model
 BD2K Centers
 MODs (Model Organism Databases)
 HMP Data and tools available in the cloud
 NCI Cloud Pilots & Genomic Data
Commons
I not only use all the brains
I have, but all I can borrow.
– Woodrow Wilson
What Can We Do Now?
 Extend the research pilots
concept
 Have TCC & TeSS work
together
 Global hackathons,
competitions
 Closer ties between NLM and
EBI / Elixir
 Student exchanges
 Engage foundations, charities
in more global initiatives
http://wwwdev.ebi.ac.uk/Tools/ddi/
ADDS Team
BD2K Representatives
NIHNIH……
Turning Discovery Into HealthTurning Discovery Into Health
philip.bourne@nih.gov
https://datascience.nih.gov/
http://www.ncbi.nlm.nih.gov/research/staff/bourne/

Open Data in a Global Ecosystem

  • 1.
    Open Data ina Global Ecosystem Philip E. Bourne Ph.D., FACMI Associate Director for Data Science National Institutes of Health philip.bourne@nih.gov BioMedBridges, EBI, November 17, 2015 http://www.slideshare.net/pebourne
  • 2.
    Not a talkinghead…. An on-going conversation
  • 3.
    Some context tostart that conversation …
  • 4.
    Perspective  Structural bioinformaticsresearcher  Former custodian of the RCSB PDB  Obsessive about open science e.g., PLOS  NIH-wide responsibility for developments in data science
  • 5.
    Consider this changefrom my own career experience ….
  • 6.
    The History ofComputational Biomedicine According to Bourne 1980s 1990s 2000s 2010s 2020 Discipline: Unknown Expt. Driven Emergent Over-sold A Service A Partner A Driver The Raw Material: Non-existent Limited /Poor More/Ontologies Big Data/Siloed Open/Integrated The People: No name Technicians Industry recognition data scientists Academics Searls (ed) The Roots in Bioinformatics Series PLOS Comp Biol
  • 7.
    It Follows … Weare entering a period of disruption in biomedical research and we should all be thinking about what this means to bioinformatics & biomedicine http://i1.wp.com/chisconsult.com/wp- content/uploads/2013/05/disruption-is-a- process.jpg http://cdn2.hubspot.net/hubfs/418817/disruption1.jpg
  • 8.
    Big Data inBiomedicine… This speaks to something more fundamental that more data … It speaks to new methodologies, new skills, new emphasis, new cultures, new modes of discovery …
  • 9.
    We are ata Point of Deception …  Evidence: – Google car – 3D printers – Waze – Robotics – Sensors From: The Second Machine Age: Work, Progress, and Prosperity in a Time of Brilliant Technologies by Erik Brynjolfsson & Andrew McAfee
  • 10.
    Disruption: Example -Photography Digitization Deception Disruption Demonetization Dematerialization Democratization Time Volume,Velocity,Variety Digital camera invented by Kodak but shelved Megapixels & quality improve slowly; Kodak slow to react Film market collapses; Kodak goes bankrupt Phones replace cameras Instagram, Flickr become the value proposition Digital media becomes bona fide form of communication
  • 11.
    Disruption: Biomedical Research Digitizationof Basic & Clinical Research & EHR’s Deception We Are Here Disruption Demonetization Dematerialization Democratization Open science Patient centered health care
  • 12.
    Disruptive Features: Sustainability SourceMichael Bell http://homepages.cs.ncl.ac.uk/m.j.bell1/blog/?p=830
  • 13.
  • 14.
    “And that’s whywe’re here today. Because something called precision medicine … gives us one of the greatest opportunities for new medical breakthroughs that we have ever seen.” President Barack Obama January 30, 2015 Disruptive Features – New Science
  • 15.
    Precision Medicine Initiative National Research Cohort – >1 million U.S. volunteers – Numerous existing cohorts (many funded by NIH) – New volunteers  Participants will be centrally involved in design and implementation of the cohort  They will be able to share genomic data, lifestyle information, biological samples – all linked to their electronic health records
  • 16.
    What Are SomeGeneral Implications of Such a Future?  Open collaborative science becomes of increasing importance nationally and internationally  The value of data and associated analytics becomes of increasing value to scholarship  Opportunities exist to improve the efficiency of the research enterprise and hence fund more research  Global cooperation between funders will be needed to sustain the emergent digital enterprise  Current training content and modalities will not match supply to demand  Balancing accessibility vs security becomes more important yet more complex
  • 17.
    What Are SomeGeneral Implications of Such a Future?  Open collaborative science becomes of increasing importance nationally and internationally  The value of data and associated analytics becomes of increasing value to scholarship  Opportunities exist to improve the efficiency of the research enterprise and hence fund more research  Global cooperation between funders will be needed to sustain the emergent digital enterprise  Current training content and modalities will not match supply to demand  Balancing accessibility vs security becomes more important yet more complex
  • 18.
    How Should WeRespond as Funders?  Community: – Encourage wherever possible a global cultural shift towards open science – Encourage global exchanges – Encourage global projects  Policies: – Understand and map data sharing policies, standards etc. – Understand ethical, legal and societal differences  Infrastructure: – Share the burden and the reward
  • 19.
    How Should WeRespond as Funders?  Community: – Encourage wherever possible a global cultural shift towards open science – Encourage global exchanges  Policies: – Understand and map data sharing policies, standards etc. – Understand ethical, legal and societal differences  Infrastructure: – Share the burden and the reward
  • 20.
  • 21.
    A Culture ofSharing 1999 20042003 2007 20142008 Research Tools Policy NIH Data Sharing Policy Model Organism Policy Genome-wide Association (GWAS) Policy 2012 NIH Public Access Policy (Publications) Big Data to Knowledge (BD2K) Initiative Genomic Data Sharing (GDS) Policy Modernization of NIH Clinical Trials White House Initiative (2013 “Holdren Memo”)
  • 22.
  • 23.
    BD2K FY14 Awards supportedby all NIH Institutes
  • 24.
    MD2K Applications –CHF and Smoking
  • 25.
    How Should WeRespond as Funders?  Community: – Encourage wherever possible a global cultural shift towards open science – Encourage global exchanges – Encourage global projects  Policies: – Understand and map data sharing policies, standards etc. – Understand ethical, legal and societal differences  Infrastructure: – Share the burden and the reward
  • 26.
     The Commons isa shared virtual space which is FAIR: – Find – Access (use effectively) – Interoperate – Reuse  An environment to find and catalyze the use of shared digital research objects The Commons Concept
  • 27.
    The Developer orUser Defines the Environment from the Appropriate Building Blocks
  • 28.
  • 29.
  • 30.
    Public Beacons Host Content AMPLab1000 Genomes Project Broad Institute ExAC Curoverse PGP, GA4GH Example Data EBI 1000 Genomes Project, UK10K, GoNL, EVS, GEUVADIS, UMCG Cardio GenePanel Google 1000 Genomes Project, Phase III, Illumina Platinum Genomes ISB Known VARiants NCBI NHLBI Exome Sequence Project OICR 55 cancer datasets SolveBio 56 public datasets UCSC ClinVar, LOVD, UniProt University of Leicester Cafe CardioKit, Cafe Variome Central WTSI IBD, Native American, Egyptian, UK10K Over 120 public datasets beaconized across 21 institutions 10s thousands of individuals
  • 32.
    Commons - Pilots The Cloud Credits - business model  BD2K Centers  MODs (Model Organism Databases)  HMP Data and tools available in the cloud  NCI Cloud Pilots & Genomic Data Commons
  • 33.
    I not onlyuse all the brains I have, but all I can borrow. – Woodrow Wilson
  • 34.
    What Can WeDo Now?  Extend the research pilots concept  Have TCC & TeSS work together  Global hackathons, competitions  Closer ties between NLM and EBI / Elixir  Student exchanges  Engage foundations, charities in more global initiatives http://wwwdev.ebi.ac.uk/Tools/ddi/
  • 35.
  • 36.
    NIHNIH…… Turning Discovery IntoHealthTurning Discovery Into Health philip.bourne@nih.gov https://datascience.nih.gov/ http://www.ncbi.nlm.nih.gov/research/staff/bourne/

Editor's Notes

  • #15 Photos: FC tweet; RK screen grab
  • #16 Images of people from Infographic (NOTE: Image is just a placeholder—Jill will tweak) Detailed Notes: National Research Cohort <<OR name of study>> >1 million U.S. volunteers committed to participating in research Will combine a number of existing cohorts Will include Dept of Veterans Affairs Million Veteran Program—note Veteran is singular per http://www.research.va.gov/MVP/
  • #31 on this slide we have a list of Beacon providers and the content that they're serving. so to date we have over 120 public datasets that have been made available via Beacons at 12 different institutions. So this represents data from 10s of thousands of individuals and theses metrics, the numbers of datasets and individuals that they represent