Keynote at the IEEE International Conference on Bioinformatics and Biomedicine, Washington DC, November 10, 2015.
https://cci.drexel.edu/ieeebibm/bibm2015/
Model Call Girl in Bikash Puri Delhi reach out to us at 🔝9953056974🔝
Big Data in Biomedicine – An NIH Perspective
1. Big Data in Biomedicine – An NIH
Perspective
Philip E. Bourne Ph.D., FACMI
Associate Director for Data Science
National Institutes of Health
philip.bourne@nih.gov
IEEE BIBM Nov 10 2015, Washington DC
http://www.slideshare.net/pebourne
2. Perspective
Structural bioinformatics researcher
Former custodian of the PDB
Obsessive about open science e.g., PLOS
NIH-wide responsibility for developments in
data science – responding to the disruption
4. Big Data in Biomedicine…
This speaks to something more
fundamental that more data …
It speaks to new methodologies, new
skills, new emphasis, new cultures,
new modes of discovery …
6. The History of Computational
Biomedicine According to Bourne
1980s 1990s 2000s 2010s 2020
Discipline:
Unknown Expt. Driven Emergent Over-sold A Service A Partner A Driver
The Raw Material:
Non-existent Limited /Poor More/Ontologies Big Data/Siloed Open/Integrated
The People:
No name Technicians Industry recognition data scientists Academics
Searls (ed) The Roots in Bioinformatics Series PLOS Comp Biol
7. Premise:
We are entering a period of disruption
in biomedical research and we should
all be thinking about what this means
to bioinformatics & biomedicine
http://i1.wp.com/chisconsult.com/wp-
content/uploads/2013/05/disruption-is-a-
process.jpg
http://cdn2.hubspot.net/hubfs/418817/disruption1.jpg
8. We are at a Point of Deception …
Evidence:
– Google car
– 3D printers
– Waze
– Robotics
– Sensors
From: The Second Machine Age: Work, Progress,
and Prosperity in a Time of Brilliant Technologies
by Erik Brynjolfsson & Andrew McAfee
9. Disruption: Example - Photography
Digitization
Deception
Disruption
Demonetization
Dematerialization
Democratization
Time
Volume,Velocity,Variety
Digital camera invented by
Kodak but shelved
Megapixels & quality improve slowly;
Kodak slow to react
Film market collapses;
Kodak goes bankrupt
Phones replace
cameras
Instagram,
Flickr become the
value proposition
Digital media becomes bona fide
form of communication
10. Disruption: Biomedical Research
Digitization of Basic &
Clinical Research & EHR’s
Deception
We Are Here
Disruption
Demonetization
Dematerialization
Democratization
Open science
Patient centered health care
13. “And that’s why we’re here today. Because something
called precision medicine … gives us one of the greatest
opportunities for new medical breakthroughs that we
have ever seen.”
President Barack Obama
January 30, 2015
Disruptive Features – New Science
14. An Example of That Promise:
Comorbidity Network for 6.2M Danes
Over 14.9 Years
Jensen et al 2014 Nat Comm 5:4022
15. What Are Some General Implications
of Such a Future?
Open collaborative science becomes of increasing
importance
The value of data and associated analytics becomes
of increasing value to scholarship
Opportunities exist to improve the efficiency of the
research enterprise and hence fund more research
Cooperation between funders will be needed to
sustain the emergent digital enterprise
Current training content and modalities will not match
supply to demand
Balancing accessibility vs security becomes more
important yet more complex
16. How Should We Respond?
Funders: Encourage change and facilitate an orderly
transition
Academic Leaders: Respond and facilitate a cultural
shift
Developers: Develop working environments that are
more adaptive and capable of answers questions in a
more efficient and hopefully accurate way
Users: Use the above environments
Publishers: Move beyond papers
17. Take an Example That is Central to
What We Do
Molecular Graphics
Is It Optimal for Today’s Science?
http://upload.wikimedia.org/wikipedia/commons/2/2e/M
olecular-Graphics-GRIP-75-Console.jpg
18. Good News/Bad News
Good News:
– It is harder to think of a
more powerful way to
comprehend complex
data
– It has excited
generations to the
promise of science
– It has adapted to
changing technologies
Bad News:
– It is not an
adaptive/extensible
environment
– It is not a collaborative
environment
– It is not an integrative
environment
– It is the curse of the
ribbon
BMC Bioinformatics 2005, 6:21
19. 1. A link brings up figures
from the paper
0. Full text of PLoS papers stored
in a database
2. Clicking the paper figure retrieves
data from the PDB which is
analyzed
3. A composite view of
journal and database
content results
Is a database
really different
than a
biological
journal?
PloS Comp Biol
2005 1(3) e34
4. The composite view has
links to pertinent blocks
of literature text and back to the PDB
1.
2.
3.
4.
The Knowledge and Data Cycle
20. Take Another Example:
The Raw Material of Structural
Bioinformatics
Is this the optimal starting point anymore?
21. Do data resources including the PDB
best serve the needs of the user at
this point?
22. Good News/Bad News for the PDB in
this Changing Landscape
Bad News:
– Interface complex and
uni-data oriented
– Data accessible;
methods accessible (sort
of); but not together
– Significant redundancy in
services offered
Good News:
– Annotation!
– Demand is increasing
– Integrated with other
data types
– Restful services
23. General Problem Statement:
How to insure a high quality
annotated data source that provides
the optimal environment for
accessibility, integration and analysis
by a broad community of diverse
users?
24. The Commons is a shared virtual space which is
FAIR:
– Find
– Access (use effectively)
– Interoperate
– Reuse
An environment to find and catalyze the use of
shared digital research objects
The Commons
Concept
25. The Developer or User Defines the
Environment from the Appropriate
Building Blocks
26. Public Beacons
Host Content
AMPLab 1000 Genomes Project
Broad Institute ExAC
Curoverse PGP, GA4GH Example Data
EBI
1000 Genomes Project, UK10K, GoNL, EVS,
GEUVADIS, UMCG Cardio GenePanel
Google
1000 Genomes Project, Phase III, Illumina Platinum
Genomes
ISB Known VARiants
NCBI NHLBI Exome Sequence Project
OICR 55 cancer datasets
SolveBio 56 public datasets
UCSC ClinVar, LOVD, UniProt
University of Leicester Cafe CardioKit, Cafe Variome Central
WTSI IBD, Native American, Egyptian, UK10K
Over ?? public datasets beaconized across 21 institutions
10s thousands of individuals
27.
28. The Commons
Components
Computing environment
– cloud or HPC (High Performance Computing)
– supports access, utilization, sharing and storage of
digital objects.
Methods for Interoperability
– enables connectivity, shareability and interoperability
between digital objects.
Digital object compliance model
– describes the properties of digital objects that
enables them to be discoverable and shareable.
31. The ability to store and share and
compute on digital research objects
Especially useful for large data sets that
are not easily computed locally
Scalable and Elastic
Pay per use - Cost effective
An environment that fosters
collaboration
The Commons
Computing Environment: Cloud
32. Commons - Pilots
The Cloud Credits - business model
BD2K Centers
MODs (Model Organism Databases)
HMP Data and tools available in the cloud
NCI Cloud Pilots & Genomic Data
Commons
33. The PDB in the Commons
Components:
– Annotated collection of data files
– API’s to access these data files
– Example methods using these APIs
Potential outcomes
– Nothing happens?
– A new breed of developer starts to use PDB data in new
ways ?
– The casual user has a broader set of services that
previously?
– Quality declines/increases?
34. I not only use all the brains
I have, but all I can borrow.
– Woodrow Wilson
36. NIHNIH……
Turning Discovery Into HealthTurning Discovery Into Health
philip.bourne@nih.gov
https://datascience.nih.gov/
http://www.ncbi.nlm.nih.gov/research/staff/bourne/
Editor's Notes
Photos: FC tweet; RK screen grab
16 million hospital inpatient events (24.5% of total), 35 million outpatient clinic events (53.6% of total) and 14 million emergency
department events (21.9% of total
on this slide we have a list of Beacon providers and the content that they're serving. so to date we have over 120 public datasets that have been made available via Beacons at 12 different institutions. So this represents data from 10s of thousands of individuals and theses metrics, the numbers of datasets and individuals that they represent