Biomedical Research as Part of the Digital
Enterprise
Philip E. Bourne Ph.D.
Associate Director for Data Science
National ...
Disclaimer: I only started March 3,
2014
…but I had been thinking about this prior to my
appointment
Let me start with a few factoids to get
the ball rolling…
The Story of Meredith
http://fora.tv/2012/04/20/Congress_Unplugged_
Phil_Bourne
1. The Era of Open Has The Potential
to Deinstitutionalize & Democratize
Daniel Hulshizer/Associated Press
1. The Era of Open Has The Potential
to Deinstitutionalize & Democratize
Daniel Hulshizer/Associated Press
2. I can’t reproduce research from my
own laboratory?
Daniel Garijo et al. 2013 Quantifying Reproducibility in Computation...
47/53 “landmark” publications
could not be replicated
[Begley, Ellis Nature,
483, 2012] [Carole Goble]
Characteristics of the Original and
Current Experiment
 Original and Current:
– Purely in silico
– Uses a combination of ...
Considered the Ability to Reproduce
by Four Classes of User
 REP-AUTHOR – original author of the work
 REP-EXPERT – doma...
A Conceptual Overview of the Method
Should Be Mandatory
Garijo et al 2013 PLOS ONE 8(11): e80278
Time to Reproduce the Method
Garijo et al 2013 PLOS ONE 8(11): e80278
2. Its not that we could not reproduce
the work, but the effort involved was
substantial
Any graduate student could tell y...
3. Data are accumulating!
4. We don’t know
enough about how
existing data are used
* http://www.cdc.gov/h1n1flu/estimates/April_March_13.htm
Jan. 20...
We Need to Learn from Industries Whose
Livelihood Addresses the Question of Use
5. Some would argue we are at an
inflexion point for change
 Evidence:
– Google car
– 3D printers
– Waze
– Robotics
From the Second Machine Age
From: The Second Machine Age: Work, Progress, and
Prosperity in a Time of Brilliant Technologi...
6. Scholarship is broken
 I have a paper with 16,000 citations that no one has
ever read
 I have papers in PLOS ONE that...
7. The reward system is in need of
repair
Okay… enough of the problems
What are some solutions?
I cast the solutions in a vision …
something I call the digital enterprise
Any institution is a candidate as a digital
ent...
Components of The Academic Digital
Enterprise
 Consists of digital assets
– E.g. datasets, papers, software, lab notes
 ...
Life in the Academic Digital Enterprise
 Jane scores extremely well in parts of her graduate on-line neurology class.
Neu...
Solution: Break Down the Silos
 New policies,
regulations e.g. data
sharing
 Economic drivers
 The promise of shared
da...
Solution: Sustainability
The How of Data Sharing
 More credit to the data scientists
 Change to funding models
 Public/...
Solution: Discoverability
 Calls for data and software registries (e.g., DDI)
 Data commons (NIH drive?)
 More clinical...
Solution: Training
 Calls out for training grants – new and as
supplements to existing training efforts
 Regional traini...
These problems and potential
solutions have been around a
long time
The good news is that “Big Data”
has bought more atten...
What Are Big Data?
 Large datasets from high throughput experiments
 Large numbers of small datasets
 Data which are “i...
The NIH is Starting to Think About the
Digital Enterprise, Witness…
 You will hear all about
BD2K from:
– Jennie Larkin
–...
This is great, but what will the end
product look like?
1. A link brings up figures
from the paper
0. Full text of PLoS papers stored
in a database
2. Clicking the paper figure r...
To get to that end point we have to
consider the complete research lifecycle
The Research Life Cycle will Persist
IDEAS – HYPOTHESES – EXPERIMENTS – DATA - ANALYSIS - COMPREHENSION - DISSEMINATION
Tools and Resources Will Continue To
Be Developed
IDEAS – HYPOTHESES – EXPERIMENTS – DATA - ANALYSIS - COMPREHENSION - DIS...
Those Elements of the Research Life Cycle will
Become More Interconnected Around a Common
Framework
IDEAS – HYPOTHESES – E...
New/Extended Support Structures Will
Emerge
IDEAS – HYPOTHESES – EXPERIMENTS – DATA - ANALYSIS - COMPREHENSION - DISSEMINA...
We Have a Ways to Go
IDEAS – HYPOTHESES – EXPERIMENTS – DATA - ANALYSIS - COMPREHENSION - DISSEMINATION
Authoring
Tools
La...
Thank You!
Questions?
philip.bourne@nih.gov
NIHNIH……
Turning Discovery Into HealthTurning Discovery Into Health
Upcoming SlideShare
Loading in …5
×

Biomedical Research as Part of the Digital Enterprise

413
-1

Published on

Presented at the Association of Biomedical Resource Facilities Annual Meeting in Albuquerque NM March 25, 2014.

Published in: Education
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
413
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
9
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide
  • Ioannidis JPA (2005) Why Most Published Research Findings Are False. PLoS Med 2(8): e124. doi:10.1371/journal.pmed.0020124
    http://www.reuters.com/article/2012/03/28/us-science-cancer-idUSBRE82R12P20120328
  • Biomedical Research as Part of the Digital Enterprise

    1. 1. Biomedical Research as Part of the Digital Enterprise Philip E. Bourne Ph.D. Associate Director for Data Science National Institutes of Health
    2. 2. Disclaimer: I only started March 3, 2014 …but I had been thinking about this prior to my appointment
    3. 3. Let me start with a few factoids to get the ball rolling…
    4. 4. The Story of Meredith http://fora.tv/2012/04/20/Congress_Unplugged_ Phil_Bourne
    5. 5. 1. The Era of Open Has The Potential to Deinstitutionalize & Democratize Daniel Hulshizer/Associated Press
    6. 6. 1. The Era of Open Has The Potential to Deinstitutionalize & Democratize Daniel Hulshizer/Associated Press
    7. 7. 2. I can’t reproduce research from my own laboratory? Daniel Garijo et al. 2013 Quantifying Reproducibility in Computational Biology: The Case of the Tuberculosis Drugome PLOS ONE 8(11) e80278 .
    8. 8. 47/53 “landmark” publications could not be replicated [Begley, Ellis Nature, 483, 2012] [Carole Goble]
    9. 9. Characteristics of the Original and Current Experiment  Original and Current: – Purely in silico – Uses a combination of public databases and open source software by us and others  Original: – http://funsite.sdsc.edu/drugome/TB/  Current: – Recast in the Wings workflow system Daniel Garijo et al. 2013 Quantifying Reproducibility in Computational Biology: The Case of the Tuberculosis Drugome PLOS ONE 8(11) e80278 .
    10. 10. Considered the Ability to Reproduce by Four Classes of User  REP-AUTHOR – original author of the work  REP-EXPERT – domain expert – can reproduce even with incomplete methods described  REP-NOVICE – basic domain (bioinformatics) expertise  REP-MINIMAL – researcher with no domain expertise Garijo et al 2013 PLOS ONE 8(11): e80278
    11. 11. A Conceptual Overview of the Method Should Be Mandatory Garijo et al 2013 PLOS ONE 8(11): e80278
    12. 12. Time to Reproduce the Method Garijo et al 2013 PLOS ONE 8(11): e80278
    13. 13. 2. Its not that we could not reproduce the work, but the effort involved was substantial Any graduate student could tell you this and little has changed in 40 years Perhaps it is time we did better?
    14. 14. 3. Data are accumulating!
    15. 15. 4. We don’t know enough about how existing data are used * http://www.cdc.gov/h1n1flu/estimates/April_March_13.htm Jan. 2008 Jan. 2009 Jan. 2010Jul. 2009Jul. 2008 Jul. 2010 1RUZ: 1918 H1 Hemagglutinin Structure Summary page activity for H1N1 Influenza related structures 3B7E: Neuraminidase of A/Brevig Mission/1/1918 H1N1 strain in complex with zanamivir [Andreas Prlic]
    16. 16. We Need to Learn from Industries Whose Livelihood Addresses the Question of Use
    17. 17. 5. Some would argue we are at an inflexion point for change  Evidence: – Google car – 3D printers – Waze – Robotics
    18. 18. From the Second Machine Age From: The Second Machine Age: Work, Progress, and Prosperity in a Time of Brilliant Technologies by Erik Brynjolfsson & Andrew McAfee
    19. 19. 6. Scholarship is broken  I have a paper with 16,000 citations that no one has ever read  I have papers in PLOS ONE that have more citations than ones in PNAS  I have data sets I am proud of few places to put them  I edited a journal but it did not count for much
    20. 20. 7. The reward system is in need of repair
    21. 21. Okay… enough of the problems What are some solutions?
    22. 22. I cast the solutions in a vision … something I call the digital enterprise Any institution is a candidate as a digital enterprise, but lets explore it in the context of the academic medical center
    23. 23. Components of The Academic Digital Enterprise  Consists of digital assets – E.g. datasets, papers, software, lab notes  Each asset is uniquely identified and has provenance, including access control – E.g. publishing simply involves changing the access control  Digital assets are interoperable across the enterprise
    24. 24. Life in the Academic Digital Enterprise  Jane scores extremely well in parts of her graduate on-line neurology class. Neurology professors, whose research profiles are on-line and well described, are automatically notified of Jane’s potential based on a computer analysis of her scores against the background interests of the neuroscience professors. Consequently, professor Smith interviews Jane and offers her a research rotation. During the rotation she enters details of her experiments related to understanding a widespread neurodegenerative disease in an on-line laboratory notebook kept in a shared on-line research space – an institutional resource where stakeholders provide metadata, including access rights and provenance beyond that available in a commercial offering. According to Jane’s preferences, the underlying computer system may automatically bring to Jane’s attention Jack, a graduate student in the chemistry department whose notebook reveals he is working on using bacteria for purposes of toxic waste cleanup. Why the connection? They reference the same gene a number of times in their notes, which is of interest to two very different disciplines – neurology and environmental sciences. In the analog academic health center they would never have discovered each other, but thanks to the Digital Enterprise, pooled knowledge can lead to a distinct advantage. The collaboration results in the discovery of a homologous human gene product as a putative target in treating the neurodegenerative disorder. A new chemical entity is developed and patented. Accordingly, by automatically matching details of the innovation with biotech companies worldwide that might have potential interest, a licensee is found. The licensee hires Jack to continue working on the project. Jane joins Joe’s laboratory, and he hires another student using the revenue from the license. The research continues and leads to a federal grant award. The students are employed, further research is supported and in time societal benefit arises from the technology. From What Big Data Means to Me JAMIA 2014 21:194
    25. 25. Solution: Break Down the Silos  New policies, regulations e.g. data sharing  Economic drivers  The promise of shared data
    26. 26. Solution: Sustainability The How of Data Sharing  More credit to the data scientists  Change to funding models  Public/Private partnerships  Interagency cooperation  International cooperation  Better evaluation and more informed decisions about existing and proposed resources – How are current data being used?  Role of institutional repositories – reward institutions rather than PIs
    27. 27. Solution: Discoverability  Calls for data and software registries (e.g., DDI)  Data commons (NIH drive?)  More clinical trial data in the public domain  Facilitate accessibility and hence access to clinical data
    28. 28. Solution: Training  Calls out for training grants – new and as supplements to existing training efforts  Regional training centers (cf Cold Spring Harbor)?
    29. 29. These problems and potential solutions have been around a long time The good news is that “Big Data” has bought more attention to the problem
    30. 30. What Are Big Data?  Large datasets from high throughput experiments  Large numbers of small datasets  Data which are “ill-formed”  The why (causality) is replaced by the what  A signal that a fundamental change is taking place – a tipping point?
    31. 31. The NIH is Starting to Think About the Digital Enterprise, Witness…  You will hear all about BD2K from: – Jennie Larkin – Warren Kibbe – Dawei Lin bd2k.nih.gov
    32. 32. This is great, but what will the end product look like?
    33. 33. 1. A link brings up figures from the paper 0. Full text of PLoS papers stored in a database 2. Clicking the paper figure retrieves data from the PDB which is analyzed 3. A composite view of journal and database content results One Possible End Point 1. User clicks on thumbnail 2. Metadata and a webservices call provide a renderable image that can be annotated 3. Selecting a features provides a database/literature mashup 4. That leads to new papers 4. The composite view has links to pertinent blocks of literature text and back to the PDB 1. 2. 3. 4. PLoS Comp. Biol. 2005 1(3) e34
    34. 34. To get to that end point we have to consider the complete research lifecycle
    35. 35. The Research Life Cycle will Persist IDEAS – HYPOTHESES – EXPERIMENTS – DATA - ANALYSIS - COMPREHENSION - DISSEMINATION
    36. 36. Tools and Resources Will Continue To Be Developed IDEAS – HYPOTHESES – EXPERIMENTS – DATA - ANALYSIS - COMPREHENSION - DISSEMINATION Authoring Tools Lab Notebooks Data Capture Software Analysis Tools Visualization Scholarly Communication
    37. 37. Those Elements of the Research Life Cycle will Become More Interconnected Around a Common Framework IDEAS – HYPOTHESES – EXPERIMENTS – DATA - ANALYSIS - COMPREHENSION - DISSEMINATION Authoring Tools Lab Notebooks Data Capture Software Analysis Tools Visualization Scholarly Communication
    38. 38. New/Extended Support Structures Will Emerge IDEAS – HYPOTHESES – EXPERIMENTS – DATA - ANALYSIS - COMPREHENSION - DISSEMINATION Authoring Tools Lab Notebooks Data Capture Software Analysis Tools Visualization Scholarly Communication Commercial & Public Tools Git-like Resources By Discipline Data Journals Discipline- Based Metadata Standards Community Portals Institutional Repositories New Reward Systems Commercial Repositories Training
    39. 39. We Have a Ways to Go IDEAS – HYPOTHESES – EXPERIMENTS – DATA - ANALYSIS - COMPREHENSION - DISSEMINATION Authoring Tools Lab Notebooks Data Capture Software Analysis Tools Visualization Scholarly Communication Commercial & Public Tools Git-like Resources By Discipline Data Journals Discipline- Based Metadata Standards Community Portals Institutional Repositories New Reward Systems Commercial Repositories Training
    40. 40. Thank You! Questions? philip.bourne@nih.gov
    41. 41. NIHNIH…… Turning Discovery Into HealthTurning Discovery Into Health
    1. A particular slide catching your eye?

      Clipping is a handy way to collect important slides you want to go back to later.

    ×