1. BD2K @ NIH – A Vision Through
2020
Philip E. Bourne, PhD, FACMI
Associate Director for Data Science
philip.bourne@nih.gov
2. First and foremost you should see this
meeting as a celebration of the hard
work of the past two years
Yes these are uncertain times, but …
There is a commitment to the BD2K
program through 2020
3. BD2K cannot be viewed in isolation,
but rather as part of a broader view of
data science @ NIH …
Particularly as funding is increasingly
from the IC’s
4. A View Which Includes:
• A vibrant research program of:
– Fundamental developments in data science
– Application of those fundamental developments
– Flagship projects to which developments are applied:
• PMI, Brain, Moonshot, ECHO
• A sustainable data ecosystem
– Commons and the FAIR Principles adoption
– Cross-cutting activities
• Increased workforce training
• A changing governance model
6. A Strategic Response
Research
Resources
Outcomes
• Fundamental
• Machine learning
• Data mining
• Indexing
• Predictive modeling …
• Applied
• Sustainability, governance,
economics of data
• Privacy and security
• Effective use of clouds …
7. A Strategic Response
Research
Resources
Outcomes
• Standards
• Commons
APIs
Reference data sets
Workflows
Access &
Authentication
• Workforce
• Fundamental
• Machine learning
• Data mining
• Indexing
• Predictive modeling …
• Applied
• Sustainability, governance,
economics of data
• Privacy and security
• Effective use of clouds …
8. A Strategic Response
Research
Resources
Outcomes
• Standards
• Commons
APIs
Reference data sets
Workflows
Access &
Authentication
• Workforce
• Fundamental
• Machine learning
• Data mining
• Indexing
• Predictive modeling …
• Applied
• Sustainability, governance,
economics of data
• Privacy and security
• Effective use of clouds …
• Evaluated pilots
• FAIR data
• Trained workforce
• Best practices
• Policies
• Effective use of clouds
• On-ramps for all IC’s
9. A View Which Includes:
• A vibrant research program of:
– Fundamental developments in data science
– Application of those fundamental developments
– Flagship projects to which developments are applied:
• PMI, Brain, Moonshot, ECHO
• A sustainable data ecosystem
– Commons and the FAIR Principles adoption
– Cross-cutting activities
• Increased workforce training
• A changing governance model
10. The Current Situation
• NIH Funded Data
– Total data from NIH-funded research currently estimated at 650 PB*
– 20 PB of that is in NCBI/NLM (3%) and it is expected to grow by 10 PB
this year
• Dark Data
– Only 12% of data described in published papers is in recognized
archives – 88% is dark data^
• Cost
– 2007-2014: NIH spent ~$1.2Bn extramurally on maintaining data
archives
* In 2012 Library of Congress was 3 PB
^ http://www.ncbi.nlm.nih.gov/pubmed/26207759
11. The Commons - Status
• Commons and FAIR principles* adopted across
NIH
• Development and public release of a prototype
Data Discovery Index
– DataMed
• Feb. v 1.0
• Nov v 1.5
• Cloud credits being issued for work in the
Commons
• FOA’s for Commons Framework being issued
• Commons pilots under way
* https://www.ncbi.nlm.nih.gov/pubmed/26978244
12. Sustainability – Sample Other Activities
• Request for Information: Metrics to Assess Value of Biomedical
Digital Repositories (NOT-OD-16-133)
– To be discussed at Sustainability Session, Wed 1pm
• RFA to support community based standards work was released in
the fall for May 2017 award, session today 1pm
• Funding opportunity announcement: (BD2K) Enhancing the
Efficiency and Effectiveness of Digital Curation for Biomedical Big
Data (RFA-LM-17-001)
Applications due Dec 15
13. Sustainability – Looking Forward
• International collaboration on business models
for sustainable data repositories
– Sustainable Business Models for Data Repositories
(OECD Global Science Forum)
– Future of Life Sciences and Biomedical Databases
(International Human Science Frontiers Program)
• NIH long-term data repository support
– Federal interagency Workshop on Measuring the
Impact of Data Repositories, 2017
– Recommend mechanism(s), review criteria,
implementation plan
14. Example Cross-cutting Activities
• International partnerships
• Count everything – Secure count query
framework
• California centers regional meetings
• GA4GH – Beacon project
15. A View Which Includes:
• A vibrant research program of:
– Fundamental developments in data science
– Application of those fundamental developments
– Flagship projects to which developments are applied:
• PMI, Brain, Moonshot, ECHO
• A sustainable data ecosystem
– Commons and the FAIR Principles adoption
– Cross-cutting activities
• Increased workforce training
• A changing governance model
16. NLM
• Working Group Report
– http://acd.od.nih.gov/reports/Report-NLM-
06112015-ACD.pdf
– Recommendation – NLM should become the
programmatic epicenter for data science at NIH …
• Patti Brennan – New NLM director
17. What We Hope to See in 2020
• New innovations bought about by large and
complex data
• Evidence of translation i.e. real application at the
point of care
• Broad Commons adoption leading to
– Improved sharing, reuse and hence cost effectiveness
and reproducibility
• A balance between what is spent on data vs what
is gained from that data
• Policies that are supportive of the above
18. … for your hard work and to the NIH
staff from the ADDS office and from
across the IC’s who have toiled to
make BD2K a success