An Informal Discussion About Big Data
Better Stated as

A Vision for Biomedical
Research
Digitally enabling the length and
quality of life
Philip E. Bourne
pbourne@ucsd.edu
http://pebourne.wordpress.com/2013/12/21/taking-on-the-role-of-associate-director-for-data-science-at-the-nih-my-originalvision-statement/
The Context for This Discussion
• On March 3, 2014 I will begin as the first
Associate Director of the NIH devoted to data
science
• I am giving up tenure and the sun because I
believe this is the right time for change
• The change that I will try and instill at NIH and
beyond is that of a Digital Enterprise

http://www.nih.gov/news/health/dec2013/od-09.htm
What Do I Mean By the Digital
Enterprise?
An organization that succeeds by
maximizing the use of its digital assets
to achieve its goals
Why the Digital Enterprise Now?
• Biomedical research is increasingly digital –
the talk of “Big Data” is one manifestation
• Fulfillment of the NIH mission (among others)
will increasingly be tied to actions taken on
digital data across boundaries

• History already has lessons to teach us to
make the job easier
Actions on Data Implies:
•
•
•
•
•
•
•
•
•

Insuring data quality and hence trust
Making data sustainable
Making data open and accessible
Making data findable
Providing suitable metadata and annotation
Making data queryable
Making data analyzable
Presenting data as to maximize its value
Rewarding good data practices
Boundaries on Data Implies:
• Working across biological scales
• Working across biomedical disciplines
• Working across basic and clinical research and
practice
• Working across institutional boundaries
• Working across public and private sectors
• Working across national and international
borders
• Working across funding agencies
Where to Start?

An external advisory group provided a
valuable blueprint for what should be
done
http://acd.od.nih.gov/Data%20and%20Informatics%20Working%20Group%20Report.pdf
Blueprint Recommendations
• Promote central and federated catalogs
– Establish minimal metadata framework
– Tools to facilitate data sharing
– Elaborate on existing data sharing policies

• Support methods and applications
– Fund all phases of software development
– Leverage lessons from National Centers

• Training
– More funding
– Enhance review of training apps
– Quantitative component to all awards

• On campus IT strategic plan
– Catalog of existing tools
– Informatics laboratory
– Ditto big data

• Sustainable funding commitment
What is Under Way?
•

Now:
–
–
–
–
–

Data centers (under review)
Data science training grants (call Q1 14)
Pilot data catalog consortium (call out)
Genomic Research Data Alliance (being finalized)
Piloting “NIH-drive”

• In Year One:
–
–
–
–
–
–

Extended public-private programs specifically for data science activities
Interagency activities
International exchange programs
Programs for better data descriptions
Reward institutions/communities
Policies to get clinical trial data into the public domain
Longer Term Strategy: Support for
The Research Lifecycle
Authoring
Tools

Data
Capture

Lab
Notebooks

Software
Repositories

Analysis
Tools

Scholarly
Communication
Visualization

IDEAS – HYPOTHESES – EXPERIMENTS – DATA - ANALYSIS - COMPREHENSION - DISSEMINATION

Commercial &
Public Tools

DisciplineBased Metadata
Standards

Community Portals
Git-like
Resources
By Discipline
Training

Institutional Repositories
Commercial Repositories

Data Journals

New Reward
Systems
Longer Term Strategy: Support for
The Research Lifecycle
Authoring
Tools

Data
Capture

Lab
Notebooks

Software
Repositories

Analysis
Tools

Scholarly
Communication
Visualization

IDEAS – HYPOTHESES – EXPERIMENTS – DATA - ANALYSIS - COMPREHENSION - DISSEMINATION

Commercial &
Public Tools

DisciplineBased Metadata
Standards

Community Portals
Git-like
Resources
By Discipline
Training

Institutional Repositories
Commercial Repositories

Data Journals

New Reward
Systems
References
• http://bd2k.nih.gov/
• http://pebourne.wordpress.com/2013/12/21/
taking-on-the-role-of-associate-director-fordata-science-at-the-nih-my-original-visionstatement/
• http://rd-alliance.org/
• http://www.genomeinformaticsalliance.org/
• http://www.force11.org/
pbourne@ucsd.edu

Discussion
Back Pocket Slides
The Role of Associate Director for Data
Science
1.

2.
3.
4.
5.
6.
7.

provide broad trans-NIH programmatic leadership in the area of
data science;
lead long-term NIH strategic planning in areas of data science;
provide oversight of the BD2K Initiative;
establish and nurture a trans-NIH intellectual and programmatic
‘hub’ for coordinating and enhancing data science activities;
coordinate with data science activities beyond NIH (e.g., other
government agencies, other funding agencies, and the private
sector);
play a major role in data sharing policy development and oversight
at NIH; and
interact with the Chief Information Officer, NIH to generate
synergy between BD2K and the Infrastructure Plus program.
Strategy
•
•
•
•

Use the Blueprint as a starting point
Work with IC’s to determine science drivers
Define developments needed for these drivers
Look for commonalities across IC’s – make those
a priority
• Manage and enable emergent developments
– data catalog – used to define the minimal data
description and a home for domain definitions
– Centers of excellence – test beds and exemplars for
best practices
Ways to Sell the NIH Data Science
Vision
• Developed in response to well recognized scientific needs
• Support for the complete research lifecycle – this is more
than just data
• Simple and well understood by all stakeholders (i.e.,
branded)
• A shared vision
• As ubiquitous as TCP/IP is to the Internet – a backbone for
the digital enterprise
• To data what PLOS is to knowledge – a movement that
people believe in and get behind
• An app store for the research enterprise
General Features of NIH Data Science
• Lightweight metadata standards
• Data & software registries
• Expanded policies on data sharing, open
source software
• Training programs & reward systems
• Institutional incentives
• Private sector incentives
• Data centers serving community needs

PSB2014 A Vision for Biomedical Research

  • 1.
    An Informal DiscussionAbout Big Data Better Stated as A Vision for Biomedical Research Digitally enabling the length and quality of life Philip E. Bourne pbourne@ucsd.edu http://pebourne.wordpress.com/2013/12/21/taking-on-the-role-of-associate-director-for-data-science-at-the-nih-my-originalvision-statement/
  • 2.
    The Context forThis Discussion • On March 3, 2014 I will begin as the first Associate Director of the NIH devoted to data science • I am giving up tenure and the sun because I believe this is the right time for change • The change that I will try and instill at NIH and beyond is that of a Digital Enterprise http://www.nih.gov/news/health/dec2013/od-09.htm
  • 3.
    What Do IMean By the Digital Enterprise? An organization that succeeds by maximizing the use of its digital assets to achieve its goals
  • 4.
    Why the DigitalEnterprise Now? • Biomedical research is increasingly digital – the talk of “Big Data” is one manifestation • Fulfillment of the NIH mission (among others) will increasingly be tied to actions taken on digital data across boundaries • History already has lessons to teach us to make the job easier
  • 5.
    Actions on DataImplies: • • • • • • • • • Insuring data quality and hence trust Making data sustainable Making data open and accessible Making data findable Providing suitable metadata and annotation Making data queryable Making data analyzable Presenting data as to maximize its value Rewarding good data practices
  • 6.
    Boundaries on DataImplies: • Working across biological scales • Working across biomedical disciplines • Working across basic and clinical research and practice • Working across institutional boundaries • Working across public and private sectors • Working across national and international borders • Working across funding agencies
  • 7.
    Where to Start? Anexternal advisory group provided a valuable blueprint for what should be done http://acd.od.nih.gov/Data%20and%20Informatics%20Working%20Group%20Report.pdf
  • 8.
    Blueprint Recommendations • Promotecentral and federated catalogs – Establish minimal metadata framework – Tools to facilitate data sharing – Elaborate on existing data sharing policies • Support methods and applications – Fund all phases of software development – Leverage lessons from National Centers • Training – More funding – Enhance review of training apps – Quantitative component to all awards • On campus IT strategic plan – Catalog of existing tools – Informatics laboratory – Ditto big data • Sustainable funding commitment
  • 9.
    What is UnderWay? • Now: – – – – – Data centers (under review) Data science training grants (call Q1 14) Pilot data catalog consortium (call out) Genomic Research Data Alliance (being finalized) Piloting “NIH-drive” • In Year One: – – – – – – Extended public-private programs specifically for data science activities Interagency activities International exchange programs Programs for better data descriptions Reward institutions/communities Policies to get clinical trial data into the public domain
  • 10.
    Longer Term Strategy:Support for The Research Lifecycle Authoring Tools Data Capture Lab Notebooks Software Repositories Analysis Tools Scholarly Communication Visualization IDEAS – HYPOTHESES – EXPERIMENTS – DATA - ANALYSIS - COMPREHENSION - DISSEMINATION Commercial & Public Tools DisciplineBased Metadata Standards Community Portals Git-like Resources By Discipline Training Institutional Repositories Commercial Repositories Data Journals New Reward Systems
  • 11.
    Longer Term Strategy:Support for The Research Lifecycle Authoring Tools Data Capture Lab Notebooks Software Repositories Analysis Tools Scholarly Communication Visualization IDEAS – HYPOTHESES – EXPERIMENTS – DATA - ANALYSIS - COMPREHENSION - DISSEMINATION Commercial & Public Tools DisciplineBased Metadata Standards Community Portals Git-like Resources By Discipline Training Institutional Repositories Commercial Repositories Data Journals New Reward Systems
  • 12.
  • 13.
  • 14.
  • 15.
    The Role ofAssociate Director for Data Science 1. 2. 3. 4. 5. 6. 7. provide broad trans-NIH programmatic leadership in the area of data science; lead long-term NIH strategic planning in areas of data science; provide oversight of the BD2K Initiative; establish and nurture a trans-NIH intellectual and programmatic ‘hub’ for coordinating and enhancing data science activities; coordinate with data science activities beyond NIH (e.g., other government agencies, other funding agencies, and the private sector); play a major role in data sharing policy development and oversight at NIH; and interact with the Chief Information Officer, NIH to generate synergy between BD2K and the Infrastructure Plus program.
  • 16.
    Strategy • • • • Use the Blueprintas a starting point Work with IC’s to determine science drivers Define developments needed for these drivers Look for commonalities across IC’s – make those a priority • Manage and enable emergent developments – data catalog – used to define the minimal data description and a home for domain definitions – Centers of excellence – test beds and exemplars for best practices
  • 17.
    Ways to Sellthe NIH Data Science Vision • Developed in response to well recognized scientific needs • Support for the complete research lifecycle – this is more than just data • Simple and well understood by all stakeholders (i.e., branded) • A shared vision • As ubiquitous as TCP/IP is to the Internet – a backbone for the digital enterprise • To data what PLOS is to knowledge – a movement that people believe in and get behind • An app store for the research enterprise
  • 18.
    General Features ofNIH Data Science • Lightweight metadata standards • Data & software registries • Expanded policies on data sharing, open source software • Training programs & reward systems • Institutional incentives • Private sector incentives • Data centers serving community needs