Komatsoulis internet2 global forum 2015

George A. Komatsoulis, Ph.D.
National Center for Biotechnology Information (NCBI)
National Library of Medicine
National Institutes of Health
U.S. Department of Health and Human Services

 Mission: ”To seek
fundamental knowledge
about the nature and
behavior of living systems
and the application of that
knowledge to enhance
health, lengthen life, and
reduce illness and disability.”
 Composed of 27 Institutes
and Centers
 Annual Budget = $30.3B
 80% of NIH budget goes to
about 50,000 grants

1960 1970 1980 1990 2000 2010 2020

Sensor Stream = 500 EB/day
Stores 69 TB/day
Collection = 14 EB/day
Store 1PB/day
Total Data = 14 PB
Store an average of 3.3TB/day for 10 years!

 Launched to support biomedical data science research
 Support for multiple facets of data science:
 BD2K Centers
 Data and Software Discovery
 Standards and Interoperability
 Training and Workforce Development
 The Commons
 Led by Dr. Phil Bourne, NIH Associate Director for Data
Science

Public Data Repositories
Local Data
U N I V E R S I T YU N I V E R S I T Y
Locally Developed Software
Publicly Available
Software
Local storage and
compute resources

 Is scalable and exploits new computing models
 Is more cost effective given digital growth
 Simplifies sharing digital research objects such
as data, software, metadata and workflows
 Makes digital research objects more FAIR:
Findable, Accessible, Interoperable and
Reusable
 DOES NOT replace existing, well-curated
databases Phil Bourne, 2014

TheCommons
Digital Objects
(with identifiers)
Search
(Indexed Metadata and API)
Computing Platform
OpenAPIs
SoftwareEncapsulation

TheCommons
Digital Objects
(with identifiers)
Search
(Indexed Metadata and API)
Computing Platform
Commons
Federation
(Infrastructure)
BD2K
Centers
DDICC
(Search)
Existing
Resources
Indexes Methods
Content

Commons
Federation
(Infrastructure)
BD2K
Centers
DDICC
(Search)
Existing
Resources
Indexes Methods
Content
Investigator
Works In
Searches

Commons
Federation
(Infrastructure)
Conformant Provider
A
Conformant Provider
B
Conformant Provider
C

The Commons: Business Model
Researcher
Discovery Index
The Commons
Cloud Provider
C
Cloud Provider
B
Cloud Provider
A
NIH
Provides Digital Objects
Retrieves/Uses Digital
Objects
Option: Fund Providers to
Support NIH Directed
Resources
Indexes Commons
Provide
Credits
Uses
Credits
Finds
Objects
 Commons
Implemented as a
federation of
‘conformant’ cloud
providers and HPC
environments
 Funded primarily
by providing
credits to
investigators

 Cost effective - Only pay for IT support used
 Drives competition – Better services at lower
cost
 Supports Data sharing by driving science into
the Commons
 Facilitates public-private partnership
 Scalable to most categories of data expected in
the next 5 years.

 Novelty:
 Never been tried, so we don’t have data about likelihood of success
 Cost Models:
 Predicated on stable or declining prices among providers
 True for the last several years, but we can’t guarantee that it will
continue, particularly if there is significant consolidation in industry
 Service Providers:
 Predicated on service providers willing to make the investment to
become conformant
 Market research suggests 3-5 providers within 2-3 months of program
launch
 Persistence:
 The model is ‘Pay As You Go’ which means if you stop paying it stops
going
 Giving investigators an unprecedented level of control over what lives
(or dies) in the Commons

Investigator
Reseller of Cloud
Services
The Commons
Cloud Provider
C
Cloud Provider
B
Cloud Provider
A
Investigator Institution
Directs reseller
to distribute
credits
Instructs provider to
put credits on
investigator account
1
2
Review
NIH
3
4
5
6
7
Approves Credit
Request
Requests Credits
Uses credits
Distributes Credits
To Investigator

 Minimum set of requirements for
 Business relationships (reseller, investigators)
 Interfaces (upload, download, manage, compute)
 Capacity (storage, compute)
 Networking and Connectivity
 Information Assurance
 Authentication and authorization
 Still need to work out details of how to manage approval of
conformance
 A conformant cloud ≠ an IaaS provider
 Draft specification out for comment among vendors

 Phase 0: Build the plumbing
 Phase 1: Pilot the model on a small number of
investigators experienced with cloud computing, probably
within the context of BD2K awards
 Phase 2: Open the Commons credit process to grantees
from a subset of NIH Institutes and Centers
 Phase 3: Open the process to all NIH grantees

QA/QC
Validation
Aggregation
Authoritative NCI
Reference Data Set
Data Coordinating Center
NCI Genomic Data Commons (under development)
NCI Clouds
High Performance
Computing
Search/Retrieve
Download
Analysis

Secure Computational Capacity
Pre-loaded Data
Pre-loaded Data
Pre-loaded Data
NCI Genomics Consortium
NCI Genomic Data Repositories

 NIH Office of ADDS
 Vivien Bonazzi, Ph.D.
 Philip Bourne, Ph.D
 Michelle Dunn, Ph.D
 Mark Guyer, Ph.D.
 Jennie Larkin, Ph.D.
 Leigh Finnegan
 Beth Russell
 NCBI
 Dennis Benson, Ph.D.
 Alan Graeff
 David Lipman, MD
 Jim Ostell, Ph.D.
 Don Preuss
 Steve Sherry

Komatsoulis internet2 global forum 2015

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (12)

Similar to Komatsoulis internet2 global forum 2015

Similar to Komatsoulis internet2 global forum 2015 (20)

Recently uploaded

Recently uploaded (20)

Komatsoulis internet2 global forum 2015

Editor's Notes