Komatsoulis internet2 executive track

George A. Komatsoulis, Ph.D.
National Center for Biotechnology Information
National Library of Medicine
National Institutes of Health
U.S. Department of Health and Human Services

TheCommons
Digital Objects
(with identifiers)
Search
(Indexed Metadata and API)
Computing Platform
OpenAPIs
SoftwareEncapsulation

TheCommons
Digital Objects
(with identifiers)
Search
(Indexed Metadata and API)
Computing Platform
Commons
Federation
(Infrastructure)
BD2K
Centers
DDICC
(Search)
Existing
Resources
Indexes Methods
Content

Commons
Federation
(Infrastructure)
BD2K
Centers
DDICC
(Search)
Existing
Resources
Indexes Methods
Content
Investigator
Works In
Searches

Commons
Federation
(Infrastructure)
Conformant Provider
A
Conformant Provider
B
Conformant Provider
C

The Commons: Business Model
Researcher
Discovery Index
The Commons
Cloud Provider
C
Cloud Provider
B
Cloud Provider
A
NIH
Provides Digital Objects
Retrieves/Uses Digital
Objects
Option: Fund Providers to
Support NIH Directed
Resources
Indexes Commons
Provide
Credits
Uses
Credits
Finds
Objects
 Commons
Implemented as a
federation of
‘conformant’ cloud
providers and HPC
environments
 Funded primarily
by providing
credits to
investigators

 Cost effective - Only pay for IT support used
 Drives competition – Better services at lower
cost
 Supports Data sharing by driving science into
the Commons
 Facilitates public-private partnership
 Scalable to most categories of data expected in
the next 5 years.

 Novelty:
 Never been tried, so we don’t have data about likelihood of success
 Cost Models:
 Predicated on stable or declining prices among providers
 True for the last several years, but we can’t guarantee that it will
continue, particularly if there is significant consolidation in industry
 Service Providers:
 Predicated on service providers willing to make the investment to
become conformant
 Market research suggests 3-5 providers within 2-3 months of program
launch
 Persistence:
 The model is ‘Pay As You Go’ which means if you stop paying it stops
going
 Giving investigators an unprecedented level of control over what lives
(or dies) in the Commons

 Minimum set of requirements for
 Business relationships (reseller, investigators)
 Interfaces (upload, download, manage, compute)
 Capacity (storage, compute)
 Networking and Connectivity
 Information Assurance
 Authentication and authorization
 Likely to be reviewed self-certification in pilot phase
 A conformant cloud ≠ an IaaS provider

 Likely to evolve into multiple ‘Levels of Compliance’ corresponding to
increasing degrees of making data/software meet ‘FAIR’ criteria.
 Some of our current thinking for basic compliance
 Objects are physically or logically available in the Commons
 Objects are indexed with a usable identifier
 Objects have basic search metadata attached to index entries
 Objects have clear access rules
 Objects have basic semantic metadata available
 Higher levels could include
 Objects indexed with standards based identifiers (ORCID, doi, etc.)
 Objects are open to the public (or as open as reasonable given data type)
 Objects conform to agreed upon standards (CDISC, DICOM, etc.)
 Data objects are accessible via standard APIs
 Software is encapsulated (containers, other technology) for easier usage
 We want and need your feedback on these matters!

 Phase 0: Build the plumbing
 Phase 1: Pilot the model on a small number of
investigators experienced with cloud computing, probably
within the context of BD2K awards
 Phase 2: Open the Commons credit process to grantees
from a subset of NIH Institutes and Centers
 Phase 3: Open the process to all NIH grantees

 Approved March 23, 2015
 “In light of the advances made in security protocols for cloud
computing in the past several years and given the expansion in
the volume and complexity of genomic data generated by the
research community, the National Institutes of Health (NIH) is
now allowing investigators to request permission to transfer
controlled-access genomic and associated phenotypic data
obtained from NIH-designated data repositories under the
auspices of the NIH Genomic Data Sharing (GDS) Policy to
public or private cloud systems for data storage and analysis.”
 Responsibility for ensuring the security and integrity remains
with the institution.

1960 1970 1980 1990 2000 2010 2020

Sensor Stream = 500 EB/day
Stores 69 TB/day
Collection = 14 EB/day
Store 1PB/day
Total Data = 14 PB
Store an average of 3.3TB/day for 10 years!

 NIH Office of ADDS
 Vivien Bonazzi, Ph.D.
 Philip Bourne, Ph.D
 Michelle Dunn, Ph.D
 Mark Guyer, Ph.D.
 Jennie Larkin, Ph.D.
 Leigh Finnegan
 Beth Russell
 NCBI
 Dennis Benson, Ph.D.
 Alan Graeff
 David Lipman, MD
 Jim Ostell, Ph.D.
 Don Preuss
 Steve Sherry

Komatsoulis internet2 executive track

More Related Content

What's hot

Viewers also liked

Similar to Komatsoulis internet2 executive track

Recently uploaded

Komatsoulis internet2 executive track

Editor's Notes