SlideShare a Scribd company logo
George A. Komatsoulis, Ph.D.
National Center for Biotechnology Information
National Library of Medicine
National Institutes of Health
U.S. Department of Health and Human Services
TheCommons
Digital Objects
(with identifiers)
Search
(Indexed Metadata and API)
Computing Platform
OpenAPIs
SoftwareEncapsulation
TheCommons
Digital Objects
(with identifiers)
Search
(Indexed Metadata and API)
Computing Platform
Commons
Federation
(Infrastructure)
BD2K
Centers
DDICC
(Search)
Existing
Resources
Indexes Methods
Content
Commons
Federation
(Infrastructure)
BD2K
Centers
DDICC
(Search)
Existing
Resources
Indexes Methods
Content
Investigator
Works In
Searches
Commons
Federation
(Infrastructure)
Conformant Provider
A
Conformant Provider
B
Conformant Provider
C
The Commons: Business Model
Researcher
Discovery Index
The Commons
Cloud Provider
C
Cloud Provider
B
Cloud Provider
A
NIH
Provides Digital Objects
Retrieves/Uses Digital
Objects
Option: Fund Providers to
Support NIH Directed
Resources
Indexes Commons
Provide
Credits
Uses
Credits
Finds
Objects
 Commons
Implemented as a
federation of
‘conformant’ cloud
providers and HPC
environments
 Funded primarily
by providing
credits to
investigators
 Cost effective - Only pay for IT support used
 Drives competition – Better services at lower
cost
 Supports Data sharing by driving science into
the Commons
 Facilitates public-private partnership
 Scalable to most categories of data expected in
the next 5 years.
 Novelty:
 Never been tried, so we don’t have data about likelihood of success
 Cost Models:
 Predicated on stable or declining prices among providers
 True for the last several years, but we can’t guarantee that it will
continue, particularly if there is significant consolidation in industry
 Service Providers:
 Predicated on service providers willing to make the investment to
become conformant
 Market research suggests 3-5 providers within 2-3 months of program
launch
 Persistence:
 The model is ‘Pay As You Go’ which means if you stop paying it stops
going
 Giving investigators an unprecedented level of control over what lives
(or dies) in the Commons
 Minimum set of requirements for
 Business relationships (reseller, investigators)
 Interfaces (upload, download, manage, compute)
 Capacity (storage, compute)
 Networking and Connectivity
 Information Assurance
 Authentication and authorization
 Likely to be reviewed self-certification in pilot phase
 A conformant cloud ≠ an IaaS provider
 Likely to evolve into multiple ‘Levels of Compliance’ corresponding to
increasing degrees of making data/software meet ‘FAIR’ criteria.
 Some of our current thinking for basic compliance
 Objects are physically or logically available in the Commons
 Objects are indexed with a usable identifier
 Objects have basic search metadata attached to index entries
 Objects have clear access rules
 Objects have basic semantic metadata available
 Higher levels could include
 Objects indexed with standards based identifiers (ORCID, doi, etc.)
 Objects are open to the public (or as open as reasonable given data type)
 Objects conform to agreed upon standards (CDISC, DICOM, etc.)
 Data objects are accessible via standard APIs
 Software is encapsulated (containers, other technology) for easier usage
 We want and need your feedback on these matters!
 Phase 0: Build the plumbing
 Phase 1: Pilot the model on a small number of
investigators experienced with cloud computing, probably
within the context of BD2K awards
 Phase 2: Open the Commons credit process to grantees
from a subset of NIH Institutes and Centers
 Phase 3: Open the process to all NIH grantees
 Approved March 23, 2015
 “In light of the advances made in security protocols for cloud
computing in the past several years and given the expansion in
the volume and complexity of genomic data generated by the
research community, the National Institutes of Health (NIH) is
now allowing investigators to request permission to transfer
controlled-access genomic and associated phenotypic data
obtained from NIH-designated data repositories under the
auspices of the NIH Genomic Data Sharing (GDS) Policy to
public or private cloud systems for data storage and analysis.”
 Responsibility for ensuring the security and integrity remains
with the institution.
1960 1970 1980 1990 2000 2010 2020
Sensor Stream = 500 EB/day
Stores 69 TB/day
Collection = 14 EB/day
Store 1PB/day
Total Data = 14 PB
Store an average of 3.3TB/day for 10 years!
 NIH Office of ADDS
 Vivien Bonazzi, Ph.D.
 Philip Bourne, Ph.D
 Michelle Dunn, Ph.D
 Mark Guyer, Ph.D.
 Jennie Larkin, Ph.D.
 Leigh Finnegan
 Beth Russell
 NCBI
 Dennis Benson, Ph.D.
 Alan Graeff
 David Lipman, MD
 Jim Ostell, Ph.D.
 Don Preuss
 Steve Sherry

More Related Content

What's hot

A Data Biosphere for Biomedical Research
A Data Biosphere for Biomedical ResearchA Data Biosphere for Biomedical Research
A Data Biosphere for Biomedical Research
Robert Grossman
 
DataCite and its Members: Connecting Research and Identifying Knowledge
DataCite and its Members: Connecting Research and Identifying KnowledgeDataCite and its Members: Connecting Research and Identifying Knowledge
DataCite and its Members: Connecting Research and Identifying Knowledge
ETH-Bibliothek
 
Global registries initiative frumkin omodei
Global registries initiative frumkin omodeiGlobal registries initiative frumkin omodei
Global registries initiative frumkin omodeiASIS&T
 
National Data Archive (NADA) 3.0
National Data Archive (NADA) 3.0National Data Archive (NADA) 3.0
National Data Archive (NADA) 3.0
mehmood78
 
EMBL Australian Bioinformatics Resource AHM - Data Commons
EMBL Australian Bioinformatics Resource AHM   - Data CommonsEMBL Australian Bioinformatics Resource AHM   - Data Commons
EMBL Australian Bioinformatics Resource AHM - Data Commons
Vivien Bonazzi
 
Bonazzi commons bd2 k ahm 2016 v2
Bonazzi commons bd2 k ahm 2016 v2Bonazzi commons bd2 k ahm 2016 v2
Bonazzi commons bd2 k ahm 2016 v2
Vivien Bonazzi
 
dkNET Webinar: FAIR Data & Software in the Research Life Cycle 01/22/2021
dkNET Webinar: FAIR Data & Software in the Research Life Cycle 01/22/2021dkNET Webinar: FAIR Data & Software in the Research Life Cycle 01/22/2021
dkNET Webinar: FAIR Data & Software in the Research Life Cycle 01/22/2021
dkNET
 
Rots RDAP11 Data Archives in Federal Agencies
Rots RDAP11 Data Archives in Federal AgenciesRots RDAP11 Data Archives in Federal Agencies
Rots RDAP11 Data Archives in Federal Agencies
ASIS&T
 
Data Publishing at Harvard's Research Data Access Symposium
Data Publishing at Harvard's Research Data Access SymposiumData Publishing at Harvard's Research Data Access Symposium
Data Publishing at Harvard's Research Data Access Symposium
Merce Crosas
 
D4Science Data Infrastructure - Facilitator for a FAIR Data Management
D4Science Data Infrastructure - Facilitator for a FAIR Data ManagementD4Science Data Infrastructure - Facilitator for a FAIR Data Management
D4Science Data Infrastructure - Facilitator for a FAIR Data Management
Blue BRIDGE
 
User engagement in research data curation
User engagement in research data curationUser engagement in research data curation
User engagement in research data curation
EDINA, University of Edinburgh
 
Policy-based Data Management
Policy-based Data Management Policy-based Data Management
Policy-based Data Management
Gary Wilhelm
 
Recording and Reasoning Over Data Provenance in Web and Grid Services
Recording and Reasoning Over Data Provenance in Web and Grid ServicesRecording and Reasoning Over Data Provenance in Web and Grid Services
Recording and Reasoning Over Data Provenance in Web and Grid ServicesMartin Szomszor
 
Advancing Biomedical Knowledge Reuse with FAIR
Advancing Biomedical Knowledge Reuse with FAIRAdvancing Biomedical Knowledge Reuse with FAIR
Advancing Biomedical Knowledge Reuse with FAIR
Michel Dumontier
 
How Data Commons are Changing the Way that Large Datasets Are Analyzed and Sh...
How Data Commons are Changing the Way that Large Datasets Are Analyzed and Sh...How Data Commons are Changing the Way that Large Datasets Are Analyzed and Sh...
How Data Commons are Changing the Way that Large Datasets Are Analyzed and Sh...
Robert Grossman
 
FAIR Data Knowledge Graphs–from Theory to Practice
FAIR Data Knowledge Graphs–from Theory to PracticeFAIR Data Knowledge Graphs–from Theory to Practice
FAIR Data Knowledge Graphs–from Theory to Practice
Tom Plasterer
 
OAIS: What is it and Where is it Going? - Don Sawyer (2002)
OAIS: What is it and Where is it Going? - Don Sawyer (2002)OAIS: What is it and Where is it Going? - Don Sawyer (2002)
OAIS: What is it and Where is it Going? - Don Sawyer (2002)
faflrt
 
Preparing your data for sharing and publishing
Preparing your data for sharing and publishingPreparing your data for sharing and publishing
Preparing your data for sharing and publishing
Varsha Khodiyar
 
FAIR Data Management and FAIR Data Sharing
FAIR Data Management and FAIR Data SharingFAIR Data Management and FAIR Data Sharing
FAIR Data Management and FAIR Data Sharing
Merce Crosas
 
Making Data FAIR (Findable, Accessible, Interoperable, Reusable)
Making Data FAIR (Findable, Accessible, Interoperable, Reusable)Making Data FAIR (Findable, Accessible, Interoperable, Reusable)
Making Data FAIR (Findable, Accessible, Interoperable, Reusable)
Tom Plasterer
 

What's hot (20)

A Data Biosphere for Biomedical Research
A Data Biosphere for Biomedical ResearchA Data Biosphere for Biomedical Research
A Data Biosphere for Biomedical Research
 
DataCite and its Members: Connecting Research and Identifying Knowledge
DataCite and its Members: Connecting Research and Identifying KnowledgeDataCite and its Members: Connecting Research and Identifying Knowledge
DataCite and its Members: Connecting Research and Identifying Knowledge
 
Global registries initiative frumkin omodei
Global registries initiative frumkin omodeiGlobal registries initiative frumkin omodei
Global registries initiative frumkin omodei
 
National Data Archive (NADA) 3.0
National Data Archive (NADA) 3.0National Data Archive (NADA) 3.0
National Data Archive (NADA) 3.0
 
EMBL Australian Bioinformatics Resource AHM - Data Commons
EMBL Australian Bioinformatics Resource AHM   - Data CommonsEMBL Australian Bioinformatics Resource AHM   - Data Commons
EMBL Australian Bioinformatics Resource AHM - Data Commons
 
Bonazzi commons bd2 k ahm 2016 v2
Bonazzi commons bd2 k ahm 2016 v2Bonazzi commons bd2 k ahm 2016 v2
Bonazzi commons bd2 k ahm 2016 v2
 
dkNET Webinar: FAIR Data & Software in the Research Life Cycle 01/22/2021
dkNET Webinar: FAIR Data & Software in the Research Life Cycle 01/22/2021dkNET Webinar: FAIR Data & Software in the Research Life Cycle 01/22/2021
dkNET Webinar: FAIR Data & Software in the Research Life Cycle 01/22/2021
 
Rots RDAP11 Data Archives in Federal Agencies
Rots RDAP11 Data Archives in Federal AgenciesRots RDAP11 Data Archives in Federal Agencies
Rots RDAP11 Data Archives in Federal Agencies
 
Data Publishing at Harvard's Research Data Access Symposium
Data Publishing at Harvard's Research Data Access SymposiumData Publishing at Harvard's Research Data Access Symposium
Data Publishing at Harvard's Research Data Access Symposium
 
D4Science Data Infrastructure - Facilitator for a FAIR Data Management
D4Science Data Infrastructure - Facilitator for a FAIR Data ManagementD4Science Data Infrastructure - Facilitator for a FAIR Data Management
D4Science Data Infrastructure - Facilitator for a FAIR Data Management
 
User engagement in research data curation
User engagement in research data curationUser engagement in research data curation
User engagement in research data curation
 
Policy-based Data Management
Policy-based Data Management Policy-based Data Management
Policy-based Data Management
 
Recording and Reasoning Over Data Provenance in Web and Grid Services
Recording and Reasoning Over Data Provenance in Web and Grid ServicesRecording and Reasoning Over Data Provenance in Web and Grid Services
Recording and Reasoning Over Data Provenance in Web and Grid Services
 
Advancing Biomedical Knowledge Reuse with FAIR
Advancing Biomedical Knowledge Reuse with FAIRAdvancing Biomedical Knowledge Reuse with FAIR
Advancing Biomedical Knowledge Reuse with FAIR
 
How Data Commons are Changing the Way that Large Datasets Are Analyzed and Sh...
How Data Commons are Changing the Way that Large Datasets Are Analyzed and Sh...How Data Commons are Changing the Way that Large Datasets Are Analyzed and Sh...
How Data Commons are Changing the Way that Large Datasets Are Analyzed and Sh...
 
FAIR Data Knowledge Graphs–from Theory to Practice
FAIR Data Knowledge Graphs–from Theory to PracticeFAIR Data Knowledge Graphs–from Theory to Practice
FAIR Data Knowledge Graphs–from Theory to Practice
 
OAIS: What is it and Where is it Going? - Don Sawyer (2002)
OAIS: What is it and Where is it Going? - Don Sawyer (2002)OAIS: What is it and Where is it Going? - Don Sawyer (2002)
OAIS: What is it and Where is it Going? - Don Sawyer (2002)
 
Preparing your data for sharing and publishing
Preparing your data for sharing and publishingPreparing your data for sharing and publishing
Preparing your data for sharing and publishing
 
FAIR Data Management and FAIR Data Sharing
FAIR Data Management and FAIR Data SharingFAIR Data Management and FAIR Data Sharing
FAIR Data Management and FAIR Data Sharing
 
Making Data FAIR (Findable, Accessible, Interoperable, Reusable)
Making Data FAIR (Findable, Accessible, Interoperable, Reusable)Making Data FAIR (Findable, Accessible, Interoperable, Reusable)
Making Data FAIR (Findable, Accessible, Interoperable, Reusable)
 

Viewers also liked

Extradicio nare
Extradicio nareExtradicio nare
Extradicio nare
arelis2604
 
Making the Most of Your Postgres Rollout
Making the Most of Your Postgres RolloutMaking the Most of Your Postgres Rollout
Making the Most of Your Postgres Rollout
EDB
 
Calentamiento
CalentamientoCalentamiento
Calentamiento
nataliamontanyes
 
Advanced motion controls b25a20i
Advanced motion controls b25a20iAdvanced motion controls b25a20i
Advanced motion controls b25a20i
Electromate
 
Conception of a water level detector (tide gauge) based on a electromagnetic ...
Conception of a water level detector (tide gauge) based on a electromagnetic ...Conception of a water level detector (tide gauge) based on a electromagnetic ...
Conception of a water level detector (tide gauge) based on a electromagnetic ...
eSAT Journals
 
CAN-briefing-reverse-4sept (1)
CAN-briefing-reverse-4sept (1)CAN-briefing-reverse-4sept (1)
CAN-briefing-reverse-4sept (1)Maeve McLynn
 
1 producto de vectores
1 producto de vectores1 producto de vectores
1 producto de vectores
javier encarnacion
 
Advanced motion controls srst70
Advanced motion controls srst70Advanced motion controls srst70
Advanced motion controls srst70
Electromate
 
Traditional croatian games for children
Traditional croatian games  for childrenTraditional croatian games  for children
Traditional croatian games for children
DAMIANMANDIC1
 
Informe antihipertensivos y diureticos
Informe antihipertensivos y diureticosInforme antihipertensivos y diureticos
Informe antihipertensivos y diureticos
keyla castillo
 
Cardiopatía isquémica informe original
Cardiopatía isquémica informe originalCardiopatía isquémica informe original
Cardiopatía isquémica informe original
keyla castillo
 
Informativo Tributário Mensal - Julho 2016
Informativo Tributário Mensal - Julho 2016Informativo Tributário Mensal - Julho 2016
Informativo Tributário Mensal - Julho 2016
Renato Lopes da Rocha
 

Viewers also liked (14)

Extradicio nare
Extradicio nareExtradicio nare
Extradicio nare
 
Making the Most of Your Postgres Rollout
Making the Most of Your Postgres RolloutMaking the Most of Your Postgres Rollout
Making the Most of Your Postgres Rollout
 
Calentamiento
CalentamientoCalentamiento
Calentamiento
 
Advanced motion controls b25a20i
Advanced motion controls b25a20iAdvanced motion controls b25a20i
Advanced motion controls b25a20i
 
Conception of a water level detector (tide gauge) based on a electromagnetic ...
Conception of a water level detector (tide gauge) based on a electromagnetic ...Conception of a water level detector (tide gauge) based on a electromagnetic ...
Conception of a water level detector (tide gauge) based on a electromagnetic ...
 
CAN-briefing-reverse-4sept (1)
CAN-briefing-reverse-4sept (1)CAN-briefing-reverse-4sept (1)
CAN-briefing-reverse-4sept (1)
 
1 producto de vectores
1 producto de vectores1 producto de vectores
1 producto de vectores
 
Engineering Degree
Engineering DegreeEngineering Degree
Engineering Degree
 
Advanced motion controls srst70
Advanced motion controls srst70Advanced motion controls srst70
Advanced motion controls srst70
 
Dr. Bill Dexter Presentation
Dr. Bill Dexter PresentationDr. Bill Dexter Presentation
Dr. Bill Dexter Presentation
 
Traditional croatian games for children
Traditional croatian games  for childrenTraditional croatian games  for children
Traditional croatian games for children
 
Informe antihipertensivos y diureticos
Informe antihipertensivos y diureticosInforme antihipertensivos y diureticos
Informe antihipertensivos y diureticos
 
Cardiopatía isquémica informe original
Cardiopatía isquémica informe originalCardiopatía isquémica informe original
Cardiopatía isquémica informe original
 
Informativo Tributário Mensal - Julho 2016
Informativo Tributário Mensal - Julho 2016Informativo Tributário Mensal - Julho 2016
Informativo Tributário Mensal - Julho 2016
 

Similar to Komatsoulis internet2 executive track

The Commons: Leveraging the Power of the Cloud for Big Data
The Commons: Leveraging the Power of the Cloud for Big DataThe Commons: Leveraging the Power of the Cloud for Big Data
The Commons: Leveraging the Power of the Cloud for Big Data
Philip Bourne
 
BD2K and the Commons : ELIXR All Hands
BD2K and the Commons : ELIXR All Hands BD2K and the Commons : ELIXR All Hands
BD2K and the Commons : ELIXR All Hands
Vivien Bonazzi
 
The NIH Data Commons - BD2K All Hands Meeting 2015
The NIH Data Commons -  BD2K All Hands Meeting 2015The NIH Data Commons -  BD2K All Hands Meeting 2015
The NIH Data Commons - BD2K All Hands Meeting 2015
Vivien Bonazzi
 
The Commons
The CommonsThe Commons
The Commons
Philip Bourne
 
NIH Data Summit - The NIH Data Commons
NIH Data Summit - The NIH Data CommonsNIH Data Summit - The NIH Data Commons
NIH Data Summit - The NIH Data Commons
Vivien Bonazzi
 
Data commons bonazzi bd2 k fundamentals of science feb 2017
Data commons bonazzi   bd2 k fundamentals of science feb 2017Data commons bonazzi   bd2 k fundamentals of science feb 2017
Data commons bonazzi bd2 k fundamentals of science feb 2017
Vivien Bonazzi
 
Data Harmonization for a Molecularly Driven Health System
Data Harmonization for a Molecularly Driven Health SystemData Harmonization for a Molecularly Driven Health System
Data Harmonization for a Molecularly Driven Health System
Warren Kibbe
 
Opportunities and Challenges for International Cooperation Around Big Data
Opportunities and Challenges for International Cooperation Around Big DataOpportunities and Challenges for International Cooperation Around Big Data
Opportunities and Challenges for International Cooperation Around Big Data
Philip Bourne
 
D paul ecn2013
D paul ecn2013D paul ecn2013
D paul ecn2013ECNOfficer
 
NDS Relevant Update from the NIH Data Science (ADDS) Office
NDS Relevant Update from the NIH Data Science (ADDS) OfficeNDS Relevant Update from the NIH Data Science (ADDS) Office
NDS Relevant Update from the NIH Data Science (ADDS) Office
Philip Bourne
 
Tag.bio: Self Service Data Mesh Platform
Tag.bio: Self Service Data Mesh PlatformTag.bio: Self Service Data Mesh Platform
Tag.bio: Self Service Data Mesh Platform
Sanjay Padhi, Ph.D
 
Perspectives on the Role of Trustworthy Repository Standards in Data Journal ...
Perspectives on the Role of Trustworthy Repository Standards in Data Journal ...Perspectives on the Role of Trustworthy Repository Standards in Data Journal ...
Perspectives on the Role of Trustworthy Repository Standards in Data Journal ...
The University of Edinburgh
 
FAIRy stories: the FAIR Data principles in theory and in practice
FAIRy stories: the FAIR Data principles in theory and in practiceFAIRy stories: the FAIR Data principles in theory and in practice
FAIRy stories: the FAIR Data principles in theory and in practice
Carole Goble
 
Toward a FAIR Biomedical Data Ecosystem
Toward a FAIR Biomedical Data EcosystemToward a FAIR Biomedical Data Ecosystem
Toward a FAIR Biomedical Data Ecosystem
Globus
 
Data Harmonization for a Molecularly Driven Health System
Data Harmonization for a Molecularly Driven Health SystemData Harmonization for a Molecularly Driven Health System
Data Harmonization for a Molecularly Driven Health System
Warren Kibbe
 
Linking Data to Publications through Citation and Virtual Archives
Linking Data to Publications through Citation and Virtual ArchivesLinking Data to Publications through Citation and Virtual Archives
Linking Data to Publications through Citation and Virtual Archives
Micah Altman
 
Intro to RDM
Intro to RDMIntro to RDM
Intro to RDM
Sarah Jones
 
Commons credits model breakout
Commons credits model breakoutCommons credits model breakout
Commons credits model breakout
George Komatsoulis
 
Data Science BD2K Update for NIH
Data Science BD2K Update for NIH Data Science BD2K Update for NIH
Data Science BD2K Update for NIH
Philip Bourne
 
FAIRness Assessment of the Library of Integrated Network-based Cellular Signa...
FAIRness Assessment of the Library of Integrated Network-based Cellular Signa...FAIRness Assessment of the Library of Integrated Network-based Cellular Signa...
FAIRness Assessment of the Library of Integrated Network-based Cellular Signa...
Kathleen Jagodnik
 

Similar to Komatsoulis internet2 executive track (20)

The Commons: Leveraging the Power of the Cloud for Big Data
The Commons: Leveraging the Power of the Cloud for Big DataThe Commons: Leveraging the Power of the Cloud for Big Data
The Commons: Leveraging the Power of the Cloud for Big Data
 
BD2K and the Commons : ELIXR All Hands
BD2K and the Commons : ELIXR All Hands BD2K and the Commons : ELIXR All Hands
BD2K and the Commons : ELIXR All Hands
 
The NIH Data Commons - BD2K All Hands Meeting 2015
The NIH Data Commons -  BD2K All Hands Meeting 2015The NIH Data Commons -  BD2K All Hands Meeting 2015
The NIH Data Commons - BD2K All Hands Meeting 2015
 
The Commons
The CommonsThe Commons
The Commons
 
NIH Data Summit - The NIH Data Commons
NIH Data Summit - The NIH Data CommonsNIH Data Summit - The NIH Data Commons
NIH Data Summit - The NIH Data Commons
 
Data commons bonazzi bd2 k fundamentals of science feb 2017
Data commons bonazzi   bd2 k fundamentals of science feb 2017Data commons bonazzi   bd2 k fundamentals of science feb 2017
Data commons bonazzi bd2 k fundamentals of science feb 2017
 
Data Harmonization for a Molecularly Driven Health System
Data Harmonization for a Molecularly Driven Health SystemData Harmonization for a Molecularly Driven Health System
Data Harmonization for a Molecularly Driven Health System
 
Opportunities and Challenges for International Cooperation Around Big Data
Opportunities and Challenges for International Cooperation Around Big DataOpportunities and Challenges for International Cooperation Around Big Data
Opportunities and Challenges for International Cooperation Around Big Data
 
D paul ecn2013
D paul ecn2013D paul ecn2013
D paul ecn2013
 
NDS Relevant Update from the NIH Data Science (ADDS) Office
NDS Relevant Update from the NIH Data Science (ADDS) OfficeNDS Relevant Update from the NIH Data Science (ADDS) Office
NDS Relevant Update from the NIH Data Science (ADDS) Office
 
Tag.bio: Self Service Data Mesh Platform
Tag.bio: Self Service Data Mesh PlatformTag.bio: Self Service Data Mesh Platform
Tag.bio: Self Service Data Mesh Platform
 
Perspectives on the Role of Trustworthy Repository Standards in Data Journal ...
Perspectives on the Role of Trustworthy Repository Standards in Data Journal ...Perspectives on the Role of Trustworthy Repository Standards in Data Journal ...
Perspectives on the Role of Trustworthy Repository Standards in Data Journal ...
 
FAIRy stories: the FAIR Data principles in theory and in practice
FAIRy stories: the FAIR Data principles in theory and in practiceFAIRy stories: the FAIR Data principles in theory and in practice
FAIRy stories: the FAIR Data principles in theory and in practice
 
Toward a FAIR Biomedical Data Ecosystem
Toward a FAIR Biomedical Data EcosystemToward a FAIR Biomedical Data Ecosystem
Toward a FAIR Biomedical Data Ecosystem
 
Data Harmonization for a Molecularly Driven Health System
Data Harmonization for a Molecularly Driven Health SystemData Harmonization for a Molecularly Driven Health System
Data Harmonization for a Molecularly Driven Health System
 
Linking Data to Publications through Citation and Virtual Archives
Linking Data to Publications through Citation and Virtual ArchivesLinking Data to Publications through Citation and Virtual Archives
Linking Data to Publications through Citation and Virtual Archives
 
Intro to RDM
Intro to RDMIntro to RDM
Intro to RDM
 
Commons credits model breakout
Commons credits model breakoutCommons credits model breakout
Commons credits model breakout
 
Data Science BD2K Update for NIH
Data Science BD2K Update for NIH Data Science BD2K Update for NIH
Data Science BD2K Update for NIH
 
FAIRness Assessment of the Library of Integrated Network-based Cellular Signa...
FAIRness Assessment of the Library of Integrated Network-based Cellular Signa...FAIRness Assessment of the Library of Integrated Network-based Cellular Signa...
FAIRness Assessment of the Library of Integrated Network-based Cellular Signa...
 

Recently uploaded

Citrus Greening Disease and its Management
Citrus Greening Disease and its ManagementCitrus Greening Disease and its Management
Citrus Greening Disease and its Management
subedisuryaofficial
 
PRESENTATION ABOUT PRINCIPLE OF COSMATIC EVALUATION
PRESENTATION ABOUT PRINCIPLE OF COSMATIC EVALUATIONPRESENTATION ABOUT PRINCIPLE OF COSMATIC EVALUATION
PRESENTATION ABOUT PRINCIPLE OF COSMATIC EVALUATION
ChetanK57
 
(May 29th, 2024) Advancements in Intravital Microscopy- Insights for Preclini...
(May 29th, 2024) Advancements in Intravital Microscopy- Insights for Preclini...(May 29th, 2024) Advancements in Intravital Microscopy- Insights for Preclini...
(May 29th, 2024) Advancements in Intravital Microscopy- Insights for Preclini...
Scintica Instrumentation
 
SCHIZOPHRENIA Disorder/ Brain Disorder.pdf
SCHIZOPHRENIA Disorder/ Brain Disorder.pdfSCHIZOPHRENIA Disorder/ Brain Disorder.pdf
SCHIZOPHRENIA Disorder/ Brain Disorder.pdf
SELF-EXPLANATORY
 
NuGOweek 2024 Ghent - programme - final version
NuGOweek 2024 Ghent - programme - final versionNuGOweek 2024 Ghent - programme - final version
NuGOweek 2024 Ghent - programme - final version
pablovgd
 
general properties of oerganologametal.ppt
general properties of oerganologametal.pptgeneral properties of oerganologametal.ppt
general properties of oerganologametal.ppt
IqrimaNabilatulhusni
 
GBSN - Biochemistry (Unit 5) Chemistry of Lipids
GBSN - Biochemistry (Unit 5) Chemistry of LipidsGBSN - Biochemistry (Unit 5) Chemistry of Lipids
GBSN - Biochemistry (Unit 5) Chemistry of Lipids
Areesha Ahmad
 
filosofia boliviana introducción jsjdjd.pptx
filosofia boliviana introducción jsjdjd.pptxfilosofia boliviana introducción jsjdjd.pptx
filosofia boliviana introducción jsjdjd.pptx
IvanMallco1
 
Nucleic Acid-its structural and functional complexity.
Nucleic Acid-its structural and functional complexity.Nucleic Acid-its structural and functional complexity.
Nucleic Acid-its structural and functional complexity.
Nistarini College, Purulia (W.B) India
 
Structures and textures of metamorphic rocks
Structures and textures of metamorphic rocksStructures and textures of metamorphic rocks
Structures and textures of metamorphic rocks
kumarmathi863
 
insect taxonomy importance systematics and classification
insect taxonomy importance systematics and classificationinsect taxonomy importance systematics and classification
insect taxonomy importance systematics and classification
anitaento25
 
Mammalian Pineal Body Structure and Also Functions
Mammalian Pineal Body Structure and Also FunctionsMammalian Pineal Body Structure and Also Functions
Mammalian Pineal Body Structure and Also Functions
YOGESH DOGRA
 
GBSN - Microbiology (Lab 4) Culture Media
GBSN - Microbiology (Lab 4) Culture MediaGBSN - Microbiology (Lab 4) Culture Media
GBSN - Microbiology (Lab 4) Culture Media
Areesha Ahmad
 
Cancer cell metabolism: special Reference to Lactate Pathway
Cancer cell metabolism: special Reference to Lactate PathwayCancer cell metabolism: special Reference to Lactate Pathway
Cancer cell metabolism: special Reference to Lactate Pathway
AADYARAJPANDEY1
 
Richard's entangled aventures in wonderland
Richard's entangled aventures in wonderlandRichard's entangled aventures in wonderland
Richard's entangled aventures in wonderland
Richard Gill
 
Comparative structure of adrenal gland in vertebrates
Comparative structure of adrenal gland in vertebratesComparative structure of adrenal gland in vertebrates
Comparative structure of adrenal gland in vertebrates
sachin783648
 
Comparing Evolved Extractive Text Summary Scores of Bidirectional Encoder Rep...
Comparing Evolved Extractive Text Summary Scores of Bidirectional Encoder Rep...Comparing Evolved Extractive Text Summary Scores of Bidirectional Encoder Rep...
Comparing Evolved Extractive Text Summary Scores of Bidirectional Encoder Rep...
University of Maribor
 
Lateral Ventricles.pdf very easy good diagrams comprehensive
Lateral Ventricles.pdf very easy good diagrams comprehensiveLateral Ventricles.pdf very easy good diagrams comprehensive
Lateral Ventricles.pdf very easy good diagrams comprehensive
silvermistyshot
 
Body fluids_tonicity_dehydration_hypovolemia_hypervolemia.pptx
Body fluids_tonicity_dehydration_hypovolemia_hypervolemia.pptxBody fluids_tonicity_dehydration_hypovolemia_hypervolemia.pptx
Body fluids_tonicity_dehydration_hypovolemia_hypervolemia.pptx
muralinath2
 
ESR_factors_affect-clinic significance-Pathysiology.pptx
ESR_factors_affect-clinic significance-Pathysiology.pptxESR_factors_affect-clinic significance-Pathysiology.pptx
ESR_factors_affect-clinic significance-Pathysiology.pptx
muralinath2
 

Recently uploaded (20)

Citrus Greening Disease and its Management
Citrus Greening Disease and its ManagementCitrus Greening Disease and its Management
Citrus Greening Disease and its Management
 
PRESENTATION ABOUT PRINCIPLE OF COSMATIC EVALUATION
PRESENTATION ABOUT PRINCIPLE OF COSMATIC EVALUATIONPRESENTATION ABOUT PRINCIPLE OF COSMATIC EVALUATION
PRESENTATION ABOUT PRINCIPLE OF COSMATIC EVALUATION
 
(May 29th, 2024) Advancements in Intravital Microscopy- Insights for Preclini...
(May 29th, 2024) Advancements in Intravital Microscopy- Insights for Preclini...(May 29th, 2024) Advancements in Intravital Microscopy- Insights for Preclini...
(May 29th, 2024) Advancements in Intravital Microscopy- Insights for Preclini...
 
SCHIZOPHRENIA Disorder/ Brain Disorder.pdf
SCHIZOPHRENIA Disorder/ Brain Disorder.pdfSCHIZOPHRENIA Disorder/ Brain Disorder.pdf
SCHIZOPHRENIA Disorder/ Brain Disorder.pdf
 
NuGOweek 2024 Ghent - programme - final version
NuGOweek 2024 Ghent - programme - final versionNuGOweek 2024 Ghent - programme - final version
NuGOweek 2024 Ghent - programme - final version
 
general properties of oerganologametal.ppt
general properties of oerganologametal.pptgeneral properties of oerganologametal.ppt
general properties of oerganologametal.ppt
 
GBSN - Biochemistry (Unit 5) Chemistry of Lipids
GBSN - Biochemistry (Unit 5) Chemistry of LipidsGBSN - Biochemistry (Unit 5) Chemistry of Lipids
GBSN - Biochemistry (Unit 5) Chemistry of Lipids
 
filosofia boliviana introducción jsjdjd.pptx
filosofia boliviana introducción jsjdjd.pptxfilosofia boliviana introducción jsjdjd.pptx
filosofia boliviana introducción jsjdjd.pptx
 
Nucleic Acid-its structural and functional complexity.
Nucleic Acid-its structural and functional complexity.Nucleic Acid-its structural and functional complexity.
Nucleic Acid-its structural and functional complexity.
 
Structures and textures of metamorphic rocks
Structures and textures of metamorphic rocksStructures and textures of metamorphic rocks
Structures and textures of metamorphic rocks
 
insect taxonomy importance systematics and classification
insect taxonomy importance systematics and classificationinsect taxonomy importance systematics and classification
insect taxonomy importance systematics and classification
 
Mammalian Pineal Body Structure and Also Functions
Mammalian Pineal Body Structure and Also FunctionsMammalian Pineal Body Structure and Also Functions
Mammalian Pineal Body Structure and Also Functions
 
GBSN - Microbiology (Lab 4) Culture Media
GBSN - Microbiology (Lab 4) Culture MediaGBSN - Microbiology (Lab 4) Culture Media
GBSN - Microbiology (Lab 4) Culture Media
 
Cancer cell metabolism: special Reference to Lactate Pathway
Cancer cell metabolism: special Reference to Lactate PathwayCancer cell metabolism: special Reference to Lactate Pathway
Cancer cell metabolism: special Reference to Lactate Pathway
 
Richard's entangled aventures in wonderland
Richard's entangled aventures in wonderlandRichard's entangled aventures in wonderland
Richard's entangled aventures in wonderland
 
Comparative structure of adrenal gland in vertebrates
Comparative structure of adrenal gland in vertebratesComparative structure of adrenal gland in vertebrates
Comparative structure of adrenal gland in vertebrates
 
Comparing Evolved Extractive Text Summary Scores of Bidirectional Encoder Rep...
Comparing Evolved Extractive Text Summary Scores of Bidirectional Encoder Rep...Comparing Evolved Extractive Text Summary Scores of Bidirectional Encoder Rep...
Comparing Evolved Extractive Text Summary Scores of Bidirectional Encoder Rep...
 
Lateral Ventricles.pdf very easy good diagrams comprehensive
Lateral Ventricles.pdf very easy good diagrams comprehensiveLateral Ventricles.pdf very easy good diagrams comprehensive
Lateral Ventricles.pdf very easy good diagrams comprehensive
 
Body fluids_tonicity_dehydration_hypovolemia_hypervolemia.pptx
Body fluids_tonicity_dehydration_hypovolemia_hypervolemia.pptxBody fluids_tonicity_dehydration_hypovolemia_hypervolemia.pptx
Body fluids_tonicity_dehydration_hypovolemia_hypervolemia.pptx
 
ESR_factors_affect-clinic significance-Pathysiology.pptx
ESR_factors_affect-clinic significance-Pathysiology.pptxESR_factors_affect-clinic significance-Pathysiology.pptx
ESR_factors_affect-clinic significance-Pathysiology.pptx
 

Komatsoulis internet2 executive track

  • 1. George A. Komatsoulis, Ph.D. National Center for Biotechnology Information National Library of Medicine National Institutes of Health U.S. Department of Health and Human Services
  • 2.
  • 3. TheCommons Digital Objects (with identifiers) Search (Indexed Metadata and API) Computing Platform OpenAPIs SoftwareEncapsulation
  • 4. TheCommons Digital Objects (with identifiers) Search (Indexed Metadata and API) Computing Platform Commons Federation (Infrastructure) BD2K Centers DDICC (Search) Existing Resources Indexes Methods Content
  • 7. The Commons: Business Model Researcher Discovery Index The Commons Cloud Provider C Cloud Provider B Cloud Provider A NIH Provides Digital Objects Retrieves/Uses Digital Objects Option: Fund Providers to Support NIH Directed Resources Indexes Commons Provide Credits Uses Credits Finds Objects  Commons Implemented as a federation of ‘conformant’ cloud providers and HPC environments  Funded primarily by providing credits to investigators
  • 8.  Cost effective - Only pay for IT support used  Drives competition – Better services at lower cost  Supports Data sharing by driving science into the Commons  Facilitates public-private partnership  Scalable to most categories of data expected in the next 5 years.
  • 9.  Novelty:  Never been tried, so we don’t have data about likelihood of success  Cost Models:  Predicated on stable or declining prices among providers  True for the last several years, but we can’t guarantee that it will continue, particularly if there is significant consolidation in industry  Service Providers:  Predicated on service providers willing to make the investment to become conformant  Market research suggests 3-5 providers within 2-3 months of program launch  Persistence:  The model is ‘Pay As You Go’ which means if you stop paying it stops going  Giving investigators an unprecedented level of control over what lives (or dies) in the Commons
  • 10.  Minimum set of requirements for  Business relationships (reseller, investigators)  Interfaces (upload, download, manage, compute)  Capacity (storage, compute)  Networking and Connectivity  Information Assurance  Authentication and authorization  Likely to be reviewed self-certification in pilot phase  A conformant cloud ≠ an IaaS provider
  • 11.  Likely to evolve into multiple ‘Levels of Compliance’ corresponding to increasing degrees of making data/software meet ‘FAIR’ criteria.  Some of our current thinking for basic compliance  Objects are physically or logically available in the Commons  Objects are indexed with a usable identifier  Objects have basic search metadata attached to index entries  Objects have clear access rules  Objects have basic semantic metadata available  Higher levels could include  Objects indexed with standards based identifiers (ORCID, doi, etc.)  Objects are open to the public (or as open as reasonable given data type)  Objects conform to agreed upon standards (CDISC, DICOM, etc.)  Data objects are accessible via standard APIs  Software is encapsulated (containers, other technology) for easier usage  We want and need your feedback on these matters!
  • 12.  Phase 0: Build the plumbing  Phase 1: Pilot the model on a small number of investigators experienced with cloud computing, probably within the context of BD2K awards  Phase 2: Open the Commons credit process to grantees from a subset of NIH Institutes and Centers  Phase 3: Open the process to all NIH grantees
  • 13.
  • 14.
  • 15.  Approved March 23, 2015  “In light of the advances made in security protocols for cloud computing in the past several years and given the expansion in the volume and complexity of genomic data generated by the research community, the National Institutes of Health (NIH) is now allowing investigators to request permission to transfer controlled-access genomic and associated phenotypic data obtained from NIH-designated data repositories under the auspices of the NIH Genomic Data Sharing (GDS) Policy to public or private cloud systems for data storage and analysis.”  Responsibility for ensuring the security and integrity remains with the institution.
  • 16.
  • 17. 1960 1970 1980 1990 2000 2010 2020
  • 18. Sensor Stream = 500 EB/day Stores 69 TB/day Collection = 14 EB/day Store 1PB/day Total Data = 14 PB Store an average of 3.3TB/day for 10 years!
  • 19.
  • 20.  NIH Office of ADDS  Vivien Bonazzi, Ph.D.  Philip Bourne, Ph.D  Michelle Dunn, Ph.D  Mark Guyer, Ph.D.  Jennie Larkin, Ph.D.  Leigh Finnegan  Beth Russell  NCBI  Dennis Benson, Ph.D.  Alan Graeff  David Lipman, MD  Jim Ostell, Ph.D.  Don Preuss  Steve Sherry

Editor's Notes

  1. Mimimum Requirements: Business relationship is to allow distribution and billing of credits and to ensure that liability issues are resolved. Investigator that puts digital object in the commons is the one that retains the liability associated with its use. Interfaces – would need to be open, but not necessarily open-source. Requires support for basic operations. In addition, environment has to be open to all; so a private environment behind a university firewall won’t work. Identifiers and metadata: Tied together and together enable researchers to search for and find resources. Networking and Connectivity: Make sure that stuff is accessible, require connection to commodity internet and internet2, but key element from investigator point of view is a free egress tier for academics Environment is secure A&A: Must support inCommon because most NIH investigators have it. Minimizes hassle of granting access to collaborators across multiple platforms. Approval of clouds: Self certify vs. NIH certify vs. 3rd party certify. In early test cases, may simply say ‘FedRamped’ Cloud vs IaaS: Some IaaS (AWS comes to mind) may be uninterested in providing the ‘conformant’ layer but support other companies that provide these services using AWS backend. Already exemplars of this: Seven Bridges Genomics and the Cancer Genomics Cloud Pilots are all software layers over an IaaS provider.
  2. 1965 – Generation capacity < 100 aa’s/year/person => Dayhoff creates 1 base code to simplify computing in punch card era 1977 – Sanger and Maxam-Gilbert Sequencing invented. By mid 1980’s increase in production of 2 orders of magnitude (maybe 10-20K bases total 2-3K finished/year) 1986 – Development of dye based sequencing, ABI 370A 2000 bases/day/instrument by mid 1990’s 1996 – Development of DNA microarrays. 2 dye 100K chips => 200K/chip/day 2000’s- Next gen sequencing; 100M’s/day