SlideShare a Scribd company logo
George A. Komatsoulis, Ph.D.
National Center for Biotechnology Information (NCBI)
National Library of Medicine
National Institutes of Health
U.S. Department of Health and Human Services
 Mission: ”To seek
fundamental knowledge
about the nature and
behavior of living systems
and the application of that
knowledge to enhance
health, lengthen life, and
reduce illness and disability.”
 Composed of 27 Institutes
and Centers
 Annual Budget = $30.3B
 80% of NIH budget goes to
about 50,000 grants
1960 1970 1980 1990 2000 2010 2020
Sensor Stream = 500 EB/day
Stores 69 TB/day
Collection = 14 EB/day
Store 1PB/day
Total Data = 14 PB
Store an average of 3.3TB/day for 10 years!
 Launched to support biomedical data science research
 Support for multiple facets of data science:
 BD2K Centers
 Data and Software Discovery
 Standards and Interoperability
 Training and Workforce Development
 The Commons
 Led by Dr. Phil Bourne, NIH Associate Director for Data
Science
Public Data Repositories
Local Data
U N I V E R S I T YU N I V E R S I T Y
Locally Developed Software
Publicly Available
Software
Local storage and
compute resources
 Is scalable and exploits new computing models
 Is more cost effective given digital growth
 Simplifies sharing digital research objects such
as data, software, metadata and workflows
 Makes digital research objects more FAIR:
Findable, Accessible, Interoperable and
Reusable
 DOES NOT replace existing, well-curated
databases Phil Bourne, 2014
TheCommons
Digital Objects
(with identifiers)
Search
(Indexed Metadata and API)
Computing Platform
OpenAPIs
SoftwareEncapsulation
TheCommons
Digital Objects
(with identifiers)
Search
(Indexed Metadata and API)
Computing Platform
Commons
Federation
(Infrastructure)
BD2K
Centers
DDICC
(Search)
Existing
Resources
Indexes Methods
Content
Commons
Federation
(Infrastructure)
BD2K
Centers
DDICC
(Search)
Existing
Resources
Indexes Methods
Content
Investigator
Works In
Searches
Commons
Federation
(Infrastructure)
Conformant Provider
A
Conformant Provider
B
Conformant Provider
C
The Commons: Business Model
Researcher
Discovery Index
The Commons
Cloud Provider
C
Cloud Provider
B
Cloud Provider
A
NIH
Provides Digital Objects
Retrieves/Uses Digital
Objects
Option: Fund Providers to
Support NIH Directed
Resources
Indexes Commons
Provide
Credits
Uses
Credits
Finds
Objects
 Commons
Implemented as a
federation of
‘conformant’ cloud
providers and HPC
environments
 Funded primarily
by providing
credits to
investigators
 Cost effective - Only pay for IT support used
 Drives competition – Better services at lower
cost
 Supports Data sharing by driving science into
the Commons
 Facilitates public-private partnership
 Scalable to most categories of data expected in
the next 5 years.
 Novelty:
 Never been tried, so we don’t have data about likelihood of success
 Cost Models:
 Predicated on stable or declining prices among providers
 True for the last several years, but we can’t guarantee that it will
continue, particularly if there is significant consolidation in industry
 Service Providers:
 Predicated on service providers willing to make the investment to
become conformant
 Market research suggests 3-5 providers within 2-3 months of program
launch
 Persistence:
 The model is ‘Pay As You Go’ which means if you stop paying it stops
going
 Giving investigators an unprecedented level of control over what lives
(or dies) in the Commons
Investigator
Reseller of Cloud
Services
The Commons
Cloud Provider
C
Cloud Provider
B
Cloud Provider
A
Investigator Institution
Directs reseller
to distribute
credits
Instructs provider to
put credits on
investigator account
1
2
Review
NIH
3
4
5
6
7
Approves Credit
Request
Requests Credits
Uses credits
Distributes Credits
To Investigator
 Minimum set of requirements for
 Business relationships (reseller, investigators)
 Interfaces (upload, download, manage, compute)
 Capacity (storage, compute)
 Networking and Connectivity
 Information Assurance
 Authentication and authorization
 Still need to work out details of how to manage approval of
conformance
 A conformant cloud ≠ an IaaS provider
 Draft specification out for comment among vendors
 Phase 0: Build the plumbing
 Phase 1: Pilot the model on a small number of
investigators experienced with cloud computing, probably
within the context of BD2K awards
 Phase 2: Open the Commons credit process to grantees
from a subset of NIH Institutes and Centers
 Phase 3: Open the process to all NIH grantees
QA/QC
Validation
Aggregation
Authoritative NCI
Reference Data Set
Data Coordinating Center
NCI Genomic Data Commons (under development)
NCI Clouds
High Performance
Computing
Search/Retrieve
Download
Analysis
Secure Computational Capacity
Pre-loaded Data
Secure Computational Capacity
Pre-loaded Data
Secure Computational Capacity
Pre-loaded Data
NCI Genomics Consortium
NCI Genomic Data Repositories
 NIH Office of ADDS
 Vivien Bonazzi, Ph.D.
 Philip Bourne, Ph.D
 Michelle Dunn, Ph.D
 Mark Guyer, Ph.D.
 Jennie Larkin, Ph.D.
 Leigh Finnegan
 Beth Russell
 NCBI
 Dennis Benson, Ph.D.
 Alan Graeff
 David Lipman, MD
 Jim Ostell, Ph.D.
 Don Preuss
 Steve Sherry

More Related Content

What's hot

Bonazzi data commons nhgri council feb 2017
Bonazzi data commons nhgri council feb 2017Bonazzi data commons nhgri council feb 2017
Bonazzi data commons nhgri council feb 2017
Vivien Bonazzi
 
Opportunities and Challenges for International Cooperation Around Big Data
Opportunities and Challenges for International Cooperation Around Big DataOpportunities and Challenges for International Cooperation Around Big Data
Opportunities and Challenges for International Cooperation Around Big Data
Philip Bourne
 
Big Data as a Catalyst for Collaboration & Innovation
Big Data as a Catalyst for Collaboration & InnovationBig Data as a Catalyst for Collaboration & Innovation
Big Data as a Catalyst for Collaboration & Innovation
Philip Bourne
 
Data commons bonazzi bd2 k fundamentals of science feb 2017
Data commons bonazzi   bd2 k fundamentals of science feb 2017Data commons bonazzi   bd2 k fundamentals of science feb 2017
Data commons bonazzi bd2 k fundamentals of science feb 2017
Vivien Bonazzi
 
How Data Commons are Changing the Way that Large Datasets Are Analyzed and Sh...
How Data Commons are Changing the Way that Large Datasets Are Analyzed and Sh...How Data Commons are Changing the Way that Large Datasets Are Analyzed and Sh...
How Data Commons are Changing the Way that Large Datasets Are Analyzed and Sh...
Robert Grossman
 
Some Proposed Principles for Interoperating Cloud Based Data Platforms
Some Proposed Principles for Interoperating Cloud Based Data PlatformsSome Proposed Principles for Interoperating Cloud Based Data Platforms
Some Proposed Principles for Interoperating Cloud Based Data Platforms
Robert Grossman
 
Commons credits model breakout
Commons credits model breakoutCommons credits model breakout
Commons credits model breakout
George Komatsoulis
 
Elsevier1 vc
Elsevier1 vcElsevier1 vc
Elsevier1 vc
Vishwas Chavan
 
What is Data Commons and How Can Your Organization Build One?
What is Data Commons and How Can Your Organization Build One?What is Data Commons and How Can Your Organization Build One?
What is Data Commons and How Can Your Organization Build One?
Robert Grossman
 
Data cite
Data citeData cite
Data cite
Vishwas Chavan
 
A Gen3 Perspective of Disparate Data
A Gen3 Perspective of Disparate DataA Gen3 Perspective of Disparate Data
A Gen3 Perspective of Disparate Data
Robert Grossman
 
Smith RDAP11 NSF Data Management Plan Case Studies
Smith RDAP11 NSF Data Management Plan Case StudiesSmith RDAP11 NSF Data Management Plan Case Studies
Smith RDAP11 NSF Data Management Plan Case Studies
ASIS&T
 
Big Data in Biomedicine – An NIH Perspective
Big Data in Biomedicine – An NIH PerspectiveBig Data in Biomedicine – An NIH Perspective
Big Data in Biomedicine – An NIH Perspective
Philip Bourne
 
Global registries initiative frumkin omodei
Global registries initiative frumkin omodeiGlobal registries initiative frumkin omodei
Global registries initiative frumkin omodei
ASIS&T
 
Altman RDAP11 Policy-based Data Management
Altman RDAP11 Policy-based Data ManagementAltman RDAP11 Policy-based Data Management
Altman RDAP11 Policy-based Data Management
ASIS&T
 
Integration of research literature and data (InFoLiS)
Integration of research literature and data (InFoLiS)Integration of research literature and data (InFoLiS)
Integration of research literature and data (InFoLiS)
Philipp Zumstein
 
NIH BD2K DataMed model, DATS
NIH BD2K DataMed model, DATSNIH BD2K DataMed model, DATS
NIH BD2K DataMed model, DATS
Susanna-Assunta Sansone
 
How Data Commons are Changing the Way that Large Datasets Are Analyzed and Sh...
How Data Commons are Changing the Way that Large Datasets Are Analyzed and Sh...How Data Commons are Changing the Way that Large Datasets Are Analyzed and Sh...
How Data Commons are Changing the Way that Large Datasets Are Analyzed and Sh...
Robert Grossman
 
NSF DataNet Partners Update at RDAP14
NSF DataNet Partners Update at RDAP14NSF DataNet Partners Update at RDAP14
NSF DataNet Partners Update at RDAP14
SEAD
 
Crossing the Analytics Chasm and Getting the Models You Developed Deployed
Crossing the Analytics Chasm and Getting the Models You Developed DeployedCrossing the Analytics Chasm and Getting the Models You Developed Deployed
Crossing the Analytics Chasm and Getting the Models You Developed Deployed
Robert Grossman
 

What's hot (20)

Bonazzi data commons nhgri council feb 2017
Bonazzi data commons nhgri council feb 2017Bonazzi data commons nhgri council feb 2017
Bonazzi data commons nhgri council feb 2017
 
Opportunities and Challenges for International Cooperation Around Big Data
Opportunities and Challenges for International Cooperation Around Big DataOpportunities and Challenges for International Cooperation Around Big Data
Opportunities and Challenges for International Cooperation Around Big Data
 
Big Data as a Catalyst for Collaboration & Innovation
Big Data as a Catalyst for Collaboration & InnovationBig Data as a Catalyst for Collaboration & Innovation
Big Data as a Catalyst for Collaboration & Innovation
 
Data commons bonazzi bd2 k fundamentals of science feb 2017
Data commons bonazzi   bd2 k fundamentals of science feb 2017Data commons bonazzi   bd2 k fundamentals of science feb 2017
Data commons bonazzi bd2 k fundamentals of science feb 2017
 
How Data Commons are Changing the Way that Large Datasets Are Analyzed and Sh...
How Data Commons are Changing the Way that Large Datasets Are Analyzed and Sh...How Data Commons are Changing the Way that Large Datasets Are Analyzed and Sh...
How Data Commons are Changing the Way that Large Datasets Are Analyzed and Sh...
 
Some Proposed Principles for Interoperating Cloud Based Data Platforms
Some Proposed Principles for Interoperating Cloud Based Data PlatformsSome Proposed Principles for Interoperating Cloud Based Data Platforms
Some Proposed Principles for Interoperating Cloud Based Data Platforms
 
Commons credits model breakout
Commons credits model breakoutCommons credits model breakout
Commons credits model breakout
 
Elsevier1 vc
Elsevier1 vcElsevier1 vc
Elsevier1 vc
 
What is Data Commons and How Can Your Organization Build One?
What is Data Commons and How Can Your Organization Build One?What is Data Commons and How Can Your Organization Build One?
What is Data Commons and How Can Your Organization Build One?
 
Data cite
Data citeData cite
Data cite
 
A Gen3 Perspective of Disparate Data
A Gen3 Perspective of Disparate DataA Gen3 Perspective of Disparate Data
A Gen3 Perspective of Disparate Data
 
Smith RDAP11 NSF Data Management Plan Case Studies
Smith RDAP11 NSF Data Management Plan Case StudiesSmith RDAP11 NSF Data Management Plan Case Studies
Smith RDAP11 NSF Data Management Plan Case Studies
 
Big Data in Biomedicine – An NIH Perspective
Big Data in Biomedicine – An NIH PerspectiveBig Data in Biomedicine – An NIH Perspective
Big Data in Biomedicine – An NIH Perspective
 
Global registries initiative frumkin omodei
Global registries initiative frumkin omodeiGlobal registries initiative frumkin omodei
Global registries initiative frumkin omodei
 
Altman RDAP11 Policy-based Data Management
Altman RDAP11 Policy-based Data ManagementAltman RDAP11 Policy-based Data Management
Altman RDAP11 Policy-based Data Management
 
Integration of research literature and data (InFoLiS)
Integration of research literature and data (InFoLiS)Integration of research literature and data (InFoLiS)
Integration of research literature and data (InFoLiS)
 
NIH BD2K DataMed model, DATS
NIH BD2K DataMed model, DATSNIH BD2K DataMed model, DATS
NIH BD2K DataMed model, DATS
 
How Data Commons are Changing the Way that Large Datasets Are Analyzed and Sh...
How Data Commons are Changing the Way that Large Datasets Are Analyzed and Sh...How Data Commons are Changing the Way that Large Datasets Are Analyzed and Sh...
How Data Commons are Changing the Way that Large Datasets Are Analyzed and Sh...
 
NSF DataNet Partners Update at RDAP14
NSF DataNet Partners Update at RDAP14NSF DataNet Partners Update at RDAP14
NSF DataNet Partners Update at RDAP14
 
Crossing the Analytics Chasm and Getting the Models You Developed Deployed
Crossing the Analytics Chasm and Getting the Models You Developed DeployedCrossing the Analytics Chasm and Getting the Models You Developed Deployed
Crossing the Analytics Chasm and Getting the Models You Developed Deployed
 

Viewers also liked

Hpm100615
Hpm100615Hpm100615
Hpm100615
Philip Bourne
 
The Transformation of Systems Biology Into A Large Data Science
The Transformation of Systems Biology Into A Large Data ScienceThe Transformation of Systems Biology Into A Large Data Science
The Transformation of Systems Biology Into A Large Data Science
Robert Grossman
 
Talk given at "Cloud Computing for Systems Biology" workshop
Talk given at "Cloud Computing for Systems Biology" workshopTalk given at "Cloud Computing for Systems Biology" workshop
Talk given at "Cloud Computing for Systems Biology" workshop
Deepak Singh
 
Big Data, Computational Biology & the Future of Strategic Planning for Research
Big Data, Computational Biology & the Future of Strategic Planning for ResearchBig Data, Computational Biology & the Future of Strategic Planning for Research
Big Data, Computational Biology & the Future of Strategic Planning for Research
NBBJDesign
 
BigData in Life Sciences, Genomics and Systems Biology
BigData in Life Sciences, Genomics and Systems BiologyBigData in Life Sciences, Genomics and Systems Biology
BigData in Life Sciences, Genomics and Systems Biology
Harsha Rajasimha
 
The Future of Research (Science and Technology)
The Future of Research (Science and Technology)The Future of Research (Science and Technology)
The Future of Research (Science and Technology)
Duncan Hull
 
Big Data in Nursing
Big Data in NursingBig Data in Nursing
Big Data in Nursing
Philip Bourne
 
Philosophy of Big Data
Philosophy of Big DataPhilosophy of Big Data
Philosophy of Big Data
Melanie Swan
 
Big Data in Pharma - Overview and Use Cases
Big Data in Pharma - Overview and Use CasesBig Data in Pharma - Overview and Use Cases
Big Data in Pharma - Overview and Use Cases
Josef Scheiber
 
Building a Business on Hadoop, HBase, and Open Source Distributed Computing
Building a Business on Hadoop, HBase, and Open Source Distributed ComputingBuilding a Business on Hadoop, HBase, and Open Source Distributed Computing
Building a Business on Hadoop, HBase, and Open Source Distributed Computing
Bradford Stephens
 
The Crypto Enlightenment: Social Theory of Blockchains
The Crypto Enlightenment: Social Theory of Blockchains The Crypto Enlightenment: Social Theory of Blockchains
The Crypto Enlightenment: Social Theory of Blockchains
Melanie Swan
 
PSFK Presents The Future of Research
PSFK Presents The Future of ResearchPSFK Presents The Future of Research
PSFK Presents The Future of Research
PSFK
 

Viewers also liked (12)

Hpm100615
Hpm100615Hpm100615
Hpm100615
 
The Transformation of Systems Biology Into A Large Data Science
The Transformation of Systems Biology Into A Large Data ScienceThe Transformation of Systems Biology Into A Large Data Science
The Transformation of Systems Biology Into A Large Data Science
 
Talk given at "Cloud Computing for Systems Biology" workshop
Talk given at "Cloud Computing for Systems Biology" workshopTalk given at "Cloud Computing for Systems Biology" workshop
Talk given at "Cloud Computing for Systems Biology" workshop
 
Big Data, Computational Biology & the Future of Strategic Planning for Research
Big Data, Computational Biology & the Future of Strategic Planning for ResearchBig Data, Computational Biology & the Future of Strategic Planning for Research
Big Data, Computational Biology & the Future of Strategic Planning for Research
 
BigData in Life Sciences, Genomics and Systems Biology
BigData in Life Sciences, Genomics and Systems BiologyBigData in Life Sciences, Genomics and Systems Biology
BigData in Life Sciences, Genomics and Systems Biology
 
The Future of Research (Science and Technology)
The Future of Research (Science and Technology)The Future of Research (Science and Technology)
The Future of Research (Science and Technology)
 
Big Data in Nursing
Big Data in NursingBig Data in Nursing
Big Data in Nursing
 
Philosophy of Big Data
Philosophy of Big DataPhilosophy of Big Data
Philosophy of Big Data
 
Big Data in Pharma - Overview and Use Cases
Big Data in Pharma - Overview and Use CasesBig Data in Pharma - Overview and Use Cases
Big Data in Pharma - Overview and Use Cases
 
Building a Business on Hadoop, HBase, and Open Source Distributed Computing
Building a Business on Hadoop, HBase, and Open Source Distributed ComputingBuilding a Business on Hadoop, HBase, and Open Source Distributed Computing
Building a Business on Hadoop, HBase, and Open Source Distributed Computing
 
The Crypto Enlightenment: Social Theory of Blockchains
The Crypto Enlightenment: Social Theory of Blockchains The Crypto Enlightenment: Social Theory of Blockchains
The Crypto Enlightenment: Social Theory of Blockchains
 
PSFK Presents The Future of Research
PSFK Presents The Future of ResearchPSFK Presents The Future of Research
PSFK Presents The Future of Research
 

Similar to Komatsoulis internet2 global forum 2015

BD2K Update
BD2K Update BD2K Update
BD2K Update
Philip Bourne
 
The Commons
The CommonsThe Commons
The Commons
Philip Bourne
 
Data Harmonization for a Molecularly Driven Health System
Data Harmonization for a Molecularly Driven Health SystemData Harmonization for a Molecularly Driven Health System
Data Harmonization for a Molecularly Driven Health System
Warren Kibbe
 
The NIH Commons: A Cloud-based Training Environment
The NIH Commons: A Cloud-based Training EnvironmentThe NIH Commons: A Cloud-based Training Environment
The NIH Commons: A Cloud-based Training Environment
Philip Bourne
 
Toward a FAIR Biomedical Data Ecosystem
Toward a FAIR Biomedical Data EcosystemToward a FAIR Biomedical Data Ecosystem
Toward a FAIR Biomedical Data Ecosystem
Globus
 
Data Science BD2K Update for NIH
Data Science BD2K Update for NIH Data Science BD2K Update for NIH
Data Science BD2K Update for NIH
Philip Bourne
 
Data Harmonization for a Molecularly Driven Health System
Data Harmonization for a Molecularly Driven Health SystemData Harmonization for a Molecularly Driven Health System
Data Harmonization for a Molecularly Driven Health System
Warren Kibbe
 
Yale Day of Data
Yale Day of Data Yale Day of Data
Yale Day of Data
Philip Bourne
 
Meeting the Computational Challenges Associated with Human Health
Meeting the Computational Challenges Associated with Human HealthMeeting the Computational Challenges Associated with Human Health
Meeting the Computational Challenges Associated with Human Health
Philip Bourne
 
NDS Relevant Update from the NIH Data Science (ADDS) Office
NDS Relevant Update from the NIH Data Science (ADDS) OfficeNDS Relevant Update from the NIH Data Science (ADDS) Office
NDS Relevant Update from the NIH Data Science (ADDS) Office
Philip Bourne
 
Or 2013-abrams-sharing-data-rich-research
Or 2013-abrams-sharing-data-rich-researchOr 2013-abrams-sharing-data-rich-research
Or 2013-abrams-sharing-data-rich-research
University of California Curation Center
 
bioCADDIE Webinar: The NIDDK Information Network (dkNET) - A Community Resear...
bioCADDIE Webinar: The NIDDK Information Network (dkNET) - A Community Resear...bioCADDIE Webinar: The NIDDK Information Network (dkNET) - A Community Resear...
bioCADDIE Webinar: The NIDDK Information Network (dkNET) - A Community Resear...
dkNET
 
Building an Intelligent Biobank to Power Research Decision-Making
Building an Intelligent Biobank to Power Research Decision-MakingBuilding an Intelligent Biobank to Power Research Decision-Making
Building an Intelligent Biobank to Power Research Decision-Making
Denodo
 
AMIA 2014
AMIA 2014AMIA 2014
AMIA 2014
Philip Bourne
 
A Data Biosphere for Biomedical Research
A Data Biosphere for Biomedical ResearchA Data Biosphere for Biomedical Research
A Data Biosphere for Biomedical Research
Robert Grossman
 
The NIH as a Digital Enterprise: Implications for PAG
The NIH as a Digital Enterprise: Implications for PAGThe NIH as a Digital Enterprise: Implications for PAG
The NIH as a Digital Enterprise: Implications for PAG
Philip Bourne
 
BD2K Update
BD2K UpdateBD2K Update
BD2K Update
Philip Bourne
 
NCI Cancer Research Data Commons - Overview
NCI Cancer Research Data Commons - OverviewNCI Cancer Research Data Commons - Overview
NCI Cancer Research Data Commons - Overview
imgcommcall
 
STI 2022 - Generating large-scale network analyses of scientific landscapes i...
STI 2022 - Generating large-scale network analyses of scientific landscapes i...STI 2022 - Generating large-scale network analyses of scientific landscapes i...
STI 2022 - Generating large-scale network analyses of scientific landscapes i...
Michele Pasin
 
Perspectives on the Role of Trustworthy Repository Standards in Data Journal ...
Perspectives on the Role of Trustworthy Repository Standards in Data Journal ...Perspectives on the Role of Trustworthy Repository Standards in Data Journal ...
Perspectives on the Role of Trustworthy Repository Standards in Data Journal ...
The University of Edinburgh
 

Similar to Komatsoulis internet2 global forum 2015 (20)

BD2K Update
BD2K Update BD2K Update
BD2K Update
 
The Commons
The CommonsThe Commons
The Commons
 
Data Harmonization for a Molecularly Driven Health System
Data Harmonization for a Molecularly Driven Health SystemData Harmonization for a Molecularly Driven Health System
Data Harmonization for a Molecularly Driven Health System
 
The NIH Commons: A Cloud-based Training Environment
The NIH Commons: A Cloud-based Training EnvironmentThe NIH Commons: A Cloud-based Training Environment
The NIH Commons: A Cloud-based Training Environment
 
Toward a FAIR Biomedical Data Ecosystem
Toward a FAIR Biomedical Data EcosystemToward a FAIR Biomedical Data Ecosystem
Toward a FAIR Biomedical Data Ecosystem
 
Data Science BD2K Update for NIH
Data Science BD2K Update for NIH Data Science BD2K Update for NIH
Data Science BD2K Update for NIH
 
Data Harmonization for a Molecularly Driven Health System
Data Harmonization for a Molecularly Driven Health SystemData Harmonization for a Molecularly Driven Health System
Data Harmonization for a Molecularly Driven Health System
 
Yale Day of Data
Yale Day of Data Yale Day of Data
Yale Day of Data
 
Meeting the Computational Challenges Associated with Human Health
Meeting the Computational Challenges Associated with Human HealthMeeting the Computational Challenges Associated with Human Health
Meeting the Computational Challenges Associated with Human Health
 
NDS Relevant Update from the NIH Data Science (ADDS) Office
NDS Relevant Update from the NIH Data Science (ADDS) OfficeNDS Relevant Update from the NIH Data Science (ADDS) Office
NDS Relevant Update from the NIH Data Science (ADDS) Office
 
Or 2013-abrams-sharing-data-rich-research
Or 2013-abrams-sharing-data-rich-researchOr 2013-abrams-sharing-data-rich-research
Or 2013-abrams-sharing-data-rich-research
 
bioCADDIE Webinar: The NIDDK Information Network (dkNET) - A Community Resear...
bioCADDIE Webinar: The NIDDK Information Network (dkNET) - A Community Resear...bioCADDIE Webinar: The NIDDK Information Network (dkNET) - A Community Resear...
bioCADDIE Webinar: The NIDDK Information Network (dkNET) - A Community Resear...
 
Building an Intelligent Biobank to Power Research Decision-Making
Building an Intelligent Biobank to Power Research Decision-MakingBuilding an Intelligent Biobank to Power Research Decision-Making
Building an Intelligent Biobank to Power Research Decision-Making
 
AMIA 2014
AMIA 2014AMIA 2014
AMIA 2014
 
A Data Biosphere for Biomedical Research
A Data Biosphere for Biomedical ResearchA Data Biosphere for Biomedical Research
A Data Biosphere for Biomedical Research
 
The NIH as a Digital Enterprise: Implications for PAG
The NIH as a Digital Enterprise: Implications for PAGThe NIH as a Digital Enterprise: Implications for PAG
The NIH as a Digital Enterprise: Implications for PAG
 
BD2K Update
BD2K UpdateBD2K Update
BD2K Update
 
NCI Cancer Research Data Commons - Overview
NCI Cancer Research Data Commons - OverviewNCI Cancer Research Data Commons - Overview
NCI Cancer Research Data Commons - Overview
 
STI 2022 - Generating large-scale network analyses of scientific landscapes i...
STI 2022 - Generating large-scale network analyses of scientific landscapes i...STI 2022 - Generating large-scale network analyses of scientific landscapes i...
STI 2022 - Generating large-scale network analyses of scientific landscapes i...
 
Perspectives on the Role of Trustworthy Repository Standards in Data Journal ...
Perspectives on the Role of Trustworthy Repository Standards in Data Journal ...Perspectives on the Role of Trustworthy Repository Standards in Data Journal ...
Perspectives on the Role of Trustworthy Repository Standards in Data Journal ...
 

Recently uploaded

Applied Science: Thermodynamics, Laws & Methodology.pdf
Applied Science: Thermodynamics, Laws & Methodology.pdfApplied Science: Thermodynamics, Laws & Methodology.pdf
Applied Science: Thermodynamics, Laws & Methodology.pdf
University of Hertfordshire
 
Shallowest Oil Discovery of Turkiye.pptx
Shallowest Oil Discovery of Turkiye.pptxShallowest Oil Discovery of Turkiye.pptx
Shallowest Oil Discovery of Turkiye.pptx
Gokturk Mehmet Dilci
 
mô tả các thí nghiệm về đánh giá tác động dòng khí hóa sau đốt
mô tả các thí nghiệm về đánh giá tác động dòng khí hóa sau đốtmô tả các thí nghiệm về đánh giá tác động dòng khí hóa sau đốt
mô tả các thí nghiệm về đánh giá tác động dòng khí hóa sau đốt
HongcNguyn6
 
3D Hybrid PIC simulation of the plasma expansion (ISSS-14)
3D Hybrid PIC simulation of the plasma expansion (ISSS-14)3D Hybrid PIC simulation of the plasma expansion (ISSS-14)
3D Hybrid PIC simulation of the plasma expansion (ISSS-14)
David Osipyan
 
The binding of cosmological structures by massless topological defects
The binding of cosmological structures by massless topological defectsThe binding of cosmological structures by massless topological defects
The binding of cosmological structures by massless topological defects
Sérgio Sacani
 
原版制作(carleton毕业证书)卡尔顿大学毕业证硕士文凭原版一模一样
原版制作(carleton毕业证书)卡尔顿大学毕业证硕士文凭原版一模一样原版制作(carleton毕业证书)卡尔顿大学毕业证硕士文凭原版一模一样
原版制作(carleton毕业证书)卡尔顿大学毕业证硕士文凭原版一模一样
yqqaatn0
 
Eukaryotic Transcription Presentation.pptx
Eukaryotic Transcription Presentation.pptxEukaryotic Transcription Presentation.pptx
Eukaryotic Transcription Presentation.pptx
RitabrataSarkar3
 
Phenomics assisted breeding in crop improvement
Phenomics assisted breeding in crop improvementPhenomics assisted breeding in crop improvement
Phenomics assisted breeding in crop improvement
IshaGoswami9
 
Compexometric titration/Chelatorphy titration/chelating titration
Compexometric titration/Chelatorphy titration/chelating titrationCompexometric titration/Chelatorphy titration/chelating titration
Compexometric titration/Chelatorphy titration/chelating titration
Vandana Devesh Sharma
 
molar-distalization in orthodontics-seminar.pptx
molar-distalization in orthodontics-seminar.pptxmolar-distalization in orthodontics-seminar.pptx
molar-distalization in orthodontics-seminar.pptx
Anagha Prasad
 
Randomised Optimisation Algorithms in DAPHNE
Randomised Optimisation Algorithms in DAPHNERandomised Optimisation Algorithms in DAPHNE
Randomised Optimisation Algorithms in DAPHNE
University of Maribor
 
Thornton ESPP slides UK WW Network 4_6_24.pdf
Thornton ESPP slides UK WW Network 4_6_24.pdfThornton ESPP slides UK WW Network 4_6_24.pdf
Thornton ESPP slides UK WW Network 4_6_24.pdf
European Sustainable Phosphorus Platform
 
SAR of Medicinal Chemistry 1st by dk.pdf
SAR of Medicinal Chemistry 1st by dk.pdfSAR of Medicinal Chemistry 1st by dk.pdf
SAR of Medicinal Chemistry 1st by dk.pdf
KrushnaDarade1
 
Equivariant neural networks and representation theory
Equivariant neural networks and representation theoryEquivariant neural networks and representation theory
Equivariant neural networks and representation theory
Daniel Tubbenhauer
 
Unlocking the mysteries of reproduction: Exploring fecundity and gonadosomati...
Unlocking the mysteries of reproduction: Exploring fecundity and gonadosomati...Unlocking the mysteries of reproduction: Exploring fecundity and gonadosomati...
Unlocking the mysteries of reproduction: Exploring fecundity and gonadosomati...
AbdullaAlAsif1
 
waterlessdyeingtechnolgyusing carbon dioxide chemicalspdf
waterlessdyeingtechnolgyusing carbon dioxide chemicalspdfwaterlessdyeingtechnolgyusing carbon dioxide chemicalspdf
waterlessdyeingtechnolgyusing carbon dioxide chemicalspdf
LengamoLAppostilic
 
Cytokines and their role in immune regulation.pptx
Cytokines and their role in immune regulation.pptxCytokines and their role in immune regulation.pptx
Cytokines and their role in immune regulation.pptx
Hitesh Sikarwar
 
Bob Reedy - Nitrate in Texas Groundwater.pdf
Bob Reedy - Nitrate in Texas Groundwater.pdfBob Reedy - Nitrate in Texas Groundwater.pdf
Bob Reedy - Nitrate in Texas Groundwater.pdf
Texas Alliance of Groundwater Districts
 
Travis Hills' Endeavors in Minnesota: Fostering Environmental and Economic Pr...
Travis Hills' Endeavors in Minnesota: Fostering Environmental and Economic Pr...Travis Hills' Endeavors in Minnesota: Fostering Environmental and Economic Pr...
Travis Hills' Endeavors in Minnesota: Fostering Environmental and Economic Pr...
Travis Hills MN
 
Topic: SICKLE CELL DISEASE IN CHILDREN-3.pdf
Topic: SICKLE CELL DISEASE IN CHILDREN-3.pdfTopic: SICKLE CELL DISEASE IN CHILDREN-3.pdf
Topic: SICKLE CELL DISEASE IN CHILDREN-3.pdf
TinyAnderson
 

Recently uploaded (20)

Applied Science: Thermodynamics, Laws & Methodology.pdf
Applied Science: Thermodynamics, Laws & Methodology.pdfApplied Science: Thermodynamics, Laws & Methodology.pdf
Applied Science: Thermodynamics, Laws & Methodology.pdf
 
Shallowest Oil Discovery of Turkiye.pptx
Shallowest Oil Discovery of Turkiye.pptxShallowest Oil Discovery of Turkiye.pptx
Shallowest Oil Discovery of Turkiye.pptx
 
mô tả các thí nghiệm về đánh giá tác động dòng khí hóa sau đốt
mô tả các thí nghiệm về đánh giá tác động dòng khí hóa sau đốtmô tả các thí nghiệm về đánh giá tác động dòng khí hóa sau đốt
mô tả các thí nghiệm về đánh giá tác động dòng khí hóa sau đốt
 
3D Hybrid PIC simulation of the plasma expansion (ISSS-14)
3D Hybrid PIC simulation of the plasma expansion (ISSS-14)3D Hybrid PIC simulation of the plasma expansion (ISSS-14)
3D Hybrid PIC simulation of the plasma expansion (ISSS-14)
 
The binding of cosmological structures by massless topological defects
The binding of cosmological structures by massless topological defectsThe binding of cosmological structures by massless topological defects
The binding of cosmological structures by massless topological defects
 
原版制作(carleton毕业证书)卡尔顿大学毕业证硕士文凭原版一模一样
原版制作(carleton毕业证书)卡尔顿大学毕业证硕士文凭原版一模一样原版制作(carleton毕业证书)卡尔顿大学毕业证硕士文凭原版一模一样
原版制作(carleton毕业证书)卡尔顿大学毕业证硕士文凭原版一模一样
 
Eukaryotic Transcription Presentation.pptx
Eukaryotic Transcription Presentation.pptxEukaryotic Transcription Presentation.pptx
Eukaryotic Transcription Presentation.pptx
 
Phenomics assisted breeding in crop improvement
Phenomics assisted breeding in crop improvementPhenomics assisted breeding in crop improvement
Phenomics assisted breeding in crop improvement
 
Compexometric titration/Chelatorphy titration/chelating titration
Compexometric titration/Chelatorphy titration/chelating titrationCompexometric titration/Chelatorphy titration/chelating titration
Compexometric titration/Chelatorphy titration/chelating titration
 
molar-distalization in orthodontics-seminar.pptx
molar-distalization in orthodontics-seminar.pptxmolar-distalization in orthodontics-seminar.pptx
molar-distalization in orthodontics-seminar.pptx
 
Randomised Optimisation Algorithms in DAPHNE
Randomised Optimisation Algorithms in DAPHNERandomised Optimisation Algorithms in DAPHNE
Randomised Optimisation Algorithms in DAPHNE
 
Thornton ESPP slides UK WW Network 4_6_24.pdf
Thornton ESPP slides UK WW Network 4_6_24.pdfThornton ESPP slides UK WW Network 4_6_24.pdf
Thornton ESPP slides UK WW Network 4_6_24.pdf
 
SAR of Medicinal Chemistry 1st by dk.pdf
SAR of Medicinal Chemistry 1st by dk.pdfSAR of Medicinal Chemistry 1st by dk.pdf
SAR of Medicinal Chemistry 1st by dk.pdf
 
Equivariant neural networks and representation theory
Equivariant neural networks and representation theoryEquivariant neural networks and representation theory
Equivariant neural networks and representation theory
 
Unlocking the mysteries of reproduction: Exploring fecundity and gonadosomati...
Unlocking the mysteries of reproduction: Exploring fecundity and gonadosomati...Unlocking the mysteries of reproduction: Exploring fecundity and gonadosomati...
Unlocking the mysteries of reproduction: Exploring fecundity and gonadosomati...
 
waterlessdyeingtechnolgyusing carbon dioxide chemicalspdf
waterlessdyeingtechnolgyusing carbon dioxide chemicalspdfwaterlessdyeingtechnolgyusing carbon dioxide chemicalspdf
waterlessdyeingtechnolgyusing carbon dioxide chemicalspdf
 
Cytokines and their role in immune regulation.pptx
Cytokines and their role in immune regulation.pptxCytokines and their role in immune regulation.pptx
Cytokines and their role in immune regulation.pptx
 
Bob Reedy - Nitrate in Texas Groundwater.pdf
Bob Reedy - Nitrate in Texas Groundwater.pdfBob Reedy - Nitrate in Texas Groundwater.pdf
Bob Reedy - Nitrate in Texas Groundwater.pdf
 
Travis Hills' Endeavors in Minnesota: Fostering Environmental and Economic Pr...
Travis Hills' Endeavors in Minnesota: Fostering Environmental and Economic Pr...Travis Hills' Endeavors in Minnesota: Fostering Environmental and Economic Pr...
Travis Hills' Endeavors in Minnesota: Fostering Environmental and Economic Pr...
 
Topic: SICKLE CELL DISEASE IN CHILDREN-3.pdf
Topic: SICKLE CELL DISEASE IN CHILDREN-3.pdfTopic: SICKLE CELL DISEASE IN CHILDREN-3.pdf
Topic: SICKLE CELL DISEASE IN CHILDREN-3.pdf
 

Komatsoulis internet2 global forum 2015

  • 1. George A. Komatsoulis, Ph.D. National Center for Biotechnology Information (NCBI) National Library of Medicine National Institutes of Health U.S. Department of Health and Human Services
  • 2.  Mission: ”To seek fundamental knowledge about the nature and behavior of living systems and the application of that knowledge to enhance health, lengthen life, and reduce illness and disability.”  Composed of 27 Institutes and Centers  Annual Budget = $30.3B  80% of NIH budget goes to about 50,000 grants
  • 3.
  • 4. 1960 1970 1980 1990 2000 2010 2020
  • 5. Sensor Stream = 500 EB/day Stores 69 TB/day Collection = 14 EB/day Store 1PB/day Total Data = 14 PB Store an average of 3.3TB/day for 10 years!
  • 6.
  • 7.  Launched to support biomedical data science research  Support for multiple facets of data science:  BD2K Centers  Data and Software Discovery  Standards and Interoperability  Training and Workforce Development  The Commons  Led by Dr. Phil Bourne, NIH Associate Director for Data Science
  • 8. Public Data Repositories Local Data U N I V E R S I T YU N I V E R S I T Y Locally Developed Software Publicly Available Software Local storage and compute resources
  • 9.  Is scalable and exploits new computing models  Is more cost effective given digital growth  Simplifies sharing digital research objects such as data, software, metadata and workflows  Makes digital research objects more FAIR: Findable, Accessible, Interoperable and Reusable  DOES NOT replace existing, well-curated databases Phil Bourne, 2014
  • 10. TheCommons Digital Objects (with identifiers) Search (Indexed Metadata and API) Computing Platform OpenAPIs SoftwareEncapsulation
  • 11. TheCommons Digital Objects (with identifiers) Search (Indexed Metadata and API) Computing Platform Commons Federation (Infrastructure) BD2K Centers DDICC (Search) Existing Resources Indexes Methods Content
  • 14. The Commons: Business Model Researcher Discovery Index The Commons Cloud Provider C Cloud Provider B Cloud Provider A NIH Provides Digital Objects Retrieves/Uses Digital Objects Option: Fund Providers to Support NIH Directed Resources Indexes Commons Provide Credits Uses Credits Finds Objects  Commons Implemented as a federation of ‘conformant’ cloud providers and HPC environments  Funded primarily by providing credits to investigators
  • 15.  Cost effective - Only pay for IT support used  Drives competition – Better services at lower cost  Supports Data sharing by driving science into the Commons  Facilitates public-private partnership  Scalable to most categories of data expected in the next 5 years.
  • 16.  Novelty:  Never been tried, so we don’t have data about likelihood of success  Cost Models:  Predicated on stable or declining prices among providers  True for the last several years, but we can’t guarantee that it will continue, particularly if there is significant consolidation in industry  Service Providers:  Predicated on service providers willing to make the investment to become conformant  Market research suggests 3-5 providers within 2-3 months of program launch  Persistence:  The model is ‘Pay As You Go’ which means if you stop paying it stops going  Giving investigators an unprecedented level of control over what lives (or dies) in the Commons
  • 17. Investigator Reseller of Cloud Services The Commons Cloud Provider C Cloud Provider B Cloud Provider A Investigator Institution Directs reseller to distribute credits Instructs provider to put credits on investigator account 1 2 Review NIH 3 4 5 6 7 Approves Credit Request Requests Credits Uses credits Distributes Credits To Investigator
  • 18.  Minimum set of requirements for  Business relationships (reseller, investigators)  Interfaces (upload, download, manage, compute)  Capacity (storage, compute)  Networking and Connectivity  Information Assurance  Authentication and authorization  Still need to work out details of how to manage approval of conformance  A conformant cloud ≠ an IaaS provider  Draft specification out for comment among vendors
  • 19.  Phase 0: Build the plumbing  Phase 1: Pilot the model on a small number of investigators experienced with cloud computing, probably within the context of BD2K awards  Phase 2: Open the Commons credit process to grantees from a subset of NIH Institutes and Centers  Phase 3: Open the process to all NIH grantees
  • 20. QA/QC Validation Aggregation Authoritative NCI Reference Data Set Data Coordinating Center NCI Genomic Data Commons (under development) NCI Clouds High Performance Computing Search/Retrieve Download Analysis
  • 21. Secure Computational Capacity Pre-loaded Data Secure Computational Capacity Pre-loaded Data Secure Computational Capacity Pre-loaded Data NCI Genomics Consortium NCI Genomic Data Repositories
  • 22.  NIH Office of ADDS  Vivien Bonazzi, Ph.D.  Philip Bourne, Ph.D  Michelle Dunn, Ph.D  Mark Guyer, Ph.D.  Jennie Larkin, Ph.D.  Leigh Finnegan  Beth Russell  NCBI  Dennis Benson, Ph.D.  Alan Graeff  David Lipman, MD  Jim Ostell, Ph.D.  Don Preuss  Steve Sherry

Editor's Notes

  1. 1965 – Generation capacity < 100 aa’s/year/person => Dayhoff creates 1 base code to simplify computing in punch card era 1977 – Sanger and Maxam-Gilbert Sequencing invented. By mid 1980’s increase in production of 2 orders of magnitude (maybe 10-20K bases total 2-3K finished/year) 1986 – Development of dye based sequencing, ABI 370A 2000 bases/day/instrument by mid 1990’s 1996 – Development of DNA microarrays. 2 dye 100K chips => 200K/chip/day 2000’s- Next gen sequencing; 100M’s/day
  2. This has worked well for a long time, but: Every investigator has their own copy of the data! Every investigator needs the computational resources to do whatever calculation they want to do. Making locally developed software work outside of the local institution is often a challenge. Everyone likes the Broad Firehose, only Broad has made it work! Consider the TCGA Data Set (2.5 PB) Storage and Data Protection cost approximately $2,000,000 per year per copy Constant network updates at universities 2.5 PB = 20,000,000 Gb = 23 days at 10 Gb/sec Redundant computing environments Most HPC environments are either drastically over or under utilized This is an issue with more ‘normal’ sized data sets as well
  3. Mimimum Requirements: Business relationship is to allow distribution and billing of credits and to ensure that liability issues are resolved. Investigator that puts digital object in the commons is the one that retains the liability associated with its use. Interfaces – would need to be open, but not necessarily open-source. Requires support for basic operations. In addition, environment has to be open to all; so a private environment behind a university firewall won’t work. Identifiers and metadata: Tied together and together enable researchers to search for and find resources. Networking and Connectivity: Make sure that stuff is accessible, require connection to commodity internet and internet2, but key element from investigator point of view is a free egress tier for academics Environment is secure A&A: Must support inCommon because most NIH investigators have it. Minimizes hassle of granting access to collaborators across multiple platforms. Approval of clouds: Self certify vs. NIH certify vs. 3rd party certify. In early test cases, may simply say ‘FedRamped’ Cloud vs IaaS: Some IaaS (AWS comes to mind) may be uninterested in providing the ‘conformant’ layer but support other companies that provide these services using AWS backend. Already exemplars of this: Seven Bridges Genomics and the Cancer Genomics Cloud Pilots are all software layers over an IaaS provider.