SlideShare a Scribd company logo
1 of 11
Federated computing on massive biomedical
data sets across multiple data centers
JONATHAN SHEFFI
CEO, CUROVERSE
Researchers are struggling
to analyze large genomic
data sets
Problem
Data are physically distributed and difficult to
move because of:
• Physical size and network constraints
• Regulatory barriers
• Privacy and competitive concerns
Researcher
Commercial Data
Aggregators
Research Institutions
Medical Centers &
Hospitals
Seamless experience for
researchers using data
3.
Secure, distributed
queries & management
2.
Curated & indexed data
stored via open platform
on existing infrastructure
1.
Federated Computing
Answers
Workflows
Why not centralized proprietary SaaS?
• Data is getting larger and harder to move
• IT teams continue to choose a variety of IT
infrastructure solutions (public cloud, private cloud,
HPC) for good reasons
• Proprietary software makes standardization harder
• How do you know you are getting the files you
request?
• How do you know your pipeline will run
properly in the new environment?
• How can you be sure that the pipelines sent to
you are secure?
• How do you discover what data sets are
available?
What are the challenges to federation?
An open source platform for managing and
processing massive data sets
Designed for building federations
• Content addressing guarantees you get the data you expect
• Common Workflow Language and Docker gives you reliably
reproducible pipelines
• Multi-platform architecture lets you layer on top of existing
infrastructure
• Security and credentials that can travel with workflows
• Lightning enables complex variant-level queries, machine
learning, normalized VCF generation, GA4GH APIs & Beacon
Federation challenges solved
Common Workflow Language (commonwl.org)
A community-based global standard for workflow description
PROBLEM
• Difficult to use bioinformatics tools
because of poor run-time packaging
• No mechanism to easily discover the
availability and capabilities of tools
• No standard approach for creating
computational workflows
• Workflows are very difficult to reproduce
because of poor definition & design
• Workflows are not portable across
systems because of DIY approaches
SOLUTION
• Standard for packaging bioinformatics and
data science tools and algorithms into Docker
containers with clear interfaces
• Standard for defining computational
workflows built with tools packaged into
Docker containers
• Adopted by many major platforms in the
space, including Arvados, Galaxy, Taverna,
and Seven Bridges
• More than 250 bioinformaticians and data
scientists participating in creating standard
• Internal collaboration across countries
• Pharma translational research projects
• Large research consortiums
• Rare disease diagnosis search across institutions
• Clinical testing company operating in multiple geographies
• Clinical trial participant identification
Use cases for federations
• Wider range of platform support
• Adding a layer of brokering capabilities to coordinate a
federation
• Pushing industry adoption of CWL
• Getting more tools containerized and described with CWL
• Integrating with directory services such as Repositive
• Building a registry of tools (Dockstore.org)
What’s next?
Go to Arvados.org to download the code
Platform available for use under the AGPLv3 open source license
Go to Curoverse.com for commercial support options
Cluster Operations Subscriptions and Professional Services available
Get started

More Related Content

What's hot

Case Study Life Sciences Data: Central for Integrative Systems Biology and Bi...
Case Study Life Sciences Data: Central for Integrative Systems Biology and Bi...Case Study Life Sciences Data: Central for Integrative Systems Biology and Bi...
Case Study Life Sciences Data: Central for Integrative Systems Biology and Bi...sesrdm
 
OpenAIRE-COAR conference 2014: Next generation metrics of scholarly performa...
 OpenAIRE-COAR conference 2014: Next generation metrics of scholarly performa... OpenAIRE-COAR conference 2014: Next generation metrics of scholarly performa...
OpenAIRE-COAR conference 2014: Next generation metrics of scholarly performa...OpenAIRE
 
A FAIR Data Sharing Framework for Large-Scale Human Cancer Proteogenomics
A FAIR Data Sharing Framework for Large-Scale Human Cancer ProteogenomicsA FAIR Data Sharing Framework for Large-Scale Human Cancer Proteogenomics
A FAIR Data Sharing Framework for Large-Scale Human Cancer ProteogenomicsBrett Tully
 
2010 CLARA Nijmegen - Data Seal of Approval tutorial
2010 CLARA Nijmegen - Data Seal of Approval tutorial2010 CLARA Nijmegen - Data Seal of Approval tutorial
2010 CLARA Nijmegen - Data Seal of Approval tutorialDirk Roorda
 
Moore RDAP11 Policy-based Data Management
Moore RDAP11 Policy-based Data ManagementMoore RDAP11 Policy-based Data Management
Moore RDAP11 Policy-based Data ManagementASIS&T
 
Canadensys Explorer presentation
Canadensys Explorer presentationCanadensys Explorer presentation
Canadensys Explorer presentationkristgen
 
U Maryland Connect: How Mendeley Illuminates a Broader Definition of Impact
U Maryland Connect: How Mendeley Illuminates a Broader Definition of ImpactU Maryland Connect: How Mendeley Illuminates a Broader Definition of Impact
U Maryland Connect: How Mendeley Illuminates a Broader Definition of ImpactWilliam Gunn
 
NFAIS Altmetrics Webinar 2014
NFAIS Altmetrics Webinar 2014NFAIS Altmetrics Webinar 2014
NFAIS Altmetrics Webinar 2014William Gunn
 
The Dryad Digital Repository: Published evolutionary data as part of the gre...
The Dryad Digital Repository: Published evolutionary data as part of the gre...The Dryad Digital Repository: Published evolutionary data as part of the gre...
The Dryad Digital Repository: Published evolutionary data as part of the gre...Todd Vision
 
Identifying and tracking research resources using RRIDs: a practical approach
Identifying and tracking research resources using RRIDs:  a practical approachIdentifying and tracking research resources using RRIDs:  a practical approach
Identifying and tracking research resources using RRIDs: a practical approachdkNET
 
Data reuse and scholarly reward: understanding practice and building infrastr...
Data reuse and scholarly reward: understanding practice and building infrastr...Data reuse and scholarly reward: understanding practice and building infrastr...
Data reuse and scholarly reward: understanding practice and building infrastr...Todd Vision
 
Transforming Current Awareness Through RSS
Transforming Current Awareness Through RSSTransforming Current Awareness Through RSS
Transforming Current Awareness Through RSSazami
 
Findable, Accessible, Interoperable and Reusable (FAIR) data
Findable, Accessible, Interoperable and Reusable (FAIR) dataFindable, Accessible, Interoperable and Reusable (FAIR) data
Findable, Accessible, Interoperable and Reusable (FAIR) dataARDC
 
Hackdays and workshops 2019
Hackdays and workshops 2019Hackdays and workshops 2019
Hackdays and workshops 2019Jisc
 
Community ORCID dashboard - COrDa
Community ORCID dashboard - COrDaCommunity ORCID dashboard - COrDa
Community ORCID dashboard - COrDaJisc
 
Research data and scholarly publications: going from casual acquaintances to ...
Research data and scholarly publications: going from casual acquaintances to ...Research data and scholarly publications: going from casual acquaintances to ...
Research data and scholarly publications: going from casual acquaintances to ...Todd Vision
 
Knowledge Exchange, Nov 2011, Bonn
Knowledge Exchange, Nov 2011, BonnKnowledge Exchange, Nov 2011, Bonn
Knowledge Exchange, Nov 2011, BonnTodd Vision
 
Applications on SciVerse
Applications on SciVerseApplications on SciVerse
Applications on SciVerseRafael Sidi
 

What's hot (20)

Case Study Life Sciences Data: Central for Integrative Systems Biology and Bi...
Case Study Life Sciences Data: Central for Integrative Systems Biology and Bi...Case Study Life Sciences Data: Central for Integrative Systems Biology and Bi...
Case Study Life Sciences Data: Central for Integrative Systems Biology and Bi...
 
OpenAIRE-COAR conference 2014: Next generation metrics of scholarly performa...
 OpenAIRE-COAR conference 2014: Next generation metrics of scholarly performa... OpenAIRE-COAR conference 2014: Next generation metrics of scholarly performa...
OpenAIRE-COAR conference 2014: Next generation metrics of scholarly performa...
 
A FAIR Data Sharing Framework for Large-Scale Human Cancer Proteogenomics
A FAIR Data Sharing Framework for Large-Scale Human Cancer ProteogenomicsA FAIR Data Sharing Framework for Large-Scale Human Cancer Proteogenomics
A FAIR Data Sharing Framework for Large-Scale Human Cancer Proteogenomics
 
APHL/CDC Presentation to Vietnamese Health Officials and Stakeholders
APHL/CDC Presentation to Vietnamese Health Officials and StakeholdersAPHL/CDC Presentation to Vietnamese Health Officials and Stakeholders
APHL/CDC Presentation to Vietnamese Health Officials and Stakeholders
 
2010 CLARA Nijmegen - Data Seal of Approval tutorial
2010 CLARA Nijmegen - Data Seal of Approval tutorial2010 CLARA Nijmegen - Data Seal of Approval tutorial
2010 CLARA Nijmegen - Data Seal of Approval tutorial
 
Moore RDAP11 Policy-based Data Management
Moore RDAP11 Policy-based Data ManagementMoore RDAP11 Policy-based Data Management
Moore RDAP11 Policy-based Data Management
 
Canadensys Explorer presentation
Canadensys Explorer presentationCanadensys Explorer presentation
Canadensys Explorer presentation
 
U Maryland Connect: How Mendeley Illuminates a Broader Definition of Impact
U Maryland Connect: How Mendeley Illuminates a Broader Definition of ImpactU Maryland Connect: How Mendeley Illuminates a Broader Definition of Impact
U Maryland Connect: How Mendeley Illuminates a Broader Definition of Impact
 
NFAIS Altmetrics Webinar 2014
NFAIS Altmetrics Webinar 2014NFAIS Altmetrics Webinar 2014
NFAIS Altmetrics Webinar 2014
 
The Dryad Digital Repository: Published evolutionary data as part of the gre...
The Dryad Digital Repository: Published evolutionary data as part of the gre...The Dryad Digital Repository: Published evolutionary data as part of the gre...
The Dryad Digital Repository: Published evolutionary data as part of the gre...
 
Identifying and tracking research resources using RRIDs: a practical approach
Identifying and tracking research resources using RRIDs:  a practical approachIdentifying and tracking research resources using RRIDs:  a practical approach
Identifying and tracking research resources using RRIDs: a practical approach
 
Data reuse and scholarly reward: understanding practice and building infrastr...
Data reuse and scholarly reward: understanding practice and building infrastr...Data reuse and scholarly reward: understanding practice and building infrastr...
Data reuse and scholarly reward: understanding practice and building infrastr...
 
Transforming Current Awareness Through RSS
Transforming Current Awareness Through RSSTransforming Current Awareness Through RSS
Transforming Current Awareness Through RSS
 
Findable, Accessible, Interoperable and Reusable (FAIR) data
Findable, Accessible, Interoperable and Reusable (FAIR) dataFindable, Accessible, Interoperable and Reusable (FAIR) data
Findable, Accessible, Interoperable and Reusable (FAIR) data
 
Hackdays and workshops 2019
Hackdays and workshops 2019Hackdays and workshops 2019
Hackdays and workshops 2019
 
Community ORCID dashboard - COrDa
Community ORCID dashboard - COrDaCommunity ORCID dashboard - COrDa
Community ORCID dashboard - COrDa
 
Research data and scholarly publications: going from casual acquaintances to ...
Research data and scholarly publications: going from casual acquaintances to ...Research data and scholarly publications: going from casual acquaintances to ...
Research data and scholarly publications: going from casual acquaintances to ...
 
Knowledge Exchange, Nov 2011, Bonn
Knowledge Exchange, Nov 2011, BonnKnowledge Exchange, Nov 2011, Bonn
Knowledge Exchange, Nov 2011, Bonn
 
Course completion Certificate
Course completion CertificateCourse completion Certificate
Course completion Certificate
 
Applications on SciVerse
Applications on SciVerseApplications on SciVerse
Applications on SciVerse
 

Viewers also liked

Compact Genome Format
Compact Genome FormatCompact Genome Format
Compact Genome FormatArvados
 
Lightning Talk 2015-10-15
Lightning Talk 2015-10-15Lightning Talk 2015-10-15
Lightning Talk 2015-10-15Arvados
 
Introduction to 3rd sequencing
Introduction to 3rd sequencing Introduction to 3rd sequencing
Introduction to 3rd sequencing Eric Lee
 
Algorithm of NGS Data
Algorithm of NGS DataAlgorithm of NGS Data
Algorithm of NGS DataEric Lee
 
Genome sequences as media files
Genome sequences as media filesGenome sequences as media files
Genome sequences as media filestparidae
 
Content-Driven Apps with React
Content-Driven Apps with ReactContent-Driven Apps with React
Content-Driven Apps with ReactNetcetera
 
Towards using multimedia technology for biological data processing
Towards using multimedia technology for biological data processingTowards using multimedia technology for biological data processing
Towards using multimedia technology for biological data processingWesley De Neve
 
Netcetera Innovation Summit 2016: The Past 12 Months - What's New & Exciting
Netcetera Innovation Summit 2016: The Past 12 Months - What's New & ExcitingNetcetera Innovation Summit 2016: The Past 12 Months - What's New & Exciting
Netcetera Innovation Summit 2016: The Past 12 Months - What's New & ExcitingNetcetera
 
SwissWallet - Die digitale Währung heisst Vertrauen
SwissWallet - Die digitale Währung heisst Vertrauen SwissWallet - Die digitale Währung heisst Vertrauen
SwissWallet - Die digitale Währung heisst Vertrauen Netcetera
 
COSCUP 2016 Workshop : 快快樂樂學Neo4j
COSCUP 2016 Workshop : 快快樂樂學Neo4jCOSCUP 2016 Workshop : 快快樂樂學Neo4j
COSCUP 2016 Workshop : 快快樂樂學Neo4jEric Lee
 
Deep Machine Learning for Making Sense of Biotech Data - From Clean Energy to...
Deep Machine Learning for Making Sense of Biotech Data - From Clean Energy to...Deep Machine Learning for Making Sense of Biotech Data - From Clean Energy to...
Deep Machine Learning for Making Sense of Biotech Data - From Clean Energy to...Wesley De Neve
 
Authentication requirements and application of PSD2 in e-Commerce - Presentat...
Authentication requirements and application of PSD2 in e-Commerce - Presentat...Authentication requirements and application of PSD2 in e-Commerce - Presentat...
Authentication requirements and application of PSD2 in e-Commerce - Presentat...Netcetera
 
Lessons from 2MM machine learning models
Lessons from 2MM machine learning modelsLessons from 2MM machine learning models
Lessons from 2MM machine learning modelsExtract Data Conference
 
Lightning
LightningLightning
LightningArvados
 
SkopjePulse: Designing a better city with IoT
SkopjePulse: Designing a better city with IoTSkopjePulse: Designing a better city with IoT
SkopjePulse: Designing a better city with IoTNetcetera
 
Die Herausforderungen in der Payment-Industrie
Die Herausforderungen in der Payment-IndustrieDie Herausforderungen in der Payment-Industrie
Die Herausforderungen in der Payment-IndustrieNetcetera
 
Lightning fast genomics with Spark, Adam and Scala
Lightning fast genomics with Spark, Adam and ScalaLightning fast genomics with Spark, Adam and Scala
Lightning fast genomics with Spark, Adam and ScalaAndy Petrella
 
Managers - The Missing Manual
Managers - The Missing ManualManagers - The Missing Manual
Managers - The Missing ManualNetcetera
 

Viewers also liked (20)

Compact Genome Format
Compact Genome FormatCompact Genome Format
Compact Genome Format
 
Introduzione a Ember.js
Introduzione a Ember.jsIntroduzione a Ember.js
Introduzione a Ember.js
 
Lightning Talk 2015-10-15
Lightning Talk 2015-10-15Lightning Talk 2015-10-15
Lightning Talk 2015-10-15
 
Introduction to 3rd sequencing
Introduction to 3rd sequencing Introduction to 3rd sequencing
Introduction to 3rd sequencing
 
Algorithm of NGS Data
Algorithm of NGS DataAlgorithm of NGS Data
Algorithm of NGS Data
 
Genome sequences as media files
Genome sequences as media filesGenome sequences as media files
Genome sequences as media files
 
Content-Driven Apps with React
Content-Driven Apps with ReactContent-Driven Apps with React
Content-Driven Apps with React
 
Towards using multimedia technology for biological data processing
Towards using multimedia technology for biological data processingTowards using multimedia technology for biological data processing
Towards using multimedia technology for biological data processing
 
Netcetera Innovation Summit 2016: The Past 12 Months - What's New & Exciting
Netcetera Innovation Summit 2016: The Past 12 Months - What's New & ExcitingNetcetera Innovation Summit 2016: The Past 12 Months - What's New & Exciting
Netcetera Innovation Summit 2016: The Past 12 Months - What's New & Exciting
 
SwissWallet - Die digitale Währung heisst Vertrauen
SwissWallet - Die digitale Währung heisst Vertrauen SwissWallet - Die digitale Währung heisst Vertrauen
SwissWallet - Die digitale Währung heisst Vertrauen
 
COSCUP 2016 Workshop : 快快樂樂學Neo4j
COSCUP 2016 Workshop : 快快樂樂學Neo4jCOSCUP 2016 Workshop : 快快樂樂學Neo4j
COSCUP 2016 Workshop : 快快樂樂學Neo4j
 
Deep Machine Learning for Making Sense of Biotech Data - From Clean Energy to...
Deep Machine Learning for Making Sense of Biotech Data - From Clean Energy to...Deep Machine Learning for Making Sense of Biotech Data - From Clean Energy to...
Deep Machine Learning for Making Sense of Biotech Data - From Clean Energy to...
 
Authentication requirements and application of PSD2 in e-Commerce - Presentat...
Authentication requirements and application of PSD2 in e-Commerce - Presentat...Authentication requirements and application of PSD2 in e-Commerce - Presentat...
Authentication requirements and application of PSD2 in e-Commerce - Presentat...
 
Lessons from 2MM machine learning models
Lessons from 2MM machine learning modelsLessons from 2MM machine learning models
Lessons from 2MM machine learning models
 
Lightning
LightningLightning
Lightning
 
SkopjePulse: Designing a better city with IoT
SkopjePulse: Designing a better city with IoTSkopjePulse: Designing a better city with IoT
SkopjePulse: Designing a better city with IoT
 
2015 genome-center
2015 genome-center2015 genome-center
2015 genome-center
 
Die Herausforderungen in der Payment-Industrie
Die Herausforderungen in der Payment-IndustrieDie Herausforderungen in der Payment-Industrie
Die Herausforderungen in der Payment-Industrie
 
Lightning fast genomics with Spark, Adam and Scala
Lightning fast genomics with Spark, Adam and ScalaLightning fast genomics with Spark, Adam and Scala
Lightning fast genomics with Spark, Adam and Scala
 
Managers - The Missing Manual
Managers - The Missing ManualManagers - The Missing Manual
Managers - The Missing Manual
 

Similar to Curoverse Presentation at ICG-11 (November 2016)

Research methods group accelarating impact by sharing data
Research methods group  accelarating impact by sharing dataResearch methods group  accelarating impact by sharing data
Research methods group accelarating impact by sharing dataWorld Agroforestry (ICRAF)
 
Data Description Registry Interoperability WG at Research Data Alliance Third...
Data Description Registry Interoperability WG at Research Data Alliance Third...Data Description Registry Interoperability WG at Research Data Alliance Third...
Data Description Registry Interoperability WG at Research Data Alliance Third...amiraryani
 
ERA CoBioTech Data Management Webinar
ERA CoBioTech Data Management WebinarERA CoBioTech Data Management Webinar
ERA CoBioTech Data Management WebinarFAIRDOM
 
Revolutionising the Journal through Big Data Computational Research
Revolutionising the Journal through Big Data Computational ResearchRevolutionising the Journal through Big Data Computational Research
Revolutionising the Journal through Big Data Computational ResearchAmye Kenall
 
Driving Faster Analytics at Symphony Health
Driving Faster Analytics at Symphony HealthDriving Faster Analytics at Symphony Health
Driving Faster Analytics at Symphony HealthPrecisely
 
FAIRDOM data management support for ERACoBioTech Proposals
FAIRDOM data management support for ERACoBioTech ProposalsFAIRDOM data management support for ERACoBioTech Proposals
FAIRDOM data management support for ERACoBioTech ProposalsFAIRDOM
 
Reveal - An Enterprise Clinical Data Search Solution
Reveal - An Enterprise Clinical Data Search SolutionReveal - An Enterprise Clinical Data Search Solution
Reveal - An Enterprise Clinical Data Search Solutiond-Wise Technologies
 
Maven and google pharma r&d (1)
Maven and google pharma r&d  (1)Maven and google pharma r&d  (1)
Maven and google pharma r&d (1)Matt Barnes
 
e-infrastructural needs to support informatics
e-infrastructural needs to support informaticse-infrastructural needs to support informatics
e-infrastructural needs to support informaticsDavid Wallom
 
Data accessibilityandchallenges
Data accessibilityandchallengesData accessibilityandchallenges
Data accessibilityandchallengesjyotikhadake
 
Perspectives on the Role of Trustworthy Repository Standards in Data Journal ...
Perspectives on the Role of Trustworthy Repository Standards in Data Journal ...Perspectives on the Role of Trustworthy Repository Standards in Data Journal ...
Perspectives on the Role of Trustworthy Repository Standards in Data Journal ...The University of Edinburgh
 
Bonazzi commons bd2 k ahm 2016 v2
Bonazzi commons bd2 k ahm 2016 v2Bonazzi commons bd2 k ahm 2016 v2
Bonazzi commons bd2 k ahm 2016 v2Vivien Bonazzi
 
ELSS use cases and strategy
ELSS use cases and strategyELSS use cases and strategy
ELSS use cases and strategyAnton Yuryev
 
Architect’s Open-Source Guide for a Data Mesh Architecture
Architect’s Open-Source Guide for a Data Mesh ArchitectureArchitect’s Open-Source Guide for a Data Mesh Architecture
Architect’s Open-Source Guide for a Data Mesh ArchitectureDatabricks
 
Lecture 3.31 3.32.pptx
Lecture 3.31  3.32.pptxLecture 3.31  3.32.pptx
Lecture 3.31 3.32.pptxRATISHKUMAR32
 
Elsevier‘s RDM Program: Habits of Effective Data and the Bourne Ulitmatum
Elsevier‘s RDM Program: Habits of Effective Data and the Bourne UlitmatumElsevier‘s RDM Program: Habits of Effective Data and the Bourne Ulitmatum
Elsevier‘s RDM Program: Habits of Effective Data and the Bourne UlitmatumAnita de Waard
 
Incentivising the uptake of reusable metadata in the survey production process
Incentivising the uptake of reusable metadata in the survey production processIncentivising the uptake of reusable metadata in the survey production process
Incentivising the uptake of reusable metadata in the survey production processLouise Corti
 
Data Lake Acceleration vs. Data Virtualization - What’s the difference?
Data Lake Acceleration vs. Data Virtualization - What’s the difference?Data Lake Acceleration vs. Data Virtualization - What’s the difference?
Data Lake Acceleration vs. Data Virtualization - What’s the difference?Denodo
 

Similar to Curoverse Presentation at ICG-11 (November 2016) (20)

SciBite
SciBiteSciBite
SciBite
 
Research methods group accelarating impact by sharing data
Research methods group  accelarating impact by sharing dataResearch methods group  accelarating impact by sharing data
Research methods group accelarating impact by sharing data
 
Data Description Registry Interoperability WG at Research Data Alliance Third...
Data Description Registry Interoperability WG at Research Data Alliance Third...Data Description Registry Interoperability WG at Research Data Alliance Third...
Data Description Registry Interoperability WG at Research Data Alliance Third...
 
ERA CoBioTech Data Management Webinar
ERA CoBioTech Data Management WebinarERA CoBioTech Data Management Webinar
ERA CoBioTech Data Management Webinar
 
Revolutionising the Journal through Big Data Computational Research
Revolutionising the Journal through Big Data Computational ResearchRevolutionising the Journal through Big Data Computational Research
Revolutionising the Journal through Big Data Computational Research
 
Driving Faster Analytics at Symphony Health
Driving Faster Analytics at Symphony HealthDriving Faster Analytics at Symphony Health
Driving Faster Analytics at Symphony Health
 
FAIRDOM data management support for ERACoBioTech Proposals
FAIRDOM data management support for ERACoBioTech ProposalsFAIRDOM data management support for ERACoBioTech Proposals
FAIRDOM data management support for ERACoBioTech Proposals
 
Reveal - An Enterprise Clinical Data Search Solution
Reveal - An Enterprise Clinical Data Search SolutionReveal - An Enterprise Clinical Data Search Solution
Reveal - An Enterprise Clinical Data Search Solution
 
Maven and google pharma r&d (1)
Maven and google pharma r&d  (1)Maven and google pharma r&d  (1)
Maven and google pharma r&d (1)
 
e-infrastructural needs to support informatics
e-infrastructural needs to support informaticse-infrastructural needs to support informatics
e-infrastructural needs to support informatics
 
Data accessibilityandchallenges
Data accessibilityandchallengesData accessibilityandchallenges
Data accessibilityandchallenges
 
Irida immemxi hsiao
Irida immemxi hsiaoIrida immemxi hsiao
Irida immemxi hsiao
 
Perspectives on the Role of Trustworthy Repository Standards in Data Journal ...
Perspectives on the Role of Trustworthy Repository Standards in Data Journal ...Perspectives on the Role of Trustworthy Repository Standards in Data Journal ...
Perspectives on the Role of Trustworthy Repository Standards in Data Journal ...
 
Bonazzi commons bd2 k ahm 2016 v2
Bonazzi commons bd2 k ahm 2016 v2Bonazzi commons bd2 k ahm 2016 v2
Bonazzi commons bd2 k ahm 2016 v2
 
ELSS use cases and strategy
ELSS use cases and strategyELSS use cases and strategy
ELSS use cases and strategy
 
Architect’s Open-Source Guide for a Data Mesh Architecture
Architect’s Open-Source Guide for a Data Mesh ArchitectureArchitect’s Open-Source Guide for a Data Mesh Architecture
Architect’s Open-Source Guide for a Data Mesh Architecture
 
Lecture 3.31 3.32.pptx
Lecture 3.31  3.32.pptxLecture 3.31  3.32.pptx
Lecture 3.31 3.32.pptx
 
Elsevier‘s RDM Program: Habits of Effective Data and the Bourne Ulitmatum
Elsevier‘s RDM Program: Habits of Effective Data and the Bourne UlitmatumElsevier‘s RDM Program: Habits of Effective Data and the Bourne Ulitmatum
Elsevier‘s RDM Program: Habits of Effective Data and the Bourne Ulitmatum
 
Incentivising the uptake of reusable metadata in the survey production process
Incentivising the uptake of reusable metadata in the survey production processIncentivising the uptake of reusable metadata in the survey production process
Incentivising the uptake of reusable metadata in the survey production process
 
Data Lake Acceleration vs. Data Virtualization - What’s the difference?
Data Lake Acceleration vs. Data Virtualization - What’s the difference?Data Lake Acceleration vs. Data Virtualization - What’s the difference?
Data Lake Acceleration vs. Data Virtualization - What’s the difference?
 

Recently uploaded

BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASEBATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASEOrtus Solutions, Corp
 
EY_Graph Database Powered Sustainability
EY_Graph Database Powered SustainabilityEY_Graph Database Powered Sustainability
EY_Graph Database Powered SustainabilityNeo4j
 
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...MyIntelliSource, Inc.
 
Professional Resume Template for Software Developers
Professional Resume Template for Software DevelopersProfessional Resume Template for Software Developers
Professional Resume Template for Software DevelopersVinodh Ram
 
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...harshavardhanraghave
 
TECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providerTECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providermohitmore19
 
What is Binary Language? Computer Number Systems
What is Binary Language?  Computer Number SystemsWhat is Binary Language?  Computer Number Systems
What is Binary Language? Computer Number SystemsJheuzeDellosa
 
5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdfWave PLM
 
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...soniya singh
 
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed DataAlluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed DataAlluxio, Inc.
 
Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)OPEN KNOWLEDGE GmbH
 
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsUnveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsAlberto González Trastoy
 
why an Opensea Clone Script might be your perfect match.pdf
why an Opensea Clone Script might be your perfect match.pdfwhy an Opensea Clone Script might be your perfect match.pdf
why an Opensea Clone Script might be your perfect match.pdfjoe51371421
 
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer DataAdobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer DataBradBedford3
 
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...ICS
 
The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdf
The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdfThe Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdf
The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdfkalichargn70th171
 
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...MyIntelliSource, Inc.
 
Introduction to Decentralized Applications (dApps)
Introduction to Decentralized Applications (dApps)Introduction to Decentralized Applications (dApps)
Introduction to Decentralized Applications (dApps)Intelisync
 
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...gurkirankumar98700
 

Recently uploaded (20)

BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASEBATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
 
EY_Graph Database Powered Sustainability
EY_Graph Database Powered SustainabilityEY_Graph Database Powered Sustainability
EY_Graph Database Powered Sustainability
 
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
 
Professional Resume Template for Software Developers
Professional Resume Template for Software DevelopersProfessional Resume Template for Software Developers
Professional Resume Template for Software Developers
 
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
 
TECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providerTECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service provider
 
What is Binary Language? Computer Number Systems
What is Binary Language?  Computer Number SystemsWhat is Binary Language?  Computer Number Systems
What is Binary Language? Computer Number Systems
 
5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf
 
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
 
Call Girls In Mukherjee Nagar 📱 9999965857 🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...
Call Girls In Mukherjee Nagar 📱  9999965857  🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...Call Girls In Mukherjee Nagar 📱  9999965857  🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...
Call Girls In Mukherjee Nagar 📱 9999965857 🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...
 
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed DataAlluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
 
Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)
 
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsUnveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
 
why an Opensea Clone Script might be your perfect match.pdf
why an Opensea Clone Script might be your perfect match.pdfwhy an Opensea Clone Script might be your perfect match.pdf
why an Opensea Clone Script might be your perfect match.pdf
 
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer DataAdobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
 
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
 
The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdf
The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdfThe Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdf
The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdf
 
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
 
Introduction to Decentralized Applications (dApps)
Introduction to Decentralized Applications (dApps)Introduction to Decentralized Applications (dApps)
Introduction to Decentralized Applications (dApps)
 
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
 

Curoverse Presentation at ICG-11 (November 2016)

  • 1. Federated computing on massive biomedical data sets across multiple data centers JONATHAN SHEFFI CEO, CUROVERSE
  • 2. Researchers are struggling to analyze large genomic data sets Problem Data are physically distributed and difficult to move because of: • Physical size and network constraints • Regulatory barriers • Privacy and competitive concerns
  • 3. Researcher Commercial Data Aggregators Research Institutions Medical Centers & Hospitals Seamless experience for researchers using data 3. Secure, distributed queries & management 2. Curated & indexed data stored via open platform on existing infrastructure 1. Federated Computing Answers Workflows
  • 4. Why not centralized proprietary SaaS? • Data is getting larger and harder to move • IT teams continue to choose a variety of IT infrastructure solutions (public cloud, private cloud, HPC) for good reasons • Proprietary software makes standardization harder
  • 5. • How do you know you are getting the files you request? • How do you know your pipeline will run properly in the new environment? • How can you be sure that the pipelines sent to you are secure? • How do you discover what data sets are available? What are the challenges to federation?
  • 6. An open source platform for managing and processing massive data sets Designed for building federations
  • 7. • Content addressing guarantees you get the data you expect • Common Workflow Language and Docker gives you reliably reproducible pipelines • Multi-platform architecture lets you layer on top of existing infrastructure • Security and credentials that can travel with workflows • Lightning enables complex variant-level queries, machine learning, normalized VCF generation, GA4GH APIs & Beacon Federation challenges solved
  • 8. Common Workflow Language (commonwl.org) A community-based global standard for workflow description PROBLEM • Difficult to use bioinformatics tools because of poor run-time packaging • No mechanism to easily discover the availability and capabilities of tools • No standard approach for creating computational workflows • Workflows are very difficult to reproduce because of poor definition & design • Workflows are not portable across systems because of DIY approaches SOLUTION • Standard for packaging bioinformatics and data science tools and algorithms into Docker containers with clear interfaces • Standard for defining computational workflows built with tools packaged into Docker containers • Adopted by many major platforms in the space, including Arvados, Galaxy, Taverna, and Seven Bridges • More than 250 bioinformaticians and data scientists participating in creating standard
  • 9. • Internal collaboration across countries • Pharma translational research projects • Large research consortiums • Rare disease diagnosis search across institutions • Clinical testing company operating in multiple geographies • Clinical trial participant identification Use cases for federations
  • 10. • Wider range of platform support • Adding a layer of brokering capabilities to coordinate a federation • Pushing industry adoption of CWL • Getting more tools containerized and described with CWL • Integrating with directory services such as Repositive • Building a registry of tools (Dockstore.org) What’s next?
  • 11. Go to Arvados.org to download the code Platform available for use under the AGPLv3 open source license Go to Curoverse.com for commercial support options Cluster Operations Subscriptions and Professional Services available Get started