SlideShare a Scribd company logo
1 of 13
Experiences In Building Globus
Genomics Using Galaxy, Globus Online
and AWS
Ravi K Madduri
University of Chicago and ANL
www.globus.org/genomics
• Challenges in Sequencing Analysis
• Proposed Approach Using Globus Genomics
• Example Collaborations
• Relevance to XSEDE
• Q&A
Outline
www.globus.org/genomics
(Re)Ru
n
Script
(Re)Ru
n
Script
Ins
tal
l
Ins
tal
l
M
odi
fy
M
odi
fy
Challenges in Sequencing Analysis
Sequencing
Centers
Sequencing
Centers
Sequencing
Centers
Sequencing
Centers
Data Movement and Access Challenges
Manual Data Analysis
Public
Data
Public
Data
Storage
Local Cluster/
CloudSeq
Center
Seq
Center
Research Lab
• Data is distributed in different locations
• Research labs need access to the data for analysis
• Be able to Share data with other researchers/collaborators
• Inefficient ways of data movement
• Data needs to be available on the local and Distributed Compute
Resources
• Local Clusters, Cloud, Grid
How do we analyze this
Sequence Data
Once we have the Sequence Data
PicardPicard
GATKGATK
FastqFastq Ref GenomeRef Genome
AlignmentAlignment
Variant CallingVariant Calling
• Manually move the data to the Compute node
• Install all the tools required for the Analysis
• BWA, Picard, GATK, Filtering Scripts, etc.
• Shell scripts to sequentially execute the tools
• Manually modify the scripts for any change
• Error Prone, difficult to keep track, messy..
• Difficult to maintain and transfer the knowledge
FTP, SCP, HTTP
SCPFTP,SCP,HTTP
www.globus.org/genomics
Globus Genomics
Sequencing
Centers
Sequencing
Centers
Sequencing
Centers
Sequencing
Centers
Public
Data
Public
Data
Storage
Local Cluster/
CloudSeq
Center
Seq
Center
Research Lab
Globus Online Provides a
• High-performance
• Fault-tolerant
• Secure
file transfer Service between
all data-endpoints
Data Management Data Analysis
PicardPicard
GATKGATK
FastqFastq Ref GenomeRef Genome
AlignmentAlignment
Variant CallingVariant Calling
Galaxy
Data Libraries
• Globus Online
Integrated within Galaxy
• Web-based UI
• Drag-Drop workflow
creations
• Easily modify Workflows
with new tools
Globus Genomics on
Amazon EC2
Globus Genomics on
Amazon EC2
• Analytical tools are
automatically run on the
scalable compute
resources when possible
Galaxy Based Workflow
Management System
Globus Online Endpoints
FTP, SCP, others
FTP, SCP
SCP
Globus Genomics
FTP,SCP,HTTP
www.globus.org/genomics
• Workflows can be easily
defined and automated with
integrated Galaxy Platform
capabilities
• Data movement is streamlined
with integrated Globus file-
transfer functionality
• Resources can be provisioned
on-demand with Amazon Web
Services cloud based
infrastructure
Globus Genomics
www.globus.org/genomics
• Professionally managed and supported platform
• Best practice pipelines
• Enhanced workbench with breadth of analytic tools
• Technical support and bioinformatics consulting
• Access to pre-integrated end-points for reliable and high-
performance data transfer (e.g. Broad Institute, Perkin
Elmer, etc.)
• Cost-effective solution with subscription-based pricing
Additional Capabilities
www.globus.org/genomics
Globus Genomics – A flexible, scalable,
simplified analysis platform
Accessibility
• Unified Web-interface for obtaining genomic data and applying computational
tools to analyze the data
• Easily integrate your own tools and scripts for analysis (CLI based tools)
• Collection of tools (Tools Panel) that reflect good practices and community
insights
• Access every step of analysis and intermediate results:
 View, Download, Visualize, Reuse (History Panel)
Reproducibility
• Track provenance and ensure repeatability of each analysis step:
 input datasets, tools used, parameter values, and output datasets
• Annotate each step or collection of steps to track and reproduce results
• Intuitive Workflow Editor to create or modify complex workflows and use
them as templates – Reusable and Reproducible
Transparency
• Publish and share metadata, histories, and workflows at multiple levels
• Store public and generated datasets as Data Libraries – e.g: hg19 Ref Genome
• Shared datasets and workflows can be imported by other users for reuse
Publish
Templates
Data and Tools
Globus Online Integration
• Access GO Endpoints and transfer data from within Galaxy UI and into Galaxy workspace
• Leverage local cluster or cloud based scalable computational resources for parallelizing the
tools
www.globus.org/genomics
Example Collaborations
Dobyns Lab
Background: Investigate the nature and causes of a wide range of
human developmental brain disorders
Approach: Replaced manual analysis with Globus Genomics
Results: Achieved greater than 10X speed-up in analysis of exome data
Future Plans: Leverage scale-out capability of Globus Genomics by
running increasingly larger data sets
www.globus.org/genomics
XSEDE’s Mission Statement
accelerat[ing] open scientific discovery by enhancing the
productivity of researchers, engineers, and scholars and
making advanced digital resources easier to use.”
Relevance to XSEDE
Key XSEDE Goals That Globus Genomics
Addresses
• “Deepen and extend the impact of eScience infrastructure on research and
education; in particular, to reach communities that have not previously
made use of it; and
• Expand the environment through the integration of new capabilities and
resources such as instruments and data repositories based on the
identified needs of the community.”
www.globus.org/genomics
• Globus Genomics leverages an XSEDE service
– Globus Transfer for data movement
– Globus Nexus for identity management
– Globus Groups for group-based access management
• Integrates advanced digital resources
– sequencing centers, a commercial cloud provider, and
NGS analysis pipelines
• Reduces the cost and complexity of scientific
discovery for a new community (NGS
researchers) who have not historically made
much use of advanced eScience infrastructures.
Relevance to XSEDE (Cont..)
www.globus.org/genomics
• Globus Genomics achieves these goals without making
use of XSEDE supercomputers
• Choice to use Amazon cloud services rather than
XSEDE systems for Globus Genomics computations is
deliberate
– scales at which our target users operate today, the costs
associated with the use of Amazon cloud computers are modest,
and Amazon’s on-demand, pay-as-you-go storage and computing
capabilities match user needs better than the proposal- and queue-
based access policies provided by XSEDE computers.
• We plan to explore using XSEDE resources to execute
Globus Genomics pipelines
XSEDE Vs AWS
www.globus.org/genomics
• This work was supported in part by the NIH
through the NHLBI grant: The Cardiovascular
Research Grid (R24HL085343) and by the
U.S. Department of Energy under contract
DE-AC02-06CH11357. We are grateful to
Amazon, Inc., for an award of Amazon Web
Services time that facilitated early
experiments.
• The Globus Genomics and Globus Online
teams at University of Chicago and Argonne
National Laboratory
Acknowledgments
www.globus.org/genomics
• More information on Globus Genomics and
to sign up: www.globus.org/genomics
• More information on Globus Online:
www.globusonline.org
• Questions?
• Thank you!
For more information

More Related Content

What's hot

Working with Instrument Data (GlobusWorld Tour - UMich)
Working with Instrument Data (GlobusWorld Tour - UMich)Working with Instrument Data (GlobusWorld Tour - UMich)
Working with Instrument Data (GlobusWorld Tour - UMich)Globus
 
Materials Data Facility: Streamlined and automated data sharing, discovery, ...
Materials Data Facility: Streamlined and automated data sharing,  discovery, ...Materials Data Facility: Streamlined and automated data sharing,  discovery, ...
Materials Data Facility: Streamlined and automated data sharing, discovery, ...Ian Foster
 
Data Science with the Help of Metadata
Data Science with the Help of MetadataData Science with the Help of Metadata
Data Science with the Help of MetadataJim Dowling
 
Benchmarking Cloud-based Tagging Services
Benchmarking Cloud-based Tagging ServicesBenchmarking Cloud-based Tagging Services
Benchmarking Cloud-based Tagging ServicesTanu Malik
 
Globus: Enabling the Open Storage Network
Globus: Enabling the Open Storage NetworkGlobus: Enabling the Open Storage Network
Globus: Enabling the Open Storage NetworkGlobus
 
20090701 Climate Data Staging
20090701 Climate Data Staging20090701 Climate Data Staging
20090701 Climate Data StagingHenning Bergmeyer
 
Learning Systems for Science
Learning Systems for ScienceLearning Systems for Science
Learning Systems for ScienceIan Foster
 
07 data structures_and_representations
07 data structures_and_representations07 data structures_and_representations
07 data structures_and_representationsMarco Quartulli
 
SFBay Area Solr Meetup - July 15th: Integrating Hadoop and Solr
 SFBay Area Solr Meetup - July 15th: Integrating Hadoop and Solr SFBay Area Solr Meetup - July 15th: Integrating Hadoop and Solr
SFBay Area Solr Meetup - July 15th: Integrating Hadoop and SolrLucidworks (Archived)
 
Gateways 2020 Tutorial - Large Scale Data Transfer with Globus
Gateways 2020 Tutorial - Large Scale Data Transfer with GlobusGateways 2020 Tutorial - Large Scale Data Transfer with Globus
Gateways 2020 Tutorial - Large Scale Data Transfer with GlobusGlobus
 
What's New in Globus - Internet2 TechEXtra
What's New in Globus - Internet2 TechEXtraWhat's New in Globus - Internet2 TechEXtra
What's New in Globus - Internet2 TechEXtraGlobus
 
Gateways 2020 Tutorial - Instrument Data Distribution with Globus
Gateways 2020 Tutorial - Instrument Data Distribution with GlobusGateways 2020 Tutorial - Instrument Data Distribution with Globus
Gateways 2020 Tutorial - Instrument Data Distribution with GlobusGlobus
 
Gateways 2020 Tutorial - Automated Data Ingest and Search with Globus
Gateways 2020 Tutorial - Automated Data Ingest and Search with GlobusGateways 2020 Tutorial - Automated Data Ingest and Search with Globus
Gateways 2020 Tutorial - Automated Data Ingest and Search with GlobusGlobus
 
Cloud Native Analysis Platform for NGS analysis
Cloud Native Analysis Platform for NGS analysisCloud Native Analysis Platform for NGS analysis
Cloud Native Analysis Platform for NGS analysisYaoyu Wang
 
Automating Research Data Flows with Globus (CHPC 2019 - South Africa)
Automating Research Data Flows with Globus (CHPC 2019 - South Africa)Automating Research Data Flows with Globus (CHPC 2019 - South Africa)
Automating Research Data Flows with Globus (CHPC 2019 - South Africa)Globus
 
Research Automation for Data-Driven Discovery
Research Automationfor Data-Driven DiscoveryResearch Automationfor Data-Driven Discovery
Research Automation for Data-Driven DiscoveryGlobus
 
Connecting Your System to Globus (APS Workshop)
Connecting Your System to Globus (APS Workshop)Connecting Your System to Globus (APS Workshop)
Connecting Your System to Globus (APS Workshop)Globus
 

What's hot (20)

SomeSlides
SomeSlidesSomeSlides
SomeSlides
 
Working with Instrument Data (GlobusWorld Tour - UMich)
Working with Instrument Data (GlobusWorld Tour - UMich)Working with Instrument Data (GlobusWorld Tour - UMich)
Working with Instrument Data (GlobusWorld Tour - UMich)
 
Materials Data Facility: Streamlined and automated data sharing, discovery, ...
Materials Data Facility: Streamlined and automated data sharing,  discovery, ...Materials Data Facility: Streamlined and automated data sharing,  discovery, ...
Materials Data Facility: Streamlined and automated data sharing, discovery, ...
 
Data Science with the Help of Metadata
Data Science with the Help of MetadataData Science with the Help of Metadata
Data Science with the Help of Metadata
 
Benchmarking Cloud-based Tagging Services
Benchmarking Cloud-based Tagging ServicesBenchmarking Cloud-based Tagging Services
Benchmarking Cloud-based Tagging Services
 
Globus: Enabling the Open Storage Network
Globus: Enabling the Open Storage NetworkGlobus: Enabling the Open Storage Network
Globus: Enabling the Open Storage Network
 
20090701 Climate Data Staging
20090701 Climate Data Staging20090701 Climate Data Staging
20090701 Climate Data Staging
 
Learning Systems for Science
Learning Systems for ScienceLearning Systems for Science
Learning Systems for Science
 
04 open source_tools
04 open source_tools04 open source_tools
04 open source_tools
 
07 data structures_and_representations
07 data structures_and_representations07 data structures_and_representations
07 data structures_and_representations
 
SFBay Area Solr Meetup - July 15th: Integrating Hadoop and Solr
 SFBay Area Solr Meetup - July 15th: Integrating Hadoop and Solr SFBay Area Solr Meetup - July 15th: Integrating Hadoop and Solr
SFBay Area Solr Meetup - July 15th: Integrating Hadoop and Solr
 
Gateways 2020 Tutorial - Large Scale Data Transfer with Globus
Gateways 2020 Tutorial - Large Scale Data Transfer with GlobusGateways 2020 Tutorial - Large Scale Data Transfer with Globus
Gateways 2020 Tutorial - Large Scale Data Transfer with Globus
 
What's New in Globus - Internet2 TechEXtra
What's New in Globus - Internet2 TechEXtraWhat's New in Globus - Internet2 TechEXtra
What's New in Globus - Internet2 TechEXtra
 
Gateways 2020 Tutorial - Instrument Data Distribution with Globus
Gateways 2020 Tutorial - Instrument Data Distribution with GlobusGateways 2020 Tutorial - Instrument Data Distribution with Globus
Gateways 2020 Tutorial - Instrument Data Distribution with Globus
 
Hadoop Ecosystem
Hadoop EcosystemHadoop Ecosystem
Hadoop Ecosystem
 
Gateways 2020 Tutorial - Automated Data Ingest and Search with Globus
Gateways 2020 Tutorial - Automated Data Ingest and Search with GlobusGateways 2020 Tutorial - Automated Data Ingest and Search with Globus
Gateways 2020 Tutorial - Automated Data Ingest and Search with Globus
 
Cloud Native Analysis Platform for NGS analysis
Cloud Native Analysis Platform for NGS analysisCloud Native Analysis Platform for NGS analysis
Cloud Native Analysis Platform for NGS analysis
 
Automating Research Data Flows with Globus (CHPC 2019 - South Africa)
Automating Research Data Flows with Globus (CHPC 2019 - South Africa)Automating Research Data Flows with Globus (CHPC 2019 - South Africa)
Automating Research Data Flows with Globus (CHPC 2019 - South Africa)
 
Research Automation for Data-Driven Discovery
Research Automationfor Data-Driven DiscoveryResearch Automationfor Data-Driven Discovery
Research Automation for Data-Driven Discovery
 
Connecting Your System to Globus (APS Workshop)
Connecting Your System to Globus (APS Workshop)Connecting Your System to Globus (APS Workshop)
Connecting Your System to Globus (APS Workshop)
 

Viewers also liked

Big Process for Big Data
Big Process for Big DataBig Process for Big Data
Big Process for Big DataIan Foster
 
ADAS&ME presentation @ the SCOUT project expert workshop (22-02-2017, Brussels)
ADAS&ME presentation @ the SCOUT project expert workshop (22-02-2017, Brussels)ADAS&ME presentation @ the SCOUT project expert workshop (22-02-2017, Brussels)
ADAS&ME presentation @ the SCOUT project expert workshop (22-02-2017, Brussels)joseplaborda
 
Role of Amyloid Burden in cognitive decline
Role of Amyloid Burden in cognitive decline Role of Amyloid Burden in cognitive decline
Role of Amyloid Burden in cognitive decline Ravi Madduri
 
CI4CC sustainability-panel
CI4CC sustainability-panelCI4CC sustainability-panel
CI4CC sustainability-panelRavi Madduri
 
Jsm madduri-august-2015
Jsm madduri-august-2015Jsm madduri-august-2015
Jsm madduri-august-2015Ravi Madduri
 
Effective ansible
Effective ansibleEffective ansible
Effective ansibleWu Bigo
 
Big Data and Genomics
Big Data and GenomicsBig Data and Genomics
Big Data and GenomicsAl Costa
 
Internet2 Bio IT 2016 v2
Internet2 Bio IT 2016 v2Internet2 Bio IT 2016 v2
Internet2 Bio IT 2016 v2Dan Taylor
 
May 2013 HUG: Apache Sqoop 2 - A next generation of data transfer tools
May 2013 HUG: Apache Sqoop 2 - A next generation of data transfer toolsMay 2013 HUG: Apache Sqoop 2 - A next generation of data transfer tools
May 2013 HUG: Apache Sqoop 2 - A next generation of data transfer toolsYahoo Developer Network
 
Globus Genomics: Democratizing NGS Analysis
Globus Genomics: Democratizing NGS AnalysisGlobus Genomics: Democratizing NGS Analysis
Globus Genomics: Democratizing NGS AnalysisRavi Madduri
 

Viewers also liked (20)

Public.Cdsc.Middleton
Public.Cdsc.MiddletonPublic.Cdsc.Middleton
Public.Cdsc.Middleton
 
Big Process for Big Data
Big Process for Big DataBig Process for Big Data
Big Process for Big Data
 
ADAS&ME presentation @ the SCOUT project expert workshop (22-02-2017, Brussels)
ADAS&ME presentation @ the SCOUT project expert workshop (22-02-2017, Brussels)ADAS&ME presentation @ the SCOUT project expert workshop (22-02-2017, Brussels)
ADAS&ME presentation @ the SCOUT project expert workshop (22-02-2017, Brussels)
 
Role of Amyloid Burden in cognitive decline
Role of Amyloid Burden in cognitive decline Role of Amyloid Burden in cognitive decline
Role of Amyloid Burden in cognitive decline
 
Supporting Barack Obama for President
Supporting Barack Obama for PresidentSupporting Barack Obama for President
Supporting Barack Obama for President
 
CI4CC sustainability-panel
CI4CC sustainability-panelCI4CC sustainability-panel
CI4CC sustainability-panel
 
Jsm madduri-august-2015
Jsm madduri-august-2015Jsm madduri-august-2015
Jsm madduri-august-2015
 
HL7: Clinical Decision Support
HL7: Clinical Decision SupportHL7: Clinical Decision Support
HL7: Clinical Decision Support
 
Effective ansible
Effective ansibleEffective ansible
Effective ansible
 
Big Data and Genomics
Big Data and GenomicsBig Data and Genomics
Big Data and Genomics
 
Internet2 Bio IT 2016 v2
Internet2 Bio IT 2016 v2Internet2 Bio IT 2016 v2
Internet2 Bio IT 2016 v2
 
May 2013 HUG: Apache Sqoop 2 - A next generation of data transfer tools
May 2013 HUG: Apache Sqoop 2 - A next generation of data transfer toolsMay 2013 HUG: Apache Sqoop 2 - A next generation of data transfer tools
May 2013 HUG: Apache Sqoop 2 - A next generation of data transfer tools
 
Google Glass Breakdown
Google Glass BreakdownGoogle Glass Breakdown
Google Glass Breakdown
 
Globus Genomics: Democratizing NGS Analysis
Globus Genomics: Democratizing NGS AnalysisGlobus Genomics: Democratizing NGS Analysis
Globus Genomics: Democratizing NGS Analysis
 
Coded Photography - Ramesh Raskar
Coded Photography - Ramesh RaskarCoded Photography - Ramesh Raskar
Coded Photography - Ramesh Raskar
 
Raskar UIST Keynote 2015 November
Raskar UIST Keynote 2015 NovemberRaskar UIST Keynote 2015 November
Raskar UIST Keynote 2015 November
 
Leap Motion Development (Rohan Puri)
Leap Motion Development (Rohan Puri)Leap Motion Development (Rohan Puri)
Leap Motion Development (Rohan Puri)
 
What is SIGGRAPH NEXT? Intro by Ramesh Raskar
What is SIGGRAPH NEXT? Intro by Ramesh RaskarWhat is SIGGRAPH NEXT? Intro by Ramesh Raskar
What is SIGGRAPH NEXT? Intro by Ramesh Raskar
 
What is Media in MIT Media Lab, Why 'Camera Culture'
What is Media in MIT Media Lab, Why 'Camera Culture'What is Media in MIT Media Lab, Why 'Camera Culture'
What is Media in MIT Media Lab, Why 'Camera Culture'
 
Multiview Imaging HW Overview
Multiview Imaging HW OverviewMultiview Imaging HW Overview
Multiview Imaging HW Overview
 

Similar to Experiences In Building Globus Genomics Using Galaxy, Globus Online and AWS

GlobusWorld 2020 Keynote
GlobusWorld 2020 KeynoteGlobusWorld 2020 Keynote
GlobusWorld 2020 KeynoteGlobus
 
Simplified Research Data Management with the Globus Platform
Simplified Research Data Management with the Globus PlatformSimplified Research Data Management with the Globus Platform
Simplified Research Data Management with the Globus PlatformGlobus
 
Introduction to Globus - XSEDE14 Tutorial
Introduction to Globus - XSEDE14 TutorialIntroduction to Globus - XSEDE14 Tutorial
Introduction to Globus - XSEDE14 TutorialGlobus
 
Introduction to the Globus SaaS (GlobusWorld Tour - STFC)
Introduction to the Globus SaaS (GlobusWorld Tour - STFC)Introduction to the Globus SaaS (GlobusWorld Tour - STFC)
Introduction to the Globus SaaS (GlobusWorld Tour - STFC)Globus
 
Building genomic data cyberinfrastructure with the online database software T...
Building genomic data cyberinfrastructure with the online database software T...Building genomic data cyberinfrastructure with the online database software T...
Building genomic data cyberinfrastructure with the online database software T...mestato
 
Advances in Scientific Workflow Environments
Advances in Scientific Workflow EnvironmentsAdvances in Scientific Workflow Environments
Advances in Scientific Workflow EnvironmentsCarole Goble
 
Science Services and Science Platforms: Using the Cloud to Accelerate and Dem...
Science Services and Science Platforms: Using the Cloud to Accelerate and Dem...Science Services and Science Platforms: Using the Cloud to Accelerate and Dem...
Science Services and Science Platforms: Using the Cloud to Accelerate and Dem...Ian Foster
 
Big Process for Big Data @ PNNL, May 2013
Big Process for Big Data @ PNNL, May 2013Big Process for Big Data @ PNNL, May 2013
Big Process for Big Data @ PNNL, May 2013Ian Foster
 
Science cloud foster june 2013
Science cloud foster june 2013Science cloud foster june 2013
Science cloud foster june 2013Kirill Osipov
 
Science as a Service: How On-Demand Computing can Accelerate Discovery
Science as a Service: How On-Demand Computing can Accelerate DiscoveryScience as a Service: How On-Demand Computing can Accelerate Discovery
Science as a Service: How On-Demand Computing can Accelerate DiscoveryIan Foster
 
Research on vector spatial data storage scheme based
Research on vector spatial data storage scheme basedResearch on vector spatial data storage scheme based
Research on vector spatial data storage scheme basedAnant Kumar
 
Hadoop ecosystem for health/life sciences
Hadoop ecosystem for health/life sciencesHadoop ecosystem for health/life sciences
Hadoop ecosystem for health/life sciencesUri Laserson
 
DataFest 2019 Science Gateways
DataFest 2019 Science GatewaysDataFest 2019 Science Gateways
DataFest 2019 Science GatewaysRaminder Singh
 
Introduction to Globus for New Users
Introduction to Globus for New UsersIntroduction to Globus for New Users
Introduction to Globus for New UsersGlobus
 
Globus: Research Data Management as Service and Platform - pearc17
Globus: Research Data Management as Service and Platform - pearc17Globus: Research Data Management as Service and Platform - pearc17
Globus: Research Data Management as Service and Platform - pearc17Mary Bass
 
Building and Extensible Storage Ecosystem with WOS
Building and Extensible Storage Ecosystem with WOSBuilding and Extensible Storage Ecosystem with WOS
Building and Extensible Storage Ecosystem with WOSinside-BigData.com
 
Matching Data Intensive Applications and Hardware/Software Architectures
Matching Data Intensive Applications and Hardware/Software ArchitecturesMatching Data Intensive Applications and Hardware/Software Architectures
Matching Data Intensive Applications and Hardware/Software ArchitecturesGeoffrey Fox
 
Matching Data Intensive Applications and Hardware/Software Architectures
Matching Data Intensive Applications and Hardware/Software ArchitecturesMatching Data Intensive Applications and Hardware/Software Architectures
Matching Data Intensive Applications and Hardware/Software ArchitecturesGeoffrey Fox
 
Introduction to Globus for New Users
Introduction to Globus for New UsersIntroduction to Globus for New Users
Introduction to Globus for New UsersGlobus
 
Introduction to Galaxy and RNA-Seq
Introduction to Galaxy and RNA-SeqIntroduction to Galaxy and RNA-Seq
Introduction to Galaxy and RNA-SeqEnis Afgan
 

Similar to Experiences In Building Globus Genomics Using Galaxy, Globus Online and AWS (20)

GlobusWorld 2020 Keynote
GlobusWorld 2020 KeynoteGlobusWorld 2020 Keynote
GlobusWorld 2020 Keynote
 
Simplified Research Data Management with the Globus Platform
Simplified Research Data Management with the Globus PlatformSimplified Research Data Management with the Globus Platform
Simplified Research Data Management with the Globus Platform
 
Introduction to Globus - XSEDE14 Tutorial
Introduction to Globus - XSEDE14 TutorialIntroduction to Globus - XSEDE14 Tutorial
Introduction to Globus - XSEDE14 Tutorial
 
Introduction to the Globus SaaS (GlobusWorld Tour - STFC)
Introduction to the Globus SaaS (GlobusWorld Tour - STFC)Introduction to the Globus SaaS (GlobusWorld Tour - STFC)
Introduction to the Globus SaaS (GlobusWorld Tour - STFC)
 
Building genomic data cyberinfrastructure with the online database software T...
Building genomic data cyberinfrastructure with the online database software T...Building genomic data cyberinfrastructure with the online database software T...
Building genomic data cyberinfrastructure with the online database software T...
 
Advances in Scientific Workflow Environments
Advances in Scientific Workflow EnvironmentsAdvances in Scientific Workflow Environments
Advances in Scientific Workflow Environments
 
Science Services and Science Platforms: Using the Cloud to Accelerate and Dem...
Science Services and Science Platforms: Using the Cloud to Accelerate and Dem...Science Services and Science Platforms: Using the Cloud to Accelerate and Dem...
Science Services and Science Platforms: Using the Cloud to Accelerate and Dem...
 
Big Process for Big Data @ PNNL, May 2013
Big Process for Big Data @ PNNL, May 2013Big Process for Big Data @ PNNL, May 2013
Big Process for Big Data @ PNNL, May 2013
 
Science cloud foster june 2013
Science cloud foster june 2013Science cloud foster june 2013
Science cloud foster june 2013
 
Science as a Service: How On-Demand Computing can Accelerate Discovery
Science as a Service: How On-Demand Computing can Accelerate DiscoveryScience as a Service: How On-Demand Computing can Accelerate Discovery
Science as a Service: How On-Demand Computing can Accelerate Discovery
 
Research on vector spatial data storage scheme based
Research on vector spatial data storage scheme basedResearch on vector spatial data storage scheme based
Research on vector spatial data storage scheme based
 
Hadoop ecosystem for health/life sciences
Hadoop ecosystem for health/life sciencesHadoop ecosystem for health/life sciences
Hadoop ecosystem for health/life sciences
 
DataFest 2019 Science Gateways
DataFest 2019 Science GatewaysDataFest 2019 Science Gateways
DataFest 2019 Science Gateways
 
Introduction to Globus for New Users
Introduction to Globus for New UsersIntroduction to Globus for New Users
Introduction to Globus for New Users
 
Globus: Research Data Management as Service and Platform - pearc17
Globus: Research Data Management as Service and Platform - pearc17Globus: Research Data Management as Service and Platform - pearc17
Globus: Research Data Management as Service and Platform - pearc17
 
Building and Extensible Storage Ecosystem with WOS
Building and Extensible Storage Ecosystem with WOSBuilding and Extensible Storage Ecosystem with WOS
Building and Extensible Storage Ecosystem with WOS
 
Matching Data Intensive Applications and Hardware/Software Architectures
Matching Data Intensive Applications and Hardware/Software ArchitecturesMatching Data Intensive Applications and Hardware/Software Architectures
Matching Data Intensive Applications and Hardware/Software Architectures
 
Matching Data Intensive Applications and Hardware/Software Architectures
Matching Data Intensive Applications and Hardware/Software ArchitecturesMatching Data Intensive Applications and Hardware/Software Architectures
Matching Data Intensive Applications and Hardware/Software Architectures
 
Introduction to Globus for New Users
Introduction to Globus for New UsersIntroduction to Globus for New Users
Introduction to Globus for New Users
 
Introduction to Galaxy and RNA-Seq
Introduction to Galaxy and RNA-SeqIntroduction to Galaxy and RNA-Seq
Introduction to Galaxy and RNA-Seq
 

More from Ed Dodds

Updated Policy Brief: Cooperatives Bring Fiber Internet Access to Rural America
Updated Policy Brief: Cooperatives Bring Fiber Internet Access to Rural AmericaUpdated Policy Brief: Cooperatives Bring Fiber Internet Access to Rural America
Updated Policy Brief: Cooperatives Bring Fiber Internet Access to Rural AmericaEd Dodds
 
ILSR 2019 12 rural coop policy brief update page 8
ILSR 2019 12 rural coop policy brief update page 8ILSR 2019 12 rural coop policy brief update page 8
ILSR 2019 12 rural coop policy brief update page 8Ed Dodds
 
Maximizing information and communications technologies for development in fai...
Maximizing information and communications technologies for development in fai...Maximizing information and communications technologies for development in fai...
Maximizing information and communications technologies for development in fai...Ed Dodds
 
Iris Ritter interconnection map
Iris Ritter interconnection mapIris Ritter interconnection map
Iris Ritter interconnection mapEd Dodds
 
Inoversity - Bob Metcalfe
Inoversity - Bob MetcalfeInoversity - Bob Metcalfe
Inoversity - Bob MetcalfeEd Dodds
 
Distributed Ledger Technology
Distributed Ledger TechnologyDistributed Ledger Technology
Distributed Ledger TechnologyEd Dodds
 
UCX: An Open Source Framework for HPC Network APIs and Beyond
UCX: An Open Source Framework for HPC Network APIs and BeyondUCX: An Open Source Framework for HPC Network APIs and Beyond
UCX: An Open Source Framework for HPC Network APIs and BeyondEd Dodds
 
Digital Inclusion and Meaningful Broadband Adoption Initiatives Colin Rhinesm...
Digital Inclusion and Meaningful Broadband Adoption Initiatives Colin Rhinesm...Digital Inclusion and Meaningful Broadband Adoption Initiatives Colin Rhinesm...
Digital Inclusion and Meaningful Broadband Adoption Initiatives Colin Rhinesm...Ed Dodds
 
Innovation Accelerators Report
Innovation Accelerators ReportInnovation Accelerators Report
Innovation Accelerators ReportEd Dodds
 
Strategy for American Innovation
Strategy for American InnovationStrategy for American Innovation
Strategy for American InnovationEd Dodds
 
Collaboration with NSFCloud
Collaboration with NSFCloudCollaboration with NSFCloud
Collaboration with NSFCloudEd Dodds
 
AppImpact: A Framework for Mobile Technology in Behavioral Healthcare
AppImpact: A Framework for Mobile Technology in Behavioral HealthcareAppImpact: A Framework for Mobile Technology in Behavioral Healthcare
AppImpact: A Framework for Mobile Technology in Behavioral HealthcareEd Dodds
 
Report to the President and Congress Ensuring Leadership in Federally Funded ...
Report to the President and Congress Ensuring Leadership in Federally Funded ...Report to the President and Congress Ensuring Leadership in Federally Funded ...
Report to the President and Congress Ensuring Leadership in Federally Funded ...Ed Dodds
 
Data Act Federal Register Notice Public Summary of Responses
Data Act Federal Register Notice Public Summary of ResponsesData Act Federal Register Notice Public Summary of Responses
Data Act Federal Register Notice Public Summary of ResponsesEd Dodds
 
Gloriad.flo con.2014.01
Gloriad.flo con.2014.01Gloriad.flo con.2014.01
Gloriad.flo con.2014.01Ed Dodds
 
2014 COMPENDIUM Edition of National Research and Education Networks in Europe
2014 COMPENDIUM Edition of National Research and  Education Networks in Europe2014 COMPENDIUM Edition of National Research and  Education Networks in Europe
2014 COMPENDIUM Edition of National Research and Education Networks in EuropeEd Dodds
 
New Westminster Keynote - Norman Jacknis
New Westminster Keynote - Norman JacknisNew Westminster Keynote - Norman Jacknis
New Westminster Keynote - Norman JacknisEd Dodds
 
HIMSS Innovation Pathways Summary
HIMSS Innovation Pathways SummaryHIMSS Innovation Pathways Summary
HIMSS Innovation Pathways SummaryEd Dodds
 

More from Ed Dodds (20)

Updated Policy Brief: Cooperatives Bring Fiber Internet Access to Rural America
Updated Policy Brief: Cooperatives Bring Fiber Internet Access to Rural AmericaUpdated Policy Brief: Cooperatives Bring Fiber Internet Access to Rural America
Updated Policy Brief: Cooperatives Bring Fiber Internet Access to Rural America
 
ILSR 2019 12 rural coop policy brief update page 8
ILSR 2019 12 rural coop policy brief update page 8ILSR 2019 12 rural coop policy brief update page 8
ILSR 2019 12 rural coop policy brief update page 8
 
Maximizing information and communications technologies for development in fai...
Maximizing information and communications technologies for development in fai...Maximizing information and communications technologies for development in fai...
Maximizing information and communications technologies for development in fai...
 
Iris Ritter interconnection map
Iris Ritter interconnection mapIris Ritter interconnection map
Iris Ritter interconnection map
 
Inoversity - Bob Metcalfe
Inoversity - Bob MetcalfeInoversity - Bob Metcalfe
Inoversity - Bob Metcalfe
 
Distributed Ledger Technology
Distributed Ledger TechnologyDistributed Ledger Technology
Distributed Ledger Technology
 
UCX: An Open Source Framework for HPC Network APIs and Beyond
UCX: An Open Source Framework for HPC Network APIs and BeyondUCX: An Open Source Framework for HPC Network APIs and Beyond
UCX: An Open Source Framework for HPC Network APIs and Beyond
 
Digital Inclusion and Meaningful Broadband Adoption Initiatives Colin Rhinesm...
Digital Inclusion and Meaningful Broadband Adoption Initiatives Colin Rhinesm...Digital Inclusion and Meaningful Broadband Adoption Initiatives Colin Rhinesm...
Digital Inclusion and Meaningful Broadband Adoption Initiatives Colin Rhinesm...
 
Jetstream
JetstreamJetstream
Jetstream
 
Innovation Accelerators Report
Innovation Accelerators ReportInnovation Accelerators Report
Innovation Accelerators Report
 
Work.
Work.Work.
Work.
 
Strategy for American Innovation
Strategy for American InnovationStrategy for American Innovation
Strategy for American Innovation
 
Collaboration with NSFCloud
Collaboration with NSFCloudCollaboration with NSFCloud
Collaboration with NSFCloud
 
AppImpact: A Framework for Mobile Technology in Behavioral Healthcare
AppImpact: A Framework for Mobile Technology in Behavioral HealthcareAppImpact: A Framework for Mobile Technology in Behavioral Healthcare
AppImpact: A Framework for Mobile Technology in Behavioral Healthcare
 
Report to the President and Congress Ensuring Leadership in Federally Funded ...
Report to the President and Congress Ensuring Leadership in Federally Funded ...Report to the President and Congress Ensuring Leadership in Federally Funded ...
Report to the President and Congress Ensuring Leadership in Federally Funded ...
 
Data Act Federal Register Notice Public Summary of Responses
Data Act Federal Register Notice Public Summary of ResponsesData Act Federal Register Notice Public Summary of Responses
Data Act Federal Register Notice Public Summary of Responses
 
Gloriad.flo con.2014.01
Gloriad.flo con.2014.01Gloriad.flo con.2014.01
Gloriad.flo con.2014.01
 
2014 COMPENDIUM Edition of National Research and Education Networks in Europe
2014 COMPENDIUM Edition of National Research and  Education Networks in Europe2014 COMPENDIUM Edition of National Research and  Education Networks in Europe
2014 COMPENDIUM Edition of National Research and Education Networks in Europe
 
New Westminster Keynote - Norman Jacknis
New Westminster Keynote - Norman JacknisNew Westminster Keynote - Norman Jacknis
New Westminster Keynote - Norman Jacknis
 
HIMSS Innovation Pathways Summary
HIMSS Innovation Pathways SummaryHIMSS Innovation Pathways Summary
HIMSS Innovation Pathways Summary
 

Recently uploaded

Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr LapshynFwdays
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Wonjun Hwang
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clashcharlottematthew16
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsMiki Katsuragi
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions
 

Recently uploaded (20)

Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clash
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering Tips
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food Manufacturing
 

Experiences In Building Globus Genomics Using Galaxy, Globus Online and AWS

  • 1. Experiences In Building Globus Genomics Using Galaxy, Globus Online and AWS Ravi K Madduri University of Chicago and ANL
  • 2. www.globus.org/genomics • Challenges in Sequencing Analysis • Proposed Approach Using Globus Genomics • Example Collaborations • Relevance to XSEDE • Q&A Outline
  • 3. www.globus.org/genomics (Re)Ru n Script (Re)Ru n Script Ins tal l Ins tal l M odi fy M odi fy Challenges in Sequencing Analysis Sequencing Centers Sequencing Centers Sequencing Centers Sequencing Centers Data Movement and Access Challenges Manual Data Analysis Public Data Public Data Storage Local Cluster/ CloudSeq Center Seq Center Research Lab • Data is distributed in different locations • Research labs need access to the data for analysis • Be able to Share data with other researchers/collaborators • Inefficient ways of data movement • Data needs to be available on the local and Distributed Compute Resources • Local Clusters, Cloud, Grid How do we analyze this Sequence Data Once we have the Sequence Data PicardPicard GATKGATK FastqFastq Ref GenomeRef Genome AlignmentAlignment Variant CallingVariant Calling • Manually move the data to the Compute node • Install all the tools required for the Analysis • BWA, Picard, GATK, Filtering Scripts, etc. • Shell scripts to sequentially execute the tools • Manually modify the scripts for any change • Error Prone, difficult to keep track, messy.. • Difficult to maintain and transfer the knowledge FTP, SCP, HTTP SCPFTP,SCP,HTTP
  • 4. www.globus.org/genomics Globus Genomics Sequencing Centers Sequencing Centers Sequencing Centers Sequencing Centers Public Data Public Data Storage Local Cluster/ CloudSeq Center Seq Center Research Lab Globus Online Provides a • High-performance • Fault-tolerant • Secure file transfer Service between all data-endpoints Data Management Data Analysis PicardPicard GATKGATK FastqFastq Ref GenomeRef Genome AlignmentAlignment Variant CallingVariant Calling Galaxy Data Libraries • Globus Online Integrated within Galaxy • Web-based UI • Drag-Drop workflow creations • Easily modify Workflows with new tools Globus Genomics on Amazon EC2 Globus Genomics on Amazon EC2 • Analytical tools are automatically run on the scalable compute resources when possible Galaxy Based Workflow Management System Globus Online Endpoints FTP, SCP, others FTP, SCP SCP Globus Genomics FTP,SCP,HTTP
  • 5. www.globus.org/genomics • Workflows can be easily defined and automated with integrated Galaxy Platform capabilities • Data movement is streamlined with integrated Globus file- transfer functionality • Resources can be provisioned on-demand with Amazon Web Services cloud based infrastructure Globus Genomics
  • 6. www.globus.org/genomics • Professionally managed and supported platform • Best practice pipelines • Enhanced workbench with breadth of analytic tools • Technical support and bioinformatics consulting • Access to pre-integrated end-points for reliable and high- performance data transfer (e.g. Broad Institute, Perkin Elmer, etc.) • Cost-effective solution with subscription-based pricing Additional Capabilities
  • 7. www.globus.org/genomics Globus Genomics – A flexible, scalable, simplified analysis platform Accessibility • Unified Web-interface for obtaining genomic data and applying computational tools to analyze the data • Easily integrate your own tools and scripts for analysis (CLI based tools) • Collection of tools (Tools Panel) that reflect good practices and community insights • Access every step of analysis and intermediate results:  View, Download, Visualize, Reuse (History Panel) Reproducibility • Track provenance and ensure repeatability of each analysis step:  input datasets, tools used, parameter values, and output datasets • Annotate each step or collection of steps to track and reproduce results • Intuitive Workflow Editor to create or modify complex workflows and use them as templates – Reusable and Reproducible Transparency • Publish and share metadata, histories, and workflows at multiple levels • Store public and generated datasets as Data Libraries – e.g: hg19 Ref Genome • Shared datasets and workflows can be imported by other users for reuse Publish Templates Data and Tools Globus Online Integration • Access GO Endpoints and transfer data from within Galaxy UI and into Galaxy workspace • Leverage local cluster or cloud based scalable computational resources for parallelizing the tools
  • 8. www.globus.org/genomics Example Collaborations Dobyns Lab Background: Investigate the nature and causes of a wide range of human developmental brain disorders Approach: Replaced manual analysis with Globus Genomics Results: Achieved greater than 10X speed-up in analysis of exome data Future Plans: Leverage scale-out capability of Globus Genomics by running increasingly larger data sets
  • 9. www.globus.org/genomics XSEDE’s Mission Statement accelerat[ing] open scientific discovery by enhancing the productivity of researchers, engineers, and scholars and making advanced digital resources easier to use.” Relevance to XSEDE Key XSEDE Goals That Globus Genomics Addresses • “Deepen and extend the impact of eScience infrastructure on research and education; in particular, to reach communities that have not previously made use of it; and • Expand the environment through the integration of new capabilities and resources such as instruments and data repositories based on the identified needs of the community.”
  • 10. www.globus.org/genomics • Globus Genomics leverages an XSEDE service – Globus Transfer for data movement – Globus Nexus for identity management – Globus Groups for group-based access management • Integrates advanced digital resources – sequencing centers, a commercial cloud provider, and NGS analysis pipelines • Reduces the cost and complexity of scientific discovery for a new community (NGS researchers) who have not historically made much use of advanced eScience infrastructures. Relevance to XSEDE (Cont..)
  • 11. www.globus.org/genomics • Globus Genomics achieves these goals without making use of XSEDE supercomputers • Choice to use Amazon cloud services rather than XSEDE systems for Globus Genomics computations is deliberate – scales at which our target users operate today, the costs associated with the use of Amazon cloud computers are modest, and Amazon’s on-demand, pay-as-you-go storage and computing capabilities match user needs better than the proposal- and queue- based access policies provided by XSEDE computers. • We plan to explore using XSEDE resources to execute Globus Genomics pipelines XSEDE Vs AWS
  • 12. www.globus.org/genomics • This work was supported in part by the NIH through the NHLBI grant: The Cardiovascular Research Grid (R24HL085343) and by the U.S. Department of Energy under contract DE-AC02-06CH11357. We are grateful to Amazon, Inc., for an award of Amazon Web Services time that facilitated early experiments. • The Globus Genomics and Globus Online teams at University of Chicago and Argonne National Laboratory Acknowledgments
  • 13. www.globus.org/genomics • More information on Globus Genomics and to sign up: www.globus.org/genomics • More information on Globus Online: www.globusonline.org • Questions? • Thank you! For more information

Editor's Notes

  1. <number>
  2. <number>