SlideShare a Scribd company logo
1 of 77
Download to read offline
Tavaxy
Integrating Taverna and Galaxy with Cloud
            Computing Support

           Mohamed Abouelhoda

              Nile University
                   Egypt




                                            1
Workflows in Bioinformatics
Finding Homologous Sequences


                                           Clustalw



      Get Protein               Get Id &
                    BLASTp
       Sequence                sequences


                                           T-Coffee




                                                      2
Implementing Scientific Workflows
Method 1: Write Python/Perl/Shell script

  Advantages
   • Reliable and efficient
   • Comprehensive programming capabilities (conditionals, loops, etc..)


 Disadvantages
  •   Requires programming skills , especially with HPC resources
  •   Scripts are workflow-specific
  •   Costly to create, debug and modify
  •   Requires installing and managing tools



                                                                           3
Implementing Scientific Workflows
Methods 2: Use of Workflow Systems


• Kepler
• Triana
• WildFire
• GenePattern
• Pegsus
• Taverna
• Galaxy                         Most popular in
• and many more                  Bioinformatics

 http://en.wikipedia.org/wiki/Bioinformatics_workflow_management_systems

                                                                           4
Composing a Workflow using Galaxy

                                       Workflow
                                      Environment




                     Canvas for
                   workflow editing
    Tool Library


                                                    5
Composing a Workflow using Galaxy




          Create new workflow
           and give it a name




                                6
Composing a Workflow using Galaxy




     Click to create
      input node
                                7
Composing a Workflow using Galaxy




           Input node created,
           will read input from
               user account




                                  8
Composing a Workflow using Galaxy




                              BLAST node
                               is custom
                             one added to
                              the system

     Choose BLAST tool and
        click to create it

                                            9
Composing a Workflow using Galaxy




              1. BLAST
            node created

            2. Window opened
               to set BLAST
                parameters



                                10
Composing a Workflow using Galaxy




                 Connect output of
                   input node as
                  input to BLAST




                                     11
Composing a Workflow using Galaxy




    Complete
   composing
    workflow



                                12
Composing a Workflow using Galaxy




  Start running the
      workflow




                                13
Composing a Workflow using Galaxy




                     Screen showing parts
                     of the output files and
                     provides links to them




                                               14
Benefits of Workflow Systems
Accelerates computation
• Intuitive abstract means for describing computational experiments

• Requires no programming expertise

• Easy to modify

• Hide execution details: invocation and scheduling

• Direct use of parallel architectures




                                                                      15
Benefits of Workflow Systems
Formalize computation
• Come with library of tools or access directories
                                                      Tool accessibility


• Store experiment details (used tools, their parameters, and used data)
                                                     Reproducibility


• Share workflows with analysis history including intermediate results
                                                      Transparency


Mesirov. Science, 2010
B. Giardine, et al. Genome Research, 2005.



                                                                            16
Implementing Scientific Workflows
Methods 2: Use of Workflow Systems


• Kepler
• Triana
• WildFire
• GenePattern
• Pegsus
• Taverna
• Galaxy                         Most popular in
• and many more                  Bioinformatics

 http://en.wikipedia.org/wiki/Bioinformatics_workflow_management_systems

                                                                           17
Composing a Workflow using Taverna
Example: Homology Workflow

                                         Clustalw



   Read Protein               Get Id &
                  BLASTp
    Sequence                 sequences


                                         T-Coffee




                                                    18
Composing a Workflow using Taverna




                      Open the workflow
                      editor, it is desktop
                          application




                                              19
Composing a Workflow using Taverna




                   • Choose a web-service and copy
                     its URL if not in the service
                     directory,
                   • Here we will use BLAST at EBI




                                                     20
Composing a Workflow using Taverna




                       Paste service URL so that
                            Taverna collects
                       information about it (i.e.,
                       methods and parameters)



                                                 21
Composing a Workflow using Taverna




                  Service
               information
                 retrieved




                                 22
Composing a Workflow using Taverna




                Create node for the
                   BLAST service




                                      23
Composing a Workflow using Taverna




                  Create node to
                   receive input




                                   24
Composing a Workflow using Taverna




                       Create nodes to
                      prepare input and
                      parameters in XML


                                          25
Composing a Workflow using Taverna




                         Create node to
                      receive Job ID from
                        service provider


                                            26
Composing a Workflow using Taverna




                             Input data and
                        parameters are merged in
                                XML file

                                            27
Composing a Workflow using Taverna




             Create sub-workflow to check
              status of the asynchronous
                         service

                                            28
Composing a Workflow using Taverna




                 Rename it Poll status



                                         29
Composing a Workflow using Taverna




         Put in place after job
              submission


                                  30
Composing a Workflow using Taverna




             Use get result method of the
             web-service to retrieve data
                      when ready

                                            31
Composing a Workflow using Taverna




               Get result are created

                                        32
Composing a Workflow using Taverna




           Start putting Get_Result in
             place in the workflow

                                         33
Composing a Workflow using Taverna




       Specify output type (BLAST xml)

                                         34
Composing a Workflow using Taverna



                  Short cut to
                 sub-workflow




       Start retrieval when
       check status is done

                                 35
Composing a Workflow using Taverna
                                                     Clustalw



               Read Protein               Get Id &
                               BLASTp
                Sequence                 sequences


                                                     T-Coffee



                              The rest of the
                                workflow




                                                          36
Taverna vs. Galaxy
                                                 Taverna                   Galaxy
           Purpose                                General               Bioinformatics
           Interface                              Desktop               Web-interface
           Usability                        More difficult                   Easy
           Engine                               Control flow              Data flow
           Control constructs                       Yes                       No
           Jobs                                 Web-service            Local invocation
           Programs                       Service directory            Library of tools
           Use of Local HPC                     Threads only           Threads/Cluster


Taverna                                                     Galaxy
D. Hull, et al. Nucleic Acids Research, 2006.               B. Giardine, et al. Genome Research, 2005.
T. Oinn, et al. Bioinformatics, 2004.
                                                                                                   37
Communities

Taverna MyExperiment                Galaxy Pages


                                          PIC




             What if we can make use of both
           repositories and what if we can have
              advantages of both systems??
                                                   38
Integrating Taverna and Galaxy Workflows




 Taverna                    Galaxy




                                           39
Integrating Taverna and Galaxy Workflows

               Tavaxy




                                           40
Tavaxy


• A standalone workflow system based on workflow patterns
• Integrates both Taverna and Galaxy workflows and improve their
  performance
• Easy to use and includes a large library of software tools
• Exploits high performance computing resources (parallel
  infrastructure) with all details being hidden
• Runs on local or cloud computing infrastructure
• Optimized to handle large datasets



                                                                   41
Tavaxy Environment




                     42
Tavaxy Nodes (Processing Units)




  Input node



                                                        Pattern
                                  • If-else conditional nodes
                                  •   Iteration
                                                                            Sub-
Tool node                         •   Data Merge
                                  •   Data Select                         workflow
                                                                           nodes


New in Tavaxy and not in Galaxy               New in Tavaxy and not in Taverna
• Pattern nodes, Sub-workflow nodes           • Easier to use interface
• Remote execution                            • Direct use of local HPC
                                                                                     43
Tools in Tavaxy

220 Tools organized in the following categories

 • EMBOSS: Package of sequence analysis tools
 • SAMtools: Package of scirpts and programs to handle NGS data
 • Fastx: Package for manipulating fasta files

 • Galaxy tools: A set of data utilities and tools developed by the galaxy team
 • Taverna tools: A set of tools based on web-services based on Taverna. collection
   This section includes as well a set of data manipulation utilities developed by the
   taverna team.

 • Tavaxy tools: This section includes extra utilities and tools for sequence analysis
   and genome comparison developed/added by the Tavaxy team.
 • Cloud utilities: A set of data manipulation and configuration of computing
   infrastructure on the Amazon cloud.


                                                                                         44
NGS Mapping Tools and Databases in Tavaxy

Tools
   •    MAQ
   •    BFAST
   •    BWA
   •    Bowtie
   •    …


Databases
 • Reference human genome
   hg36
 • Indexed genome for MAQ,
   Bowtie
 • NCBI nucleotide/protein DBs


                                          45
Data Patterns of Tavaxy

To facilitate execution of data-intensive tasks in cloud computing
cluster




                                                                     46
Domenstration 1
Importing Taverna Workflow




                             47
The Tavaxy Project
Single environment for integrating Galaxy and Taverna workflows:

Taverna or Galaxy workflows can be
    o imported ,  open and re-draw in Tavaxy environment

    o re-designed,  delete or re-order workflow nodes

    o enhanced,       add new nodes and sub-workflows

    o optimized,
                Tavaxy concstructs can be used to exploit parallelization
                remote Taverna calls can be replaced with local tools

    o and executed in Tavaxy
               on local (HPC) infrastructure
               on cloud
                                                                             48
Importing Taverna Workflow




                 Taverna
                 Workflow




                             49
Taverna Workflow




                   50
Imported Taverna Workflow




                            51
Optimized Imported Taverna Workflow




                                      53
Idea of Integration
Workflow Patterns




Bag of techniques

- Taverna is used as a secondary engine to execute Taverna sub-workflows
- The use of patterns as special nodes in the data-flow oriented engine
- Simulating control constructs over the data flow digraph representing the workflow
- Use of sub-workflow to enable iteration pattern

                                                                                       54
High Performance
Cloud Computing Support




                          55
Three Modes for Supporting Cloud

• Whole system instantiation
• Delegating execution of a sub-workflow to the cloud
• Delegating execution of one tool to the cloud




                                                        56
User case scenario
Whole System Instantiation


                                                               Cloud
                                                              Platform
                                  5
                                                                (Instance Disks)
                                                                            EBS
                                                        2

                                       1                                             S3
                                                                                   Storage
                                                                                   4
                                                                   3


   1 Create master node
   2 Master node creates worker nodes
   3 Instance disks are mounted
                                                                       Include Tavaxy tools
   4 Connection to S3 is established
                                                                             and DBs
   5 Submit data, command line (and executable) and execute                                   57
User case scenario
Sub-workflow and tool
     delegation

                                                             Cloud
                                                            Platform
                                5
                                                              (Instance Disks)
                                                                          EBS
                                                      2

                                     1                                             S3
                                                                                 Storage
                                                                                 4
                                                                 3


 1 Create master node
 2 Master node creates worker nodes
 3 Instance disks are mounted
 4 Connection to S3 is established
 5 Submit data, command line (and executable) and execute                                  58
Whole System Instantiation




                             59
Whole System Instantiation




                             60
Sub-workflow/Tool Instantiation




                                  61
Exploiting Cloud Computing
Two steps:
        1- Enter AWS credentials
        2- Define your cluster




                                             62
Supporting Task Parallelism

I. Parallelism due to branching

        • Branching in the DAG implies independent execution
        • Independent jobs can run in parallel
Supporting Data Intensive Tasks

Data parallelism

      • For an input as a list, node A can process the list items in parallel


                                     Tavaxy Data Patterns

                     [x1,.., xn]
                                                            [x1,.., xn]       [y1,.., yn]
                 A   
                     a =[A(x1),..,A(xn)]
                                                                          A
                 B
                                                        
            b =[B(A(x1)),..,B(A(xn))]                    a =[A((x1, y1)),..,A((xn, yn))]

           Single List                                           Multiple List
Personalized Medicine Workflow




                                 65
Personalized Medicine Workflow
Individual Whole Genome Mapping and Variant Annotation   Tonellato-Wall




                                                                      66
Workflow Sktech




            Variant
Alignment
            analysis




                                         67
Personalized Medicine Workflow on Tavaxy
Read Mapping
                                             SNP/Disease
                                              Database
                                               Queries



                        Variant
                        Calling




          Formatting




                                                   68
Exploiting Parallelization and Locality of Data

                                                 Data base are
                                                 hosted locally




                            Fork Pattern
Use of list data pattern
                            Parallel Task
Reads are defined as         Execution
 list which implies         (Parallel DBs
parallel processing of        Queries)
   blocks of reads



                                                           69
Read-Mapping with Crossbow

• Crossbow (based on EMR/Hadoop) is used to map set of
  human reads to human genome.
• With this cluster, we mounted EBS volumes including the
  reference human genome files (each including one
  chromosome)
• Read Datasets:
   – illumnia reads of around 13 Gbp (47 GB) from from the African
     genome,
   – the human genome version hg18, build 36




                                                                     70
Read-Mapping with Crossbow




                             71
Domenstration 3
Metagenomics Workflow




                        72
NGS-based Metagenomics Study




  Galaxy Workflow   Optimized Tavaxy Workflow


                                                73
NGS-based Metagenomics Study

               Mega-blast run on cloud




                                         74
NGS-based Metagenomics Study

• The MegaBLAST program is used to annotate a set of
  sequences coming from a metagenomics experiment.
• With this cluster, we mounted EBS volumes including the NCBI
  NT database
• Datasets: Windshield dataset composed of two collections of
  454 FLX reads:
   – Trip A: 138575 (25.3 Mbp) Mb104283 (18.8 Mbp)
   – Trip B: ) 151000 (30.2 Mbp) 79460 (12.7 Mbp)




                                                            75
Bioinformatics Experiment (2)
• We used the windshield dataset which is composed of
  two collections of 454 FLX reads.
• These reads came from the DNA of the organic matter
  on the windshield of a moving vehicle that visited two
  geographic locations (trips A and B).
• For each trip A or B, there are two subsets for the left
  and right part of the windshield.
• The number of reads are 66575 (12.3 Mbp) 71000
  (14.2 Mbp) 104283 (18.8 Mbp) 79460 (12.7 Mbp) for
  trips A Left, B Left, A Right, and B Right, respectively.
• The queries were queued in parallel, fetch strategy was
  used

                                                          76
Conclusions and Future Work
• Use of workflow systems provides flexibility and efficiency
• High performance and cloud computing resources are
  exploited with technical details being hidden
• Future work include
   – Further performance optimization for execution on local and cloud
     infrastructures.
   – Supporting multiple cloud providers
   – Handling larger data sizes in multi-use environment




                                                                         77
Thanks for attention




                       78

More Related Content

What's hot

Plone FSR
Plone FSRPlone FSR
Plone FSRfulv
 
Workflow User Interfaces Patterns
Workflow User Interfaces PatternsWorkflow User Interfaces Patterns
Workflow User Interfaces PatternsJean Vanderdonckt
 
How Parse Built a Mobile Backend as a Service on AWS (MBL307) | AWS re:Invent...
How Parse Built a Mobile Backend as a Service on AWS (MBL307) | AWS re:Invent...How Parse Built a Mobile Backend as a Service on AWS (MBL307) | AWS re:Invent...
How Parse Built a Mobile Backend as a Service on AWS (MBL307) | AWS re:Invent...Amazon Web Services
 
Spring Batch Workshop (advanced)
Spring Batch Workshop (advanced)Spring Batch Workshop (advanced)
Spring Batch Workshop (advanced)lyonjug
 
Batching and Java EE (jdk.io)
Batching and Java EE (jdk.io)Batching and Java EE (jdk.io)
Batching and Java EE (jdk.io)Ryan Cuprak
 
[DanNotes] XPages - Beyound the Basics
[DanNotes] XPages - Beyound the Basics[DanNotes] XPages - Beyound the Basics
[DanNotes] XPages - Beyound the BasicsUlrich Krause
 
Apache maven and its impact on java 9 (Java One 2017)
Apache maven and its impact on java 9 (Java One 2017)Apache maven and its impact on java 9 (Java One 2017)
Apache maven and its impact on java 9 (Java One 2017)Robert Scholte
 
Step-by-Step Introduction to Apache Flink
Step-by-Step Introduction to Apache Flink Step-by-Step Introduction to Apache Flink
Step-by-Step Introduction to Apache Flink Slim Baltagi
 
MicroProfile: A Quest for a Lightweight and Modern Enterprise Java Platform
MicroProfile: A Quest for a Lightweight and Modern Enterprise Java PlatformMicroProfile: A Quest for a Lightweight and Modern Enterprise Java Platform
MicroProfile: A Quest for a Lightweight and Modern Enterprise Java PlatformMike Croft
 
The Play Framework at LinkedIn
The Play Framework at LinkedInThe Play Framework at LinkedIn
The Play Framework at LinkedInYevgeniy Brikman
 
Erlang as a cloud citizen, a fractal approach to throughput
Erlang as a cloud citizen, a fractal approach to throughputErlang as a cloud citizen, a fractal approach to throughput
Erlang as a cloud citizen, a fractal approach to throughputPaolo Negri
 
OpenNTF Domino API (ODA): Super-Charging Domino Development
OpenNTF Domino API (ODA): Super-Charging Domino DevelopmentOpenNTF Domino API (ODA): Super-Charging Domino Development
OpenNTF Domino API (ODA): Super-Charging Domino DevelopmentPaul Withers
 
Being Ready for Apache Kafka - Apache: Big Data Europe 2015
Being Ready for Apache Kafka - Apache: Big Data Europe 2015Being Ready for Apache Kafka - Apache: Big Data Europe 2015
Being Ready for Apache Kafka - Apache: Big Data Europe 2015Michael Noll
 
Introduction to Puppet Scripting
Introduction to Puppet ScriptingIntroduction to Puppet Scripting
Introduction to Puppet ScriptingAchieve Internet
 
Apache Flink Crash Course by Slim Baltagi and Srini Palthepu
Apache Flink Crash Course by Slim Baltagi and Srini PalthepuApache Flink Crash Course by Slim Baltagi and Srini Palthepu
Apache Flink Crash Course by Slim Baltagi and Srini PalthepuSlim Baltagi
 
Chef for OpenStack - OpenStack Fall 2012 Summit
Chef for OpenStack  - OpenStack Fall 2012 SummitChef for OpenStack  - OpenStack Fall 2012 Summit
Chef for OpenStack - OpenStack Fall 2012 SummitMatt Ray
 
Novedades Denali Integration Services
Novedades Denali Integration ServicesNovedades Denali Integration Services
Novedades Denali Integration ServicesSolidQ
 
Introduction to the intermediate Python - v1.1
Introduction to the intermediate Python - v1.1Introduction to the intermediate Python - v1.1
Introduction to the intermediate Python - v1.1Andrei KUCHARAVY
 

What's hot (19)

Plone FSR
Plone FSRPlone FSR
Plone FSR
 
Workflow User Interfaces Patterns
Workflow User Interfaces PatternsWorkflow User Interfaces Patterns
Workflow User Interfaces Patterns
 
How Parse Built a Mobile Backend as a Service on AWS (MBL307) | AWS re:Invent...
How Parse Built a Mobile Backend as a Service on AWS (MBL307) | AWS re:Invent...How Parse Built a Mobile Backend as a Service on AWS (MBL307) | AWS re:Invent...
How Parse Built a Mobile Backend as a Service on AWS (MBL307) | AWS re:Invent...
 
Spring Batch Workshop (advanced)
Spring Batch Workshop (advanced)Spring Batch Workshop (advanced)
Spring Batch Workshop (advanced)
 
Batching and Java EE (jdk.io)
Batching and Java EE (jdk.io)Batching and Java EE (jdk.io)
Batching and Java EE (jdk.io)
 
[DanNotes] XPages - Beyound the Basics
[DanNotes] XPages - Beyound the Basics[DanNotes] XPages - Beyound the Basics
[DanNotes] XPages - Beyound the Basics
 
Apache maven and its impact on java 9 (Java One 2017)
Apache maven and its impact on java 9 (Java One 2017)Apache maven and its impact on java 9 (Java One 2017)
Apache maven and its impact on java 9 (Java One 2017)
 
Step-by-Step Introduction to Apache Flink
Step-by-Step Introduction to Apache Flink Step-by-Step Introduction to Apache Flink
Step-by-Step Introduction to Apache Flink
 
MicroProfile: A Quest for a Lightweight and Modern Enterprise Java Platform
MicroProfile: A Quest for a Lightweight and Modern Enterprise Java PlatformMicroProfile: A Quest for a Lightweight and Modern Enterprise Java Platform
MicroProfile: A Quest for a Lightweight and Modern Enterprise Java Platform
 
The Play Framework at LinkedIn
The Play Framework at LinkedInThe Play Framework at LinkedIn
The Play Framework at LinkedIn
 
Erlang as a cloud citizen, a fractal approach to throughput
Erlang as a cloud citizen, a fractal approach to throughputErlang as a cloud citizen, a fractal approach to throughput
Erlang as a cloud citizen, a fractal approach to throughput
 
OpenNTF Domino API (ODA): Super-Charging Domino Development
OpenNTF Domino API (ODA): Super-Charging Domino DevelopmentOpenNTF Domino API (ODA): Super-Charging Domino Development
OpenNTF Domino API (ODA): Super-Charging Domino Development
 
Being Ready for Apache Kafka - Apache: Big Data Europe 2015
Being Ready for Apache Kafka - Apache: Big Data Europe 2015Being Ready for Apache Kafka - Apache: Big Data Europe 2015
Being Ready for Apache Kafka - Apache: Big Data Europe 2015
 
Introduction to Puppet Scripting
Introduction to Puppet ScriptingIntroduction to Puppet Scripting
Introduction to Puppet Scripting
 
Apache Flink Crash Course by Slim Baltagi and Srini Palthepu
Apache Flink Crash Course by Slim Baltagi and Srini PalthepuApache Flink Crash Course by Slim Baltagi and Srini Palthepu
Apache Flink Crash Course by Slim Baltagi and Srini Palthepu
 
Apache Flink Hands On
Apache Flink Hands OnApache Flink Hands On
Apache Flink Hands On
 
Chef for OpenStack - OpenStack Fall 2012 Summit
Chef for OpenStack  - OpenStack Fall 2012 SummitChef for OpenStack  - OpenStack Fall 2012 Summit
Chef for OpenStack - OpenStack Fall 2012 Summit
 
Novedades Denali Integration Services
Novedades Denali Integration ServicesNovedades Denali Integration Services
Novedades Denali Integration Services
 
Introduction to the intermediate Python - v1.1
Introduction to the intermediate Python - v1.1Introduction to the intermediate Python - v1.1
Introduction to the intermediate Python - v1.1
 

Similar to Mohamed Abouelhoda: Next Generation Workflow Systems on the Cloud: The Tavaxy System

AWS - Managing Your Cloud Assets 2013
AWS - Managing Your Cloud Assets 2013AWS - Managing Your Cloud Assets 2013
AWS - Managing Your Cloud Assets 2013Amazon Web Services
 
Taverna and myExperiment. SCAPE presentation at a Hack-a-thon
Taverna and myExperiment. SCAPE presentation at a Hack-a-thonTaverna and myExperiment. SCAPE presentation at a Hack-a-thon
Taverna and myExperiment. SCAPE presentation at a Hack-a-thonSCAPE Project
 
The Taverna Software Suite
The Taverna Software SuiteThe Taverna Software Suite
The Taverna Software SuitemyGrid team
 
Opscode-Eucalyptus Webinar 20110721
 Opscode-Eucalyptus Webinar 20110721 Opscode-Eucalyptus Webinar 20110721
Opscode-Eucalyptus Webinar 20110721Chef Software, Inc.
 
Managing Your Cloud Assets with AWS
Managing Your Cloud Assets with AWSManaging Your Cloud Assets with AWS
Managing Your Cloud Assets with AWSAmazon Web Services
 
Data cleaning with the Kurator toolkit: Bridging the gap between conventional...
Data cleaning with the Kurator toolkit: Bridging the gap between conventional...Data cleaning with the Kurator toolkit: Bridging the gap between conventional...
Data cleaning with the Kurator toolkit: Bridging the gap between conventional...Timothy McPhillips
 
What's new in Docker - InfraKit - Docker Meetup Berlin 2016
What's new in Docker - InfraKit - Docker Meetup Berlin 2016What's new in Docker - InfraKit - Docker Meetup Berlin 2016
What's new in Docker - InfraKit - Docker Meetup Berlin 2016Patrick Chanezon
 
Improving your Time to Market with AWS
Improving your Time to Market with AWSImproving your Time to Market with AWS
Improving your Time to Market with AWSAmazon Web Services
 
Build your own clouds with Chef and MCollective
Build your own clouds with Chef and MCollectiveBuild your own clouds with Chef and MCollective
Build your own clouds with Chef and MCollectiveJonathan Weiss
 
Jenkins Workflow Webinar - Dec 10, 2014
Jenkins Workflow Webinar - Dec 10, 2014Jenkins Workflow Webinar - Dec 10, 2014
Jenkins Workflow Webinar - Dec 10, 2014CloudBees
 
2014 Taverna tutorial introduction to Taverna workflows
2014 Taverna tutorial introduction to Taverna workflows2014 Taverna tutorial introduction to Taverna workflows
2014 Taverna tutorial introduction to Taverna workflowsmyGrid team
 
Puppet overview
Puppet overviewPuppet overview
Puppet overviewjoshbeard
 
Crash Course on Open Source Cloud Computing
Crash Course on Open Source Cloud ComputingCrash Course on Open Source Cloud Computing
Crash Course on Open Source Cloud ComputingMark Hinkle
 
Continuous Integration & Continuous Delivery
Continuous Integration & Continuous DeliveryContinuous Integration & Continuous Delivery
Continuous Integration & Continuous DeliveryDatabricks
 
State of Puppet 2013 - Puppet Camp DC
State of Puppet 2013 - Puppet Camp DCState of Puppet 2013 - Puppet Camp DC
State of Puppet 2013 - Puppet Camp DCPuppet
 
Introduction to data flow management using apache nifi
Introduction to data flow management using apache nifiIntroduction to data flow management using apache nifi
Introduction to data flow management using apache nifiAnshuman Ghosh
 
04_Azure Kubernetes Service: Basic Practices for Developers_GAB2019
04_Azure Kubernetes Service: Basic Practices for Developers_GAB201904_Azure Kubernetes Service: Basic Practices for Developers_GAB2019
04_Azure Kubernetes Service: Basic Practices for Developers_GAB2019Kumton Suttiraksiri
 
DIMT 2023 SG - Hands-on Workshop_ Getting started with Confluent Cloud.pdf
DIMT 2023 SG - Hands-on Workshop_ Getting started with Confluent Cloud.pdfDIMT 2023 SG - Hands-on Workshop_ Getting started with Confluent Cloud.pdf
DIMT 2023 SG - Hands-on Workshop_ Getting started with Confluent Cloud.pdfconfluent
 
OSCON 2012 OpenStack Automation and DevOps Best Practices
OSCON 2012 OpenStack Automation and DevOps Best PracticesOSCON 2012 OpenStack Automation and DevOps Best Practices
OSCON 2012 OpenStack Automation and DevOps Best PracticesMatt Ray
 
Automate your Development Environment with Vagrant & Chef
Automate your Development Environment with Vagrant & ChefAutomate your Development Environment with Vagrant & Chef
Automate your Development Environment with Vagrant & Chef Michael Lihs
 

Similar to Mohamed Abouelhoda: Next Generation Workflow Systems on the Cloud: The Tavaxy System (20)

AWS - Managing Your Cloud Assets 2013
AWS - Managing Your Cloud Assets 2013AWS - Managing Your Cloud Assets 2013
AWS - Managing Your Cloud Assets 2013
 
Taverna and myExperiment. SCAPE presentation at a Hack-a-thon
Taverna and myExperiment. SCAPE presentation at a Hack-a-thonTaverna and myExperiment. SCAPE presentation at a Hack-a-thon
Taverna and myExperiment. SCAPE presentation at a Hack-a-thon
 
The Taverna Software Suite
The Taverna Software SuiteThe Taverna Software Suite
The Taverna Software Suite
 
Opscode-Eucalyptus Webinar 20110721
 Opscode-Eucalyptus Webinar 20110721 Opscode-Eucalyptus Webinar 20110721
Opscode-Eucalyptus Webinar 20110721
 
Managing Your Cloud Assets with AWS
Managing Your Cloud Assets with AWSManaging Your Cloud Assets with AWS
Managing Your Cloud Assets with AWS
 
Data cleaning with the Kurator toolkit: Bridging the gap between conventional...
Data cleaning with the Kurator toolkit: Bridging the gap between conventional...Data cleaning with the Kurator toolkit: Bridging the gap between conventional...
Data cleaning with the Kurator toolkit: Bridging the gap between conventional...
 
What's new in Docker - InfraKit - Docker Meetup Berlin 2016
What's new in Docker - InfraKit - Docker Meetup Berlin 2016What's new in Docker - InfraKit - Docker Meetup Berlin 2016
What's new in Docker - InfraKit - Docker Meetup Berlin 2016
 
Improving your Time to Market with AWS
Improving your Time to Market with AWSImproving your Time to Market with AWS
Improving your Time to Market with AWS
 
Build your own clouds with Chef and MCollective
Build your own clouds with Chef and MCollectiveBuild your own clouds with Chef and MCollective
Build your own clouds with Chef and MCollective
 
Jenkins Workflow Webinar - Dec 10, 2014
Jenkins Workflow Webinar - Dec 10, 2014Jenkins Workflow Webinar - Dec 10, 2014
Jenkins Workflow Webinar - Dec 10, 2014
 
2014 Taverna tutorial introduction to Taverna workflows
2014 Taverna tutorial introduction to Taverna workflows2014 Taverna tutorial introduction to Taverna workflows
2014 Taverna tutorial introduction to Taverna workflows
 
Puppet overview
Puppet overviewPuppet overview
Puppet overview
 
Crash Course on Open Source Cloud Computing
Crash Course on Open Source Cloud ComputingCrash Course on Open Source Cloud Computing
Crash Course on Open Source Cloud Computing
 
Continuous Integration & Continuous Delivery
Continuous Integration & Continuous DeliveryContinuous Integration & Continuous Delivery
Continuous Integration & Continuous Delivery
 
State of Puppet 2013 - Puppet Camp DC
State of Puppet 2013 - Puppet Camp DCState of Puppet 2013 - Puppet Camp DC
State of Puppet 2013 - Puppet Camp DC
 
Introduction to data flow management using apache nifi
Introduction to data flow management using apache nifiIntroduction to data flow management using apache nifi
Introduction to data flow management using apache nifi
 
04_Azure Kubernetes Service: Basic Practices for Developers_GAB2019
04_Azure Kubernetes Service: Basic Practices for Developers_GAB201904_Azure Kubernetes Service: Basic Practices for Developers_GAB2019
04_Azure Kubernetes Service: Basic Practices for Developers_GAB2019
 
DIMT 2023 SG - Hands-on Workshop_ Getting started with Confluent Cloud.pdf
DIMT 2023 SG - Hands-on Workshop_ Getting started with Confluent Cloud.pdfDIMT 2023 SG - Hands-on Workshop_ Getting started with Confluent Cloud.pdf
DIMT 2023 SG - Hands-on Workshop_ Getting started with Confluent Cloud.pdf
 
OSCON 2012 OpenStack Automation and DevOps Best Practices
OSCON 2012 OpenStack Automation and DevOps Best PracticesOSCON 2012 OpenStack Automation and DevOps Best Practices
OSCON 2012 OpenStack Automation and DevOps Best Practices
 
Automate your Development Environment with Vagrant & Chef
Automate your Development Environment with Vagrant & ChefAutomate your Development Environment with Vagrant & Chef
Automate your Development Environment with Vagrant & Chef
 

More from GigaScience, BGI Hong Kong

IDW2022: A decades experiences in transparent and interactive publication of ...
IDW2022: A decades experiences in transparent and interactive publication of ...IDW2022: A decades experiences in transparent and interactive publication of ...
IDW2022: A decades experiences in transparent and interactive publication of ...GigaScience, BGI Hong Kong
 
Scott Edmunds: Preparing a data paper for GigaByte
Scott Edmunds: Preparing a data paper for GigaByteScott Edmunds: Preparing a data paper for GigaByte
Scott Edmunds: Preparing a data paper for GigaByteGigaScience, BGI Hong Kong
 
STM Week: Demonstrating bringing publications to life via an End-to-end XML p...
STM Week: Demonstrating bringing publications to life via an End-to-end XML p...STM Week: Demonstrating bringing publications to life via an End-to-end XML p...
STM Week: Demonstrating bringing publications to life via an End-to-end XML p...GigaScience, BGI Hong Kong
 
Measuring richness. A RCT to quantify the benefits of metadata quality; Scott...
Measuring richness. A RCT to quantify the benefits of metadata quality; Scott...Measuring richness. A RCT to quantify the benefits of metadata quality; Scott...
Measuring richness. A RCT to quantify the benefits of metadata quality; Scott...GigaScience, BGI Hong Kong
 
Scott Edmunds: A new publishing workflow for rapid dissemination of genomes u...
Scott Edmunds: A new publishing workflow for rapid dissemination of genomes u...Scott Edmunds: A new publishing workflow for rapid dissemination of genomes u...
Scott Edmunds: A new publishing workflow for rapid dissemination of genomes u...GigaScience, BGI Hong Kong
 
Scott Edmunds: Quantifying how FAIR is Hong Kong: The Hong Kong Shareability ...
Scott Edmunds: Quantifying how FAIR is Hong Kong: The Hong Kong Shareability ...Scott Edmunds: Quantifying how FAIR is Hong Kong: The Hong Kong Shareability ...
Scott Edmunds: Quantifying how FAIR is Hong Kong: The Hong Kong Shareability ...GigaScience, BGI Hong Kong
 
Scott Edmunds talk at IARC: How can we make science more trustworthy and FAIR...
Scott Edmunds talk at IARC: How can we make science more trustworthy and FAIR...Scott Edmunds talk at IARC: How can we make science more trustworthy and FAIR...
Scott Edmunds talk at IARC: How can we make science more trustworthy and FAIR...GigaScience, BGI Hong Kong
 
PAGAsia19 - The Digitalization of Ruili Botanical Garden Project: Production...
PAGAsia19 - The Digitalization of Ruili Botanical Garden Project:  Production...PAGAsia19 - The Digitalization of Ruili Botanical Garden Project:  Production...
PAGAsia19 - The Digitalization of Ruili Botanical Garden Project: Production...GigaScience, BGI Hong Kong
 
Democratising biodiversity and genomics research: open and citizen science to...
Democratising biodiversity and genomics research: open and citizen science to...Democratising biodiversity and genomics research: open and citizen science to...
Democratising biodiversity and genomics research: open and citizen science to...GigaScience, BGI Hong Kong
 
Ricardo Wurmus: Reproducible genomics analysis pipelines with GNU Guix
Ricardo Wurmus: Reproducible genomics analysis pipelines with GNU GuixRicardo Wurmus: Reproducible genomics analysis pipelines with GNU Guix
Ricardo Wurmus: Reproducible genomics analysis pipelines with GNU GuixGigaScience, BGI Hong Kong
 
Anil Thanki at #ICG13: Aequatus: An open-source homology browser
Anil Thanki at #ICG13: Aequatus: An open-source homology browserAnil Thanki at #ICG13: Aequatus: An open-source homology browser
Anil Thanki at #ICG13: Aequatus: An open-source homology browserGigaScience, BGI Hong Kong
 
Paul Pavlidis at #ICG13: Monitoring changes in the Gene Ontology and their im...
Paul Pavlidis at #ICG13: Monitoring changes in the Gene Ontology and their im...Paul Pavlidis at #ICG13: Monitoring changes in the Gene Ontology and their im...
Paul Pavlidis at #ICG13: Monitoring changes in the Gene Ontology and their im...GigaScience, BGI Hong Kong
 
Venice Juanillas at #ICG13: Rice Galaxy: an open resource for plant science
Venice Juanillas at #ICG13: Rice Galaxy: an open resource for plant scienceVenice Juanillas at #ICG13: Rice Galaxy: an open resource for plant science
Venice Juanillas at #ICG13: Rice Galaxy: an open resource for plant scienceGigaScience, BGI Hong Kong
 
Stefan Prost at #ICG13: Genome analyses show strong selection on coloration, ...
Stefan Prost at #ICG13: Genome analyses show strong selection on coloration, ...Stefan Prost at #ICG13: Genome analyses show strong selection on coloration, ...
Stefan Prost at #ICG13: Genome analyses show strong selection on coloration, ...GigaScience, BGI Hong Kong
 
Lisa Johnson at #ICG13: Re-assembly, quality evaluation, and annotation of 67...
Lisa Johnson at #ICG13: Re-assembly, quality evaluation, and annotation of 67...Lisa Johnson at #ICG13: Re-assembly, quality evaluation, and annotation of 67...
Lisa Johnson at #ICG13: Re-assembly, quality evaluation, and annotation of 67...GigaScience, BGI Hong Kong
 
Chris Armit at IDW2018: Democratising Data Publishing: A Global Perspective
Chris Armit at IDW2018: Democratising Data Publishing: A Global PerspectiveChris Armit at IDW2018: Democratising Data Publishing: A Global Perspective
Chris Armit at IDW2018: Democratising Data Publishing: A Global PerspectiveGigaScience, BGI Hong Kong
 
EMBL OA Week: FAIR or unfair? Principled publishing for more Open & Democrati...
EMBL OA Week: FAIR or unfair? Principled publishing for more Open & Democrati...EMBL OA Week: FAIR or unfair? Principled publishing for more Open & Democrati...
EMBL OA Week: FAIR or unfair? Principled publishing for more Open & Democrati...GigaScience, BGI Hong Kong
 
Reproducible method and benchmarking publishing for the data (and evidence) d...
Reproducible method and benchmarking publishing for the data (and evidence) d...Reproducible method and benchmarking publishing for the data (and evidence) d...
Reproducible method and benchmarking publishing for the data (and evidence) d...GigaScience, BGI Hong Kong
 
Mary Ann Tuli: What MODs can learn from Journals – a GigaDB curator’s perspec...
Mary Ann Tuli: What MODs can learn from Journals – a GigaDB curator’s perspec...Mary Ann Tuli: What MODs can learn from Journals – a GigaDB curator’s perspec...
Mary Ann Tuli: What MODs can learn from Journals – a GigaDB curator’s perspec...GigaScience, BGI Hong Kong
 

More from GigaScience, BGI Hong Kong (20)

IDW2022: A decades experiences in transparent and interactive publication of ...
IDW2022: A decades experiences in transparent and interactive publication of ...IDW2022: A decades experiences in transparent and interactive publication of ...
IDW2022: A decades experiences in transparent and interactive publication of ...
 
Scott Edmunds: Preparing a data paper for GigaByte
Scott Edmunds: Preparing a data paper for GigaByteScott Edmunds: Preparing a data paper for GigaByte
Scott Edmunds: Preparing a data paper for GigaByte
 
STM Week: Demonstrating bringing publications to life via an End-to-end XML p...
STM Week: Demonstrating bringing publications to life via an End-to-end XML p...STM Week: Demonstrating bringing publications to life via an End-to-end XML p...
STM Week: Demonstrating bringing publications to life via an End-to-end XML p...
 
Measuring richness. A RCT to quantify the benefits of metadata quality; Scott...
Measuring richness. A RCT to quantify the benefits of metadata quality; Scott...Measuring richness. A RCT to quantify the benefits of metadata quality; Scott...
Measuring richness. A RCT to quantify the benefits of metadata quality; Scott...
 
Scott Edmunds: A new publishing workflow for rapid dissemination of genomes u...
Scott Edmunds: A new publishing workflow for rapid dissemination of genomes u...Scott Edmunds: A new publishing workflow for rapid dissemination of genomes u...
Scott Edmunds: A new publishing workflow for rapid dissemination of genomes u...
 
Scott Edmunds: Quantifying how FAIR is Hong Kong: The Hong Kong Shareability ...
Scott Edmunds: Quantifying how FAIR is Hong Kong: The Hong Kong Shareability ...Scott Edmunds: Quantifying how FAIR is Hong Kong: The Hong Kong Shareability ...
Scott Edmunds: Quantifying how FAIR is Hong Kong: The Hong Kong Shareability ...
 
Scott Edmunds talk at IARC: How can we make science more trustworthy and FAIR...
Scott Edmunds talk at IARC: How can we make science more trustworthy and FAIR...Scott Edmunds talk at IARC: How can we make science more trustworthy and FAIR...
Scott Edmunds talk at IARC: How can we make science more trustworthy and FAIR...
 
PAGAsia19 - The Digitalization of Ruili Botanical Garden Project: Production...
PAGAsia19 - The Digitalization of Ruili Botanical Garden Project:  Production...PAGAsia19 - The Digitalization of Ruili Botanical Garden Project:  Production...
PAGAsia19 - The Digitalization of Ruili Botanical Garden Project: Production...
 
Democratising biodiversity and genomics research: open and citizen science to...
Democratising biodiversity and genomics research: open and citizen science to...Democratising biodiversity and genomics research: open and citizen science to...
Democratising biodiversity and genomics research: open and citizen science to...
 
Hong Kong Open Access & GigaScience: CCHK@10
Hong Kong Open Access & GigaScience: CCHK@10Hong Kong Open Access & GigaScience: CCHK@10
Hong Kong Open Access & GigaScience: CCHK@10
 
Ricardo Wurmus: Reproducible genomics analysis pipelines with GNU Guix
Ricardo Wurmus: Reproducible genomics analysis pipelines with GNU GuixRicardo Wurmus: Reproducible genomics analysis pipelines with GNU Guix
Ricardo Wurmus: Reproducible genomics analysis pipelines with GNU Guix
 
Anil Thanki at #ICG13: Aequatus: An open-source homology browser
Anil Thanki at #ICG13: Aequatus: An open-source homology browserAnil Thanki at #ICG13: Aequatus: An open-source homology browser
Anil Thanki at #ICG13: Aequatus: An open-source homology browser
 
Paul Pavlidis at #ICG13: Monitoring changes in the Gene Ontology and their im...
Paul Pavlidis at #ICG13: Monitoring changes in the Gene Ontology and their im...Paul Pavlidis at #ICG13: Monitoring changes in the Gene Ontology and their im...
Paul Pavlidis at #ICG13: Monitoring changes in the Gene Ontology and their im...
 
Venice Juanillas at #ICG13: Rice Galaxy: an open resource for plant science
Venice Juanillas at #ICG13: Rice Galaxy: an open resource for plant scienceVenice Juanillas at #ICG13: Rice Galaxy: an open resource for plant science
Venice Juanillas at #ICG13: Rice Galaxy: an open resource for plant science
 
Stefan Prost at #ICG13: Genome analyses show strong selection on coloration, ...
Stefan Prost at #ICG13: Genome analyses show strong selection on coloration, ...Stefan Prost at #ICG13: Genome analyses show strong selection on coloration, ...
Stefan Prost at #ICG13: Genome analyses show strong selection on coloration, ...
 
Lisa Johnson at #ICG13: Re-assembly, quality evaluation, and annotation of 67...
Lisa Johnson at #ICG13: Re-assembly, quality evaluation, and annotation of 67...Lisa Johnson at #ICG13: Re-assembly, quality evaluation, and annotation of 67...
Lisa Johnson at #ICG13: Re-assembly, quality evaluation, and annotation of 67...
 
Chris Armit at IDW2018: Democratising Data Publishing: A Global Perspective
Chris Armit at IDW2018: Democratising Data Publishing: A Global PerspectiveChris Armit at IDW2018: Democratising Data Publishing: A Global Perspective
Chris Armit at IDW2018: Democratising Data Publishing: A Global Perspective
 
EMBL OA Week: FAIR or unfair? Principled publishing for more Open & Democrati...
EMBL OA Week: FAIR or unfair? Principled publishing for more Open & Democrati...EMBL OA Week: FAIR or unfair? Principled publishing for more Open & Democrati...
EMBL OA Week: FAIR or unfair? Principled publishing for more Open & Democrati...
 
Reproducible method and benchmarking publishing for the data (and evidence) d...
Reproducible method and benchmarking publishing for the data (and evidence) d...Reproducible method and benchmarking publishing for the data (and evidence) d...
Reproducible method and benchmarking publishing for the data (and evidence) d...
 
Mary Ann Tuli: What MODs can learn from Journals – a GigaDB curator’s perspec...
Mary Ann Tuli: What MODs can learn from Journals – a GigaDB curator’s perspec...Mary Ann Tuli: What MODs can learn from Journals – a GigaDB curator’s perspec...
Mary Ann Tuli: What MODs can learn from Journals – a GigaDB curator’s perspec...
 

Recently uploaded

Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessPixlogix Infotech
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?Igalia
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 

Recently uploaded (20)

Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 

Mohamed Abouelhoda: Next Generation Workflow Systems on the Cloud: The Tavaxy System

  • 1. Tavaxy Integrating Taverna and Galaxy with Cloud Computing Support Mohamed Abouelhoda Nile University Egypt 1
  • 2. Workflows in Bioinformatics Finding Homologous Sequences Clustalw Get Protein Get Id & BLASTp Sequence sequences T-Coffee 2
  • 3. Implementing Scientific Workflows Method 1: Write Python/Perl/Shell script Advantages • Reliable and efficient • Comprehensive programming capabilities (conditionals, loops, etc..) Disadvantages • Requires programming skills , especially with HPC resources • Scripts are workflow-specific • Costly to create, debug and modify • Requires installing and managing tools 3
  • 4. Implementing Scientific Workflows Methods 2: Use of Workflow Systems • Kepler • Triana • WildFire • GenePattern • Pegsus • Taverna • Galaxy Most popular in • and many more Bioinformatics http://en.wikipedia.org/wiki/Bioinformatics_workflow_management_systems 4
  • 5. Composing a Workflow using Galaxy Workflow Environment Canvas for workflow editing Tool Library 5
  • 6. Composing a Workflow using Galaxy Create new workflow and give it a name 6
  • 7. Composing a Workflow using Galaxy Click to create input node 7
  • 8. Composing a Workflow using Galaxy Input node created, will read input from user account 8
  • 9. Composing a Workflow using Galaxy BLAST node is custom one added to the system Choose BLAST tool and click to create it 9
  • 10. Composing a Workflow using Galaxy 1. BLAST node created 2. Window opened to set BLAST parameters 10
  • 11. Composing a Workflow using Galaxy Connect output of input node as input to BLAST 11
  • 12. Composing a Workflow using Galaxy Complete composing workflow 12
  • 13. Composing a Workflow using Galaxy Start running the workflow 13
  • 14. Composing a Workflow using Galaxy Screen showing parts of the output files and provides links to them 14
  • 15. Benefits of Workflow Systems Accelerates computation • Intuitive abstract means for describing computational experiments • Requires no programming expertise • Easy to modify • Hide execution details: invocation and scheduling • Direct use of parallel architectures 15
  • 16. Benefits of Workflow Systems Formalize computation • Come with library of tools or access directories  Tool accessibility • Store experiment details (used tools, their parameters, and used data)  Reproducibility • Share workflows with analysis history including intermediate results  Transparency Mesirov. Science, 2010 B. Giardine, et al. Genome Research, 2005. 16
  • 17. Implementing Scientific Workflows Methods 2: Use of Workflow Systems • Kepler • Triana • WildFire • GenePattern • Pegsus • Taverna • Galaxy Most popular in • and many more Bioinformatics http://en.wikipedia.org/wiki/Bioinformatics_workflow_management_systems 17
  • 18. Composing a Workflow using Taverna Example: Homology Workflow Clustalw Read Protein Get Id & BLASTp Sequence sequences T-Coffee 18
  • 19. Composing a Workflow using Taverna Open the workflow editor, it is desktop application 19
  • 20. Composing a Workflow using Taverna • Choose a web-service and copy its URL if not in the service directory, • Here we will use BLAST at EBI 20
  • 21. Composing a Workflow using Taverna Paste service URL so that Taverna collects information about it (i.e., methods and parameters) 21
  • 22. Composing a Workflow using Taverna Service information retrieved 22
  • 23. Composing a Workflow using Taverna Create node for the BLAST service 23
  • 24. Composing a Workflow using Taverna Create node to receive input 24
  • 25. Composing a Workflow using Taverna Create nodes to prepare input and parameters in XML 25
  • 26. Composing a Workflow using Taverna Create node to receive Job ID from service provider 26
  • 27. Composing a Workflow using Taverna Input data and parameters are merged in XML file 27
  • 28. Composing a Workflow using Taverna Create sub-workflow to check status of the asynchronous service 28
  • 29. Composing a Workflow using Taverna Rename it Poll status 29
  • 30. Composing a Workflow using Taverna Put in place after job submission 30
  • 31. Composing a Workflow using Taverna Use get result method of the web-service to retrieve data when ready 31
  • 32. Composing a Workflow using Taverna Get result are created 32
  • 33. Composing a Workflow using Taverna Start putting Get_Result in place in the workflow 33
  • 34. Composing a Workflow using Taverna Specify output type (BLAST xml) 34
  • 35. Composing a Workflow using Taverna Short cut to sub-workflow Start retrieval when check status is done 35
  • 36. Composing a Workflow using Taverna Clustalw Read Protein Get Id & BLASTp Sequence sequences T-Coffee The rest of the workflow 36
  • 37. Taverna vs. Galaxy Taverna Galaxy Purpose General Bioinformatics Interface Desktop Web-interface Usability More difficult Easy Engine Control flow Data flow Control constructs Yes No Jobs Web-service Local invocation Programs Service directory Library of tools Use of Local HPC Threads only Threads/Cluster Taverna Galaxy D. Hull, et al. Nucleic Acids Research, 2006. B. Giardine, et al. Genome Research, 2005. T. Oinn, et al. Bioinformatics, 2004. 37
  • 38. Communities Taverna MyExperiment Galaxy Pages PIC What if we can make use of both repositories and what if we can have advantages of both systems?? 38
  • 39. Integrating Taverna and Galaxy Workflows Taverna Galaxy 39
  • 40. Integrating Taverna and Galaxy Workflows Tavaxy 40
  • 41. Tavaxy • A standalone workflow system based on workflow patterns • Integrates both Taverna and Galaxy workflows and improve their performance • Easy to use and includes a large library of software tools • Exploits high performance computing resources (parallel infrastructure) with all details being hidden • Runs on local or cloud computing infrastructure • Optimized to handle large datasets 41
  • 43. Tavaxy Nodes (Processing Units) Input node Pattern • If-else conditional nodes • Iteration Sub- Tool node • Data Merge • Data Select workflow nodes New in Tavaxy and not in Galaxy New in Tavaxy and not in Taverna • Pattern nodes, Sub-workflow nodes • Easier to use interface • Remote execution • Direct use of local HPC 43
  • 44. Tools in Tavaxy 220 Tools organized in the following categories • EMBOSS: Package of sequence analysis tools • SAMtools: Package of scirpts and programs to handle NGS data • Fastx: Package for manipulating fasta files • Galaxy tools: A set of data utilities and tools developed by the galaxy team • Taverna tools: A set of tools based on web-services based on Taverna. collection This section includes as well a set of data manipulation utilities developed by the taverna team. • Tavaxy tools: This section includes extra utilities and tools for sequence analysis and genome comparison developed/added by the Tavaxy team. • Cloud utilities: A set of data manipulation and configuration of computing infrastructure on the Amazon cloud. 44
  • 45. NGS Mapping Tools and Databases in Tavaxy Tools • MAQ • BFAST • BWA • Bowtie • … Databases • Reference human genome hg36 • Indexed genome for MAQ, Bowtie • NCBI nucleotide/protein DBs 45
  • 46. Data Patterns of Tavaxy To facilitate execution of data-intensive tasks in cloud computing cluster 46
  • 48. The Tavaxy Project Single environment for integrating Galaxy and Taverna workflows: Taverna or Galaxy workflows can be o imported ,  open and re-draw in Tavaxy environment o re-designed,  delete or re-order workflow nodes o enhanced,  add new nodes and sub-workflows o optimized,  Tavaxy concstructs can be used to exploit parallelization  remote Taverna calls can be replaced with local tools o and executed in Tavaxy  on local (HPC) infrastructure  on cloud 48
  • 49. Importing Taverna Workflow Taverna Workflow 49
  • 53. Idea of Integration Workflow Patterns Bag of techniques - Taverna is used as a secondary engine to execute Taverna sub-workflows - The use of patterns as special nodes in the data-flow oriented engine - Simulating control constructs over the data flow digraph representing the workflow - Use of sub-workflow to enable iteration pattern 54
  • 55. Three Modes for Supporting Cloud • Whole system instantiation • Delegating execution of a sub-workflow to the cloud • Delegating execution of one tool to the cloud 56
  • 56. User case scenario Whole System Instantiation Cloud Platform 5 (Instance Disks) EBS 2 1 S3 Storage 4 3 1 Create master node 2 Master node creates worker nodes 3 Instance disks are mounted Include Tavaxy tools 4 Connection to S3 is established and DBs 5 Submit data, command line (and executable) and execute 57
  • 57. User case scenario Sub-workflow and tool delegation Cloud Platform 5 (Instance Disks) EBS 2 1 S3 Storage 4 3 1 Create master node 2 Master node creates worker nodes 3 Instance disks are mounted 4 Connection to S3 is established 5 Submit data, command line (and executable) and execute 58
  • 61. Exploiting Cloud Computing Two steps: 1- Enter AWS credentials 2- Define your cluster 62
  • 62. Supporting Task Parallelism I. Parallelism due to branching • Branching in the DAG implies independent execution • Independent jobs can run in parallel
  • 63. Supporting Data Intensive Tasks Data parallelism • For an input as a list, node A can process the list items in parallel Tavaxy Data Patterns [x1,.., xn] [x1,.., xn] [y1,.., yn] A  a =[A(x1),..,A(xn)] A B   b =[B(A(x1)),..,B(A(xn))] a =[A((x1, y1)),..,A((xn, yn))] Single List Multiple List
  • 65. Personalized Medicine Workflow Individual Whole Genome Mapping and Variant Annotation Tonellato-Wall 66
  • 66. Workflow Sktech Variant Alignment analysis 67
  • 67. Personalized Medicine Workflow on Tavaxy Read Mapping SNP/Disease Database Queries Variant Calling Formatting 68
  • 68. Exploiting Parallelization and Locality of Data Data base are hosted locally Fork Pattern Use of list data pattern Parallel Task Reads are defined as Execution list which implies (Parallel DBs parallel processing of Queries) blocks of reads 69
  • 69. Read-Mapping with Crossbow • Crossbow (based on EMR/Hadoop) is used to map set of human reads to human genome. • With this cluster, we mounted EBS volumes including the reference human genome files (each including one chromosome) • Read Datasets: – illumnia reads of around 13 Gbp (47 GB) from from the African genome, – the human genome version hg18, build 36 70
  • 72. NGS-based Metagenomics Study Galaxy Workflow Optimized Tavaxy Workflow 73
  • 73. NGS-based Metagenomics Study Mega-blast run on cloud 74
  • 74. NGS-based Metagenomics Study • The MegaBLAST program is used to annotate a set of sequences coming from a metagenomics experiment. • With this cluster, we mounted EBS volumes including the NCBI NT database • Datasets: Windshield dataset composed of two collections of 454 FLX reads: – Trip A: 138575 (25.3 Mbp) Mb104283 (18.8 Mbp) – Trip B: ) 151000 (30.2 Mbp) 79460 (12.7 Mbp) 75
  • 75. Bioinformatics Experiment (2) • We used the windshield dataset which is composed of two collections of 454 FLX reads. • These reads came from the DNA of the organic matter on the windshield of a moving vehicle that visited two geographic locations (trips A and B). • For each trip A or B, there are two subsets for the left and right part of the windshield. • The number of reads are 66575 (12.3 Mbp) 71000 (14.2 Mbp) 104283 (18.8 Mbp) 79460 (12.7 Mbp) for trips A Left, B Left, A Right, and B Right, respectively. • The queries were queued in parallel, fetch strategy was used 76
  • 76. Conclusions and Future Work • Use of workflow systems provides flexibility and efficiency • High performance and cloud computing resources are exploited with technical details being hidden • Future work include – Further performance optimization for execution on local and cloud infrastructures. – Supporting multiple cloud providers – Handling larger data sizes in multi-use environment 77