SlideShare a Scribd company logo
1 of 48
The Third International Conference on Bioinformatics,
                        Biocomputational Systems and Biotechnologies (BIOTECHNO 2011)


Bio-UnaGrid: Easing Bioinformatics Workflow Execution Using LONI
               Pipeline and a Virtual Desktop Grid




Mario Villamizar, Harold Castro,         Silvia Restrepo, Luis Rodriguez
         David Mendez                     Department of Biological Sciences
Departments of Systems and Computer          Universidad de los Andes
            Engineering                       Bogotá D.C., Colombia
      Universidad de los Andes
         Bogotá, Colombia
The Third International Conference on Bioinformatics,
     Biocomputational Systems and Biotechnologies (BIOTECHNO 2011)


             Agenda

The Problem Context
Related Work
Bio‐UnaGrid: Architecture
Bio‐UnaGrid: Implementation
Bio‐UnaGrid: Performance Tests
Conclusions and Future Work
The Third International Conference on Bioinformatics,
     Biocomputational Systems and Biotechnologies (BIOTECHNO 2011)


             Agenda

The Problem Context
Related Work
Bio‐UnaGrid: Architecture
Bio‐UnaGrid: Implementation
Bio‐UnaGrid: Performance Tests
Conclusions and Future Work
The Third International Conference on Bioinformatics,
                          Biocomputational Systems and Biotechnologies (BIOTECHNO 2011)


                         Around the problem
Bioinformatics researchers use applications that require large computational
capabilities.

Cluster and grid computing are common solutions to support these
capabilities.




                                                 Cluster computing allows to
                                                  aggregate homogeneous
                                                computational resources for an
                                                  specific research group or
                                                         organization
The Third International Conference on Bioinformatics,
                          Biocomputational Systems and Biotechnologies (BIOTECHNO 2011)


                         Around the problem
Grid computing allows to aggregate heterogeneous computational resources
of different organizations to support larger computational capabilities in e-
Science projects.




                                                     In grid computing a set of
                                                   organization, with a common
                                                       objective (VO), share
                                                     computational resources.
The Third International Conference on Bioinformatics,
                         Biocomputational Systems and Biotechnologies (BIOTECHNO 2011)


                        Around the problem
High performance computing (HPC) infrastructures, such as cluster or grid
computing, provide results in shorter time, however when bioinformatics
researchers want to execute applications, they face several consuming time
problems.
The Third International Conference on Bioinformatics,
                           Biocomputational Systems and Biotechnologies (BIOTECHNO 2011)


                          Around the problem
1. There are many application suites to run different analysis, so researchers
must learn command line syntax (options, input and output files, distributed
execution environment, etc.) for hundreds of applications.
The Third International Conference on Bioinformatics,
                      Biocomputational Systems and Biotechnologies (BIOTECHNO 2011)


                     Around the problem
2. Researchers must learn commands to manage and use distributed
computing infrastructures.
The Third International Conference on Bioinformatics,
                         Biocomputational Systems and Biotechnologies (BIOTECHNO 2011)


                        Around the problem
3. Applications require specific and complex configurations to operate in
distributed environments.
The Third International Conference on Bioinformatics,
                           Biocomputational Systems and Biotechnologies (BIOTECHNO 2011)


                          Around the problem
4. Researchers frequently execute a set of applications in a sequential and/or
coordinated manner, called pipelines or workflows.




In many cases the workflow execution is manually done by the researchers
  using command line. Execute an application, wait …., execute the next
         application, wait …. This process may take a lot of time.
The Third International Conference on Bioinformatics,
                         Biocomputational Systems and Biotechnologies (BIOTECHNO 2011)


                        Around the problem
5. Very often researchers want to execute applications that require larger
processing capabilities than those provided by dedicated cluster computing
infrastructures, so they must wait weeks or months before getting their
results.
The Third International Conference on Bioinformatics,
                         Biocomputational Systems and Biotechnologies (BIOTECHNO 2011)


                             The problem
We were having the following problem:

The manual and command-based application and workflow execution require
bioinformatics researchers to spend most of their time in: configuring
application parameters, managing HPC infrastructures, managing scientific
data, and linking partial results from one application to another.

When a new researcher or student want to use applications he/she spends a
lot of time learning all those things.
The Third International Conference on Bioinformatics,
     Biocomputational Systems and Biotechnologies (BIOTECHNO 2011)


             Agenda

The Problem Context
Related Work
Bio‐UnaGrid: Architecture
Bio‐UnaGrid: Implementation
Bio‐UnaGrid: Performance Tests
Conclusions and Future Work
The Third International Conference on Bioinformatics,
                         Biocomputational Systems and Biotechnologies (BIOTECHNO 2011)


                             Related work
Several projects have been developed to facilitate the automatic workflow
execution in bioinformatics and other scientific fields.
The Third International Conference on Bioinformatics,
                           Biocomputational Systems and Biotechnologies (BIOTECHNO 2011)


                               Related work
Most of workflow tools have limitations for bioinformatics applications such
as:

   They require applications to be recompiled or modified.

It is too complex to modify or recompile every bioinformatics application.

   They support internal data sources with specific data structures.

Bioinformatics data have different formats and they can be internal or
external.

   They operate with specific platforms, and cluster or grid middlewares.

Institutions regularly have different cluster and grid configurations. In our
case we are part of different Grid projects such as GISELA, EGEE, and
OSG.
The Third International Conference on Bioinformatics,
                          Biocomputational Systems and Biotechnologies (BIOTECHNO 2011)


                              Related work
Most of workflow tools have limitations for bioinformatics applications such
as:


   They regularly require researchers to use complex commands involving
   consuming-time tasks.

Bioinformatics researchers should use easy GUIs to execute workflows.


   They operate on dedicated cluster and grid infrastructures, however in a
   university or enterprise campus there are tens or hundreds of desktop
   computers that are under-utilized.

Opportunistic or DGVCSs can provide large processing capabilities, at low
cost.
The Third International Conference on Bioinformatics,
                             Biocomputational Systems and Biotechnologies (BIOTECHNO 2011)


                      How to solve these problems

                                     Define an infrastructure that allows that
                                     existing bioinformatics applications can be
                                     easily incorporated without modifications

          LONI Pipeline
                                     Workflows should be created through
                                     GUIs and executed on different HPC
                                     infrastructures.

                                     The infrastructure should also allow the
                                     use of different data sources (internal and
                                     external), and the incorporation of MPI
                                     applications.
           UnaGrid
                                     For getting results faster the infrastructure
   Bio-UnaGrid integrates the        should operate on dedicated computing
 capabilities of LONI Pipeline and   infrastructures and on DGVCS.
UnaGrid to provide these features.
The Third International Conference on Bioinformatics,
Biocomputational Systems and Biotechnologies (BIOTECHNO 2011)


   LONI Pipeline
        LONI Pipeline is a free framework used to
        execute neuroscience workflows.


        From a computational perspective, LONI differs
        from other workflow tools in several features:

        It does not require external application being
        recompiled.
        It supports external data sources.
        It is hardware platform independent.
        It can be installed on different cluster or grid
        middlewares using the Pipeline plugins.


        From a research user perspective, LONI was
        designed having in mind features like usability,
        portability, intuitiveness, transparency and
        abstraction of cluster or grid infrastructures.
The Third International Conference on Bioinformatics,
   Biocomputational Systems and Biotechnologies (BIOTECHNO 2011)


LONI Pipeline Architecture
The Third International Conference on Bioinformatics,
                         Biocomputational Systems and Biotechnologies (BIOTECHNO 2011)


                                UnaGrid




   Dedicated infrastructures are an unviable option in organizations or
countries with low financial resources. However, these organizations have
many computer labs which are not fully utilized by employees or university
students.
The Third International Conference on Bioinformatics,
                          Biocomputational Systems and Biotechnologies (BIOTECHNO 2011)


              UnaGrid: Opportunistic virtual clusters
                                                                                           X
                                                                                         Cores

                                                                          Linux




                                                                         Processing
                                                                       Virtual Machine




                                                                   Physical Machine of a
                                                                     Computer Room

                                                             b. When there is not an End User
                                                                using the physical machine


    A virtual cluster is a set of commodity and interconnected desktops
executing virtual machines (VMs) in background and low-priority through
virtualization technologies, these VMs take advantage of the available idle
processing capabilities in computer labs on an university campus.
The Third International Conference on Bioinformatics,
                          Biocomputational Systems and Biotechnologies (BIOTECHNO 2011)


             UnaGrid: Opportunistic virtual clusters




    A virtual machine is executed on each computer of a lab and it supports
the role of a cluster slave and all of these virtual machines on execution
make up a virtual processing cluster. A dedicated node is necessary for a
virtual cluster and it supports the role of the cluster master.
The Third International Conference on Bioinformatics,
                          Biocomputational Systems and Biotechnologies (BIOTECHNO 2011)


             UnaGrid: Opportunistic virtual clusters




    A virtual machine is executed on each computer of a lab and it supports
the role of a cluster slave and all of these virtual machines on execution
make up a virtual processing cluster. A dedicated node is necessary for a
virtual cluster and it supports the role of the cluster master.
The Third International Conference on Bioinformatics,
                          Biocomputational Systems and Biotechnologies (BIOTECHNO 2011)


             UnaGrid: Opportunistic virtual clusters




    A virtual machine is executed on each computer of a lab and it supports
the role of a cluster slave and all of these virtual machines on execution
make up a virtual processing cluster. A dedicated node is necessary for a
virtual cluster and it supports the role of the cluster master.
The Third International Conference on Bioinformatics,
                          Biocomputational Systems and Biotechnologies (BIOTECHNO 2011)


             UnaGrid: Opportunistic virtual clusters




   Each research group can define its own virtual clusters with custom
application environments (middlewares, applications, configurations, etc.)

   A grid solution (several virtual clusters) can be deployed for supporting
the processing capabilities required by some applications.
The Third International Conference on Bioinformatics,
     Biocomputational Systems and Biotechnologies (BIOTECHNO 2011)


             Agenda

The Problem Context
Related Work
Bio‐UnaGrid: Architecture
Bio‐UnaGrid: Implementation
Bio‐UnaGrid: Performance Tests
Conclusions and Future Work
The Third International Conference on Bioinformatics,
                      Biocomputational Systems and Biotechnologies (BIOTECHNO 2011)


                   Bio-UnaGrid Architecture




The LONI Pipeline Client is the main entry point of bioinformatics
researchers to the Bio-UnaGrid infrastructure.
The Third International Conference on Bioinformatics,
                       Biocomputational Systems and Biotechnologies (BIOTECHNO 2011)


                    Bio-UnaGrid Operation
Each researcher receives a username and a password.

He/she can proceed to create the workflows, using the bioinformatics
LONI Module Library, drag and drop, and other tools, provided by the
LONI Pipeline Client.

For each LONI Module (a bioinformatics application), the researcher
must define the input and output parameters, and how each LONI
Module is connected with other LONI Modules of the workflow.

Once a workflow has been created, researchers can execute it, sending it
to the LONI Pipeline Server.

Researcher can monitor and visualize the status of the workflow, and
download partial results, through GUIs of the LONI Pipeline Client.

  The Grid/Cluster infrastructure is transparent for researchers.
The Third International Conference on Bioinformatics,
           Biocomputational Systems and Biotechnologies (BIOTECHNO 2011)


Bio-UnaGrid: Bioinformatics Applications




      For executing workflows, bioinformatics researchers do
      not have to worry about applications’ commands or
      complex HPC infrastructures.
The Third International Conference on Bioinformatics,
                         Biocomputational Systems and Biotechnologies (BIOTECHNO 2011)


  Bio-UnaGrid: Incorporating Bioinformatics Applications
The process to incorporate existing and new bioinformatics application
suites, like NCBI BLAST, into Bio-UnaGrid infrastructure are summarized in
five steps.

1) Installation and configuration of the application suite on a Shared
   Storage System.

2) Creation of a LONI Pipeline Module using the LONI Pipeline Client for
   each executable of the suite.

3) Individual tests for each LONI Pipeline Module of the suite from the
   LONI Pipeline Client.

4) Storage of the LONI Pipeline Module in the LONI Pipeline Server.

5) Using the modules from workflows               created     by    bioinformatics
   researchers in the LONI Pipeline Client.
The Third International Conference on Bioinformatics,
                 Biocomputational Systems and Biotechnologies (BIOTECHNO 2011)


Bio-UnaGrid: Incorporating Bioinformatics Applications
The Third International Conference on Bioinformatics,
                          Biocomputational Systems and Biotechnologies (BIOTECHNO 2011)


        Bio-UnaGrid: MPI applications on Bio-UnaGrid




At the time of this development, LONI did not support MPI applications.

To allow the execution of MPI applications, we developed a wrapper, called
mpiJobManager, which uses the Distributed Resource Management
Application API (DRMAA).

To execute an MPI Application, such as mpiBLAST, a researcher uses the
MPI LONI Module of the MPI application to specify the number of MPI
processes to be used.

The LONI Pipeline Server receives the MPI job and proceeds to execute it
on the DRM slaves.
The Third International Conference on Bioinformatics,
     Biocomputational Systems and Biotechnologies (BIOTECHNO 2011)


             Agenda

The Problem Context
Related Work
Bio‐UnaGrid: Architecture
Bio‐UnaGrid: Implementation
Bio‐UnaGrid: Performance Tests
Conclusions and Future Work
The Third International Conference on Bioinformatics,
                                    Biocomputational Systems and Biotechnologies (BIOTECHNO 2011)


                     Bio-UnaGrid: Implementation
                                                      Bioinformatics Infrastructure
                                                   6 Dedicated OGE Slaves – 24 Cores



                                                              1 Gbps              S&C Infrastructure
           LONI Client              GUMA Client
                                                           10
                                                          Gbps                    NFS-NAS Server
                                              10 Gbps
                                                                                  LONI Pipeline Server
                                               10                      1          OGE Master
             Bioinformatics                   Gbps                    Gbps
              Researcher                                                          GUMA Portal
                                              10 Gbps        10 Gbps
                                       155
                           LAN/        Mbps                10
                         Internet                         Gbps                    User Database
                                                             1 Gbps

                                                   Linux CVC with 21 OGE Slaves – 21 Cores


                                                  VM    VM       VM          VM       ...    VM



                                                                Windows 7 Lab
                                                           UnaGrid Infrastructure

Bio-UnaGrid was implemented to support several genomic projects related
to coffee, potato and cassava, focused in increasing their production.
The Third International Conference on Bioinformatics,
                      Biocomputational Systems and Biotechnologies (BIOTECHNO 2011)


                Bio-UnaGrid: Implementation

The LONI Pipeline Server version 5.0.2 and the DRM Master of OGE
version 6.2 update 5.

6 dedicated servers - Intel Xeon X5560 Quad Core processor of 2.8 GHz
and 4 GB of RAM.

A computer lab (Windows 7) - Intel i5 processor of 3.46 GHz and 8 GB of
RAM. 21 VMs (UnaGrid) was deployed and configured with a CPU core
and 4 GB of RAM.

The Database User was installed on a dedicated server using MySQL.

We use an NFSv3-NAS.

All nodes are interconnected through 1 GbE and 10GbE links.
The Third International Conference on Bioinformatics,
                         Biocomputational Systems and Biotechnologies (BIOTECHNO 2011)


                  Bio-UnaGrid: Implementation
Fault tolerance mechanisms were configured for BoT applications using the
OGE capabilities.

Four bioinformatics application suites have been installed on the Bio-
UnaGrid implementation and 21 LONI Modules were created.

  NCBI BLAST version 2.2.20: BLASTn, BLASTp, BLASTx, tBLASTn,
  tBLASTx and MEGABLAST.

  HMMER version 2.3.2: Build, Search and Calibrate

  InterProScan version 4.6: BlastProdom, Coils, Gene3D, PIR, Panther,
  Pfam, SEG, SMART, SuperFamily, TIGRfam and fPrintScan

  mpiBLAST version 1.6.0: mpiBLAST

Because mpiBLAST requires the use of an MPI implementation, the MPICH-
2 implementation version 1.2.7 was installed.
The Third International Conference on Bioinformatics,
          Biocomputational Systems and Biotechnologies (BIOTECHNO 2011)


Bio-UnaGrid: A Simple Workflow Example
The Third International Conference on Bioinformatics,
                Biocomputational Systems and Biotechnologies (BIOTECHNO 2011)


Bio-UnaGrid: An Example of A More Complex Workflow
The Third International Conference on Bioinformatics,
     Biocomputational Systems and Biotechnologies (BIOTECHNO 2011)


             Agenda

The Problem Context
Related Work
Bio‐UnaGrid: Architecture
Bio‐UnaGrid: Implementation
Bio‐UnaGrid: Performance Tests
Conclusions and Future Work
The Third International Conference on Bioinformatics,
                         Biocomputational Systems and Biotechnologies (BIOTECHNO 2011)


                   Performance Tests - BLASTn

We executed performance tests in both environments, in the dedicated
cluster and in the opportunistic UnaGrid CVC.

We selected the high popular BLAST algorithm to execute these tests.

For testing the performance of Bio-UnaGrid for executing BoT applications
we used the NCBI BLASTn application, varying the number of CPU cores
and the size of the sequence set.

We executed BLASTn searches between nucleotide sets of 480 (487 KB)
and 960 (954 KB) sequences, and the nt database (28 GB).

The searches were executed using 5, 10, 15 and 20, BLASTn independent
processes (BoT BLASTn), using query segmentation.

Each search was executed using a single CPU core.
The Third International Conference on Bioinformatics,
                         Biocomputational Systems and Biotechnologies (BIOTECHNO 2011)


                   Performance Tests - BLASTn




  Execution time of BoT BLASTn searches between sets of 480 and 960
                     sequences, and the nt database.

480 seq. - Dedicated Cluster (20 Cores) 10.9x – UnaGrid (20 Cores) 13.1x
960 seq. - Dedicated Cluster (15 Cores) 6.1x – UnaGrid (20 Cores) 10x
The Third International Conference on Bioinformatics,
                         Biocomputational Systems and Biotechnologies (BIOTECHNO 2011)


                 Performance Tests - mpiBLAST

For testing the execution of MPI applications on both environments, we
executed the same tests using mpiBLAST.


In these tests we executed MPI processes coordinated on 5, 10, 15 and 20
CPU cores.


In each test, the additional process called mpiJobManager was executed in
another CPU core.


We used database segmentation to divide the database in 5, 10, 15 and
20 partitions using the mpiformatdb installed with mpiBLAST.
The Third International Conference on Bioinformatics,
                         Biocomputational Systems and Biotechnologies (BIOTECHNO 2011)


                 Performance Tests - mpiBLAST




   Execution time of mpiBLAST searches between sets of 480 and 960
                     sequences, and the nt database.

480 seq. - Dedicated Cluster (15 Cores) 2.3x – UnaGrid (20 Cores) 4.7x
960 seq. - Dedicated Cluster (20 Cores) 3.0x – UnaGrid (20 Cores) 3.7x
The Third International Conference on Bioinformatics,
     Biocomputational Systems and Biotechnologies (BIOTECHNO 2011)


             Agenda

The Problem Context
Related Work
Bio‐UnaGrid: Architecture
Bio‐UnaGrid: Implementation
Bio‐UnaGrid: Performance Tests
Conclusions and Future Work
The Third International Conference on Bioinformatics,
Biocomputational Systems and Biotechnologies (BIOTECHNO 2011)


    Conclusions
Using the idle processing capabilities available in
computer labs using Windows, Linux or Mac
desktops,    Bio-UnaGrid    provides     additional
processing capabilities to those provided by
dedicated computational clusters.

Bio-UnaGrid allows bioinformatics researchers to
easily define workflows using GUIs and execute
them on cluster and grid computing infrastructures
with a simple click.

Bio-UnaGrid is highly extensible as existing
bioinformatics applications can be incorporated
without modifications.

With Bio-UnaGrid, researchers focus on analysis
of application results, not on technical issues of
distributed computing infrastructures.
The Third International Conference on Bioinformatics,
Biocomputational Systems and Biotechnologies (BIOTECHNO 2011)


    Future Work
   We will incorporate more bioinformatics
   applications suites into Bio-UnaGrid.

   We will execute new performance tests with
   other applications such as HMMER, MPI-
   HMMER, ClustalW and ClustalW-MPI.

   We will execute new performance tests in a
   larger grid deployment involving hundreds of
   heterogeneous desktop computers in different
   administrative domains.

   We will implement a shared storage system
   more scalable than NFSv3.

   We also plan to implement a fault tolerance
   mechanism for MPI applications.
The Third International Conference on Bioinformatics,
       Biocomputational Systems and Biotechnologies (BIOTECHNO 2011)


Thanks for your attention !!!
The Third International Conference on Bioinformatics,
                      Biocomputational Systems and Biotechnologies (BIOTECHNO 2011)


                     Contact Information



mj.villamizar24@uniandes.edu.co                   mjvc007@hotmail.com




                     http://twitter.com/mjvc007




           http://linkedin.com/in/mariojosevillamizarcano


                Mario José Villamizar Cano

More Related Content

What's hot

Efficiency and Effectiveness: Shared services to support STEM subjects
Efficiency and Effectiveness: Shared services to support STEM subjectsEfficiency and Effectiveness: Shared services to support STEM subjects
Efficiency and Effectiveness: Shared services to support STEM subjectsJisc
 
Information Technology in Industry(ITII) - November Issue 2018
Information Technology in Industry(ITII) - November Issue 2018Information Technology in Industry(ITII) - November Issue 2018
Information Technology in Industry(ITII) - November Issue 2018ITIIIndustries
 
Efficient Point Cloud Pre-processing using The Point Cloud Library
Efficient Point Cloud Pre-processing using The Point Cloud LibraryEfficient Point Cloud Pre-processing using The Point Cloud Library
Efficient Point Cloud Pre-processing using The Point Cloud LibraryCSCJournals
 
TexGen: Open Source Software for Modelling of Textile Composites
TexGen: Open Source Software for Modelling of Textile CompositesTexGen: Open Source Software for Modelling of Textile Composites
TexGen: Open Source Software for Modelling of Textile CompositesJisc
 
[Thesis] Tangible Collaboration applied in Space Systems Concurrent Engineeri...
[Thesis] Tangible Collaboration applied in Space Systems Concurrent Engineeri...[Thesis] Tangible Collaboration applied in Space Systems Concurrent Engineeri...
[Thesis] Tangible Collaboration applied in Space Systems Concurrent Engineeri...Christopher Cerqueira
 
Smalltalk-80 : hardware and software
Smalltalk-80 : hardware and softwareSmalltalk-80 : hardware and software
Smalltalk-80 : hardware and softwareESUG
 
Design of an IT Capstone Subject - Cloud Robotics
Design of an IT Capstone Subject - Cloud RoboticsDesign of an IT Capstone Subject - Cloud Robotics
Design of an IT Capstone Subject - Cloud RoboticsITIIIndustries
 
A High Throughput Bioinformatics Distributed Computing Platform
A High Throughput Bioinformatics Distributed Computing PlatformA High Throughput Bioinformatics Distributed Computing Platform
A High Throughput Bioinformatics Distributed Computing PlatformHabibur Rahman
 
Serguei “SB” Beloussov - Future Of Computing at SIT Insights in Technology 2019
Serguei “SB” Beloussov - Future Of Computing at SIT Insights in Technology 2019Serguei “SB” Beloussov - Future Of Computing at SIT Insights in Technology 2019
Serguei “SB” Beloussov - Future Of Computing at SIT Insights in Technology 2019Schaffhausen Institute of Technology
 
On Modeling and Testing When Unpredictability Becomes the Pattern (April 2nd,...
On Modeling and Testing When Unpredictability Becomes the Pattern (April 2nd,...On Modeling and Testing When Unpredictability Becomes the Pattern (April 2nd,...
On Modeling and Testing When Unpredictability Becomes the Pattern (April 2nd,...Benoit Combemale
 
IRJET- A Real Time Yolo Human Detection in Flood Affected Areas based on Vide...
IRJET- A Real Time Yolo Human Detection in Flood Affected Areas based on Vide...IRJET- A Real Time Yolo Human Detection in Flood Affected Areas based on Vide...
IRJET- A Real Time Yolo Human Detection in Flood Affected Areas based on Vide...IRJET Journal
 
Weeds detection efficiency through different convolutional neural networks te...
Weeds detection efficiency through different convolutional neural networks te...Weeds detection efficiency through different convolutional neural networks te...
Weeds detection efficiency through different convolutional neural networks te...IJECEIAES
 
Dame ivoa interop_brescia_naples2011
Dame ivoa interop_brescia_naples2011Dame ivoa interop_brescia_naples2011
Dame ivoa interop_brescia_naples2011INAF-OAC
 
Large scale gpu cluster for ai
Large scale gpu cluster for aiLarge scale gpu cluster for ai
Large scale gpu cluster for aiKyunam Cho
 
Next Technology Wave
Next Technology WaveNext Technology Wave
Next Technology WaveFalascoj
 
Asymmetric image encryption scheme based on Massey Omura scheme
Asymmetric image encryption scheme based on Massey Omura scheme Asymmetric image encryption scheme based on Massey Omura scheme
Asymmetric image encryption scheme based on Massey Omura scheme IJECEIAES
 

What's hot (20)

Efficiency and Effectiveness: Shared services to support STEM subjects
Efficiency and Effectiveness: Shared services to support STEM subjectsEfficiency and Effectiveness: Shared services to support STEM subjects
Efficiency and Effectiveness: Shared services to support STEM subjects
 
Information Technology in Industry(ITII) - November Issue 2018
Information Technology in Industry(ITII) - November Issue 2018Information Technology in Industry(ITII) - November Issue 2018
Information Technology in Industry(ITII) - November Issue 2018
 
Efficient Point Cloud Pre-processing using The Point Cloud Library
Efficient Point Cloud Pre-processing using The Point Cloud LibraryEfficient Point Cloud Pre-processing using The Point Cloud Library
Efficient Point Cloud Pre-processing using The Point Cloud Library
 
TexGen: Open Source Software for Modelling of Textile Composites
TexGen: Open Source Software for Modelling of Textile CompositesTexGen: Open Source Software for Modelling of Textile Composites
TexGen: Open Source Software for Modelling of Textile Composites
 
[Thesis] Tangible Collaboration applied in Space Systems Concurrent Engineeri...
[Thesis] Tangible Collaboration applied in Space Systems Concurrent Engineeri...[Thesis] Tangible Collaboration applied in Space Systems Concurrent Engineeri...
[Thesis] Tangible Collaboration applied in Space Systems Concurrent Engineeri...
 
Smalltalk-80 : hardware and software
Smalltalk-80 : hardware and softwareSmalltalk-80 : hardware and software
Smalltalk-80 : hardware and software
 
Design of an IT Capstone Subject - Cloud Robotics
Design of an IT Capstone Subject - Cloud RoboticsDesign of an IT Capstone Subject - Cloud Robotics
Design of an IT Capstone Subject - Cloud Robotics
 
50120140502008
5012014050200850120140502008
50120140502008
 
Availability in Cloud Computing
Availability in Cloud ComputingAvailability in Cloud Computing
Availability in Cloud Computing
 
A High Throughput Bioinformatics Distributed Computing Platform
A High Throughput Bioinformatics Distributed Computing PlatformA High Throughput Bioinformatics Distributed Computing Platform
A High Throughput Bioinformatics Distributed Computing Platform
 
Serguei “SB” Beloussov - Future Of Computing at SIT Insights in Technology 2019
Serguei “SB” Beloussov - Future Of Computing at SIT Insights in Technology 2019Serguei “SB” Beloussov - Future Of Computing at SIT Insights in Technology 2019
Serguei “SB” Beloussov - Future Of Computing at SIT Insights in Technology 2019
 
On Modeling and Testing When Unpredictability Becomes the Pattern (April 2nd,...
On Modeling and Testing When Unpredictability Becomes the Pattern (April 2nd,...On Modeling and Testing When Unpredictability Becomes the Pattern (April 2nd,...
On Modeling and Testing When Unpredictability Becomes the Pattern (April 2nd,...
 
IRJET- A Real Time Yolo Human Detection in Flood Affected Areas based on Vide...
IRJET- A Real Time Yolo Human Detection in Flood Affected Areas based on Vide...IRJET- A Real Time Yolo Human Detection in Flood Affected Areas based on Vide...
IRJET- A Real Time Yolo Human Detection in Flood Affected Areas based on Vide...
 
Weeds detection efficiency through different convolutional neural networks te...
Weeds detection efficiency through different convolutional neural networks te...Weeds detection efficiency through different convolutional neural networks te...
Weeds detection efficiency through different convolutional neural networks te...
 
Dame ivoa interop_brescia_naples2011
Dame ivoa interop_brescia_naples2011Dame ivoa interop_brescia_naples2011
Dame ivoa interop_brescia_naples2011
 
ISVLSI 2012
ISVLSI 2012ISVLSI 2012
ISVLSI 2012
 
Large scale gpu cluster for ai
Large scale gpu cluster for aiLarge scale gpu cluster for ai
Large scale gpu cluster for ai
 
Next Technology Wave
Next Technology WaveNext Technology Wave
Next Technology Wave
 
Asymmetric image encryption scheme based on Massey Omura scheme
Asymmetric image encryption scheme based on Massey Omura scheme Asymmetric image encryption scheme based on Massey Omura scheme
Asymmetric image encryption scheme based on Massey Omura scheme
 
GO-Infosheet
GO-Infosheet  GO-Infosheet
GO-Infosheet
 

Viewers also liked

Energy-aware VM Allocation on An Opportunistic Cloud Infrastructure
Energy-aware VM Allocation on An Opportunistic Cloud InfrastructureEnergy-aware VM Allocation on An Opportunistic Cloud Infrastructure
Energy-aware VM Allocation on An Opportunistic Cloud InfrastructureMario Jose Villamizar Cano
 
Infraestructura computacional: Computación en grid
Infraestructura computacional: Computación en gridInfraestructura computacional: Computación en grid
Infraestructura computacional: Computación en gridMario Jose Villamizar Cano
 
Desarrollo de Soluciones Escalables de Software como Servicio (SaaS)
Desarrollo de Soluciones Escalables de Software como Servicio (SaaS)Desarrollo de Soluciones Escalables de Software como Servicio (SaaS)
Desarrollo de Soluciones Escalables de Software como Servicio (SaaS)Mario Jose Villamizar Cano
 
Emprendimiento en Internet / Internet Startups
Emprendimiento en Internet / Internet StartupsEmprendimiento en Internet / Internet Startups
Emprendimiento en Internet / Internet StartupsMario Jose Villamizar Cano
 
e-Clouds: a SaaS Marketplace for Scientific Computing
e-Clouds: a SaaS Marketplace for Scientific Computinge-Clouds: a SaaS Marketplace for Scientific Computing
e-Clouds: a SaaS Marketplace for Scientific ComputingMario Jose Villamizar Cano
 

Viewers also liked (6)

Energy-aware VM Allocation on An Opportunistic Cloud Infrastructure
Energy-aware VM Allocation on An Opportunistic Cloud InfrastructureEnergy-aware VM Allocation on An Opportunistic Cloud Infrastructure
Energy-aware VM Allocation on An Opportunistic Cloud Infrastructure
 
Infraestructura computacional: Computación en grid
Infraestructura computacional: Computación en gridInfraestructura computacional: Computación en grid
Infraestructura computacional: Computación en grid
 
Desarrollo de Soluciones Escalables de Software como Servicio (SaaS)
Desarrollo de Soluciones Escalables de Software como Servicio (SaaS)Desarrollo de Soluciones Escalables de Software como Servicio (SaaS)
Desarrollo de Soluciones Escalables de Software como Servicio (SaaS)
 
An Opportunistic Storage System for UnaGrid
An Opportunistic Storage System for UnaGridAn Opportunistic Storage System for UnaGrid
An Opportunistic Storage System for UnaGrid
 
Emprendimiento en Internet / Internet Startups
Emprendimiento en Internet / Internet StartupsEmprendimiento en Internet / Internet Startups
Emprendimiento en Internet / Internet Startups
 
e-Clouds: a SaaS Marketplace for Scientific Computing
e-Clouds: a SaaS Marketplace for Scientific Computinge-Clouds: a SaaS Marketplace for Scientific Computing
e-Clouds: a SaaS Marketplace for Scientific Computing
 

Similar to Bio-UnaGrid: Easing bioinformatics workflow execution

FAIR Computational Workflows
FAIR Computational WorkflowsFAIR Computational Workflows
FAIR Computational WorkflowsCarole Goble
 
Software Pipelines: The Good, The Bad and The Ugly
Software Pipelines: The Good, The Bad and The UglySoftware Pipelines: The Good, The Bad and The Ugly
Software Pipelines: The Good, The Bad and The UglyJoão André Carriço
 
Using Microservices to Design Patient-facing Research Software
Using Microservices to Design Patient-facing Research SoftwareUsing Microservices to Design Patient-facing Research Software
Using Microservices to Design Patient-facing Research SoftwareMartin Chapman
 
INSTRUCT - Integrated Structural Biology Infrastructure
INSTRUCT - Integrated Structural Biology InfrastructureINSTRUCT - Integrated Structural Biology Infrastructure
INSTRUCT - Integrated Structural Biology InfrastructureResearch Data Alliance
 
AI for All: Biology is eating the world & AI is eating Biology
AI for All: Biology is eating the world & AI is eating Biology AI for All: Biology is eating the world & AI is eating Biology
AI for All: Biology is eating the world & AI is eating Biology Intel® Software
 
Biopython at BOSC 2010
Biopython at BOSC 2010Biopython at BOSC 2010
Biopython at BOSC 2010Brad Chapman
 
Chapman bosc2010 biopython
Chapman bosc2010 biopythonChapman bosc2010 biopython
Chapman bosc2010 biopythonBOSC 2010
 
IRJET - Development of Cloud System for IoT Applications
IRJET - Development of Cloud System for IoT ApplicationsIRJET - Development of Cloud System for IoT Applications
IRJET - Development of Cloud System for IoT ApplicationsIRJET Journal
 
Mercer bosc2010 microsoft_framework
Mercer bosc2010 microsoft_frameworkMercer bosc2010 microsoft_framework
Mercer bosc2010 microsoft_frameworkBOSC 2010
 

Similar to Bio-UnaGrid: Easing bioinformatics workflow execution (20)

chapter 2.docx
chapter 2.docxchapter 2.docx
chapter 2.docx
 
chapter 2.pdf
chapter 2.pdfchapter 2.pdf
chapter 2.pdf
 
GRID COMPUTING.ppt
GRID COMPUTING.pptGRID COMPUTING.ppt
GRID COMPUTING.ppt
 
Towards the Adoption of Cyber-Physical Systems of Systems Paradigm in Smart ...
Towards the Adoption of Cyber-Physical Systems of  Systems Paradigm in Smart ...Towards the Adoption of Cyber-Physical Systems of  Systems Paradigm in Smart ...
Towards the Adoption of Cyber-Physical Systems of Systems Paradigm in Smart ...
 
chapter 3.pdf
chapter 3.pdfchapter 3.pdf
chapter 3.pdf
 
chapter 3.docx
chapter 3.docxchapter 3.docx
chapter 3.docx
 
FAIR Computational Workflows
FAIR Computational WorkflowsFAIR Computational Workflows
FAIR Computational Workflows
 
CIKB poster_2015
CIKB poster_2015CIKB poster_2015
CIKB poster_2015
 
Software Pipelines: The Good, The Bad and The Ugly
Software Pipelines: The Good, The Bad and The UglySoftware Pipelines: The Good, The Bad and The Ugly
Software Pipelines: The Good, The Bad and The Ugly
 
chapter 5.docx
chapter 5.docxchapter 5.docx
chapter 5.docx
 
chapter 5.pdf
chapter 5.pdfchapter 5.pdf
chapter 5.pdf
 
Using Microservices to Design Patient-facing Research Software
Using Microservices to Design Patient-facing Research SoftwareUsing Microservices to Design Patient-facing Research Software
Using Microservices to Design Patient-facing Research Software
 
INSTRUCT - Integrated Structural Biology Infrastructure
INSTRUCT - Integrated Structural Biology InfrastructureINSTRUCT - Integrated Structural Biology Infrastructure
INSTRUCT - Integrated Structural Biology Infrastructure
 
AI for All: Biology is eating the world & AI is eating Biology
AI for All: Biology is eating the world & AI is eating Biology AI for All: Biology is eating the world & AI is eating Biology
AI for All: Biology is eating the world & AI is eating Biology
 
Pine education-platform
Pine education-platformPine education-platform
Pine education-platform
 
Biopython at BOSC 2010
Biopython at BOSC 2010Biopython at BOSC 2010
Biopython at BOSC 2010
 
Chapman bosc2010 biopython
Chapman bosc2010 biopythonChapman bosc2010 biopython
Chapman bosc2010 biopython
 
BIOMED_presentation.ppt
BIOMED_presentation.pptBIOMED_presentation.ppt
BIOMED_presentation.ppt
 
IRJET - Development of Cloud System for IoT Applications
IRJET - Development of Cloud System for IoT ApplicationsIRJET - Development of Cloud System for IoT Applications
IRJET - Development of Cloud System for IoT Applications
 
Mercer bosc2010 microsoft_framework
Mercer bosc2010 microsoft_frameworkMercer bosc2010 microsoft_framework
Mercer bosc2010 microsoft_framework
 

More from Mario Jose Villamizar Cano

e-Clouds A Platform and Marketplace to Access and Publish Scientific Applicat...
e-Clouds A Platform and Marketplace to Access and Publish Scientific Applicat...e-Clouds A Platform and Marketplace to Access and Publish Scientific Applicat...
e-Clouds A Platform and Marketplace to Access and Publish Scientific Applicat...Mario Jose Villamizar Cano
 
Frameworks y herramientas de desarrollo ágil para emprendedores y startups
Frameworks y herramientas de desarrollo ágil para emprendedores y startupsFrameworks y herramientas de desarrollo ágil para emprendedores y startups
Frameworks y herramientas de desarrollo ágil para emprendedores y startupsMario Jose Villamizar Cano
 
An Overview of Internet Startups and Entrepreneurship
An Overview of Internet Startups and EntrepreneurshipAn Overview of Internet Startups and Entrepreneurship
An Overview of Internet Startups and EntrepreneurshipMario Jose Villamizar Cano
 
Cloud computing oportunidades para empresarios y emprendedores
Cloud computing oportunidades para empresarios y emprendedoresCloud computing oportunidades para empresarios y emprendedores
Cloud computing oportunidades para empresarios y emprendedoresMario Jose Villamizar Cano
 
UnaCloud: Opportunistic Cloud Computing Infrastructure as a Service
UnaCloud: Opportunistic Cloud Computing Infrastructure as a ServiceUnaCloud: Opportunistic Cloud Computing Infrastructure as a Service
UnaCloud: Opportunistic Cloud Computing Infrastructure as a ServiceMario Jose Villamizar Cano
 
Taxonomía de los modelos de entrega de servicios, despliegue y facturación en...
Taxonomía de los modelos de entrega de servicios, despliegue y facturación en...Taxonomía de los modelos de entrega de servicios, despliegue y facturación en...
Taxonomía de los modelos de entrega de servicios, despliegue y facturación en...Mario Jose Villamizar Cano
 
BacteriumSimulatorGrid (BSGrid) - Tool for Simulating the Behavior of the Bac...
BacteriumSimulatorGrid (BSGrid) - Tool for Simulating the Behavior of the Bac...BacteriumSimulatorGrid (BSGrid) - Tool for Simulating the Behavior of the Bac...
BacteriumSimulatorGrid (BSGrid) - Tool for Simulating the Behavior of the Bac...Mario Jose Villamizar Cano
 
Una grid una solución oportunista para la HPC en colombia
Una grid una solución oportunista para la HPC en colombiaUna grid una solución oportunista para la HPC en colombia
Una grid una solución oportunista para la HPC en colombiaMario Jose Villamizar Cano
 

More from Mario Jose Villamizar Cano (19)

Emprendimiento en internet y startups 2017
Emprendimiento en internet y startups 2017Emprendimiento en internet y startups 2017
Emprendimiento en internet y startups 2017
 
e-Clouds A Platform and Marketplace to Access and Publish Scientific Applicat...
e-Clouds A Platform and Marketplace to Access and Publish Scientific Applicat...e-Clouds A Platform and Marketplace to Access and Publish Scientific Applicat...
e-Clouds A Platform and Marketplace to Access and Publish Scientific Applicat...
 
Frameworks y herramientas de desarrollo ágil para emprendedores y startups
Frameworks y herramientas de desarrollo ágil para emprendedores y startupsFrameworks y herramientas de desarrollo ágil para emprendedores y startups
Frameworks y herramientas de desarrollo ágil para emprendedores y startups
 
An Overview of Internet Startups and Entrepreneurship
An Overview of Internet Startups and EntrepreneurshipAn Overview of Internet Startups and Entrepreneurship
An Overview of Internet Startups and Entrepreneurship
 
Cloud computing oportunidades para empresarios y emprendedores
Cloud computing oportunidades para empresarios y emprendedoresCloud computing oportunidades para empresarios y emprendedores
Cloud computing oportunidades para empresarios y emprendedores
 
CLOUD COMPUTING HOY: Todo como Servicio.
CLOUD COMPUTING HOY: Todo como Servicio.CLOUD COMPUTING HOY: Todo como Servicio.
CLOUD COMPUTING HOY: Todo como Servicio.
 
UnaCloud: Opportunistic Cloud Computing Infrastructure as a Service
UnaCloud: Opportunistic Cloud Computing Infrastructure as a ServiceUnaCloud: Opportunistic Cloud Computing Infrastructure as a Service
UnaCloud: Opportunistic Cloud Computing Infrastructure as a Service
 
Taxonomía de los modelos de entrega de servicios, despliegue y facturación en...
Taxonomía de los modelos de entrega de servicios, despliegue y facturación en...Taxonomía de los modelos de entrega de servicios, despliegue y facturación en...
Taxonomía de los modelos de entrega de servicios, despliegue y facturación en...
 
BacteriumSimulatorGrid (BSGrid) - Tool for Simulating the Behavior of the Bac...
BacteriumSimulatorGrid (BSGrid) - Tool for Simulating the Behavior of the Bac...BacteriumSimulatorGrid (BSGrid) - Tool for Simulating the Behavior of the Bac...
BacteriumSimulatorGrid (BSGrid) - Tool for Simulating the Behavior of the Bac...
 
Una grid una solución oportunista para la HPC en colombia
Una grid una solución oportunista para la HPC en colombiaUna grid una solución oportunista para la HPC en colombia
Una grid una solución oportunista para la HPC en colombia
 
APO1 - Presentacion nivel 6
APO1 - Presentacion nivel 6APO1 - Presentacion nivel 6
APO1 - Presentacion nivel 6
 
APO1 - Presentacion nivel 4
APO1 - Presentacion nivel 4APO1 - Presentacion nivel 4
APO1 - Presentacion nivel 4
 
APO2 - Presentacion nivel 10
APO2 - Presentacion nivel 10APO2 - Presentacion nivel 10
APO2 - Presentacion nivel 10
 
APO2 - Presentacion nivel 9
APO2 - Presentacion nivel 9APO2 - Presentacion nivel 9
APO2 - Presentacion nivel 9
 
APO1 - Presentacion nivel 3
APO1 - Presentacion nivel 3APO1 - Presentacion nivel 3
APO1 - Presentacion nivel 3
 
APO2 - Presentacion nivel 8
APO2 - Presentacion nivel 8APO2 - Presentacion nivel 8
APO2 - Presentacion nivel 8
 
APO1 - Presentacion nivel 2
APO1 - Presentacion nivel 2APO1 - Presentacion nivel 2
APO1 - Presentacion nivel 2
 
APO2 - Presentacion nivel 7
APO2 - Presentacion nivel 7APO2 - Presentacion nivel 7
APO2 - Presentacion nivel 7
 
APO1 - Presentacion nivel 1
APO1 - Presentacion nivel 1APO1 - Presentacion nivel 1
APO1 - Presentacion nivel 1
 

Recently uploaded

Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxLoriGlavin3
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionDilum Bandara
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsNathaniel Shimoni
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxBkGupta21
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
What is Artificial Intelligence?????????
What is Artificial Intelligence?????????What is Artificial Intelligence?????????
What is Artificial Intelligence?????????blackmambaettijean
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfLoriGlavin3
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESSALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESmohitsingh558521
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 

Recently uploaded (20)

Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An Introduction
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directions
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptx
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
What is Artificial Intelligence?????????
What is Artificial Intelligence?????????What is Artificial Intelligence?????????
What is Artificial Intelligence?????????
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdf
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESSALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 

Bio-UnaGrid: Easing bioinformatics workflow execution

  • 1. The Third International Conference on Bioinformatics, Biocomputational Systems and Biotechnologies (BIOTECHNO 2011) Bio-UnaGrid: Easing Bioinformatics Workflow Execution Using LONI Pipeline and a Virtual Desktop Grid Mario Villamizar, Harold Castro, Silvia Restrepo, Luis Rodriguez David Mendez Department of Biological Sciences Departments of Systems and Computer Universidad de los Andes Engineering Bogotá D.C., Colombia Universidad de los Andes Bogotá, Colombia
  • 2. The Third International Conference on Bioinformatics, Biocomputational Systems and Biotechnologies (BIOTECHNO 2011) Agenda The Problem Context Related Work Bio‐UnaGrid: Architecture Bio‐UnaGrid: Implementation Bio‐UnaGrid: Performance Tests Conclusions and Future Work
  • 3. The Third International Conference on Bioinformatics, Biocomputational Systems and Biotechnologies (BIOTECHNO 2011) Agenda The Problem Context Related Work Bio‐UnaGrid: Architecture Bio‐UnaGrid: Implementation Bio‐UnaGrid: Performance Tests Conclusions and Future Work
  • 4. The Third International Conference on Bioinformatics, Biocomputational Systems and Biotechnologies (BIOTECHNO 2011) Around the problem Bioinformatics researchers use applications that require large computational capabilities. Cluster and grid computing are common solutions to support these capabilities. Cluster computing allows to aggregate homogeneous computational resources for an specific research group or organization
  • 5. The Third International Conference on Bioinformatics, Biocomputational Systems and Biotechnologies (BIOTECHNO 2011) Around the problem Grid computing allows to aggregate heterogeneous computational resources of different organizations to support larger computational capabilities in e- Science projects. In grid computing a set of organization, with a common objective (VO), share computational resources.
  • 6. The Third International Conference on Bioinformatics, Biocomputational Systems and Biotechnologies (BIOTECHNO 2011) Around the problem High performance computing (HPC) infrastructures, such as cluster or grid computing, provide results in shorter time, however when bioinformatics researchers want to execute applications, they face several consuming time problems.
  • 7. The Third International Conference on Bioinformatics, Biocomputational Systems and Biotechnologies (BIOTECHNO 2011) Around the problem 1. There are many application suites to run different analysis, so researchers must learn command line syntax (options, input and output files, distributed execution environment, etc.) for hundreds of applications.
  • 8. The Third International Conference on Bioinformatics, Biocomputational Systems and Biotechnologies (BIOTECHNO 2011) Around the problem 2. Researchers must learn commands to manage and use distributed computing infrastructures.
  • 9. The Third International Conference on Bioinformatics, Biocomputational Systems and Biotechnologies (BIOTECHNO 2011) Around the problem 3. Applications require specific and complex configurations to operate in distributed environments.
  • 10. The Third International Conference on Bioinformatics, Biocomputational Systems and Biotechnologies (BIOTECHNO 2011) Around the problem 4. Researchers frequently execute a set of applications in a sequential and/or coordinated manner, called pipelines or workflows. In many cases the workflow execution is manually done by the researchers using command line. Execute an application, wait …., execute the next application, wait …. This process may take a lot of time.
  • 11. The Third International Conference on Bioinformatics, Biocomputational Systems and Biotechnologies (BIOTECHNO 2011) Around the problem 5. Very often researchers want to execute applications that require larger processing capabilities than those provided by dedicated cluster computing infrastructures, so they must wait weeks or months before getting their results.
  • 12. The Third International Conference on Bioinformatics, Biocomputational Systems and Biotechnologies (BIOTECHNO 2011) The problem We were having the following problem: The manual and command-based application and workflow execution require bioinformatics researchers to spend most of their time in: configuring application parameters, managing HPC infrastructures, managing scientific data, and linking partial results from one application to another. When a new researcher or student want to use applications he/she spends a lot of time learning all those things.
  • 13. The Third International Conference on Bioinformatics, Biocomputational Systems and Biotechnologies (BIOTECHNO 2011) Agenda The Problem Context Related Work Bio‐UnaGrid: Architecture Bio‐UnaGrid: Implementation Bio‐UnaGrid: Performance Tests Conclusions and Future Work
  • 14. The Third International Conference on Bioinformatics, Biocomputational Systems and Biotechnologies (BIOTECHNO 2011) Related work Several projects have been developed to facilitate the automatic workflow execution in bioinformatics and other scientific fields.
  • 15. The Third International Conference on Bioinformatics, Biocomputational Systems and Biotechnologies (BIOTECHNO 2011) Related work Most of workflow tools have limitations for bioinformatics applications such as: They require applications to be recompiled or modified. It is too complex to modify or recompile every bioinformatics application. They support internal data sources with specific data structures. Bioinformatics data have different formats and they can be internal or external. They operate with specific platforms, and cluster or grid middlewares. Institutions regularly have different cluster and grid configurations. In our case we are part of different Grid projects such as GISELA, EGEE, and OSG.
  • 16. The Third International Conference on Bioinformatics, Biocomputational Systems and Biotechnologies (BIOTECHNO 2011) Related work Most of workflow tools have limitations for bioinformatics applications such as: They regularly require researchers to use complex commands involving consuming-time tasks. Bioinformatics researchers should use easy GUIs to execute workflows. They operate on dedicated cluster and grid infrastructures, however in a university or enterprise campus there are tens or hundreds of desktop computers that are under-utilized. Opportunistic or DGVCSs can provide large processing capabilities, at low cost.
  • 17. The Third International Conference on Bioinformatics, Biocomputational Systems and Biotechnologies (BIOTECHNO 2011) How to solve these problems Define an infrastructure that allows that existing bioinformatics applications can be easily incorporated without modifications LONI Pipeline Workflows should be created through GUIs and executed on different HPC infrastructures. The infrastructure should also allow the use of different data sources (internal and external), and the incorporation of MPI applications. UnaGrid For getting results faster the infrastructure Bio-UnaGrid integrates the should operate on dedicated computing capabilities of LONI Pipeline and infrastructures and on DGVCS. UnaGrid to provide these features.
  • 18. The Third International Conference on Bioinformatics, Biocomputational Systems and Biotechnologies (BIOTECHNO 2011) LONI Pipeline LONI Pipeline is a free framework used to execute neuroscience workflows. From a computational perspective, LONI differs from other workflow tools in several features: It does not require external application being recompiled. It supports external data sources. It is hardware platform independent. It can be installed on different cluster or grid middlewares using the Pipeline plugins. From a research user perspective, LONI was designed having in mind features like usability, portability, intuitiveness, transparency and abstraction of cluster or grid infrastructures.
  • 19. The Third International Conference on Bioinformatics, Biocomputational Systems and Biotechnologies (BIOTECHNO 2011) LONI Pipeline Architecture
  • 20. The Third International Conference on Bioinformatics, Biocomputational Systems and Biotechnologies (BIOTECHNO 2011) UnaGrid Dedicated infrastructures are an unviable option in organizations or countries with low financial resources. However, these organizations have many computer labs which are not fully utilized by employees or university students.
  • 21. The Third International Conference on Bioinformatics, Biocomputational Systems and Biotechnologies (BIOTECHNO 2011) UnaGrid: Opportunistic virtual clusters X Cores Linux Processing Virtual Machine Physical Machine of a Computer Room b. When there is not an End User using the physical machine A virtual cluster is a set of commodity and interconnected desktops executing virtual machines (VMs) in background and low-priority through virtualization technologies, these VMs take advantage of the available idle processing capabilities in computer labs on an university campus.
  • 22. The Third International Conference on Bioinformatics, Biocomputational Systems and Biotechnologies (BIOTECHNO 2011) UnaGrid: Opportunistic virtual clusters A virtual machine is executed on each computer of a lab and it supports the role of a cluster slave and all of these virtual machines on execution make up a virtual processing cluster. A dedicated node is necessary for a virtual cluster and it supports the role of the cluster master.
  • 23. The Third International Conference on Bioinformatics, Biocomputational Systems and Biotechnologies (BIOTECHNO 2011) UnaGrid: Opportunistic virtual clusters A virtual machine is executed on each computer of a lab and it supports the role of a cluster slave and all of these virtual machines on execution make up a virtual processing cluster. A dedicated node is necessary for a virtual cluster and it supports the role of the cluster master.
  • 24. The Third International Conference on Bioinformatics, Biocomputational Systems and Biotechnologies (BIOTECHNO 2011) UnaGrid: Opportunistic virtual clusters A virtual machine is executed on each computer of a lab and it supports the role of a cluster slave and all of these virtual machines on execution make up a virtual processing cluster. A dedicated node is necessary for a virtual cluster and it supports the role of the cluster master.
  • 25. The Third International Conference on Bioinformatics, Biocomputational Systems and Biotechnologies (BIOTECHNO 2011) UnaGrid: Opportunistic virtual clusters Each research group can define its own virtual clusters with custom application environments (middlewares, applications, configurations, etc.) A grid solution (several virtual clusters) can be deployed for supporting the processing capabilities required by some applications.
  • 26. The Third International Conference on Bioinformatics, Biocomputational Systems and Biotechnologies (BIOTECHNO 2011) Agenda The Problem Context Related Work Bio‐UnaGrid: Architecture Bio‐UnaGrid: Implementation Bio‐UnaGrid: Performance Tests Conclusions and Future Work
  • 27. The Third International Conference on Bioinformatics, Biocomputational Systems and Biotechnologies (BIOTECHNO 2011) Bio-UnaGrid Architecture The LONI Pipeline Client is the main entry point of bioinformatics researchers to the Bio-UnaGrid infrastructure.
  • 28. The Third International Conference on Bioinformatics, Biocomputational Systems and Biotechnologies (BIOTECHNO 2011) Bio-UnaGrid Operation Each researcher receives a username and a password. He/she can proceed to create the workflows, using the bioinformatics LONI Module Library, drag and drop, and other tools, provided by the LONI Pipeline Client. For each LONI Module (a bioinformatics application), the researcher must define the input and output parameters, and how each LONI Module is connected with other LONI Modules of the workflow. Once a workflow has been created, researchers can execute it, sending it to the LONI Pipeline Server. Researcher can monitor and visualize the status of the workflow, and download partial results, through GUIs of the LONI Pipeline Client. The Grid/Cluster infrastructure is transparent for researchers.
  • 29. The Third International Conference on Bioinformatics, Biocomputational Systems and Biotechnologies (BIOTECHNO 2011) Bio-UnaGrid: Bioinformatics Applications For executing workflows, bioinformatics researchers do not have to worry about applications’ commands or complex HPC infrastructures.
  • 30. The Third International Conference on Bioinformatics, Biocomputational Systems and Biotechnologies (BIOTECHNO 2011) Bio-UnaGrid: Incorporating Bioinformatics Applications The process to incorporate existing and new bioinformatics application suites, like NCBI BLAST, into Bio-UnaGrid infrastructure are summarized in five steps. 1) Installation and configuration of the application suite on a Shared Storage System. 2) Creation of a LONI Pipeline Module using the LONI Pipeline Client for each executable of the suite. 3) Individual tests for each LONI Pipeline Module of the suite from the LONI Pipeline Client. 4) Storage of the LONI Pipeline Module in the LONI Pipeline Server. 5) Using the modules from workflows created by bioinformatics researchers in the LONI Pipeline Client.
  • 31. The Third International Conference on Bioinformatics, Biocomputational Systems and Biotechnologies (BIOTECHNO 2011) Bio-UnaGrid: Incorporating Bioinformatics Applications
  • 32. The Third International Conference on Bioinformatics, Biocomputational Systems and Biotechnologies (BIOTECHNO 2011) Bio-UnaGrid: MPI applications on Bio-UnaGrid At the time of this development, LONI did not support MPI applications. To allow the execution of MPI applications, we developed a wrapper, called mpiJobManager, which uses the Distributed Resource Management Application API (DRMAA). To execute an MPI Application, such as mpiBLAST, a researcher uses the MPI LONI Module of the MPI application to specify the number of MPI processes to be used. The LONI Pipeline Server receives the MPI job and proceeds to execute it on the DRM slaves.
  • 33. The Third International Conference on Bioinformatics, Biocomputational Systems and Biotechnologies (BIOTECHNO 2011) Agenda The Problem Context Related Work Bio‐UnaGrid: Architecture Bio‐UnaGrid: Implementation Bio‐UnaGrid: Performance Tests Conclusions and Future Work
  • 34. The Third International Conference on Bioinformatics, Biocomputational Systems and Biotechnologies (BIOTECHNO 2011) Bio-UnaGrid: Implementation Bioinformatics Infrastructure 6 Dedicated OGE Slaves – 24 Cores 1 Gbps S&C Infrastructure LONI Client GUMA Client 10 Gbps NFS-NAS Server 10 Gbps LONI Pipeline Server 10 1 OGE Master Bioinformatics Gbps Gbps Researcher GUMA Portal 10 Gbps 10 Gbps 155 LAN/ Mbps 10 Internet Gbps User Database 1 Gbps Linux CVC with 21 OGE Slaves – 21 Cores VM VM VM VM ... VM Windows 7 Lab UnaGrid Infrastructure Bio-UnaGrid was implemented to support several genomic projects related to coffee, potato and cassava, focused in increasing their production.
  • 35. The Third International Conference on Bioinformatics, Biocomputational Systems and Biotechnologies (BIOTECHNO 2011) Bio-UnaGrid: Implementation The LONI Pipeline Server version 5.0.2 and the DRM Master of OGE version 6.2 update 5. 6 dedicated servers - Intel Xeon X5560 Quad Core processor of 2.8 GHz and 4 GB of RAM. A computer lab (Windows 7) - Intel i5 processor of 3.46 GHz and 8 GB of RAM. 21 VMs (UnaGrid) was deployed and configured with a CPU core and 4 GB of RAM. The Database User was installed on a dedicated server using MySQL. We use an NFSv3-NAS. All nodes are interconnected through 1 GbE and 10GbE links.
  • 36. The Third International Conference on Bioinformatics, Biocomputational Systems and Biotechnologies (BIOTECHNO 2011) Bio-UnaGrid: Implementation Fault tolerance mechanisms were configured for BoT applications using the OGE capabilities. Four bioinformatics application suites have been installed on the Bio- UnaGrid implementation and 21 LONI Modules were created. NCBI BLAST version 2.2.20: BLASTn, BLASTp, BLASTx, tBLASTn, tBLASTx and MEGABLAST. HMMER version 2.3.2: Build, Search and Calibrate InterProScan version 4.6: BlastProdom, Coils, Gene3D, PIR, Panther, Pfam, SEG, SMART, SuperFamily, TIGRfam and fPrintScan mpiBLAST version 1.6.0: mpiBLAST Because mpiBLAST requires the use of an MPI implementation, the MPICH- 2 implementation version 1.2.7 was installed.
  • 37. The Third International Conference on Bioinformatics, Biocomputational Systems and Biotechnologies (BIOTECHNO 2011) Bio-UnaGrid: A Simple Workflow Example
  • 38. The Third International Conference on Bioinformatics, Biocomputational Systems and Biotechnologies (BIOTECHNO 2011) Bio-UnaGrid: An Example of A More Complex Workflow
  • 39. The Third International Conference on Bioinformatics, Biocomputational Systems and Biotechnologies (BIOTECHNO 2011) Agenda The Problem Context Related Work Bio‐UnaGrid: Architecture Bio‐UnaGrid: Implementation Bio‐UnaGrid: Performance Tests Conclusions and Future Work
  • 40. The Third International Conference on Bioinformatics, Biocomputational Systems and Biotechnologies (BIOTECHNO 2011) Performance Tests - BLASTn We executed performance tests in both environments, in the dedicated cluster and in the opportunistic UnaGrid CVC. We selected the high popular BLAST algorithm to execute these tests. For testing the performance of Bio-UnaGrid for executing BoT applications we used the NCBI BLASTn application, varying the number of CPU cores and the size of the sequence set. We executed BLASTn searches between nucleotide sets of 480 (487 KB) and 960 (954 KB) sequences, and the nt database (28 GB). The searches were executed using 5, 10, 15 and 20, BLASTn independent processes (BoT BLASTn), using query segmentation. Each search was executed using a single CPU core.
  • 41. The Third International Conference on Bioinformatics, Biocomputational Systems and Biotechnologies (BIOTECHNO 2011) Performance Tests - BLASTn Execution time of BoT BLASTn searches between sets of 480 and 960 sequences, and the nt database. 480 seq. - Dedicated Cluster (20 Cores) 10.9x – UnaGrid (20 Cores) 13.1x 960 seq. - Dedicated Cluster (15 Cores) 6.1x – UnaGrid (20 Cores) 10x
  • 42. The Third International Conference on Bioinformatics, Biocomputational Systems and Biotechnologies (BIOTECHNO 2011) Performance Tests - mpiBLAST For testing the execution of MPI applications on both environments, we executed the same tests using mpiBLAST. In these tests we executed MPI processes coordinated on 5, 10, 15 and 20 CPU cores. In each test, the additional process called mpiJobManager was executed in another CPU core. We used database segmentation to divide the database in 5, 10, 15 and 20 partitions using the mpiformatdb installed with mpiBLAST.
  • 43. The Third International Conference on Bioinformatics, Biocomputational Systems and Biotechnologies (BIOTECHNO 2011) Performance Tests - mpiBLAST Execution time of mpiBLAST searches between sets of 480 and 960 sequences, and the nt database. 480 seq. - Dedicated Cluster (15 Cores) 2.3x – UnaGrid (20 Cores) 4.7x 960 seq. - Dedicated Cluster (20 Cores) 3.0x – UnaGrid (20 Cores) 3.7x
  • 44. The Third International Conference on Bioinformatics, Biocomputational Systems and Biotechnologies (BIOTECHNO 2011) Agenda The Problem Context Related Work Bio‐UnaGrid: Architecture Bio‐UnaGrid: Implementation Bio‐UnaGrid: Performance Tests Conclusions and Future Work
  • 45. The Third International Conference on Bioinformatics, Biocomputational Systems and Biotechnologies (BIOTECHNO 2011) Conclusions Using the idle processing capabilities available in computer labs using Windows, Linux or Mac desktops, Bio-UnaGrid provides additional processing capabilities to those provided by dedicated computational clusters. Bio-UnaGrid allows bioinformatics researchers to easily define workflows using GUIs and execute them on cluster and grid computing infrastructures with a simple click. Bio-UnaGrid is highly extensible as existing bioinformatics applications can be incorporated without modifications. With Bio-UnaGrid, researchers focus on analysis of application results, not on technical issues of distributed computing infrastructures.
  • 46. The Third International Conference on Bioinformatics, Biocomputational Systems and Biotechnologies (BIOTECHNO 2011) Future Work We will incorporate more bioinformatics applications suites into Bio-UnaGrid. We will execute new performance tests with other applications such as HMMER, MPI- HMMER, ClustalW and ClustalW-MPI. We will execute new performance tests in a larger grid deployment involving hundreds of heterogeneous desktop computers in different administrative domains. We will implement a shared storage system more scalable than NFSv3. We also plan to implement a fault tolerance mechanism for MPI applications.
  • 47. The Third International Conference on Bioinformatics, Biocomputational Systems and Biotechnologies (BIOTECHNO 2011) Thanks for your attention !!!
  • 48. The Third International Conference on Bioinformatics, Biocomputational Systems and Biotechnologies (BIOTECHNO 2011) Contact Information mj.villamizar24@uniandes.edu.co mjvc007@hotmail.com http://twitter.com/mjvc007 http://linkedin.com/in/mariojosevillamizarcano Mario José Villamizar Cano