SlideShare a Scribd company logo
1 of 23
Download to read offline
VIP: design and implementation
    of the portal and execution service

                  Rafael FERREIRA DA SILVA
         CNRS, CREATIS, INSA-Lyon, Université Lyon 1, INSERM

                   For the VIP Project Consortium:




                       VIP Launching Workshop
                      Lyon, December 14th 2012


1
                                                 Rafael Ferreira da Silva – rafael.silva@creatis.insa-lyon.fr
Outline

  Introduction
  VIP Architecture
       Web Portal
       Data Transfers
       Workflow Execution

  Workflow Self-Healing
  Conclusions




2
    http://vip.creatis.insa-lyon.fr   Rafael Ferreira da Silva – rafael.silva@creatis.insa-lyon.fr
Platform goals
                                        Multi-modality medical image simulators
                                          MRI, US, CT and PET

                                        Objectives
                                          Workflow execution on EGI
                                          Access to storage resources
                                          High–level interface for non-experts

                                        No IT required
                                            Software as a Service (SaaS)
                                            No client software instalation
                                            New features automatically available
                                            Consolidated support and troubleshooting


3
    http://vip.creatis.insa-lyon.fr                            Rafael Ferreira da Silva – rafael.silva@creatis.insa-lyon.fr
VIP – Architecture
                                                                                                       Object Model
                                                                                                       Repository
                                      Data Management




                                                                                                     Simulated Data
                                           GASW                                                      Repository
                                        Job Generation      Workflow Engine




                                         Job Scheduler




4
    http://vip.creatis.insa-lyon.fr                                     Rafael Ferreira da Silva – rafael.silva@creatis.insa-lyon.fr
VIP – Web Portal

         User Front-End
               Openly-accessible web portal
               Access point to models and simulators.
               User-friendly interface which assists users in using image
                simulators.
               Modular code design (GWT + SmartGWT)




5
    http://vip.creatis.insa-lyon.fr                 Rafael Ferreira da Silva – rafael.silva@creatis.insa-lyon.fr
Users/Apps Management
                Users                 Groups   Application Classes                 Applications




6
    http://vip.creatis.insa-lyon.fr                      Rafael Ferreira da Silva – rafael.silva@creatis.insa-lyon.fr
VIP – GRIDA


         Grid Data Management Agent
               Handles file catalog and transfer operations by pooling
               Performs data replication




7
    http://vip.creatis.insa-lyon.fr                 Rafael Ferreira da Silva – rafael.silva@creatis.insa-lyon.fr
Data Transfers Management
            User Machine                                      VIP Server                             Grid Storage




                                      User uploads file                    GRIDA Uploads
                                      to VIP Server                        file to the grid
                                                                           (replication)




                                                   User downloads            GRIDA Downloads
                                                   the file                  file to VIP Server




8
    http://vip.creatis.insa-lyon.fr                                                Rafael Ferreira da Silva – rafael.silva@creatis.insa-lyon.fr
VIP – Data Repositories

      Easily integration of third-party
         libraries
           NeuSemStore-Provenance for simulated
            data
           NeuSemStore-Simulated-Objects for the
            model catalog
           Encapsulation of objects as GWT
               serialized beans                     GWT Client             GWT Server             Databases




    More details on the presentation of B. Gibaud               RPC call
                                                                                           NeuSemStore

                                                     GWT Bean




9
    http://vip.creatis.insa-lyon.fr                    Rafael Ferreira da Silva – rafael.silva@creatis.insa-lyon.fr
VIP – Workflow Engine
                MOTEUR workflow engine
                      Applications described on formal language
                          http://modalis.i3s.unice.fr/softwares/moteur




                Generic Application Service Wrapper (GASW)
                      Bash scripts wrapped in grid jobs
                      Self-healing of workflow execution




10
     http://vip.creatis.insa-lyon.fr                                     Rafael Ferreira da Silva – rafael.silva@creatis.insa-lyon.fr
VIP – Architecture
                                         Workload Management
                                         System with Pilot Jobs
                                           Distributed Infrastructure with
                                           Remote Agent Control (DIRAC)
                                           [CPPM-LHCb]
                                           http://diracgrid.org

                                           Hosted by CC-IN2P3
                                           French National Instance




                                         Data Storage and Computing
                                         Back-End
                                           EGI infrastructure, Biomed VO
                                            http://www.egi.eu



11
     http://vip.creatis.insa-lyon.fr                Rafael Ferreira da Silva – rafael.silva@creatis.insa-lyon.fr
Workflow Execution

                             2. User launches                        3. MOTEUR generates
                                a simulation                              invocations




                                                                                       4. GASW generates
                                                                                                grid jobs
1. Input data              11. Download results
      upload




                                                                                  5. Jobs are submitted
                                       8. Inputs download                                     to DIRAC

                                                                   6. Pilot jobs are
                                                                  submitted to EGI
                                             9. Execution




                                       10. Results upload
                                                                    7. Pilot jobs
                                                                   fetch grid jobs

12
     http://vip.creatis.insa-lyon.fr                               Rafael Ferreira da Silva – rafael.silva@creatis.insa-lyon.fr
Outline

  Introduction
  VIP Architecture
        Web Portal
        Data Transfers
        Workflow Execution

  Workflow Self-Healing
  Conclusions




13
     http://vip.creatis.insa-lyon.fr   Rafael Ferreira da Silva – rafael.silva@creatis.insa-lyon.fr
Workflow Self-Healing
  Problem: costly manual operations
         Rescheduling tasks, restarting services, killing misbehaving
             experiments or replicating data files


  Objective: automated platform administration
         Autonomous detection of operational incidents
         Perform appropriate set of actions


  Assumptions: online and non-clairvoyant
         Only partial information available
         Decisions must be fast
         Production conditions, no user activity and workloads prediction

14
     http://vip.creatis.insa-lyon.fr                 Rafael Ferreira da Silva – rafael.silva@creatis.insa-lyon.fr
General MAPE-K loop
event                                        Incident 1               Incident 2                  Incident 3
(job completion and failures)
                                             degree η = 0.8           degree η = 0.4              degree η = 0.1
or
timeout                                       level   level   level   level   level      level    level   level     level
                                                1       2       3       1       2          3        1       2         3
 Monitoring                                                                                                                               Analysis



                                                                Monitoring data

                 x2                                                                                                                                   ηi
                                                                                                                                              =       n

         Set of Actions
                                                                                                                                                  ∑   j =1
                                                                                                                                                           ηj


 Execution                              Knowledge                                                                   Roulette wheel selection
                                                                                                                                €
 Planning

                                                                              Rule         Confidence (ρ)         ρxη
         Selected                                                             2 1               0.8              0.32              Selected

        Incident 2                                                            31                0.2              0.02              Incident 1
                                                                              1  1	

           1.0              0.80

                                         Roulette wheel selection                          Association rules
                                        based on association rules                          for incident 1

 15
      http://vip.creatis.insa-lyon.fr                                                            Rafael Ferreira da Silva – rafael.silva@creatis.insa-lyon.fr
Incident: Activity Blocked
  An invocation is late compared to the others




                  Invocations completion rate for a simulation        Job flow for a simulation



  Possible causes
         Longer waiting times
         Lost tasks (e.g. killed by site due to quota violation)
         Resources with poor performance



16
     http://vip.creatis.insa-lyon.fr                             Rafael Ferreira da Silva – rafael.silva@creatis.insa-lyon.fr
Activity blocked: degree
      Degree computed from all completed jobs of the activity
         Job phases: setup  inputs download  execution  outputs upload
         Assumption: bag-of-tasks (all jobs have equal durations)
         Median-based estimation:

                               Median duration   Estimated job   Real job
                                of jobs phases      duration     duration

                                        50s           42s           42s
                                                                               completed
                                       250s          300s          300s
                                       400s         400s*           20s        current

                                        15s           15s           ?
                                Mi = 715s        Ei = 757s
                                                                                                     *: max(400s, 20s) = 400s


      Incident degree: job performance w.r.t median
                      Ei
             d=             ∈ [0,1]
                    Mi + Ei

17
     http://vip.creatis.insa-lyon.fr                                        Rafael Ferreira da Silva – rafael.silva@creatis.insa-lyon.fr
€
Activity blocked: levels and actions
  Levels: identified from the platform logs
                                                τ1

                               Level 1               Level 2
                             (no actions)
                                            €                  action: replicate jobs




                                            d
                                                                  Replication process for one task



  Actions
         Job replication
               Cancel replicas with
                  bad performance
               Replicate only if all
                  active replicas are running

18
     http://vip.creatis.insa-lyon.fr                                 Rafael Ferreira da Silva – rafael.silva@creatis.insa-lyon.fr
Experimental results
   Goal: Self-Healing vs No-Healing
         Cope with recoverable errors

   Metrics
         Makespan of the activity execution
         Resource waste
                                                                                            speeds up FIELD-II execution up to 4
                           (CPU + data) self −healing
                    w=                                          −1
                            (CPU + data) no−healing                                                      Repetition         w
                                                                                                              1              –0.10

       For w < 0: self-healing consumed less resources                                                        2              –0.15
                                                                                                              3              –0.09
       For w > 0: self-healing wasted resources
                                                                                                              4                 0.05
   €
                                                                                                              5              –0.26



                                                                                            Self-Healing process reduced resource
                                                                                           consumption up to 26% when compared
                                                                                                 to the No-Healing execution
   R. Ferreira da Silva, T. Glatard, F. Desprez, Self-healing of operational workflow 	

   incidents on distributed computing infrastructures, IEEE/ACM International 	

19 Symposium on Cluster, Cloud and Grid Computing (CCGrid), Ottawa, Canada, 2012.	

                                                                                              Rafael Ferreira da Silva – rafael.silva@creatis.insa-lyon.fr
VIP – Facts
            321 registered users, from
                 38 countries

            Most used portal certificate in
                 EGI (August 2012)
                 https://wiki.egi.eu/wiki/EGI_robot_certificate_users


            Consumed 379 CPU years from
                 January 2011 to August 2012
                 http://accounting.egi.eu


            1/10 of the total activity of the
                 biomed international VO. One of
                 the most active users



20
     http://vip.creatis.insa-lyon.fr                                    Rafael Ferreira da Silva – rafael.silva@creatis.insa-lyon.fr
VIP – Facts

                                                                                     Applications
                                                                   1155 executed simulations during the last year (~3/day)




                             Users
     Repartition of portal users on EGI (August 2012)
(source: https://wiki.egi.eu/wiki/EGI_robot_certificate_users)




                                                                 Repartition of application executions in VIP (Nov 2011 – Oct 2012)



 21
      http://vip.creatis.insa-lyon.fr                                            Rafael Ferreira da Silva – rafael.silva@creatis.insa-lyon.fr
Concluding remarks
  VIP is an openly-accessible web portal for multi-modality
      medical image simulators
        MRI, US, CT and PET and other tools
        Workflow execution on EGI
        Access to storage resources
        High–level interface for non-experts
  No IT required (Software as a Service)
  Facts
        321 registered users from 38 countries
        Consumed about 400 CPU years / year
  Limits and perspectives
        Fair resource allocation among workflows
        User support
        Heavy data transfers
22
     http://vip.creatis.insa-lyon.fr                Rafael Ferreira da Silva – rafael.silva@creatis.insa-lyon.fr
VIP: design and implementation
of the portal and execution services

      Thank you for your attention.
              Questions?

    http://vip.creatis.insa-lyon.fr!



               Rafael FERREIRA DA SILVA
    CNRS, CREATIS, INSA-Lyon, Université Lyon 1, INSERM

              For the VIP Project Consortium:




                   VIP Launching Workshop
                  Lyon, December 14th 2012

                                            Rafael Ferreira da Silva – rafael.silva@creatis.insa-lyon.fr

More Related Content

Similar to VIP: design and implementation of the portal and execution service

Resume-pierre-stephane-us
Resume-pierre-stephane-usResume-pierre-stephane-us
Resume-pierre-stephane-us
Stephane Pierre
 
System Administrator_Sivaiah
System Administrator_SivaiahSystem Administrator_Sivaiah
System Administrator_Sivaiah
Sivaiah Yakkanti
 
Wy583 g formation-working-with-ibm-websphere-application-server-v8-5-5-2-libe...
Wy583 g formation-working-with-ibm-websphere-application-server-v8-5-5-2-libe...Wy583 g formation-working-with-ibm-websphere-application-server-v8-5-5-2-libe...
Wy583 g formation-working-with-ibm-websphere-application-server-v8-5-5-2-libe...
CERTyou Formation
 
Java in the database–is it really useful? Solving impossible Big Data challenges
Java in the database–is it really useful? Solving impossible Big Data challengesJava in the database–is it really useful? Solving impossible Big Data challenges
Java in the database–is it really useful? Solving impossible Big Data challenges
Rogue Wave Software
 
CurriculumVitae_VictorXavier_english
CurriculumVitae_VictorXavier_englishCurriculumVitae_VictorXavier_english
CurriculumVitae_VictorXavier_english
Victor Xavier
 
Lenovo xClarity - Presentacion - ITALTEL.pptx
Lenovo xClarity - Presentacion - ITALTEL.pptxLenovo xClarity - Presentacion - ITALTEL.pptx
Lenovo xClarity - Presentacion - ITALTEL.pptx
JairVelasquezParraga
 
Accessing Your Existing SAP NetWeaver Portal on Mobile Device
Accessing Your Existing SAP NetWeaver Portal on Mobile DeviceAccessing Your Existing SAP NetWeaver Portal on Mobile Device
Accessing Your Existing SAP NetWeaver Portal on Mobile Device
SAP Portal
 

Similar to VIP: design and implementation of the portal and execution service (20)

Self-healing of operational workflow incidents on distributed computing infra...
Self-healing of operational workflow incidents on distributed computing infra...Self-healing of operational workflow incidents on distributed computing infra...
Self-healing of operational workflow incidents on distributed computing infra...
 
Resume-pierre-stephane-us
Resume-pierre-stephane-usResume-pierre-stephane-us
Resume-pierre-stephane-us
 
Campus party, management of blueprints
Campus party, management of blueprintsCampus party, management of blueprints
Campus party, management of blueprints
 
CloudStack DC Meetup - Apache CloudStack Overview and 4.1/4.2 Preview
CloudStack DC Meetup - Apache CloudStack Overview and 4.1/4.2 PreviewCloudStack DC Meetup - Apache CloudStack Overview and 4.1/4.2 Preview
CloudStack DC Meetup - Apache CloudStack Overview and 4.1/4.2 Preview
 
Milvusdm
MilvusdmMilvusdm
Milvusdm
 
PaaS Manager GEi
PaaS Manager GEiPaaS Manager GEi
PaaS Manager GEi
 
Hydrosphere.io Platform for AI/ML Operations Automation
Hydrosphere.io Platform for AI/ML Operations AutomationHydrosphere.io Platform for AI/ML Operations Automation
Hydrosphere.io Platform for AI/ML Operations Automation
 
System Administrator_Sivaiah
System Administrator_SivaiahSystem Administrator_Sivaiah
System Administrator_Sivaiah
 
Spring and Pivotal Application Service - SpringOne Tour Dallas
Spring and Pivotal Application Service - SpringOne Tour DallasSpring and Pivotal Application Service - SpringOne Tour Dallas
Spring and Pivotal Application Service - SpringOne Tour Dallas
 
Distributed Database practicals
Distributed Database practicals Distributed Database practicals
Distributed Database practicals
 
Dl4j in the wild
Dl4j in the wildDl4j in the wild
Dl4j in the wild
 
S104877 cdm-data-reuse-jburg-v1809d
S104877 cdm-data-reuse-jburg-v1809dS104877 cdm-data-reuse-jburg-v1809d
S104877 cdm-data-reuse-jburg-v1809d
 
apidays LIVE Helsinki & North 2022_Apps without APIs
apidays LIVE Helsinki & North 2022_Apps without APIsapidays LIVE Helsinki & North 2022_Apps without APIs
apidays LIVE Helsinki & North 2022_Apps without APIs
 
Wy583 g formation-working-with-ibm-websphere-application-server-v8-5-5-2-libe...
Wy583 g formation-working-with-ibm-websphere-application-server-v8-5-5-2-libe...Wy583 g formation-working-with-ibm-websphere-application-server-v8-5-5-2-libe...
Wy583 g formation-working-with-ibm-websphere-application-server-v8-5-5-2-libe...
 
Java in the database–is it really useful? Solving impossible Big Data challenges
Java in the database–is it really useful? Solving impossible Big Data challengesJava in the database–is it really useful? Solving impossible Big Data challenges
Java in the database–is it really useful? Solving impossible Big Data challenges
 
CurriculumVitae_VictorXavier_english
CurriculumVitae_VictorXavier_englishCurriculumVitae_VictorXavier_english
CurriculumVitae_VictorXavier_english
 
Lenovo xClarity - Presentacion - ITALTEL.pptx
Lenovo xClarity - Presentacion - ITALTEL.pptxLenovo xClarity - Presentacion - ITALTEL.pptx
Lenovo xClarity - Presentacion - ITALTEL.pptx
 
CV_RishabhDixit
CV_RishabhDixitCV_RishabhDixit
CV_RishabhDixit
 
Cloud Portal - Lesson 5. Advanced tasks
Cloud Portal - Lesson 5. Advanced tasksCloud Portal - Lesson 5. Advanced tasks
Cloud Portal - Lesson 5. Advanced tasks
 
Accessing Your Existing SAP NetWeaver Portal on Mobile Device
Accessing Your Existing SAP NetWeaver Portal on Mobile DeviceAccessing Your Existing SAP NetWeaver Portal on Mobile Device
Accessing Your Existing SAP NetWeaver Portal on Mobile Device
 

More from Rafael Ferreira da Silva

Towards an Infrastructure for Enabling Systematic Development and Research of...
Towards an Infrastructure for Enabling Systematic Development and Research of...Towards an Infrastructure for Enabling Systematic Development and Research of...
Towards an Infrastructure for Enabling Systematic Development and Research of...
Rafael Ferreira da Silva
 
WorkflowHub: Community Framework for Enabling Scientific Workflow Research a...
WorkflowHub: Community Framework for Enabling  Scientific Workflow Research a...WorkflowHub: Community Framework for Enabling  Scientific Workflow Research a...
WorkflowHub: Community Framework for Enabling Scientific Workflow Research a...
Rafael Ferreira da Silva
 
Accurately Simulating Energy Consumption of I/O-intensive Scientific Workflows
Accurately Simulating Energy Consumption of I/O-intensive Scientific WorkflowsAccurately Simulating Energy Consumption of I/O-intensive Scientific Workflows
Accurately Simulating Energy Consumption of I/O-intensive Scientific Workflows
Rafael Ferreira da Silva
 
Running Accurate, Scalable, and Reproducible Simulations of Distributed Syste...
Running Accurate, Scalable, and Reproducible Simulations of Distributed Syste...Running Accurate, Scalable, and Reproducible Simulations of Distributed Syste...
Running Accurate, Scalable, and Reproducible Simulations of Distributed Syste...
Rafael Ferreira da Silva
 
Using Simple PID Controllers to Prevent and Mitigate Faults in Scientific Wor...
Using Simple PID Controllers to Prevent and Mitigate Faults in Scientific Wor...Using Simple PID Controllers to Prevent and Mitigate Faults in Scientific Wor...
Using Simple PID Controllers to Prevent and Mitigate Faults in Scientific Wor...
Rafael Ferreira da Silva
 
Analysis of User Submission Behavior on HPC and HTC
Analysis of User Submission Behavior on HPC and HTCAnalysis of User Submission Behavior on HPC and HTC
Analysis of User Submission Behavior on HPC and HTC
Rafael Ferreira da Silva
 
Performance Analysis of an I/O-Intensive Workflow executing on Google Cloud a...
Performance Analysis of an I/O-Intensive Workflow executing on Google Cloud a...Performance Analysis of an I/O-Intensive Workflow executing on Google Cloud a...
Performance Analysis of an I/O-Intensive Workflow executing on Google Cloud a...
Rafael Ferreira da Silva
 
Task Resource Consumption Prediction for Scientific Applications and Workflows
Task Resource Consumption Prediction for Scientific Applications and WorkflowsTask Resource Consumption Prediction for Scientific Applications and Workflows
Task Resource Consumption Prediction for Scientific Applications and Workflows
Rafael Ferreira da Silva
 
A Unified Approach for Modeling and Optimization of Energy, Makespan and Reli...
A Unified Approach for Modeling and Optimization of Energy, Makespan and Reli...A Unified Approach for Modeling and Optimization of Energy, Makespan and Reli...
A Unified Approach for Modeling and Optimization of Energy, Makespan and Reli...
Rafael Ferreira da Silva
 

More from Rafael Ferreira da Silva (20)

Towards an Infrastructure for Enabling Systematic Development and Research of...
Towards an Infrastructure for Enabling Systematic Development and Research of...Towards an Infrastructure for Enabling Systematic Development and Research of...
Towards an Infrastructure for Enabling Systematic Development and Research of...
 
Modeling and Simulation of Parallel and Distributed Computing Systems with Si...
Modeling and Simulation of Parallel and Distributed Computing Systems with Si...Modeling and Simulation of Parallel and Distributed Computing Systems with Si...
Modeling and Simulation of Parallel and Distributed Computing Systems with Si...
 
Good Practices for Developing Scientific Software Frameworks: The WRENCH fram...
Good Practices for Developing Scientific Software Frameworks: The WRENCH fram...Good Practices for Developing Scientific Software Frameworks: The WRENCH fram...
Good Practices for Developing Scientific Software Frameworks: The WRENCH fram...
 
WorkflowHub: Community Framework for Enabling Scientific Workflow Research a...
WorkflowHub: Community Framework for Enabling  Scientific Workflow Research a...WorkflowHub: Community Framework for Enabling  Scientific Workflow Research a...
WorkflowHub: Community Framework for Enabling Scientific Workflow Research a...
 
Bridging Concepts and Practice in eScience via Simulation-driven Engineering
Bridging Concepts and Practice in eScience via Simulation-driven EngineeringBridging Concepts and Practice in eScience via Simulation-driven Engineering
Bridging Concepts and Practice in eScience via Simulation-driven Engineering
 
Accurately Simulating Energy Consumption of I/O-intensive Scientific Workflows
Accurately Simulating Energy Consumption of I/O-intensive Scientific WorkflowsAccurately Simulating Energy Consumption of I/O-intensive Scientific Workflows
Accurately Simulating Energy Consumption of I/O-intensive Scientific Workflows
 
Running Accurate, Scalable, and Reproducible Simulations of Distributed Syste...
Running Accurate, Scalable, and Reproducible Simulations of Distributed Syste...Running Accurate, Scalable, and Reproducible Simulations of Distributed Syste...
Running Accurate, Scalable, and Reproducible Simulations of Distributed Syste...
 
WRENCH: Workflow Management System Simulation Workbench
WRENCH: Workflow Management System Simulation WorkbenchWRENCH: Workflow Management System Simulation Workbench
WRENCH: Workflow Management System Simulation Workbench
 
The Interplay of Workflow Execution and Resource Provisioning
The Interplay of Workflow Execution and Resource ProvisioningThe Interplay of Workflow Execution and Resource Provisioning
The Interplay of Workflow Execution and Resource Provisioning
 
On the Use of Burst Buffers for Accelerating Data-Intensive Scientific Workflows
On the Use of Burst Buffers for Accelerating Data-Intensive Scientific WorkflowsOn the Use of Burst Buffers for Accelerating Data-Intensive Scientific Workflows
On the Use of Burst Buffers for Accelerating Data-Intensive Scientific Workflows
 
Using Simple PID Controllers to Prevent and Mitigate Faults in Scientific Wor...
Using Simple PID Controllers to Prevent and Mitigate Faults in Scientific Wor...Using Simple PID Controllers to Prevent and Mitigate Faults in Scientific Wor...
Using Simple PID Controllers to Prevent and Mitigate Faults in Scientific Wor...
 
Automating Environmental Computing Applications with Scientific Workflows
Automating Environmental Computing Applications with Scientific WorkflowsAutomating Environmental Computing Applications with Scientific Workflows
Automating Environmental Computing Applications with Scientific Workflows
 
Analysis of User Submission Behavior on HPC and HTC
Analysis of User Submission Behavior on HPC and HTCAnalysis of User Submission Behavior on HPC and HTC
Analysis of User Submission Behavior on HPC and HTC
 
Automating Real-time Seismic Analysis Through Streaming and High Throughput W...
Automating Real-time Seismic Analysis Through Streaming and High Throughput W...Automating Real-time Seismic Analysis Through Streaming and High Throughput W...
Automating Real-time Seismic Analysis Through Streaming and High Throughput W...
 
Performance Analysis of an I/O-Intensive Workflow executing on Google Cloud a...
Performance Analysis of an I/O-Intensive Workflow executing on Google Cloud a...Performance Analysis of an I/O-Intensive Workflow executing on Google Cloud a...
Performance Analysis of an I/O-Intensive Workflow executing on Google Cloud a...
 
Pegasus - automate, recover, and debug scientific computations
Pegasus - automate, recover, and debug scientific computationsPegasus - automate, recover, and debug scientific computations
Pegasus - automate, recover, and debug scientific computations
 
Task Resource Consumption Prediction for Scientific Applications and Workflows
Task Resource Consumption Prediction for Scientific Applications and WorkflowsTask Resource Consumption Prediction for Scientific Applications and Workflows
Task Resource Consumption Prediction for Scientific Applications and Workflows
 
Characterizing a High Throughput Computing Workload: The Compact Muon Solenoi...
Characterizing a High Throughput Computing Workload: The Compact Muon Solenoi...Characterizing a High Throughput Computing Workload: The Compact Muon Solenoi...
Characterizing a High Throughput Computing Workload: The Compact Muon Solenoi...
 
Experiments with Complex Scientific Applications on Hybrid Cloud Infrastructures
Experiments with Complex Scientific Applications on Hybrid Cloud InfrastructuresExperiments with Complex Scientific Applications on Hybrid Cloud Infrastructures
Experiments with Complex Scientific Applications on Hybrid Cloud Infrastructures
 
A Unified Approach for Modeling and Optimization of Energy, Makespan and Reli...
A Unified Approach for Modeling and Optimization of Energy, Makespan and Reli...A Unified Approach for Modeling and Optimization of Energy, Makespan and Reli...
A Unified Approach for Modeling and Optimization of Energy, Makespan and Reli...
 

Recently uploaded

+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
WSO2
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 

Recently uploaded (20)

Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Vector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxVector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptx
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Six Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal OntologySix Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal Ontology
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 

VIP: design and implementation of the portal and execution service

  • 1. VIP: design and implementation of the portal and execution service Rafael FERREIRA DA SILVA CNRS, CREATIS, INSA-Lyon, Université Lyon 1, INSERM For the VIP Project Consortium: VIP Launching Workshop Lyon, December 14th 2012 1 Rafael Ferreira da Silva – rafael.silva@creatis.insa-lyon.fr
  • 2. Outline   Introduction   VIP Architecture   Web Portal   Data Transfers   Workflow Execution   Workflow Self-Healing   Conclusions 2 http://vip.creatis.insa-lyon.fr Rafael Ferreira da Silva – rafael.silva@creatis.insa-lyon.fr
  • 3. Platform goals   Multi-modality medical image simulators   MRI, US, CT and PET   Objectives   Workflow execution on EGI   Access to storage resources   High–level interface for non-experts   No IT required   Software as a Service (SaaS)   No client software instalation   New features automatically available   Consolidated support and troubleshooting 3 http://vip.creatis.insa-lyon.fr Rafael Ferreira da Silva – rafael.silva@creatis.insa-lyon.fr
  • 4. VIP – Architecture Object Model Repository Data Management Simulated Data GASW Repository Job Generation Workflow Engine Job Scheduler 4 http://vip.creatis.insa-lyon.fr Rafael Ferreira da Silva – rafael.silva@creatis.insa-lyon.fr
  • 5. VIP – Web Portal   User Front-End   Openly-accessible web portal   Access point to models and simulators.   User-friendly interface which assists users in using image simulators.   Modular code design (GWT + SmartGWT) 5 http://vip.creatis.insa-lyon.fr Rafael Ferreira da Silva – rafael.silva@creatis.insa-lyon.fr
  • 6. Users/Apps Management Users Groups Application Classes Applications 6 http://vip.creatis.insa-lyon.fr Rafael Ferreira da Silva – rafael.silva@creatis.insa-lyon.fr
  • 7. VIP – GRIDA   Grid Data Management Agent   Handles file catalog and transfer operations by pooling   Performs data replication 7 http://vip.creatis.insa-lyon.fr Rafael Ferreira da Silva – rafael.silva@creatis.insa-lyon.fr
  • 8. Data Transfers Management User Machine VIP Server Grid Storage User uploads file GRIDA Uploads to VIP Server file to the grid (replication) User downloads GRIDA Downloads the file file to VIP Server 8 http://vip.creatis.insa-lyon.fr Rafael Ferreira da Silva – rafael.silva@creatis.insa-lyon.fr
  • 9. VIP – Data Repositories   Easily integration of third-party libraries   NeuSemStore-Provenance for simulated data   NeuSemStore-Simulated-Objects for the model catalog   Encapsulation of objects as GWT serialized beans GWT Client GWT Server Databases More details on the presentation of B. Gibaud RPC call NeuSemStore GWT Bean 9 http://vip.creatis.insa-lyon.fr Rafael Ferreira da Silva – rafael.silva@creatis.insa-lyon.fr
  • 10. VIP – Workflow Engine   MOTEUR workflow engine   Applications described on formal language http://modalis.i3s.unice.fr/softwares/moteur   Generic Application Service Wrapper (GASW)   Bash scripts wrapped in grid jobs   Self-healing of workflow execution 10 http://vip.creatis.insa-lyon.fr Rafael Ferreira da Silva – rafael.silva@creatis.insa-lyon.fr
  • 11. VIP – Architecture   Workload Management System with Pilot Jobs   Distributed Infrastructure with Remote Agent Control (DIRAC) [CPPM-LHCb] http://diracgrid.org   Hosted by CC-IN2P3 French National Instance   Data Storage and Computing Back-End   EGI infrastructure, Biomed VO http://www.egi.eu 11 http://vip.creatis.insa-lyon.fr Rafael Ferreira da Silva – rafael.silva@creatis.insa-lyon.fr
  • 12. Workflow Execution 2. User launches 3. MOTEUR generates a simulation invocations 4. GASW generates grid jobs 1. Input data 11. Download results upload 5. Jobs are submitted 8. Inputs download to DIRAC 6. Pilot jobs are submitted to EGI 9. Execution 10. Results upload 7. Pilot jobs fetch grid jobs 12 http://vip.creatis.insa-lyon.fr Rafael Ferreira da Silva – rafael.silva@creatis.insa-lyon.fr
  • 13. Outline   Introduction   VIP Architecture   Web Portal   Data Transfers   Workflow Execution   Workflow Self-Healing   Conclusions 13 http://vip.creatis.insa-lyon.fr Rafael Ferreira da Silva – rafael.silva@creatis.insa-lyon.fr
  • 14. Workflow Self-Healing   Problem: costly manual operations   Rescheduling tasks, restarting services, killing misbehaving experiments or replicating data files   Objective: automated platform administration   Autonomous detection of operational incidents   Perform appropriate set of actions   Assumptions: online and non-clairvoyant   Only partial information available   Decisions must be fast   Production conditions, no user activity and workloads prediction 14 http://vip.creatis.insa-lyon.fr Rafael Ferreira da Silva – rafael.silva@creatis.insa-lyon.fr
  • 15. General MAPE-K loop event Incident 1 Incident 2 Incident 3 (job completion and failures) degree η = 0.8 degree η = 0.4 degree η = 0.1 or timeout level level level level level level level level level 1 2 3 1 2 3 1 2 3 Monitoring Analysis Monitoring data x2 ηi = n Set of Actions ∑ j =1 ηj Execution Knowledge Roulette wheel selection € Planning Rule Confidence (ρ) ρxη Selected 2 1 0.8 0.32 Selected Incident 2 31 0.2 0.02 Incident 1 1  1 1.0 0.80 Roulette wheel selection Association rules based on association rules for incident 1 15 http://vip.creatis.insa-lyon.fr Rafael Ferreira da Silva – rafael.silva@creatis.insa-lyon.fr
  • 16. Incident: Activity Blocked   An invocation is late compared to the others Invocations completion rate for a simulation Job flow for a simulation   Possible causes   Longer waiting times   Lost tasks (e.g. killed by site due to quota violation)   Resources with poor performance 16 http://vip.creatis.insa-lyon.fr Rafael Ferreira da Silva – rafael.silva@creatis.insa-lyon.fr
  • 17. Activity blocked: degree   Degree computed from all completed jobs of the activity   Job phases: setup  inputs download  execution  outputs upload   Assumption: bag-of-tasks (all jobs have equal durations)   Median-based estimation: Median duration Estimated job Real job of jobs phases duration duration 50s 42s 42s completed 250s 300s 300s 400s 400s* 20s current 15s 15s ? Mi = 715s Ei = 757s *: max(400s, 20s) = 400s   Incident degree: job performance w.r.t median Ei d= ∈ [0,1] Mi + Ei 17 http://vip.creatis.insa-lyon.fr Rafael Ferreira da Silva – rafael.silva@creatis.insa-lyon.fr €
  • 18. Activity blocked: levels and actions   Levels: identified from the platform logs τ1 Level 1 Level 2 (no actions) € action: replicate jobs d Replication process for one task   Actions   Job replication   Cancel replicas with bad performance   Replicate only if all active replicas are running 18 http://vip.creatis.insa-lyon.fr Rafael Ferreira da Silva – rafael.silva@creatis.insa-lyon.fr
  • 19. Experimental results   Goal: Self-Healing vs No-Healing   Cope with recoverable errors   Metrics   Makespan of the activity execution   Resource waste speeds up FIELD-II execution up to 4 (CPU + data) self −healing w= −1 (CPU + data) no−healing Repetition w 1 –0.10 For w < 0: self-healing consumed less resources 2 –0.15 3 –0.09 For w > 0: self-healing wasted resources 4 0.05 € 5 –0.26 Self-Healing process reduced resource consumption up to 26% when compared to the No-Healing execution R. Ferreira da Silva, T. Glatard, F. Desprez, Self-healing of operational workflow incidents on distributed computing infrastructures, IEEE/ACM International 19 Symposium on Cluster, Cloud and Grid Computing (CCGrid), Ottawa, Canada, 2012. Rafael Ferreira da Silva – rafael.silva@creatis.insa-lyon.fr
  • 20. VIP – Facts   321 registered users, from 38 countries   Most used portal certificate in EGI (August 2012) https://wiki.egi.eu/wiki/EGI_robot_certificate_users   Consumed 379 CPU years from January 2011 to August 2012 http://accounting.egi.eu   1/10 of the total activity of the biomed international VO. One of the most active users 20 http://vip.creatis.insa-lyon.fr Rafael Ferreira da Silva – rafael.silva@creatis.insa-lyon.fr
  • 21. VIP – Facts Applications 1155 executed simulations during the last year (~3/day) Users Repartition of portal users on EGI (August 2012) (source: https://wiki.egi.eu/wiki/EGI_robot_certificate_users) Repartition of application executions in VIP (Nov 2011 – Oct 2012) 21 http://vip.creatis.insa-lyon.fr Rafael Ferreira da Silva – rafael.silva@creatis.insa-lyon.fr
  • 22. Concluding remarks   VIP is an openly-accessible web portal for multi-modality medical image simulators   MRI, US, CT and PET and other tools   Workflow execution on EGI   Access to storage resources   High–level interface for non-experts   No IT required (Software as a Service)   Facts   321 registered users from 38 countries   Consumed about 400 CPU years / year   Limits and perspectives   Fair resource allocation among workflows   User support   Heavy data transfers 22 http://vip.creatis.insa-lyon.fr Rafael Ferreira da Silva – rafael.silva@creatis.insa-lyon.fr
  • 23. VIP: design and implementation of the portal and execution services Thank you for your attention. Questions? http://vip.creatis.insa-lyon.fr! Rafael FERREIRA DA SILVA CNRS, CREATIS, INSA-Lyon, Université Lyon 1, INSERM For the VIP Project Consortium: VIP Launching Workshop Lyon, December 14th 2012 Rafael Ferreira da Silva – rafael.silva@creatis.insa-lyon.fr