SlideShare a Scribd company logo
1 of 21
Download to read offline
A science-gateway workload archive
    to study pilot jobs, user activity, bag of tasks,
       task sub-steps and workflow executions

                        Rafael FERREIRA DA SILVA and Tristan GLATARD
                          University of Lyon, CNRS, INSERM, CREATIS
                                     Villeurbanne, France




              CoreGRID/ERCIM Workshop on Grids, Clouds and P2P Computing
                                  August 27th 2012



1
                                                            Rafael Ferreira da Silva – rafael.silva@creatis.insa-lyon.fr
Context: Workload Archives

                                                                                     Assumptions validation



    exit_code                       task_status




                                                              useful for
         submit_time                                    ime
                                                t ion_t                               Computational activity

                   site_name              execu                                            modeling


                   inpu
                        t   _file
                                                id
                                      workflow_
          activity_name                                                               Methods evaluation
                                                                                  (simulation or experimental)


      Information produced by grid workflow executions




2
                                                                   Rafael Ferreira da Silva – rafael.silva@creatis.insa-lyon.fr
Science-gateway architecture

                   0. Login                               3. Launch workflow
              1. Send input data
    User
                                                                                     Workflow Engine
                                         Web Portal


                          2. Transfer
                                                                                                         4. Generate and
                          input files
                                                                                                         submit task



       Storage
      Element


           8. Get files                                     7. Get task
           9. Execute
           10. Upload results                                                             Pilot Manager
                                   Computing site


                                            6. Schedule                              5. Submit
                                             pilot jobs                              pilot jobs
                                                          Meta-Scheduler

3
                                                                     Rafael Ferreira da Silva – rafael.silva@creatis.insa-lyon.fr
State of the Art

                                                                                                   Grid Workload Archives

    exit_code                        task_status

          submit_time                                    time
                                                   tion_
                                             execu
                        site_name
                    inpu
                         t   _file
                                                 d
                                       workflow_i               Information gathered
           activity_name
                                                                at infrastructure-level
                                                                                                                tasks




                Lack of critical information:
                •  Dependencies among tasks                                               •  Parallel Workloads Archive
                                                                                           (http://www.cs.huji.ac.il/labs/parallel/workload/)
                •  Task sub-steps
                                                                                          •  Grid Workloads Archive
                •  Application-level scheduling artifacts                                  (http://gwa.ewi.tudelft.nl/pmwiki/)
                •  User




4
                                                                                             Rafael Ferreira da Silva – rafael.silva@creatis.insa-lyon.fr
At infrastructure-level

                   0. Login                               3. Launch workflow
              1. Send input data
    User
                                                                                     Workflow Engine
                                         Web Portal


                          2. Transfer
                                                                                                         4. Generate and
                          input files
                                                                                                         submit task



       Storage
      Element


           8. Get files                                     7. Get task
           9. Execute
           10. Upload results                                                             Pilot Manager
                                   Computing site


                                            6. Schedule                              5. Submit
                                             pilot jobs                              pilot jobs
                                                          Meta-Scheduler

5
                                                                     Rafael Ferreira da Silva – rafael.silva@creatis.insa-lyon.fr
Outline
  A science-gateway workload archive
  Case studies
        Pilot Jobs
        Accounting
        Task analysis
        Bag of tasks
        Workflows

  Conclusions




6
                                    Rafael Ferreira da Silva – rafael.silva@creatis.insa-lyon.fr
Our approach

                                                                                             Science-Gateway Workload Archive
    exit_code                        task_status

          submit_time                                    time
                                                   tion_
                                             execu
                        site_name
                    inpu
                         t   _file
                                                 d
                                                                  Information gathered
                                       workflow_i
           activity_name                                        at science-gateway level




                Advantages:                                                                            workflow executions
                •  Fine-grained information about tasks
                •  Dependencies among tasks
                •  Workflow characterization
                •  Accounting




7
                                                                                           Rafael Ferreira da Silva – rafael.silva@creatis.insa-lyon.fr
At science-gateway level

                   0. Login                               3. Launch workflow
              1. Send input data
    User
                                                                                     Workflow Engine
                                         Web Portal


                          2. Transfer
                                                                                                         4. Generate and
                          input files
                                                                                                         submit task



       Storage
      Element


           8. Get files                                     7. Get task
           9. Execute
           10. Upload results                                                             Pilot Manager
                                   Computing site


                                            6. Schedule                              5. Submit
                                             pilot jobs                              pilot jobs
                                                          Meta-Scheduler

8
                                                                     Rafael Ferreira da Silva – rafael.silva@creatis.insa-lyon.fr
Virtual Imaging Platform
  Virtual Imaging Platform (VIP)
      Medical imaging science-gateway
      Grid of 129 sites (EGI – http://www.egi.eu)
                                                                             Applications
  Significant usage
      Registered users: 244 from 26 countries
      Applications: 18                                                      File transfer

      Consumed 32 CPU years in 2011                               VIP – http://vip.creatis.insa-lyon.fr




                                     VIP usage in 2011: CPU consumption
                                     of VIP and related platforms on EGI.

9
                                                         Rafael Ferreira da Silva – rafael.silva@creatis.insa-lyon.fr
SGWA
  Science Gateway Workload Archive (SGWA)
       Archive is extracted from VIP




                                  Science-gateway archive model


          Task, Site and Workflow Execution               File and Pilot Job extracted from
          acquired from databases populated                  the parsing of task standard
           by the workflow engine at runtime                     output and error files


10
                                                             Rafael Ferreira da Silva – rafael.silva@creatis.insa-lyon.fr
Workload for Case Studies
  Based on the workload of VIP
        January 2011 to April 2012

                                                                            338,989 completed
                                                                            138,480 error
                                                                           105,488 aborted
                                                                             15,576 aborted replicas
                                                                             48,293 stalled
                                                                             34,162 queued
     112 users     2,941 workflow executions    680,988 tasks




                                               339,545 pilot jobs




11
                                                       Rafael Ferreira da Silva – rafael.silva@creatis.insa-lyon.fr
Pilot Jobs
  A single pilot can wrap several
     tasks and users                                               282331
                                                          250000
                                                          200000

  At infrastructure-level                                150000




                                              Frequency
                                                          100000

       Assimilates pilot jobs to tasks and               50000
                                                                            28121

        users                                                                           11885
                                                                                                      6721
                                                                                                             10487



       Valid for only 62% of the tasks                       0
                                                                     1       2            3            4      5
                                                                                    Tasks per pilot
       Valid for 95% of user-task
       associations
                                                                   323214
                                                          300000
                                                          250000
                                                          200000
                                                          150000




                                              Frequency
  At science-gateway level                               100000

                                                          50000


       Users and tasks are correctly                                       15178



       associated to pilots
                                                                                         1079
                                                                                                       70     4
                                                              0
                                                                     1       2            3            4      5
                                                                                    Users per pilot




12
                                                Rafael Ferreira da Silva – rafael.silva@creatis.insa-lyon.fr
Accounting: Users
  Authentications based on login and password are mapped to
     X.509 robot certificates

  At infrastructure-level
       All VIP users are reported as a single user

  At science-gateway level
       Maps task executions to VIP users


                             40


                             30
                     Users




                                                                                                      EGI

                             20                                                                       VIP



                             10


                              0
                                  1   2    3   4   5   6   7    8   9   10 11 12 13 14 15 16
                                                               Months

                                          Number of reported EGI and VIP users
13
                                                                                   Rafael Ferreira da Silva – rafael.silva@creatis.insa-lyon.fr
Accounting: CPU and
                                       Wall-clock Time
  Huge discrepancy of values                              6e+05
                                                                     VIP jobs

       Pilot jobs do not register to




                                          Number of jobs
                                                           5e+05     EGI jobs



        the pilot system
                                                           4e+05

                                                           3e+05


       Absence of workload                                2e+05

                                                           1e+05


       Outputs unretrievable                                                     5              10   15
                                                                                         Month
       Pilot setup time                                           Number of submitted pilot jobs
                                                                         by EGI and VIP
       Lost tasks (a.k.a. stalled)
                                                           150

                                                                   VIP CPU time

                                                                   VIP Wall−clock time
                                                           100

  Undetectable at infrastructure-level                            EGI CPU time




                                          Years
                                                                   EGI Wall−clock time

                                                            50




                                                                                  5              10    15
                                                                                         Month
                                                                   Consumed CPU and wall-clock time
                                                                           by EGI and VIP

14
                                          Rafael Ferreira da Silva – rafael.silva@creatis.insa-lyon.fr
Task Analysis
  At infrastructure-level
       Limited to task exit codes                                                               55165
                                                                                                                  50925
                                                                                       50000                                    48293




                                                                     Number of tasks
                                                                                       40000

                                                                                       30000

  At science-gateway level                                                            20000                                                  19463



         Fine-grained information
                                                                                       10000
                                                                                                                                                        1123
                                                                                          0

         Steps in task life                                                                   application        input         stalled
                                                                                                                          Error causes
                                                                                                                                              output    folder




         Error causes
         Replicas per task                                                            1200    1191
                                                                                                                                                 1285


                                                                                       1000




                                                                     Frequency
                  1.0                                                                  800

                            download                                                   600
                  0.8
                            execution                                                  400                   401
                                                                                                                          347         322
                  0.6       upload
            CDF




                                                                                       200
                  0.4                                                                                                                                      6
                                                                                          0
                                                                                                 1            2           3               4        5      +5
                  0.2                                                                                                 Replicas per task


                        1                  100             10000
                                                 Time(s)
15                                   Different steps in task life
                                                                         Rafael Ferreira da Silva – rafael.silva@creatis.insa-lyon.fr
Bag of Tasks:
                                                                    at Infrastructure level
  Evaluation of the accuracy of Iosup et al.[8] method to detect
     bag of tasks (BoT)
                                                                                   Task 1


                                                                                            Task 2
  Two successively submitted
     tasks are in the same BoT if                                              Δ1,2         Δ2,3             Task 3
     the time interval between
     submission times is lower                                                t1       t2              t3                               time

     or equal to Δ.                                                                Δ
                                                                                            Δ


                           BoT 1                                                       BoT 2

                           Task 1              Δ1,2 ≤Δ                                 Task 3               Δ2,3 >Δ
                                               |t1 – t2|≤Δ                                                  |t2 – t3|>Δ

                           Task 2



16   [8] Iosup, A., Jan, M., Sonmez, O., Epema, D.: The Characteristics and
     performance of groups of jobs in grids. In: Euro-Par. (2007) 382-393               Rafael Ferreira da Silva – rafael.silva@creatis.insa-lyon.fr
Bag of Tasks: Size and Duration
              Infrastructure vs science-gateway
  90% of Batch BoTs size ranges              0.8



     from 2 to 10 while it represents         0.6




                                        CDF
     50% of Real Batch
                                              0.4

                                              0.2                                                    Real Batch

                                                                                                     Batch

                                              0.0
                                                             200        400       600        800             1000
                                                                    Size (number of tasks)




                                              0.8

  Non-Batch duration is                      0.6

     overestimated up to 400%



                                        CDF
                                                                                              Real Batch
                                              0.4
                                                                                              Real Non−Batch

                                              0.2                                             Batch

                                                                                              Non−Batch

                                              0.0
                                                            10000      20000     30000       40000           50000
                                                                         Duration (s)


                                                     Real Batch = ground-truth BoT
                                                     Real Non-Batch = ground-truth non-BoT
                                                     Batch = Iosup et al. BoT
                                                     Non-Batch = Iosup et al. non-BoT

17
                                                    Rafael Ferreira da Silva – rafael.silva@creatis.insa-lyon.fr
Bag of Tasks: Inter-arrival Time
                     and Consumed CPU Time
  Batch and Non-Batch inter-arrival         0.8



     times are underestimated by             0.6




                                       CDF
     about 30%                               0.4
                                                                                                Real Batch

                                                                                                Real Non−Batch

                                             0.2                                                Batch

                                                                                                Non−Batch

                                             0.0
                                                            2000      4000           6000      8000          10000
                                                                   Inter−Arrival Time (s)



                                             0.8

  CPU times are underestimated of           0.6

     25% for Non-Batch and of about




                                       CDF
     20% for Batch
                                                                                                Real Batch
                                             0.4
                                                                                                Real Non−Batch

                                             0.2                                                Batch

                                                                                                Non−Batch



                                                   0      5000     10000     15000     20000    25000        30000
                                                              Consumed CPUTime (KCPUs)


                                                    Real Batch = ground-truth BoT
                                                    Real Non-Batch = ground-truth non-BoT
                                                    Batch = Iosup et al. BoT
                                                    Non-Batch = Iosup et al. non-BoT

18
                                                   Rafael Ferreira da Silva – rafael.silva@creatis.insa-lyon.fr
Workflow Characterization
  At infrastructure-level                                                             Small (52%): ≤ 100 tasks
                                                                                       Medium (31%): between 101 and 500 tasks
       Hardly possible                                                                Large (17%): > 500 tasks


  At science-gateway level
                1.0                                                   1.0


                0.8                                                   0.8


                0.6                                                   0.6
          CDF




                                                                CDF
                                                                                                                            small
                                                                      0.4
                0.4                                                                                                         medium

                                                                      0.2                                                   large
                0.2                                                                                                         total



                       2000          4000       6000   8000                     1e+03            1e+05          1e+07          1e+09
                              Size (number of tasks)                                             Makespan (s)



                1.0                                                   1.0

                0.8                                                   0.8

                0.6                                                   0.6
          CDF




                                                                CDF

                                                       small                                                                small
                0.4                                                   0.4
                                                       medium                                                               medium

                0.2                                    large          0.2                                                   large

                                                       total                                                                total

                                                                      0.0
                       200            400        600    800                 0      1         2           3      4       5           6
                                    Speedup                                                  Critical path length

19
                                                                                        Rafael Ferreira da Silva – rafael.silva@creatis.insa-lyon.fr
Conclusions
  Science-gateway model of workload archive
       Illustration by using traces of the VIP from 2011/2012

  Added value when compared to infrastructure-level traces
         Exactly identify tasks and users
         Distinguishes additional workload artifacts from real workload
         Fine-grained information about tasks
         Ground-truth of bag of tasks
         Workflow characterization


  Traces are available to the community in the
     Grid Observatory
       http://www.grid-observatory.org

20
                                                 Rafael Ferreira da Silva – rafael.silva@creatis.insa-lyon.fr
A science-gateway workload archive
 to study pilot jobs, user activity, bag of tasks,
    task sub-steps and workflow executions
            Thank you for your attention.
                    Questions?

                             ACKNOWLEDGMENTS
                      VIP users and project members
           French National Agency for Research (ANR-09-COSI-03)
                       European Grid Initiative (EGI)
                               France-Grilles


             Rafael FERREIRA DA SILVA and Tristan GLATARD
                  University of Lyon, CNRS, INSERM, CREATIS
                               Villeurbanne, France

21
                                                  Rafael Ferreira da Silva – rafael.silva@creatis.insa-lyon.fr

More Related Content

Similar to A science-gateway workload archive to study pilot jobs, user activity, bag of tasks, task sub-steps, and workflow executions

Breaking Parser Logic: Take Your Path Normalization Off and Pop 0days Out!
Breaking Parser Logic: Take Your Path Normalization Off and Pop 0days Out!Breaking Parser Logic: Take Your Path Normalization Off and Pop 0days Out!
Breaking Parser Logic: Take Your Path Normalization Off and Pop 0days Out!Priyanka Aash
 
Spring Batch Behind the Scenes
Spring Batch Behind the ScenesSpring Batch Behind the Scenes
Spring Batch Behind the ScenesJoshua Long
 
Ogce Workflow Suite Tg09
Ogce Workflow Suite Tg09Ogce Workflow Suite Tg09
Ogce Workflow Suite Tg09smarru
 
EAS Data Flow lessons learnt
EAS Data Flow lessons learntEAS Data Flow lessons learnt
EAS Data Flow lessons learnteuc-dm-test
 
DBI-Assisted Android Application Reverse Engineering
DBI-Assisted Android Application Reverse EngineeringDBI-Assisted Android Application Reverse Engineering
DBI-Assisted Android Application Reverse EngineeringSahil Dhar
 
Spring Batch Performance Tuning
Spring Batch Performance TuningSpring Batch Performance Tuning
Spring Batch Performance TuningGunnar Hillert
 
Josh Patterson, Advisor, Skymind – Deep learning for Industry at MLconf ATL 2016
Josh Patterson, Advisor, Skymind – Deep learning for Industry at MLconf ATL 2016Josh Patterson, Advisor, Skymind – Deep learning for Industry at MLconf ATL 2016
Josh Patterson, Advisor, Skymind – Deep learning for Industry at MLconf ATL 2016MLconf
 
Deep Learning: DL4J and DataVec
Deep Learning: DL4J and DataVecDeep Learning: DL4J and DataVec
Deep Learning: DL4J and DataVecJosh Patterson
 
Celery: The Distributed Task Queue
Celery: The Distributed Task QueueCelery: The Distributed Task Queue
Celery: The Distributed Task QueueRichard Leland
 
Lean Php Presentation
Lean Php PresentationLean Php Presentation
Lean Php PresentationAlan Pinstein
 
Springboard deepdive
Springboard deepdiveSpringboard deepdive
Springboard deepdivepcave
 
.NET Project Manual
.NET Project Manual.NET Project Manual
.NET Project Manualcormacsharpe
 
Introduction to data flow management using apache nifi
Introduction to data flow management using apache nifiIntroduction to data flow management using apache nifi
Introduction to data flow management using apache nifiAnshuman Ghosh
 
Vinay Kumar [InfluxData] | InfluxDB Tasks Demonstration | InfluxDays 2022
Vinay Kumar [InfluxData] | InfluxDB Tasks Demonstration | InfluxDays 2022Vinay Kumar [InfluxData] | InfluxDB Tasks Demonstration | InfluxDays 2022
Vinay Kumar [InfluxData] | InfluxDB Tasks Demonstration | InfluxDays 2022InfluxData
 
(ATS3-APP09) Integrating Symyx Notebook into an Enterprise Management System
(ATS3-APP09) Integrating Symyx Notebook into an Enterprise Management System(ATS3-APP09) Integrating Symyx Notebook into an Enterprise Management System
(ATS3-APP09) Integrating Symyx Notebook into an Enterprise Management SystemBIOVIA
 
Spring boot
Spring bootSpring boot
Spring bootsdeeg
 
WORKS 11 Presentation
WORKS 11 PresentationWORKS 11 Presentation
WORKS 11 Presentationdgarijo
 
FireWorks workflow software
FireWorks workflow softwareFireWorks workflow software
FireWorks workflow softwareAnubhav Jain
 
Oracle GoldenGate - Herve Schweitzer, dbi services - Hilton Basel 5/2011
Oracle GoldenGate - Herve Schweitzer, dbi services - Hilton Basel 5/2011Oracle GoldenGate - Herve Schweitzer, dbi services - Hilton Basel 5/2011
Oracle GoldenGate - Herve Schweitzer, dbi services - Hilton Basel 5/2011dbi services
 
Cloud infrastructure as code
Cloud infrastructure as codeCloud infrastructure as code
Cloud infrastructure as codeTomasz Cholewa
 

Similar to A science-gateway workload archive to study pilot jobs, user activity, bag of tasks, task sub-steps, and workflow executions (20)

Breaking Parser Logic: Take Your Path Normalization Off and Pop 0days Out!
Breaking Parser Logic: Take Your Path Normalization Off and Pop 0days Out!Breaking Parser Logic: Take Your Path Normalization Off and Pop 0days Out!
Breaking Parser Logic: Take Your Path Normalization Off and Pop 0days Out!
 
Spring Batch Behind the Scenes
Spring Batch Behind the ScenesSpring Batch Behind the Scenes
Spring Batch Behind the Scenes
 
Ogce Workflow Suite Tg09
Ogce Workflow Suite Tg09Ogce Workflow Suite Tg09
Ogce Workflow Suite Tg09
 
EAS Data Flow lessons learnt
EAS Data Flow lessons learntEAS Data Flow lessons learnt
EAS Data Flow lessons learnt
 
DBI-Assisted Android Application Reverse Engineering
DBI-Assisted Android Application Reverse EngineeringDBI-Assisted Android Application Reverse Engineering
DBI-Assisted Android Application Reverse Engineering
 
Spring Batch Performance Tuning
Spring Batch Performance TuningSpring Batch Performance Tuning
Spring Batch Performance Tuning
 
Josh Patterson, Advisor, Skymind – Deep learning for Industry at MLconf ATL 2016
Josh Patterson, Advisor, Skymind – Deep learning for Industry at MLconf ATL 2016Josh Patterson, Advisor, Skymind – Deep learning for Industry at MLconf ATL 2016
Josh Patterson, Advisor, Skymind – Deep learning for Industry at MLconf ATL 2016
 
Deep Learning: DL4J and DataVec
Deep Learning: DL4J and DataVecDeep Learning: DL4J and DataVec
Deep Learning: DL4J and DataVec
 
Celery: The Distributed Task Queue
Celery: The Distributed Task QueueCelery: The Distributed Task Queue
Celery: The Distributed Task Queue
 
Lean Php Presentation
Lean Php PresentationLean Php Presentation
Lean Php Presentation
 
Springboard deepdive
Springboard deepdiveSpringboard deepdive
Springboard deepdive
 
.NET Project Manual
.NET Project Manual.NET Project Manual
.NET Project Manual
 
Introduction to data flow management using apache nifi
Introduction to data flow management using apache nifiIntroduction to data flow management using apache nifi
Introduction to data flow management using apache nifi
 
Vinay Kumar [InfluxData] | InfluxDB Tasks Demonstration | InfluxDays 2022
Vinay Kumar [InfluxData] | InfluxDB Tasks Demonstration | InfluxDays 2022Vinay Kumar [InfluxData] | InfluxDB Tasks Demonstration | InfluxDays 2022
Vinay Kumar [InfluxData] | InfluxDB Tasks Demonstration | InfluxDays 2022
 
(ATS3-APP09) Integrating Symyx Notebook into an Enterprise Management System
(ATS3-APP09) Integrating Symyx Notebook into an Enterprise Management System(ATS3-APP09) Integrating Symyx Notebook into an Enterprise Management System
(ATS3-APP09) Integrating Symyx Notebook into an Enterprise Management System
 
Spring boot
Spring bootSpring boot
Spring boot
 
WORKS 11 Presentation
WORKS 11 PresentationWORKS 11 Presentation
WORKS 11 Presentation
 
FireWorks workflow software
FireWorks workflow softwareFireWorks workflow software
FireWorks workflow software
 
Oracle GoldenGate - Herve Schweitzer, dbi services - Hilton Basel 5/2011
Oracle GoldenGate - Herve Schweitzer, dbi services - Hilton Basel 5/2011Oracle GoldenGate - Herve Schweitzer, dbi services - Hilton Basel 5/2011
Oracle GoldenGate - Herve Schweitzer, dbi services - Hilton Basel 5/2011
 
Cloud infrastructure as code
Cloud infrastructure as codeCloud infrastructure as code
Cloud infrastructure as code
 

More from Rafael Ferreira da Silva

Towards an Infrastructure for Enabling Systematic Development and Research of...
Towards an Infrastructure for Enabling Systematic Development and Research of...Towards an Infrastructure for Enabling Systematic Development and Research of...
Towards an Infrastructure for Enabling Systematic Development and Research of...Rafael Ferreira da Silva
 
Modeling and Simulation of Parallel and Distributed Computing Systems with Si...
Modeling and Simulation of Parallel and Distributed Computing Systems with Si...Modeling and Simulation of Parallel and Distributed Computing Systems with Si...
Modeling and Simulation of Parallel and Distributed Computing Systems with Si...Rafael Ferreira da Silva
 
Good Practices for Developing Scientific Software Frameworks: The WRENCH fram...
Good Practices for Developing Scientific Software Frameworks: The WRENCH fram...Good Practices for Developing Scientific Software Frameworks: The WRENCH fram...
Good Practices for Developing Scientific Software Frameworks: The WRENCH fram...Rafael Ferreira da Silva
 
WorkflowHub: Community Framework for Enabling Scientific Workflow Research a...
WorkflowHub: Community Framework for Enabling  Scientific Workflow Research a...WorkflowHub: Community Framework for Enabling  Scientific Workflow Research a...
WorkflowHub: Community Framework for Enabling Scientific Workflow Research a...Rafael Ferreira da Silva
 
Bridging Concepts and Practice in eScience via Simulation-driven Engineering
Bridging Concepts and Practice in eScience via Simulation-driven EngineeringBridging Concepts and Practice in eScience via Simulation-driven Engineering
Bridging Concepts and Practice in eScience via Simulation-driven EngineeringRafael Ferreira da Silva
 
Accurately Simulating Energy Consumption of I/O-intensive Scientific Workflows
Accurately Simulating Energy Consumption of I/O-intensive Scientific WorkflowsAccurately Simulating Energy Consumption of I/O-intensive Scientific Workflows
Accurately Simulating Energy Consumption of I/O-intensive Scientific WorkflowsRafael Ferreira da Silva
 
Running Accurate, Scalable, and Reproducible Simulations of Distributed Syste...
Running Accurate, Scalable, and Reproducible Simulations of Distributed Syste...Running Accurate, Scalable, and Reproducible Simulations of Distributed Syste...
Running Accurate, Scalable, and Reproducible Simulations of Distributed Syste...Rafael Ferreira da Silva
 
WRENCH: Workflow Management System Simulation Workbench
WRENCH: Workflow Management System Simulation WorkbenchWRENCH: Workflow Management System Simulation Workbench
WRENCH: Workflow Management System Simulation WorkbenchRafael Ferreira da Silva
 
The Interplay of Workflow Execution and Resource Provisioning
The Interplay of Workflow Execution and Resource ProvisioningThe Interplay of Workflow Execution and Resource Provisioning
The Interplay of Workflow Execution and Resource ProvisioningRafael Ferreira da Silva
 
On the Use of Burst Buffers for Accelerating Data-Intensive Scientific Workflows
On the Use of Burst Buffers for Accelerating Data-Intensive Scientific WorkflowsOn the Use of Burst Buffers for Accelerating Data-Intensive Scientific Workflows
On the Use of Burst Buffers for Accelerating Data-Intensive Scientific WorkflowsRafael Ferreira da Silva
 
Using Simple PID Controllers to Prevent and Mitigate Faults in Scientific Wor...
Using Simple PID Controllers to Prevent and Mitigate Faults in Scientific Wor...Using Simple PID Controllers to Prevent and Mitigate Faults in Scientific Wor...
Using Simple PID Controllers to Prevent and Mitigate Faults in Scientific Wor...Rafael Ferreira da Silva
 
Automating Environmental Computing Applications with Scientific Workflows
Automating Environmental Computing Applications with Scientific WorkflowsAutomating Environmental Computing Applications with Scientific Workflows
Automating Environmental Computing Applications with Scientific WorkflowsRafael Ferreira da Silva
 
Analysis of User Submission Behavior on HPC and HTC
Analysis of User Submission Behavior on HPC and HTCAnalysis of User Submission Behavior on HPC and HTC
Analysis of User Submission Behavior on HPC and HTCRafael Ferreira da Silva
 
Automating Real-time Seismic Analysis Through Streaming and High Throughput W...
Automating Real-time Seismic Analysis Through Streaming and High Throughput W...Automating Real-time Seismic Analysis Through Streaming and High Throughput W...
Automating Real-time Seismic Analysis Through Streaming and High Throughput W...Rafael Ferreira da Silva
 
Performance Analysis of an I/O-Intensive Workflow executing on Google Cloud a...
Performance Analysis of an I/O-Intensive Workflow executing on Google Cloud a...Performance Analysis of an I/O-Intensive Workflow executing on Google Cloud a...
Performance Analysis of an I/O-Intensive Workflow executing on Google Cloud a...Rafael Ferreira da Silva
 
Pegasus - automate, recover, and debug scientific computations
Pegasus - automate, recover, and debug scientific computationsPegasus - automate, recover, and debug scientific computations
Pegasus - automate, recover, and debug scientific computationsRafael Ferreira da Silva
 
Task Resource Consumption Prediction for Scientific Applications and Workflows
Task Resource Consumption Prediction for Scientific Applications and WorkflowsTask Resource Consumption Prediction for Scientific Applications and Workflows
Task Resource Consumption Prediction for Scientific Applications and WorkflowsRafael Ferreira da Silva
 
Characterizing a High Throughput Computing Workload: The Compact Muon Solenoi...
Characterizing a High Throughput Computing Workload: The Compact Muon Solenoi...Characterizing a High Throughput Computing Workload: The Compact Muon Solenoi...
Characterizing a High Throughput Computing Workload: The Compact Muon Solenoi...Rafael Ferreira da Silva
 
Experiments with Complex Scientific Applications on Hybrid Cloud Infrastructures
Experiments with Complex Scientific Applications on Hybrid Cloud InfrastructuresExperiments with Complex Scientific Applications on Hybrid Cloud Infrastructures
Experiments with Complex Scientific Applications on Hybrid Cloud InfrastructuresRafael Ferreira da Silva
 
A Unified Approach for Modeling and Optimization of Energy, Makespan and Reli...
A Unified Approach for Modeling and Optimization of Energy, Makespan and Reli...A Unified Approach for Modeling and Optimization of Energy, Makespan and Reli...
A Unified Approach for Modeling and Optimization of Energy, Makespan and Reli...Rafael Ferreira da Silva
 

More from Rafael Ferreira da Silva (20)

Towards an Infrastructure for Enabling Systematic Development and Research of...
Towards an Infrastructure for Enabling Systematic Development and Research of...Towards an Infrastructure for Enabling Systematic Development and Research of...
Towards an Infrastructure for Enabling Systematic Development and Research of...
 
Modeling and Simulation of Parallel and Distributed Computing Systems with Si...
Modeling and Simulation of Parallel and Distributed Computing Systems with Si...Modeling and Simulation of Parallel and Distributed Computing Systems with Si...
Modeling and Simulation of Parallel and Distributed Computing Systems with Si...
 
Good Practices for Developing Scientific Software Frameworks: The WRENCH fram...
Good Practices for Developing Scientific Software Frameworks: The WRENCH fram...Good Practices for Developing Scientific Software Frameworks: The WRENCH fram...
Good Practices for Developing Scientific Software Frameworks: The WRENCH fram...
 
WorkflowHub: Community Framework for Enabling Scientific Workflow Research a...
WorkflowHub: Community Framework for Enabling  Scientific Workflow Research a...WorkflowHub: Community Framework for Enabling  Scientific Workflow Research a...
WorkflowHub: Community Framework for Enabling Scientific Workflow Research a...
 
Bridging Concepts and Practice in eScience via Simulation-driven Engineering
Bridging Concepts and Practice in eScience via Simulation-driven EngineeringBridging Concepts and Practice in eScience via Simulation-driven Engineering
Bridging Concepts and Practice in eScience via Simulation-driven Engineering
 
Accurately Simulating Energy Consumption of I/O-intensive Scientific Workflows
Accurately Simulating Energy Consumption of I/O-intensive Scientific WorkflowsAccurately Simulating Energy Consumption of I/O-intensive Scientific Workflows
Accurately Simulating Energy Consumption of I/O-intensive Scientific Workflows
 
Running Accurate, Scalable, and Reproducible Simulations of Distributed Syste...
Running Accurate, Scalable, and Reproducible Simulations of Distributed Syste...Running Accurate, Scalable, and Reproducible Simulations of Distributed Syste...
Running Accurate, Scalable, and Reproducible Simulations of Distributed Syste...
 
WRENCH: Workflow Management System Simulation Workbench
WRENCH: Workflow Management System Simulation WorkbenchWRENCH: Workflow Management System Simulation Workbench
WRENCH: Workflow Management System Simulation Workbench
 
The Interplay of Workflow Execution and Resource Provisioning
The Interplay of Workflow Execution and Resource ProvisioningThe Interplay of Workflow Execution and Resource Provisioning
The Interplay of Workflow Execution and Resource Provisioning
 
On the Use of Burst Buffers for Accelerating Data-Intensive Scientific Workflows
On the Use of Burst Buffers for Accelerating Data-Intensive Scientific WorkflowsOn the Use of Burst Buffers for Accelerating Data-Intensive Scientific Workflows
On the Use of Burst Buffers for Accelerating Data-Intensive Scientific Workflows
 
Using Simple PID Controllers to Prevent and Mitigate Faults in Scientific Wor...
Using Simple PID Controllers to Prevent and Mitigate Faults in Scientific Wor...Using Simple PID Controllers to Prevent and Mitigate Faults in Scientific Wor...
Using Simple PID Controllers to Prevent and Mitigate Faults in Scientific Wor...
 
Automating Environmental Computing Applications with Scientific Workflows
Automating Environmental Computing Applications with Scientific WorkflowsAutomating Environmental Computing Applications with Scientific Workflows
Automating Environmental Computing Applications with Scientific Workflows
 
Analysis of User Submission Behavior on HPC and HTC
Analysis of User Submission Behavior on HPC and HTCAnalysis of User Submission Behavior on HPC and HTC
Analysis of User Submission Behavior on HPC and HTC
 
Automating Real-time Seismic Analysis Through Streaming and High Throughput W...
Automating Real-time Seismic Analysis Through Streaming and High Throughput W...Automating Real-time Seismic Analysis Through Streaming and High Throughput W...
Automating Real-time Seismic Analysis Through Streaming and High Throughput W...
 
Performance Analysis of an I/O-Intensive Workflow executing on Google Cloud a...
Performance Analysis of an I/O-Intensive Workflow executing on Google Cloud a...Performance Analysis of an I/O-Intensive Workflow executing on Google Cloud a...
Performance Analysis of an I/O-Intensive Workflow executing on Google Cloud a...
 
Pegasus - automate, recover, and debug scientific computations
Pegasus - automate, recover, and debug scientific computationsPegasus - automate, recover, and debug scientific computations
Pegasus - automate, recover, and debug scientific computations
 
Task Resource Consumption Prediction for Scientific Applications and Workflows
Task Resource Consumption Prediction for Scientific Applications and WorkflowsTask Resource Consumption Prediction for Scientific Applications and Workflows
Task Resource Consumption Prediction for Scientific Applications and Workflows
 
Characterizing a High Throughput Computing Workload: The Compact Muon Solenoi...
Characterizing a High Throughput Computing Workload: The Compact Muon Solenoi...Characterizing a High Throughput Computing Workload: The Compact Muon Solenoi...
Characterizing a High Throughput Computing Workload: The Compact Muon Solenoi...
 
Experiments with Complex Scientific Applications on Hybrid Cloud Infrastructures
Experiments with Complex Scientific Applications on Hybrid Cloud InfrastructuresExperiments with Complex Scientific Applications on Hybrid Cloud Infrastructures
Experiments with Complex Scientific Applications on Hybrid Cloud Infrastructures
 
A Unified Approach for Modeling and Optimization of Energy, Makespan and Reli...
A Unified Approach for Modeling and Optimization of Energy, Makespan and Reli...A Unified Approach for Modeling and Optimization of Energy, Makespan and Reli...
A Unified Approach for Modeling and Optimization of Energy, Makespan and Reli...
 

Recently uploaded

Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersThousandEyes
 
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024BookNet Canada
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptxLBM Solutions
 
Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024Neo4j
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024The Digital Insurer
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsAndrey Dotsenko
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksSoftradix Technologies
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Alan Dix
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphNeo4j
 

Recently uploaded (20)

Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
 
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food Manufacturing
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptx
 
Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
The transition to renewables in India.pdf
The transition to renewables in India.pdfThe transition to renewables in India.pdf
The transition to renewables in India.pdf
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other Frameworks
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
 
Hot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort Service
Hot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort ServiceHot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort Service
Hot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort Service
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
 

A science-gateway workload archive to study pilot jobs, user activity, bag of tasks, task sub-steps, and workflow executions

  • 1. A science-gateway workload archive to study pilot jobs, user activity, bag of tasks, task sub-steps and workflow executions Rafael FERREIRA DA SILVA and Tristan GLATARD University of Lyon, CNRS, INSERM, CREATIS Villeurbanne, France CoreGRID/ERCIM Workshop on Grids, Clouds and P2P Computing August 27th 2012 1 Rafael Ferreira da Silva – rafael.silva@creatis.insa-lyon.fr
  • 2. Context: Workload Archives Assumptions validation exit_code task_status useful for submit_time ime t ion_t Computational activity site_name execu modeling inpu t _file id workflow_ activity_name Methods evaluation (simulation or experimental) Information produced by grid workflow executions 2 Rafael Ferreira da Silva – rafael.silva@creatis.insa-lyon.fr
  • 3. Science-gateway architecture 0. Login 3. Launch workflow 1. Send input data User Workflow Engine Web Portal 2. Transfer 4. Generate and input files submit task Storage Element 8. Get files 7. Get task 9. Execute 10. Upload results Pilot Manager Computing site 6. Schedule 5. Submit pilot jobs pilot jobs Meta-Scheduler 3 Rafael Ferreira da Silva – rafael.silva@creatis.insa-lyon.fr
  • 4. State of the Art Grid Workload Archives exit_code task_status submit_time time tion_ execu site_name inpu t _file d workflow_i Information gathered activity_name at infrastructure-level tasks Lack of critical information: •  Dependencies among tasks •  Parallel Workloads Archive (http://www.cs.huji.ac.il/labs/parallel/workload/) •  Task sub-steps •  Grid Workloads Archive •  Application-level scheduling artifacts (http://gwa.ewi.tudelft.nl/pmwiki/) •  User 4 Rafael Ferreira da Silva – rafael.silva@creatis.insa-lyon.fr
  • 5. At infrastructure-level 0. Login 3. Launch workflow 1. Send input data User Workflow Engine Web Portal 2. Transfer 4. Generate and input files submit task Storage Element 8. Get files 7. Get task 9. Execute 10. Upload results Pilot Manager Computing site 6. Schedule 5. Submit pilot jobs pilot jobs Meta-Scheduler 5 Rafael Ferreira da Silva – rafael.silva@creatis.insa-lyon.fr
  • 6. Outline   A science-gateway workload archive   Case studies   Pilot Jobs   Accounting   Task analysis   Bag of tasks   Workflows   Conclusions 6 Rafael Ferreira da Silva – rafael.silva@creatis.insa-lyon.fr
  • 7. Our approach Science-Gateway Workload Archive exit_code task_status submit_time time tion_ execu site_name inpu t _file d Information gathered workflow_i activity_name at science-gateway level Advantages: workflow executions •  Fine-grained information about tasks •  Dependencies among tasks •  Workflow characterization •  Accounting 7 Rafael Ferreira da Silva – rafael.silva@creatis.insa-lyon.fr
  • 8. At science-gateway level 0. Login 3. Launch workflow 1. Send input data User Workflow Engine Web Portal 2. Transfer 4. Generate and input files submit task Storage Element 8. Get files 7. Get task 9. Execute 10. Upload results Pilot Manager Computing site 6. Schedule 5. Submit pilot jobs pilot jobs Meta-Scheduler 8 Rafael Ferreira da Silva – rafael.silva@creatis.insa-lyon.fr
  • 9. Virtual Imaging Platform   Virtual Imaging Platform (VIP)   Medical imaging science-gateway   Grid of 129 sites (EGI – http://www.egi.eu) Applications   Significant usage   Registered users: 244 from 26 countries   Applications: 18 File transfer   Consumed 32 CPU years in 2011 VIP – http://vip.creatis.insa-lyon.fr VIP usage in 2011: CPU consumption of VIP and related platforms on EGI. 9 Rafael Ferreira da Silva – rafael.silva@creatis.insa-lyon.fr
  • 10. SGWA   Science Gateway Workload Archive (SGWA)   Archive is extracted from VIP Science-gateway archive model Task, Site and Workflow Execution File and Pilot Job extracted from acquired from databases populated the parsing of task standard by the workflow engine at runtime output and error files 10 Rafael Ferreira da Silva – rafael.silva@creatis.insa-lyon.fr
  • 11. Workload for Case Studies   Based on the workload of VIP   January 2011 to April 2012 338,989 completed 138,480 error 105,488 aborted 15,576 aborted replicas 48,293 stalled 34,162 queued 112 users 2,941 workflow executions 680,988 tasks 339,545 pilot jobs 11 Rafael Ferreira da Silva – rafael.silva@creatis.insa-lyon.fr
  • 12. Pilot Jobs   A single pilot can wrap several tasks and users 282331 250000 200000   At infrastructure-level 150000 Frequency 100000   Assimilates pilot jobs to tasks and 50000 28121 users 11885 6721 10487   Valid for only 62% of the tasks 0 1 2 3 4 5 Tasks per pilot   Valid for 95% of user-task associations 323214 300000 250000 200000 150000 Frequency   At science-gateway level 100000 50000   Users and tasks are correctly 15178 associated to pilots 1079 70 4 0 1 2 3 4 5 Users per pilot 12 Rafael Ferreira da Silva – rafael.silva@creatis.insa-lyon.fr
  • 13. Accounting: Users   Authentications based on login and password are mapped to X.509 robot certificates   At infrastructure-level   All VIP users are reported as a single user   At science-gateway level   Maps task executions to VIP users 40 30 Users EGI 20 VIP 10 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 Months Number of reported EGI and VIP users 13 Rafael Ferreira da Silva – rafael.silva@creatis.insa-lyon.fr
  • 14. Accounting: CPU and Wall-clock Time   Huge discrepancy of values 6e+05 VIP jobs   Pilot jobs do not register to Number of jobs 5e+05 EGI jobs the pilot system 4e+05 3e+05   Absence of workload 2e+05 1e+05   Outputs unretrievable 5 10 15 Month   Pilot setup time Number of submitted pilot jobs by EGI and VIP   Lost tasks (a.k.a. stalled) 150 VIP CPU time VIP Wall−clock time 100   Undetectable at infrastructure-level EGI CPU time Years EGI Wall−clock time 50 5 10 15 Month Consumed CPU and wall-clock time by EGI and VIP 14 Rafael Ferreira da Silva – rafael.silva@creatis.insa-lyon.fr
  • 15. Task Analysis   At infrastructure-level   Limited to task exit codes 55165 50925 50000 48293 Number of tasks 40000 30000   At science-gateway level 20000 19463   Fine-grained information 10000 1123 0   Steps in task life application input stalled Error causes output folder   Error causes   Replicas per task 1200 1191 1285 1000 Frequency 1.0 800 download 600 0.8 execution 400 401 347 322 0.6 upload CDF 200 0.4 6 0 1 2 3 4 5 +5 0.2 Replicas per task 1 100 10000 Time(s) 15 Different steps in task life Rafael Ferreira da Silva – rafael.silva@creatis.insa-lyon.fr
  • 16. Bag of Tasks: at Infrastructure level   Evaluation of the accuracy of Iosup et al.[8] method to detect bag of tasks (BoT) Task 1 Task 2   Two successively submitted tasks are in the same BoT if Δ1,2 Δ2,3 Task 3 the time interval between submission times is lower t1 t2 t3 time or equal to Δ. Δ Δ BoT 1 BoT 2 Task 1 Δ1,2 ≤Δ Task 3 Δ2,3 >Δ |t1 – t2|≤Δ |t2 – t3|>Δ Task 2 16 [8] Iosup, A., Jan, M., Sonmez, O., Epema, D.: The Characteristics and performance of groups of jobs in grids. In: Euro-Par. (2007) 382-393 Rafael Ferreira da Silva – rafael.silva@creatis.insa-lyon.fr
  • 17. Bag of Tasks: Size and Duration Infrastructure vs science-gateway   90% of Batch BoTs size ranges 0.8 from 2 to 10 while it represents 0.6 CDF 50% of Real Batch 0.4 0.2 Real Batch Batch 0.0 200 400 600 800 1000 Size (number of tasks) 0.8   Non-Batch duration is 0.6 overestimated up to 400% CDF Real Batch 0.4 Real Non−Batch 0.2 Batch Non−Batch 0.0 10000 20000 30000 40000 50000 Duration (s) Real Batch = ground-truth BoT Real Non-Batch = ground-truth non-BoT Batch = Iosup et al. BoT Non-Batch = Iosup et al. non-BoT 17 Rafael Ferreira da Silva – rafael.silva@creatis.insa-lyon.fr
  • 18. Bag of Tasks: Inter-arrival Time and Consumed CPU Time   Batch and Non-Batch inter-arrival 0.8 times are underestimated by 0.6 CDF about 30% 0.4 Real Batch Real Non−Batch 0.2 Batch Non−Batch 0.0 2000 4000 6000 8000 10000 Inter−Arrival Time (s) 0.8   CPU times are underestimated of 0.6 25% for Non-Batch and of about CDF 20% for Batch Real Batch 0.4 Real Non−Batch 0.2 Batch Non−Batch 0 5000 10000 15000 20000 25000 30000 Consumed CPUTime (KCPUs) Real Batch = ground-truth BoT Real Non-Batch = ground-truth non-BoT Batch = Iosup et al. BoT Non-Batch = Iosup et al. non-BoT 18 Rafael Ferreira da Silva – rafael.silva@creatis.insa-lyon.fr
  • 19. Workflow Characterization   At infrastructure-level Small (52%): ≤ 100 tasks Medium (31%): between 101 and 500 tasks   Hardly possible Large (17%): > 500 tasks   At science-gateway level 1.0 1.0 0.8 0.8 0.6 0.6 CDF CDF small 0.4 0.4 medium 0.2 large 0.2 total 2000 4000 6000 8000 1e+03 1e+05 1e+07 1e+09 Size (number of tasks) Makespan (s) 1.0 1.0 0.8 0.8 0.6 0.6 CDF CDF small small 0.4 0.4 medium medium 0.2 large 0.2 large total total 0.0 200 400 600 800 0 1 2 3 4 5 6 Speedup Critical path length 19 Rafael Ferreira da Silva – rafael.silva@creatis.insa-lyon.fr
  • 20. Conclusions   Science-gateway model of workload archive   Illustration by using traces of the VIP from 2011/2012   Added value when compared to infrastructure-level traces   Exactly identify tasks and users   Distinguishes additional workload artifacts from real workload   Fine-grained information about tasks   Ground-truth of bag of tasks   Workflow characterization   Traces are available to the community in the Grid Observatory   http://www.grid-observatory.org 20 Rafael Ferreira da Silva – rafael.silva@creatis.insa-lyon.fr
  • 21. A science-gateway workload archive to study pilot jobs, user activity, bag of tasks, task sub-steps and workflow executions Thank you for your attention. Questions? ACKNOWLEDGMENTS VIP users and project members French National Agency for Research (ANR-09-COSI-03) European Grid Initiative (EGI) France-Grilles Rafael FERREIRA DA SILVA and Tristan GLATARD University of Lyon, CNRS, INSERM, CREATIS Villeurbanne, France 21 Rafael Ferreira da Silva – rafael.silva@creatis.insa-lyon.fr