Workflow Overhead Analysis
    and Optimizations
     Weiwei Chen, Ewa Deelman
        Information Sciences Institute
       University of Southern California
          {wchen,deelman}@isi.edu
      WORKS11, Nov 14 2011, Seattle WA
Outline

•    Introduction
•    Overhead modeling
•    Cumulative overhead
•    Experiments and evaluations
•    Conclusions and future work
Introduction
• Workflow Optimization
   • Scheduling
   • Reducing Runtime
   • Reducing and Overlapping
   Overheads
• Overheads
• Benefits
   • Workflow modeling and
   simulation                   Fig	
  1	
  System	
  Overview	
  
   • Performance evaluation
   • New optimization methods
Outline


•    Introduction
•    Overhead modeling
•    Cumulative overhead
•    Experiments and evaluations
•    Conclusions and future work
Modeling Overheads




Job Events:                                                 Workflow Events:
•    Job Release
                            Workflow engine delay •  Workflow Engine Start
•    Job Submit                                                                                                             Makespan
                            Queue delay           •  Workflow Engine Finished
•    Job Execute
                            Runtime
•    Job Terminate
•    Postscript Start
                            Postscript delay
•    Postscript Terminate


                                           	
  1	
  h1p://pegasus.isi.edu/wms/docs/3.1/monitoring_	
  debugging_stats.php#ploAng_staBsBcs	
  
Outline

•    Introduction
•    Overhead modeling
•    Cumulative overhead
•    Experiments and evaluations
•    Conclusions and future work
Cumulative Overhead (O1)
•  O1 simply adds up a similar type of
overheads of all jobs.




     O1(workflow engine delay)=10+10+10=30
     O1(queue delay)=10+20+10=40
     O1(data transfer delay)=10
     O1(postscript delay)=10+20+10=40
Cumulative Overhead (O2)
•  O2 subtracts from O1 the overlaps of the same type of
overhead.




  O2(workflow engine delay)=20	
     O2(data transfer delay)=10.
  O2(queue delay)=30.                O2(postscript delay)=40
Cumulative Overhead (O3)
    •  O3 subtracts the overlap of dissimilar overheads from O2




O3(workflow engine delay)=20	
     O3(data transfer delay)=10
O3(queue delay)=20	
               O3(postscript delay)=30	
  
Outline

•    Introduction
•    Overhead modeling
•    Cumulative overhead
•    Experiments and evaluations
•    Conclusions and future work
Experiments
•  Environments:
          •  Amazon EC2                                                        •  HPCC
          •  FutureGrid                                                        •  Other clusters
•  Applications:
          •          Biology: Epigenomics, Proteomics, SIPHT
          •          Earthquake science: Broadband, CyberShake
          •          Astronomy: Montage
          •          Physics: LIGO
•  Optimizations:
          •          Job Clustering                                            •  Data Pre-Staging
          •          Resource Provisioning                                     •  Throttling

Data	
  are	
  available	
  at	
  h1p://pegasus.isi.edu/workflow_gallery/	
  
Experiments
Distribution of Overheads
Job Clustering
•  Merging small jobs into a clustered job	
  




        without with            without with            without    with
        clustering clustering   clustering clustering   clustering clustering
   Percentage(%)=cumulative overhead(seconds) / makspan(seconds)

   With job clustering, the cumulative overheads decrease
   greatly due to the decreased number of jobs.
Resource Provisioning
         •  Deploy pilot jobs as placeholders	
  




  with         without        with         without        with         without
  provisioning provisioning   provisioning provisioning   provisioning provisioning



O3 and O2 have shown more obviously that the
portion of runtime has been increased than O1.
Outline

•    Introduction
•    Overhead modeling
•    Cumulative overhead
•    Experiments and evaluations
•    Conclusions and future work
Conclusions and Future Work

Conclusions
  •    Overhead Analysis
  •    A complete view of these three metrics

Future Work
  •  More optimization methods.
  •  Dynamic provisioning
Q&A

•  Pegasus Group: http://pegasus.isi.edu/
•  FutureGrid: https://portal.futuregrid.org/
•  Scripts are available at
      http://isi.edu/~wchen/techniques.html
•  Data are available at
      http://pegasus.isi.edu/workflow_gallery/

Overhead Supercomputing 2011

  • 1.
    Workflow Overhead Analysis and Optimizations Weiwei Chen, Ewa Deelman Information Sciences Institute University of Southern California {wchen,deelman}@isi.edu WORKS11, Nov 14 2011, Seattle WA
  • 2.
    Outline •  Introduction •  Overhead modeling •  Cumulative overhead •  Experiments and evaluations •  Conclusions and future work
  • 3.
    Introduction • Workflow Optimization • Scheduling • Reducing Runtime • Reducing and Overlapping Overheads • Overheads • Benefits • Workflow modeling and simulation Fig  1  System  Overview   • Performance evaluation • New optimization methods
  • 4.
    Outline •  Introduction •  Overhead modeling •  Cumulative overhead •  Experiments and evaluations •  Conclusions and future work
  • 5.
    Modeling Overheads Job Events: Workflow Events: •  Job Release Workflow engine delay •  Workflow Engine Start •  Job Submit Makespan Queue delay •  Workflow Engine Finished •  Job Execute Runtime •  Job Terminate •  Postscript Start Postscript delay •  Postscript Terminate  1  h1p://pegasus.isi.edu/wms/docs/3.1/monitoring_  debugging_stats.php#ploAng_staBsBcs  
  • 6.
    Outline •  Introduction •  Overhead modeling •  Cumulative overhead •  Experiments and evaluations •  Conclusions and future work
  • 7.
    Cumulative Overhead (O1) • O1 simply adds up a similar type of overheads of all jobs. O1(workflow engine delay)=10+10+10=30 O1(queue delay)=10+20+10=40 O1(data transfer delay)=10 O1(postscript delay)=10+20+10=40
  • 8.
    Cumulative Overhead (O2) • O2 subtracts from O1 the overlaps of the same type of overhead. O2(workflow engine delay)=20   O2(data transfer delay)=10. O2(queue delay)=30. O2(postscript delay)=40
  • 9.
    Cumulative Overhead (O3) •  O3 subtracts the overlap of dissimilar overheads from O2 O3(workflow engine delay)=20   O3(data transfer delay)=10 O3(queue delay)=20   O3(postscript delay)=30  
  • 10.
    Outline •  Introduction •  Overhead modeling •  Cumulative overhead •  Experiments and evaluations •  Conclusions and future work
  • 11.
    Experiments •  Environments: •  Amazon EC2 •  HPCC •  FutureGrid •  Other clusters •  Applications: •  Biology: Epigenomics, Proteomics, SIPHT •  Earthquake science: Broadband, CyberShake •  Astronomy: Montage •  Physics: LIGO •  Optimizations: •  Job Clustering •  Data Pre-Staging •  Resource Provisioning •  Throttling Data  are  available  at  h1p://pegasus.isi.edu/workflow_gallery/  
  • 12.
  • 13.
  • 14.
    Job Clustering •  Mergingsmall jobs into a clustered job   without with without with without with clustering clustering clustering clustering clustering clustering Percentage(%)=cumulative overhead(seconds) / makspan(seconds) With job clustering, the cumulative overheads decrease greatly due to the decreased number of jobs.
  • 15.
    Resource Provisioning •  Deploy pilot jobs as placeholders   with without with without with without provisioning provisioning provisioning provisioning provisioning provisioning O3 and O2 have shown more obviously that the portion of runtime has been increased than O1.
  • 16.
    Outline •  Introduction •  Overhead modeling •  Cumulative overhead •  Experiments and evaluations •  Conclusions and future work
  • 17.
    Conclusions and FutureWork Conclusions •  Overhead Analysis •  A complete view of these three metrics Future Work •  More optimization methods. •  Dynamic provisioning
  • 18.
    Q&A •  Pegasus Group:http://pegasus.isi.edu/ •  FutureGrid: https://portal.futuregrid.org/ •  Scripts are available at http://isi.edu/~wchen/techniques.html •  Data are available at http://pegasus.isi.edu/workflow_gallery/