Data Flow Driven Scheduling Of Bpel Workflows Using Cloud Resources, IEEE CLOUD 2010, Miami

842 views

Published on

An approach to assign BPEL workflow steps to available resources is presented. The approach takes data dependencies between workflow steps and the utilization of resources at runtime into account.
The developed scheduling algorithm simulates whether the makespan of workflows could be reduced by providing additional resources from a Cloud infrastructure. If yes, Cloud resources are automatically set up and used to increase throughput.
The proposed approach does not require any changes to the BPEL standard. An implementation based on the ActiveBPEL engine and Amazon\\\'s Elastic Compute Cloud is presented.
Experimental results for a real-life workflow from a medical application indicate that workflow execution times can be reduced significantly.

0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
842
On SlideShare
0
From Embeds
0
Number of Embeds
3
Actions
Shares
0
Downloads
0
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Data Flow Driven Scheduling Of Bpel Workflows Using Cloud Resources, IEEE CLOUD 2010, Miami

  1. 1. Data Flow Driven Scheduling of BPELWorkflows Using Cloud Resources Tim Dörnemann, Ernst Juhnke, Thomas Noll, Dominik Seiler, Bernd Freisleben {doernemt, ejuhnke, noll, seiler, freisleb}@informatik.uni-marburg.de
  2. 2. Agenda• Introduction and Motivation• Architecture• Implementation• Conclusion
  3. 3. Business Process Execution Language• BPEL is the de-facto standard for workflow / business process modeling in the web service area• Programming in the large: complex applications are built by composing existing components (web services)• the composed process is exposed as a web service itself and integrates perfectly into SOAs
  4. 4. <process ... " name="processname" suppressJoinFailure="no"targetNamespace="http://namespace.de/target"><variables> ... <variable name="inVar" messageType="sns:inputMsg" /> <variable name="outVar" messageType="sns:outMsg" /></variables><partnerLinks><partnerLink name="startPL" partnerLinkType="sns:startProcessPLT" myRole="startRole" /> ... </partnerLinks> <flow name="Flow1"> <links> ... </links> <receive name="receiveVideoFile" createInstance="yes" operation="startProcess" partnerLink="startPL" portType="sns:invokePT" variable="inVar"><source linkName="Connection2" /><source linkName="Connection3" /> </receive><sequence name="faceSequence"><assign name="Vid2FaceDet" /> <invoke name="FaceDetection" partnerLink="gsPL" portType="gs:FaceDetPort" operation="doFaceDet" inputVariable="inVar" outputVariable="outVar" /><assign name="face2MP7" /> <source linkName="Connection4" /> <target linkName="Connection2" /> </sequence> ... <reply name="replyToUser" operation="startProcess" partnerLink="startPL" portType="sns:invokePT" variable="outVar"> <target linkName="Connection1" /> </reply>
  5. 5. BPEL – dynamic resource selection• Destinations of invoke operations • mixup of business logic and are typically set at design time infrastructural settings• setting at runtime possible, but • very high modeling overhead to complicated make resource selection dynamic <assign> <copy> <from> <literal> <wsa:EndpointReference xmlns:ns="NSPACE"> <wsa:Address>http://FQDN:PORT/SERVICE-ADDRESS</wsa:Address> <wsa:ServiceName PortName="Port"> ns:SERVICE-NAME</wsa:ServiceName> <wsa:ReferenceParameters> <wsa:To>...</wsa:To> <wsa:Action>...</wsa:Action> </wsa:ReferenceParameters> </wsa:EndpointReference> </literal> </from> <to variable="targetEPR"/> </copy> <copy> <from variable="targetEPR" /> <to partnerLink="targetPL" /> </copy> </assign>
  6. 6. Peak Load Scenario• Scenario: static/pre-defined target hosts, workflow is invoked many times in parallel• Leads to high load on workflows target machines – increase of workflow runtime / response time – negative user experience – loss of stability• worst case: abandonment of workflow – waste of CPU hours (lost intermediate results) BPEL Engine
  7. 7. Peak Load Scenario• Scenario: static/pre-defined target hosts, workflow is invoked many times in parallel• Leads to high load on workflows target machines – increase of workflow runtime / response time – negative user experience – loss of stability• worst case: abandonment of workflow – waste of CPU hours (lost intermediate results) BPEL Engine
  8. 8. Peak Load Scenario• Scenario: static/pre-defined target hosts, workflow is invoked many times in parallel• Leads to high load on workflows target machines – increase of workflow runtime / response time – negative user experience – loss of stability• worst case: abandonment of workflow – waste of CPU hours (lost intermediate results) BPEL Engine
  9. 9. Peak Load Scenario• Scenario: static/pre-defined target hosts, workflow is invoked many times in parallel• Leads to high load on workflows target machines – increase of workflow runtime / response time – negative user experience – loss of stability• worst case: abandonment of workflow – waste of CPU hours ? (lost intermediate results) BPEL Engine
  10. 10. Desired behavior On-demand resources ? BPEL Engine
  11. 11. Sample Application• Workflow from the medical domain: apnoea research• Scheduling must respect data dependencies
  12. 12. Solution Requirements• BPEL is a non-DAG workflow language – (While) loops – Rescheduling – Low computation time• Example: – Workflow with 10 activities and 6 available resources – 106 matches have to be computed• Heuristic algorithm is necessary
  13. 13. Design: Genetic Algorithm• Widely used approach in literature• Natural choice – Chromosome → invoke activity – Genome → list of activities – Population → set of candidate resource allocations• Low risk of local minimum problem
  14. 14. Design: GA (cont„d)Start population Cross- Selection over Mutation population
  15. 15. Design: Critical Paths• Critical path (CP) is a linear part of the data flow graph• Reduces assignment complexity• CPs are sorted according to their estimated runtime (descending) – GA computes schedule for CPs in this order
  16. 16. Design: Reservation• Reserves resources for a certain time (exclusive allocation) – Prevents overloading of resources• Coordinates re-scheduling of subgraphs – Reservations are removed when execution of operation is finished – If a operation has no reservation, either • the reservation was violated and therefore removed • the operation is in a cycle (while)
  17. 17. GA: Pseudo code for( path : criticalPaths ) { pop = generateStartPopulation( path ); while( evolutionNotFinished() ) { newPop = survivalOfTheFittest( pop ); newPopC = crossover( newPop ); newPopM = mutate( newPop ); newPop.add( newPopC , newPopM ); pop = newPop; } }
  18. 18. Architecture• Target hosts for service calls are determined at execution time of BPEL workflows instead of design time
  19. 19. Implementation
  20. 20. Implementation (cont„d)• Pass reference (FlexSwA) instead of actual data Details: Steffen Heinzl, Markus Mathes, Thomas Friese, Matthew Smith, Bernd Freisleben Flex-SwA: Flexible Exchange of Binary Data Based on SOAP Messages with Attachments In: Proceedings of the IEEE International Conference on Web Services (ICWS), pp. 3-10, IEEE Computer Society Press, 2006
  21. 21. Evaluation• Sample application stems from the medical research (apnoea detection) – heavily uses native code (Physio Toolkit)• Total amount of transferred data per Workflow – 258 MB – + 118 MB from client to engine• Test bed – dedicated resources: Core2Duo E6850, 2 GB RAM – Cloud resources: “High-CPU Medium Instance”, 5 EC2 Compute Units, 1.7 GB RAM
  22. 22. Evaluation (cont„d) 1. a new workflow every 30 seconds 2. at an interval of 90 seconds, two workflows are started 3. four workflows are started concurrently Workflow 1 Workflow 2 Workflow 3 Workflow 4 Workflow 5 time
  23. 23. Conclusion• Data flow aware scheduler for BPEL – uses genetic algorithm as heuristic – reduces makespan of workflow – utilizes existing and virtual resources more efficient• Future work – extend approach to support multi-objective scheduling • Example: cost and performance optimization – Impl. details like ahead-of-time provisioning of VMs to avoid delays (see last slide)
  24. 24. Thank you for your Attention! Any Questions or Remarks?{doernemt, ejuhnke, noll, seiler, freisleb}@informatik.uni-marburg.de

×