Towards Scalable Service Composition on Multicores

1,725 views

Published on

Presentation by Daniele Bonetta at OTM/SWWS 2010, Hersonissos, Crete

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
1,725
On SlideShare
0
From Embeds
0
Number of Embeds
215
Actions
Shares
0
Downloads
0
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Towards Scalable Service Composition on Multicores

  1. 1. Towards Scalable Service Composition on Multicores Daniele Bonetta, Achille Peternier, Cesare Pautasso,Walter Binder Faculty of Informatics University of Lugano - USI Switzerland http://sosoa.inf.usi.ch daniele.bonetta@usi.ch
  2. 2. Service Composition Build Services by reusing existing Web services Client Web Service Web Service Web Service Composite Web Service
  3. 3. Composition Engines Focus: Service Composition Runtime Execution Environment Client Web Service Web Service Web Service Composite Web Service Service Composition Engine
  4. 4. How to scale? Client Web Service Web Service Web Service Composite Web Service Service Composition EngineClient Client Client Client Client Client Client Client Client Client Client Client Client Client
  5. 5. How to scale? Client Web Service Web Service Web Service Composite Web Service Service Composition EngineClient Client Client Client Client Client Client Client Client Client Client Client Client Client ent ient ent ent Client Client Client Client Client Client Client Client Client Client Client Client Client Client Client Client Client Client Client Client Client Client Client
  6. 6. Outline 1. Problem: Scalable Service Composition 2. Opportunity: Multicores 3. Scalable Composition Engine Architecture 4. Multicore-Aware Deployment 5. Preliminary Evaluation 6. Conclusion & Outlook
  7. 7. Scalability Constraints • Service Level Agreement • Response Time • Throughput • Portability • Heterogeneous environments
  8. 8. Existing solutions • Centralized: • Scale on cluster of computers, beowulf • Decentralized: • Scale on P2P networks s.c.e. ws ws ws cli cli ws ws ws sce sce sce sce
  9. 9. Scalability on the Cloud Today’s challenge The Cloud Services Data Code Service Composition Engine
  10. 10. Portability The Cloud Service Composition Engine Before scaling out on the cloud it is important to make efficient usage of the hardware architectures that are available on the Cloud The cloud is a very heterogeneous environment
  11. 11. Scalability on Multicores core core core core core core core core core core core core core core core core core core core core core core core core core core core core core core core core core core core core Quad-Core AMD Opteron core core core core core core core core core core core core core core core core core core core core core core
  12. 12. Scalability on Multicores On top of today’s heterogeneous hardware • Different number of cores • Different type of cores (SMT = n) • Different chip memory layouts (cache levels, cache size, NUMA)
  13. 13. Engine Architecture Run a large number of concurrent compositions with a limited number of execution threads Request Handler Kernel Invoker
  14. 14. Engine Architecture Request Handler Kernel Invoker • 3-stage Pipeline •Thread Pools • Non-blocking I/O
  15. 15. Deployment on Multicores core core core core core core corecore core core core core core core core core ... #2 #4 #6 # n // threads Request Handler Kernel Invoker
  16. 16. Deployment on Multicores core core core core core core corecore core core core core core core core core ... #2 #4 #6 # n // threads Request Handler Kernel Invoker How?
  17. 17. Deployment Challenge
  18. 18. ! ! # !$ $ ! ! # $ ! ! # ! $ $
  19. 19. ! ! # $ ! ! # !$ $ ! ! # $ !
  20. 20. ! # !$ $ # # !# # # # # !# # ## #$ # !$ $ $ $ $ !$ $ $ $ $# !$$ $$ $ !
  21. 21. ! # $ ! ! # ! $ $ ! ! # $ ! ! # • 4 P7 CPUs • 32 cores • 128 // threads
  22. 22. Deployment Challenge How to scale on multicores? Just increase the number of parallel concurrent threads in the engine?
  23. 23. Experimental Results 200 400 600 800 1000 1200 1400 1600 1800 0 20 40 60 80 100 120 140 Throughput(Instances/sec) Number of threads (per pool) ForEach Sequential Parallel Loop Just increasing the number of threads... # of threads Throughput(req/s)
  24. 24. Our Proposal Topology-Aware deployment
  25. 25. Our Proposal • Replicate the architecture instead of just increasing the number of threads Topology-Aware deployment
  26. 26. Our Proposal • Replicate the architecture instead of just increasing the number of threads • Bind threads to specific affinity groups Topology-Aware deployment
  27. 27. Our Proposal • Replicate the architecture instead of just increasing the number of threads • Bind threads to specific affinity groups • Distribute resources(memory/threads) among replicas proportionally to hw- resources and number of replicas Topology-Aware deployment
  28. 28. Example • 4 cores, 4 L1 caches, 2 L2 caches L2 cache L2 cache L1 L1 L1 L1 C1 C2 C3 C4
  29. 29. Single Instance This baseline deployment lets the OS thread scheduler map the engine threads on all cores L2 cache L2 cache L1 L1 L1 L1 C1 C2 C3 C4 Engine Instance (8 threads)
  30. 30. Two instances The threads of each instance are bound to specific cores L2 cache L2 cache L1 L1 L1 L1 C1 C2 C3 C4 Instance #1 (4 threads) Instance #2 (4 threads)
  31. 31. Hardware Awareness 1. Gather hardware topology information: • #cores, #caches, #cache-levels, ... 2. Replicate the engine architecture: • One instance per last-level shared cache • Configure the thread pool sizes Self-configuration at startup:
  32. 32. Experimental Results Sequence Invoke/ 6x (a) /Sequence Sequential Foreach 6x Invoke/ (b) /Foreach Foreach Flow Invoke/ 6x (c) /Flow Parallel While 6x Invoke/ (d) /While Loop ......
  33. 33. Experimental Results 200 400 600 800 1000 1200 1400 1600 1800 0 20 40 60 80 100 120 140 Throughput(Instances/sec) Number of threads (per pool) ForEach Sequential Parallel Loop Just increasing the number of threads... # of threads Throughput(res/s)
  34. 34. Experimental Results Fixing the number of threads to the optimal value number # of Replicas Request Handler Kernel Invoker Total 1 2 6 12 12 6 2 1 12 6 2 1 12 6 2 1 36 36 36 36
  35. 35. Experimental Results 2 x AMD Barcelona 6 cores processors with 2 LLC 300 600 900 1200 1500 1800 2100 300 600 900 1200 1500 1800 2100 2400 2700 3000 3300 Throughput(Instances/sec) Number of Clients 1 Replica 2 Replicas 6 Replicas 12 Replicas Scalability (Throughput up to 3300 clients) # of clients Throughput(res/s)
  36. 36. Experimental Results 4 x Intel Xeon 4 cores processors with 4 LLC Relative Speedup at saturation 1 Rep 2 Rep 3 Rep 4 Rep
  37. 37. Conclusion • Targeting different multicores with the same engine architecture is a challenging issue • Simply increasing the number of threads is not always the optimal approach • The scalability depends on how a limited amount of threads are mapped to the cores • Hardware Aware Deployment can improve performance up to 30%
  38. 38. What’s next? • Heterogeneous replicas for heterogeneous hardware • Runtime auto-tuning • Load balancing • Failure handling
  39. 39. Thank you! Self-Organizing Service Oriented Architectures: http://sosoa.inf.usi.ch JOpera service composition engine: http://www.jopera.org me: daniele.bonetta@usi.ch

×