Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Performance Modeling of Stream Joins

134 views

Published on

These are the slides for the paper "Performance Modeling of Stream Joins" presented at the international ACM conference on Distributed Event-Based Systems (DEBS)

Published in: Science
  • Be the first to comment

  • Be the first to like this

Performance Modeling of Stream Joins

  1. 1. Performance Modeling of Stream Joins Vincenzo Gulisano1, Alessandro V. Papadopoulos2, Yiannis Nikolakopoulos1, Marina Papatriantafilou1, Philippas Tsigas1 1 2
  2. 2. Agenda – Performance modeling of stream joins • Performance modeling of stream joins • Performance modeling of stream joins • The model • Evaluation • Conclusions V. Gulisano Performance Modeling of Stream Joins 2
  3. 3. Streaming applications: OP OP OP Static/dynamic decisions: - deployment - load balancing - elasticity - load shedding f(operators’ cost / throughput / latency ...) measure (live) take decisions model V. Gulisano Performance Modeling of Stream Joins 3
  4. 4. Agenda – Performance modeling of stream joins • Performance modeling of stream joins • Performance modeling of stream joins • The model • Evaluation • Conclusions V. Gulisano Performance Modeling of Stream Joins 4
  5. 5. V. Gulisano 5 Out WR WS Time-based windows - FIXED interval of time - VARIABLE number of tuples Tuple-based windows - VARIABLE interval of time - FIXED number of tuples Stream joins R S <ts,A1,...,An> Performance Modeling of Stream Joins
  6. 6. V. Gulisano 6 R S WR WS Deterministic execution - results do not depend on the interleaving of R and S tuples Process ready tuples in timestamp order A tuple is ready if its timestamp is less than or equal to the minimum of the last tuple’s timestamp from R and S Stream joins – deterministic execution READY Performance Modeling of Stream Joins
  7. 7. V. Gulisano 7 Stream joins – multiple physical streams |R| |S| WR WS READY Performance Modeling of Stream Joins
  8. 8. V. Gulisano 8 Stream joins – parallel execution Out READY n threads WR WS thread WR WS thread ... READY ~ 1/n of comparisons, 1/n of outputs Performance Modeling of Stream Joins
  9. 9. Agenda – Performance modeling of stream joins • Performance modeling of stream joins • Performance modeling of stream joins • The model • Evaluation • Conclusions V. Gulisano Performance Modeling of Stream Joins 9
  10. 10. Modeling goal V. Gulisano Performance Modeling of Stream Joins 10 Stream join Characteristics of the input streams Configuration Throughput yi Latency li Out R S Latency Time difference over time interval i
  11. 11. Scope of the presentation • The presentation covers only a portion of the whole model in the paper • Equations can be found in the paper • Presentation covers and discusses main dependencies between • Input characteristics <-> Throughput / Latency • Join configuration <-> Throughput / Latency V. Gulisano Performance Modeling of Stream Joins 11
  12. 12. Step-by-step model 1. Non-deterministic stream join 2. Deterministic stream join with multiple physical streams 3. Deterministic parallel stream join with multiple physical streams V. Gulisano Performance Modeling of Stream Joins 12
  13. 13. V. Gulisano 13 R S WR WS Out Time-based Tuple-based y ljoin ∝ ri, si ∝ ωR i, ωS i Non-deterministic stream join ri si ωRωi ωSωi ∝ ωR i, ωS i ∝ ri, si ∝ ri, si Performance Modeling of Stream Joins Dependencies
  14. 14. Deterministic stream join with multiple physical streams V. Gulisano Performance Modeling of Stream Joins 14 |R| |S| WR WS thread READY
  15. 15. V. Gulisano Performance Modeling of Stream Joins 15 1 2 3 Deterministic stream join with multiple physical streams latency overhead for results produced by 1
  16. 16. V. Gulisano Performance Modeling of Stream Joins 16 2 3 4 Deterministic stream join with multiple physical streams latency overhead for results produced by 2
  17. 17. V. Gulisano Performance Modeling of Stream Joins 17 3 4 5 Deterministic stream join with multiple physical streams latency overhead for results produced by 3
  18. 18. V. Gulisano 18Performance Modeling of Stream Joins Time-based Tuple-based y ljoin lin ∝ ri, si ∝ ωR i, ωS i ∝ ωR i, ωS i ∝ ri, si ∝ ri, si ∝ 1/ri, 1/si ∝ |R|, |S| |R| |S| WR WS thread READY Deterministic stream join with multiple physical streams Dependencies
  19. 19. V. Gulisano Performance Modeling of Stream Joins 19 Time-based Tuple-based y ljoin lin lout ∝ ri, si ∝ ωR i, ωS i ∝ ωR i, ωS i ∝ ri, si ∝ ri, si ∝ 1/ri, 1/si ∝ |R|, |S| ∝ 1/n ∝ n n threads WR WS thread WR WS thread ... READY Out READY Deterministic parallel stream join with multiple physical streams Dependencies
  20. 20. Agenda – Performance modeling of stream joins • Performance modeling of stream joins • Performance modeling of stream joins • The model • Evaluation • Conclusions V. Gulisano Performance Modeling of Stream Joins 20
  21. 21. Evaluation • Runs common benchmark (Handshake Join, ScaleJoin) • results compare the simulator’s output with a Java implementation (available at https://github.com/dcs-chalmers/Join_Model) V. Gulisano Performance Modeling of Stream Joins 21
  22. 22. Evaluation - Throughput V. Gulisano Performance Modeling of Stream Joins 22
  23. 23. Evaluation - Throughput V. Gulisano Performance Modeling of Stream Joins 23
  24. 24. Evaluation - Throughput V. Gulisano Performance Modeling of Stream Joins 24
  25. 25. Evaluation – Latency V. Gulisano Performance Modeling of Stream Joins 25 Non deterministic
  26. 26. Evaluation – Latency V. Gulisano Performance Modeling of Stream Joins 26 Non deterministic
  27. 27. Evaluation – Latency V. Gulisano Performance Modeling of Stream Joins 27 Non deterministic Deterministic, 5 physical streams
  28. 28. Evaluation – Latency V. Gulisano Performance Modeling of Stream Joins 28 Non deterministic Deterministic, 5 physical streams
  29. 29. Evaluation – Latency V. Gulisano Performance Modeling of Stream Joins 29 Deterministic, 5 physical streams Deterministic, parallel (3 threads), 5 physical streams
  30. 30. Evaluation – Latency V. Gulisano Performance Modeling of Stream Joins 30 Deterministic, 5 physical streams Deterministic, parallel (3 threads), 5 physical streams
  31. 31. Agenda – Performance modeling of stream joins • Performance modeling of stream joins • Performance modeling of stream joins • The model • Evaluation • Conclusions V. Gulisano Performance Modeling of Stream Joins 31
  32. 32. Conclusions Comprehensive dynamic model for stream joins • Non-deterministic vs deterministic execution • Single vs multiple physical streams • Centralized vs parallel • Non-saturated vs saturated Very close matching between the model and empirical measurements Open for future work, for instance: • Other operators • Worst case vs average case ... V. Gulisano Performance Modeling of Stream Joins 32
  33. 33. Thanks! questions? V. Gulisano Performance Modeling of Stream Joins 33

×