Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Efficient Window Aggregation with General Stream Slicing (EDBT 2019, Best Paper)

111 views

Published on

The Paper "Efficient Window Aggregation with General Stream Slicing" by Jonas Traub, Philipp M. Grulich, Alejandro Rodriguez Cuellar, Sebastian Breß, Tilmann Rabl, and Volker Markl was selected as best paper of the International Conference on Extending Database Technology (EDBT) 2019.

Abstract:
Window aggregation is a core operation in data stream processing. Existing aggregation techniques focus on reducing latency, eliminating redundant computations, and minimizing memory usage. However, each technique operates under different assumptions with respect to workload characteristics such as properties of aggregation functions (e.g., invertible, associative), window types
(e.g., sliding, sessions), windowing measures (e.g., time- or countbased), and stream (dis)order. Violating the assumptions of a technique can deem it unusable or drastically reduce its performance.
In this paper, we present the first general stream slicing technique for window aggregation. General stream slicing automatically adapts to workload characteristics to improve performance without sacrificing its general applicability. As a prerequisite, we identify workload characteristics which affect the performance and applicability of aggregation techniques. Our experiments show that general stream slicing outperforms alternative concepts by up to one order of magnitude.

Published in: Data & Analytics
  • Be the first to comment

  • Be the first to like this

Efficient Window Aggregation with General Stream Slicing (EDBT 2019, Best Paper)

  1. 1. Jonas Traub Philipp M. Grulich Alejandro Rodríguez Cuéllar Sebastian Breß Asterios Katsifodimos Tilmann Rabl Volker Markl Efficient Window Aggregation with General Stream Slicing 22nd International Conference on Extending Database Technology March 26-29, 2019, Lisbon, Portugal
  2. 2. Stream Processing Pipelines 27.03.2019 Efficient Window Aggregation with General Stream Slicing 2 A stream processing pipeline is a series of concurrently running operators.
  3. 3. Stream Processing Pipelines 27.03.2019 Efficient Window Aggregation with General Stream Slicing 2 A stream processing pipeline is a series of concurrently running operators. Window Aggregation
  4. 4. Stream Processing Pipelines 27.03.2019 Efficient Window Aggregation with General Stream Slicing 2 A stream processing pipeline is a series of concurrently running operators. Window Aggregation 53
  5. 5. Stream Processing Pipelines 27.03.2019 Efficient Window Aggregation with General Stream Slicing 2 A stream processing pipeline is a series of concurrently running operators. Window Aggregation 8
  6. 6. Motivation 27.03.2019 Efficient Window Aggregation with General Stream Slicing 3
  7. 7. Motivation 27.03.2019 Efficient Window Aggregation with General Stream Slicing 3
  8. 8. Stream Slicing Example 27.03.2019 Efficient Window Aggregation with General Stream Slicing 4
  9. 9. Stream Slicing Example 27.03.2019 Efficient Window Aggregation with General Stream Slicing 5
  10. 10. The number of slices depends on the workload. Stream Slicing Example 27.03.2019 Efficient Window Aggregation with General Stream Slicing 5
  11. 11. Stream Slicing Example 27.03.2019 Efficient Window Aggregation with General Stream Slicing 6
  12. 12. Stream Slicing Example 27.03.2019 Efficient Window Aggregation with General Stream Slicing 7
  13. 13. Stream Slicing Example 27.03.2019 Efficient Window Aggregation with General Stream Slicing 8
  14. 14. Stream Slicing Example 27.03.2019 Efficient Window Aggregation with General Stream Slicing 9
  15. 15. We store partial aggregates instead of all tuples.  Small memory footprint. Stream Slicing Example 27.03.2019 Efficient Window Aggregation with General Stream Slicing 9
  16. 16. Stream Slicing Example 27.03.2019 Efficient Window Aggregation with General Stream Slicing 10
  17. 17. We assign each tuple to exactly one slice.  O(1) per-tuple complexity. Stream Slicing Example 27.03.2019 Efficient Window Aggregation with General Stream Slicing 10
  18. 18. Stream Slicing Example 27.03.2019 Efficient Window Aggregation with General Stream Slicing 11
  19. 19. We require just a few computation steps to calculate final aggregates.  Low latency. Stream Slicing Example 27.03.2019 Efficient Window Aggregation with General Stream Slicing 11
  20. 20. Stream Slicing Example 27.03.2019 Efficient Window Aggregation with General Stream Slicing 12
  21. 21. We share partial aggregations among all users and queries.  Efficiency by preventing redundancy. Stream Slicing Example 27.03.2019 Efficient Window Aggregation with General Stream Slicing 12
  22. 22. General Stream Slicing 27.03.2019 Efficient Window Aggregation with General Stream Slicing 13
  23. 23. General Stream Slicing Workload Characteristics 27.03.2019 Efficient Window Aggregation with General Stream Slicing 13
  24. 24. General Stream Slicing Workload Characteristics Aggregation Functions distributive algebraic holistic associativity cummutativity invertibility 27.03.2019 Efficient Window Aggregation with General Stream Slicing 13
  25. 25. General Stream Slicing Workload Characteristics Window Types Context Free Forward Context Free Forward Context Aware Aggregation Functions distributive algebraic holistic associativity cummutativity invertibility 27.03.2019 Efficient Window Aggregation with General Stream Slicing 13
  26. 26. General Stream Slicing Workload Characteristics Window Types Context Free Forward Context Free Forward Context Aware Window Measures time tuple count arbitrary Aggregation Functions distributive algebraic holistic associativity cummutativity invertibility 27.03.2019 Efficient Window Aggregation with General Stream Slicing 13
  27. 27. General Stream Slicing Workload Characteristics Window Types Context Free Forward Context Free Forward Context Aware Stream Order in-order out-of-order Window Measures time tuple count arbitrary Aggregation Functions distributive algebraic holistic associativity cummutativity invertibility 27.03.2019 Efficient Window Aggregation with General Stream Slicing 13
  28. 28. General Stream Slicing Workload Characteristics Window Types Context Free Forward Context Free Forward Context Aware Stream Order in-order out-of-order Window Measures time tuple count arbitrary Aggregation Functions distributive algebraic holistic associativity cummutativity invertibility 27.03.2019 Efficient Window Aggregation with General Stream Slicing 13 General Stream Slicing combines generality and efficiency in a single solution.
  29. 29. Window Aggregation Concepts 27.03.2019 Efficient Window Aggregation with General Stream Slicing 14 Variations of Stream SlicingNon-Slicing Techniques
  30. 30. General Slicing Core 27.03.2019 Efficient Window Aggregation with General Stream Slicing 15
  31. 31. General Slicing Core The General Slicing Core adapts to work load characteristics and provides extension point for user-defined window types and aggregation functions. 27.03.2019 Efficient Window Aggregation with General Stream Slicing 15
  32. 32. General Stream Slicing Internals 27.03.2019 Efficient Window Aggregation with General Stream Slicing 16
  33. 33. General Stream Slicing Internals 27.03.2019 Efficient Window Aggregation with General Stream Slicing 16 Part 1: Three Fundamental Operations on Slices
  34. 34. General Stream Slicing Internals 27.03.2019 Efficient Window Aggregation with General Stream Slicing 16 Merge Slices Part 1: Three Fundamental Operations on Slices
  35. 35. General Stream Slicing Internals 27.03.2019 Efficient Window Aggregation with General Stream Slicing 16 Merge Slices Split Slices Part 1: Three Fundamental Operations on Slices
  36. 36. General Stream Slicing Internals 27.03.2019 Efficient Window Aggregation with General Stream Slicing 16 Merge Slices Split Slices Update Slices Part 1: Three Fundamental Operations on Slices
  37. 37. General Stream Slicing Internals 27.03.2019 Efficient Window Aggregation with General Stream Slicing 16 Merge Slices Split Slices Update Slices Part 1: Three Fundamental Operations on Slices Part 2: Adapt to Workload Characteristics:
  38. 38. General Stream Slicing Internals 27.03.2019 Efficient Window Aggregation with General Stream Slicing 16 Merge Slices Split Slices Update Slices Part 1: Three Fundamental Operations on Slices Part 2: Adapt to Workload Characteristics: Do we need to store original tuples?
  39. 39. General Stream Slicing Internals 27.03.2019 Efficient Window Aggregation with General Stream Slicing 16 Merge Slices Split Slices Update Slices Part 1: Three Fundamental Operations on Slices Part 2: Adapt to Workload Characteristics: Do we need to store original tuples? Do we potentially need to split slices?
  40. 40. General Stream Slicing Internals 27.03.2019 Efficient Window Aggregation with General Stream Slicing 16 Merge Slices Split Slices Update Slices Part 1: Three Fundamental Operations on Slices Part 2: Adapt to Workload Characteristics: Do we need to store original tuples? Do we potentially need to split slices? Do we potentially need to remove tuples from slices?
  41. 41. General Stream Slicing Internals 27.03.2019 Efficient Window Aggregation with General Stream Slicing 16 Merge Slices Split Slices Update Slices Part 1: Three Fundamental Operations on Slices Part 2: Adapt to Workload Characteristics: Do we need to store original tuples? Do we potentially need to split slices? Do we potentially need to remove tuples from slices?
  42. 42. General Stream Slicing Internals 27.03.2019 Efficient Window Aggregation with General Stream Slicing 16 Merge Slices Split Slices Update Slices Part 1: Three Fundamental Operations on Slices Part 2: Adapt to Workload Characteristics: Do we need to store original tuples? Do we potentially need to split slices? Do we potentially need to remove tuples from slices? General Stream Slicing adapts to current workload characteristics.
  43. 43. Impact of Workload Characteristics (Example) 27.03.2019 Efficient Window Aggregation with General Stream Slicing 17
  44. 44. Impact of Workload Characteristics (Example) 27.03.2019 Efficient Window Aggregation with General Stream Slicing 17 1 2 1 4 3 1 5 2 2 3 6 1 2 2 1
  45. 45. Impact of Workload Characteristics (Example) 27.03.2019 Efficient Window Aggregation with General Stream Slicing 17 1 2 1 4 3 1 5 2 2 3 6 1 2 2 1 Count-based tumbling window with a length of 5 tuples.
  46. 46. Impact of Workload Characteristics (Example) 27.03.2019 Efficient Window Aggregation with General Stream Slicing 17 1 2 1 4 3 1 5 2 2 3 6 1 2 2 1 1 2 3 4 5 6 7 8 9 10 11 12 13 14 Tuple Count 15 Count-based tumbling window with a length of 5 tuples.
  47. 47. Impact of Workload Characteristics (Example) 27.03.2019 Efficient Window Aggregation with General Stream Slicing 17 1 2 1 4 3 1 5 2 2 3 6 1 2 2 1 1 2 3 4 5 6 7 8 9 10 11 12 13 14 Tuple Count 15 Count-based tumbling window with a length of 5 tuples. 11 13 12
  48. 48. Impact of Workload Characteristics (Example) 27.03.2019 Efficient Window Aggregation with General Stream Slicing 17 1 2 1 4 3 1 5 2 2 3 6 1 2 2 1 1 2 3 4 5 6 7 8 9 10 11 12 13 14 Tuple Count 15 11 13 12 What if the stream is out-of-order?
  49. 49. Impact of Workload Characteristics (Example) 27.03.2019 Efficient Window Aggregation with General Stream Slicing 17 1 2 1 4 3 1 5 2 2 3 6 1 2 2 1 1 2 3 4 5 6 7 8 9 10 11 12 13 14 Tuple Count 15 Event Time 5 12 13 20 35 37 42 46 48 51 52 57 63 64 65 11 13 12 What if the stream is out-of-order?
  50. 50. Impact of Workload Characteristics (Example) 27.03.2019 Efficient Window Aggregation with General Stream Slicing 17 1 2 1 4 3 1 5 2 2 3 6 1 2 2 1 1 2 3 4 5 6 7 8 9 10 11 12 13 14 Tuple Count 15 Event Time 5 12 13 20 35 37 42 46 48 51 52 57 63 64 65 11 13 12 What if the stream is out-of-order? 5 49 Out-of-order Tuple
  51. 51. Impact of Workload Characteristics (Example) 27.03.2019 Efficient Window Aggregation with General Stream Slicing 17 1 2 1 4 3 1 5 2 2 3 6 1 2 2 1 1 2 3 4 5 6 7 8 9 10 11 12 13 14 Tuple Count 15 Event Time 5 12 13 20 35 37 42 46 48 51 52 57 63 64 65 11 13 12 What if the stream is out-of-order? 5 49 Out-of-order Tuple
  52. 52. Impact of Workload Characteristics (Example) 27.03.2019 Efficient Window Aggregation with General Stream Slicing 17 1 2 1 4 3 1 5 2 2 3 6 1 2 2 1 1 2 3 4 5 6 7 8 9 10 11 12 13 14 Tuple Count 15 Event Time 5 12 13 20 35 37 42 46 48 51 52 57 63 64 65 11 13 12 What if the stream is out-of-order? 5 49
  53. 53. Impact of Workload Characteristics (Example) 27.03.2019 Efficient Window Aggregation with General Stream Slicing 17 1 2 1 4 3 1 5 2 2 3 6 1 2 2 1 1 2 3 4 5 6 7 8 9 10 11 12 13 14 Tuple Count 15 Event Time 5 12 13 20 35 37 42 46 48 51 52 57 63 64 65 11 13 12 What if the stream is out-of-order? 5 49 13 12
  54. 54. Impact of Workload Characteristics (Example) 27.03.2019 Efficient Window Aggregation with General Stream Slicing 17 1 2 1 4 3 1 5 2 2 3 6 1 2 2 1 1 2 3 4 5 6 7 8 9 10 11 12 13 14 Tuple Count 15 Event Time 5 12 13 20 35 37 42 46 48 51 52 57 63 64 65 11 13 12 1 2 1 4 3 1 5 2 2 3 6 1 2 2 1 What if the stream is out-of-order? 5 49 13 12
  55. 55. Impact of Workload Characteristics (Example) 27.03.2019 Efficient Window Aggregation with General Stream Slicing 17 1 2 1 4 3 1 5 2 2 3 6 1 2 2 1 1 2 3 4 5 6 7 8 9 10 11 12 13 14 Tuple Count 15 Event Time 5 12 13 20 35 37 42 46 48 51 52 57 63 64 65 11 13 12 1 2 1 4 3 1 5 2 2 3 6 1 2 2 1 What if the stream is out-of-order? 5 49 13 12 5
  56. 56. Impact of Workload Characteristics (Example) 27.03.2019 Efficient Window Aggregation with General Stream Slicing 17 1 2 1 4 3 1 5 2 2 3 6 1 2 2 1 1 2 3 4 5 6 7 8 9 10 11 12 13 14 Tuple Count 15 Event Time 5 12 13 20 35 37 42 46 48 51 52 57 63 64 65 11 13 12 1 2 1 4 3 1 5 2 2 3 6 1 2 2 1 What if the stream is out-of-order? 5 49 13 125 + - 3 5
  57. 57. Impact of Workload Characteristics (Example) 27.03.2019 Efficient Window Aggregation with General Stream Slicing 17 1 2 1 4 3 1 5 2 2 3 6 1 2 2 1 1 2 3 4 5 6 7 8 9 10 11 12 13 14 Tuple Count 15 Event Time 5 12 13 20 35 37 42 46 48 51 52 57 63 64 65 11 13 12 1 2 1 4 3 1 5 2 2 3 6 1 2 2 1 What if the stream is out-of-order? 5 49 13 123 1+ -5 + - 3 5
  58. 58. Impact of Workload Characteristics (Example) 27.03.2019 Efficient Window Aggregation with General Stream Slicing 17 1 2 1 4 3 1 5 2 2 3 6 1 2 2 1 1 2 3 4 5 6 7 8 9 10 11 12 13 14 Tuple Count 15 Event Time 5 12 13 20 35 37 42 46 48 51 52 57 63 64 65 11 13 12 1 2 1 4 3 1 5 2 2 3 6 1 2 2 1 What if the stream is out-of-order? 5 49 13 123 1+ -5 + - 3 5 What if the aggregation function is not invertible?
  59. 59. In-order Processing with Context Free Windows 27.03.2019 Efficient Window Aggregation with General Stream Slicing 18
  60. 60. In-order Processing with Context Free Windows 27.03.2019 Efficient Window Aggregation with General Stream Slicing 18 Slicing techniques scale to large numbers of concurrent windows.
  61. 61. Impact of Stream Order 27.03.2019 Efficient Window Aggregation with General Stream Slicing 19
  62. 62. Impact of Stream Order 27.03.2019 Efficient Window Aggregation with General Stream Slicing 19 Slicing techniques are robust against out-of-order tuples.
  63. 63. Impact of Aggregation Functions (20% out-of-order) 27.03.2019 Efficient Window Aggregation with General Stream Slicing 20
  64. 64. Impact of Aggregation Functions (20% out-of-order) 27.03.2019 Efficient Window Aggregation with General Stream Slicing 20 Stream Slicing performs well on many different kinds of aggregation functions.
  65. 65. Efficient Window Aggregation with General Stream Slicing 27.03.2019 Efficient Window Aggregation with General Stream Slicing 21
  66. 66. Efficient Window Aggregation with General Stream Slicing • We identify workload characteristics which impact applicability and performance of window aggregation techniques. 27.03.2019 Efficient Window Aggregation with General Stream Slicing 21
  67. 67. Efficient Window Aggregation with General Stream Slicing • We identify workload characteristics which impact applicability and performance of window aggregation techniques. • We present a generally applicable and highly efficient solution for streaming window aggregation. 27.03.2019 Efficient Window Aggregation with General Stream Slicing 21
  68. 68. Efficient Window Aggregation with General Stream Slicing • We identify workload characteristics which impact applicability and performance of window aggregation techniques. • We present a generally applicable and highly efficient solution for streaming window aggregation. • We show that general stream slicing is generally applicable and offers better performance than alternative approaches. 27.03.2019 Efficient Window Aggregation with General Stream Slicing 21
  69. 69. Efficient Window Aggregation with General Stream Slicing • We identify workload characteristics which impact applicability and performance of window aggregation techniques. • We present a generally applicable and highly efficient solution for streaming window aggregation. • We show that general stream slicing is generally applicable and offers better performance than alternative approaches. 27.03.2019 Efficient Window Aggregation with General Stream Slicing 21 tu-berlin-dima.github.io/scotty-window-processor Open Source Repository:

×