Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Tutorial: The Role of Event-Time Analysis Order in Data Streaming

Slides for our tutorial, titled “The Role of Event-Time Analysis Order in Data Streaming”, presented at the 14th ACM International Conference on Distributed and Event-Based Systems (DEBS) conference. We have recorded the tutorial, and you can find the videos at the following links:
Part 1: https://youtu.be/SW_WS6ULsdY
Part 2: https://youtu.be/bq3ECNvPwOU

You can find this slides, as well as the code examples, at https://github.com/vincenzo-gulisano/debs2020_tutorial_event_time and at SlideS

  • Be the first to comment

Tutorial: The Role of Event-Time Analysis Order in Data Streaming

  1. 1. Tutorial: The Role of Event-Time Order in Data Streaming Analysis VincenzoGulisano Chalmers University ofTechnology Gothenburg, Sweden vincenzo.gulisano@chalmers.se Dimitris Palyvos-Giannas Chalmers University ofTechnology Gothenburg, Sweden palyvos@chalmers.se Bastian Havers Chalmers University ofTechnology &Volvo Cars Gothenburg, Sweden havers@chalmers.se Marina Papatriantafilou Chalmers University ofTechnology Gothenburg, Sweden ptrianta@chalmers.se
  2. 2. 2
  3. 3. Agenda • Motivation, preliminaries and examples about data streaming and Stream Processing Engines • Causes of out-of-order data and solutions enforcing total-ordering • Pros/Cons of total-ordering • Relaxation of total-ordering and the watermarks 3 https://github.com/ vincenzo- gulisano/debs2020_tutorial_event_time
  4. 4. Agenda • Motivation, preliminaries and examples about data streaming and Stream Processing Engines • Causes of out-of-order data and solutions enforcing total-ordering • Pros/Cons of total-ordering • Relaxation of total-ordering and the watermarks 4
  5. 5. Terabyte Scale 1012 bytes 1984 Zetabyte Scale 109 Terabytes The era of Big Data 5
  6. 6. Where does Big Data Originate? 1 trillion sensors by 2030 in the Internet of Things (IoT) uploaded toYouTube every minute 2.32 billion Facebook users 219 billion photos uploaded to Facebook 1 online interaction every 18 seconds by 2025 6 500 hours of video
  7. 7. Main Memory Database Management Systems (DBMSs) vs. Stream Processing Engines (SPEs) 7 Disk 1 Data Query Processing 3 Query results 2 Query Main Memory Query Processing Continuous QueryData Query results
  8. 8. 8 time Borealis The Aurora Project STanfordstREamdatAManager NiagaraCQ COUGAR StreamCloud
  9. 9. Flink-related images / code snippets in the following are taken from: https://flink.apache.org/ 9
  10. 10. Data Stream: unbounded sequence of tuples sharing the same schema Example: vehicles’ speed reports 10 time Field Field vehicle id text time (secs) text speed (Km/h) double X coordinate double Y coordinate double A 8:00 55.5 X1 Y1 Let’s assume each source (e.g., vehicle) produces and delivers a timestamp-sorted stream A 8:07 34.3 X3 Y3 A 8:03 70.3 X2 Y2
  11. 11. Continuous query (or simply query): Directed Acyclic Graph (DAG) of streams and operators OP OP OP OP OP OP OP source op (1+ out streams) sink op (1+ in streams) stream op (1+ in, 1+ out streams) 11
  12. 12. Data Streaming Operators Two main types: • Stateless operators • do not maintain any state • one-by-one processing • if they maintain some state, such state does not evolve depending on the tuples being processed • Stateful operators • maintain a state that evolves depending on the tuples being processed • produce output tuples that depend on multiple input tuples 12 OP OP
  13. 13. Stateless Operators Filter / route tuples based on one (or more) conditions 13 Filter ...Map Transform each tuple
  14. 14. 14
  15. 15. Stateful Operators Aggregate information from multiple tuples (e.g., max, min, sum, ...) 15 Join tuples coming from 2 streams given a certain predicate Aggregate Join
  16. 16. Windows and Stateful Analysis Stateful operations are done over windows: • Time-based (e.g., tuples in the last 10 minutes) • Tuple-based (e.g., given the last 50 tuples) 16 time [8:00,9:00) [8:20,9:20) [8:40,9:40) Example of time-based window of size 1 hour and advance 20 minutes  How many tuple in a window?  Which time period does a window span?
  17. 17. Time-based sliding window aggregation (count) 17 Counter: 4 time [8:00,9:00) 8:05 8:15 8:22 8:45 9:05 Output: 4 Counter: 1 Counter: 2 Counter: 3 Counter: 3 time 8:05 8:15 8:22 8:45 9:05 [8:20,9:20)
  18. 18. Time-based sliding window joining 18 t1 t2 t3 t4 t1 t2 t3 t4 R S Sliding window Window sizeWS WSWR Predicate P
  19. 19. Windows and stateful analysis 19
  20. 20. Basic operators and user-defined operators 20 Besides a set of basic operators, SPEs usually allow the user to define ad-hoc operators (e.g., when existing aggregation are not enough)
  21. 21. SPEs and operators' variants • Each SPE might define its own variants of certain streaming operators: 21 t1 t2 t3 t4 t1 t2 t3 t4 R S Sliding window Window sizeWS WSWR Predicate P
  22. 22. Sample Query "every five minutes, of all vehicles that braked significantly, find the one that braked the hardest" 22 time A 8:00 55.5 X1 Y1 ... B 8:07 34.3 X3 Y3 ... B 8:03 70.3 X2 Y2 ...
  23. 23. Sample Query Remove unused fields Map Field vehicle id time (secs) speed (Km/h) X coordinate Y coordinate ... Field vehicle id time (secs) speed (Km/h) Every minute, compute average speed of each vehicle during the last 2.5 minutes Aggregate Field vehicle id time (secs) avg speed (Km/h) Join High average speed and slow current speed? Filter Field vehicle id time (secs) braking factor Join on vehicle id in last minute Field vehicle id time (secs) avg speed (Km/h) speed (Km/h) Aggregate Every 5 minutes, produce vehicle that braked the hardest during last 5 minutes 23
  24. 24. Agenda • Motivation, preliminaries and examples about data streaming and Stream Processing Engines • Causes of out-of-order data and solutions enforcing total-ordering • Pros/Cons of total-ordering • Relaxation of total-ordering and the watermarks 24
  25. 25. From an abstract query … to streaming application run by an SPE 25 OP1 OP2 OP4 OP6OP3 OP5Source Sink OP1 OP2 OP4 OP6OP3 OP5Source Sink Source OP2 OP3 OP3 OP5
  26. 26. OP1 OP2 OP4 OP6OP3 OP5 Source Sink Source OP2 OP3 OP3 OP5 Node Node NodeNode Node Process Processes Process Process Process Process From an abstract query … to streaming application run by an SPE 26
  27. 27. 27 From an abstract query …to streaming application run by Flink - 1/3
  28. 28. 28 From an abstract query …to streaming application run by Flink - 1/3
  29. 29. 29 From an abstract query …to streaming application run by Flink - 1/3
  30. 30. Causes of out-of-order data: 30 Sources themselves Asynchronous Distributed Parallel executions
  31. 31. Data sources that produce out-of-order data • Discussed in many related works (e.g., Babu, Shivnath, Utkarsh Srivastava, and JenniferWidom. "Exploiting k-constraints to reduce memory overhead in continuous queries over data streams." ACM Transactions on Database Systems (TODS) 29.3 (2004): 545-580.) • Battery-operated devices, unreliable wireless networks, … 1 trillion sensors by 2030 in the Internet of Things (IoT) 31
  32. 32. Causes of out-of-order data 32 Sources themselves Asynchronous Distributed Parallel executions
  33. 33. The 3-step procedure (sequential stream join) 33 For each incoming tuple t: 1. compare t with all tuples in opposite window given predicate P 2. add t to its window 3. remove stale tuples from t’s window Add tuplesto S Add tuples to R Prod R Prod S Consume resultsConsPU We assume each producer delivers tuples in timestamp order
  34. 34. The 3-step procedure, is it enough? 34 t1 t2 t1 t2 R S WSWR t3 t1 t2 t1 t2 R S WSWR t4 t3
  35. 35. Causes of out-of-order data 35 Asynchronous Distributed Parallel executions Any operator fed data from multiple logical / physical stream can potentially observe out- of-order data
  36. 36. Parallel execution • General approach 36 OPA OPB
  37. 37. Parallel execution • General approach R: Router M: Merger OPA OPB OPA RM Thread 1 OPA RM Thread m … OPA RM Thread 1 Thread 2 Thread 3 OPA RM Thread 1 Thread 2 OPA RM Thread 1 Thread 2 … 37
  38. 38. Parallel execution • General approach R: Router M: Merger OPA OPB OPA RM Thread 1 OPA RM Thread m … OPA RM Thread 1 Thread 2 Thread 3 OPA RM Thread 1 Thread 2 OPA RM Thread 1 Thread 2 … 38
  39. 39. Parallel execution • General approach R: Router M: Merger OPA OPB OPA RM Thread 1 OPA RM Thread m … OPB RM Thread 1 OPB RM Thread n … 39
  40. 40. Parallel execution • General approach R: Router M: Merger OPA OPB OPA RM Thread 1 OPA RM Thread m … OPB RM Thread 1 OPB RM Thread n … 40
  41. 41. Parallel execution • Stateful operators: Semantic awareness • Aggregate: count within last hour, group-by vehicle id 41 Previous Subcluster R … R … M Agg1 M Agg2 M Agg3 … … … Vehicle A
  42. 42. Parallel execution • Depending on the stateful operator semantic: • Partition input stream into keys • Each key is processed by 1 thread • # keys >> # threads/nodes 42
  43. 43. Parallel execution • Depending on the stateful operator semantic: • Partition input stream into keys • Each key is processed by 1 thread • # keys >> # threads/nodes 43 Keys domain Agg1 Agg2 Agg3 A D E B C F
  44. 44. Parallel execution • Depending on the stateful operator semantics: • Partition input stream into keys • Each key is processed by 1 thread • # keys >> # threads/nodes 44 Keys domain Agg1 Agg2 Agg3 A D E B C F
  45. 45. 45
  46. 46. A 8:00 55.5 X1 Y1 A 8:07 34.3 X3 Y3 A 8:03 70.3 X2 Y2 Parallel execution • Stateful operators: Semantic awareness • Aggregate: count within last hour, group-by vehicle id 46 … R … R M Agg1 M Agg2 M Agg3 … … … Map Map Vehicle A Round-robin (stateless)
  47. 47. Parallel execution • Stateful operators: Semantic awareness • Aggregate: count within last hour, group-by vehicle id 47 R… R… M Agg1 M Agg2 M Agg3 … … … Vehicle A Map Map Round-robin (stateless) A 8:00 55.5 X1 Y1 A 8:07 34.3 X3 Y3 A 8:03 70.3 X2 Y2
  48. 48. Inherent disorder 48 Map Aggregate Join Filter Aggregate Print-to-file operator P Map0 Map1 Map2 Map3 P Aggregate Filter AggregateJoin Disorder from parallelism
  49. 49. how to merge several timestamp-sorted streams... 49 M ... ...into one timestamp-sorted stream?
  50. 50. Gulisano, Vincenzo, et al. "Streamcloud: An elastic and scalable data streaming system." IEEE Transactions on Parallel and Distributed Systems 23.12 (2012): 2351-2365. 50
  51. 51. Balazinska, Magdalena, et al. "Fault-tolerance in the Borealis distributed stream processing system." Proceedings of the 2005 ACM SIGMOD international conference on Management of data. 2005. 51
  52. 52. Gulisano, Vincenzo, et al. "Scalejoin: A deterministic, disjoint-parallel and skew-resilient stream join." IEEE Transactions on Big Data (2016). 52
  53. 53. Which one to choose? • Different options have different costs • might require different types of data structures • might require “special tuples” (more on this in the following part of the tutorial) • When the price of merge-sorting is paid by who provides the tuples rather than who receives them, the system can scale better 53
  54. 54. Tutorial: The Role of Event-Time Order in Data Streaming Analysis VincenzoGulisano Chalmers University ofTechnology Gothenburg, Sweden vincenzo.gulisano@chalmers.se Dimitris Palyvos-Giannas Chalmers University ofTechnology Gothenburg, Sweden palyvos@chalmers.se Bastian Havers Chalmers University ofTechnology &Volvo Cars Gothenburg, Sweden havers@chalmers.se Marina Papatriantafilou Chalmers University ofTechnology Gothenburg, Sweden ptrianta@chalmers.se
  55. 55. Tutorial: The Role of Event-Time Order in Data Streaming Analysis VincenzoGulisano Chalmers University ofTechnology Gothenburg, Sweden vincenzo.gulisano@chalmers.se Dimitris Palyvos-Giannas Chalmers University ofTechnology Gothenburg, Sweden palyvos@chalmers.se Bastian Havers Chalmers University ofTechnology &Volvo Cars Gothenburg, Sweden havers@chalmers.se Marina Papatriantafilou Chalmers University ofTechnology Gothenburg, Sweden ptrianta@chalmers.se
  56. 56. 56 Agenda • Motivation, preliminaries and examples about data streaming and Stream Processing Engines • Causes of out-of-order data and solutions enforcing total-ordering • Pros/Cons of total-ordering • Relaxation of total-ordering and the watermarks
  57. 57. 57 Agenda • Motivation, preliminaries and examples about data streaming and Stream Processing Engines • Causes of out-of-order data and solutions enforcing total-ordering • Pros/Cons of total-ordering • Relaxation of total-ordering and the watermarks
  58. 58. Pros/Cons of total-ordering • Cons: • Expensive (computation- and latency-wise) • An “overkill” for certain applications (more of this in the following slides) • Pros: • Determinism • Synchronization • Eager purging of stale state 58
  59. 59. Pros/Cons of total-ordering • Cons: • Expensive (computation- and latency-wise) • An “overkill” for certain applications (more of this in the following slides) • Pros: • Determinism • Synchronization • Eager purging of stale state 59
  60. 60. Cost • We need to temporary maintain tuples • Linear in the number of tuples we receive, which depends on the streams’ rate • We need to sort tuples… O(n log(n)) • (n is number of sources or tuples, depending on the case) • We need data from all sources, the processing latency depends on the slowest source. • The latency overhead can be estimated based on the sources’ rates1 1Gulisano, Vincenzo, et al. "Performance modeling of stream joins." Proceedings of the 11th ACM International Conference on Distributed and Event-based Systems. 2017. 60
  61. 61. Estimating the latency overhead 1Gulisano, Vincenzo, et al. "Performance modeling of stream joins." Proceedings of the 11th ACM International Conference on Distributed and Event-based Systems. 2017. 61
  62. 62. Pros/Cons of total-ordering • Cons: • Expensive (computation- and latency-wise) • An “overkill” for certain applications (more of this in the following slides) • Pros: • Determinism • Synchronization • Eager purging of stale state 62
  63. 63. Determinism 63 OP1 OP2 OP4 OP6OP3 OP5Source Sink OP1 OP2 OP4 OP6OP3 OP5Source Sink Source OP2 OP3 OP3 OP5 Balazinska, Magdalena, et al. "Fault-tolerance in the Borealis distributed stream processing system." Proceedings of the 2005 ACM SIGMOD international conference on Management of data. 2005. Gulisano, Vincenzo. StreamCloud: an elastic parallel-distributed stream processing engine. Diss. 2012.
  64. 64. Determinism 64 t1 t2 S1 S2 t3 t4 t5 t6 t7 t8 t9 t1 t2 t3 t4 t5 t6 t7 t8 t9 … Tuple-based window, size: 4 / advance: 1 Hwang, Jeong-Hyon, Ugur Cetintemel, and Stan Zdonik. "Fast and reliable stream processing over wide area networks." 2007 IEEE 23rd International Conference on Data Engineering Workshop. IEEE, 2007. Gulisano, Vincenzo, Yiannis Nikolakopoulos, Marina Papatriantafilou, and Philippas Tsigas. "Scalejoin: A deterministic, disjoint-parallel and skew-resilient stream join." IEEE Transactions on Big Data (2016).
  65. 65. Pros/Cons of total-ordering • Cons: • Expensive (computation- and latency-wise) • An “overkill” for certain applications (more of this in the following slides) • Pros: • Determinism • Synchronization • Eager purging of stale state 65
  66. 66. Synchronization OP1 Source Source OP2 OP2 OP4 OP6OP3 OP5 Sink OP3 OP3 OP5 OP3 OP3 OP3 Replicas OP4 OP6 OP5 Sink OP5 Balazinska, Magdalena, et al. "Fault-tolerance in the Borealis distributed stream processing system." Proceedings of the 2005 ACM SIGMOD international conference on Management of data. 2005. 66
  67. 67. Synchronization t1 t2 R S WR t3 t4 R S t4 R S t4 R S t4 t1 WR t2 WR WR t3 Gulisano, Vincenzo, Yiannis Nikolakopoulos, Marina Papatriantafilou, and Philippas Tsigas. "Scalejoin: A deterministic, disjoint-parallel and skew-resilient stream join." IEEE Transactions on Big Data (2016). 67
  68. 68. Synchronization 68 t1 t2R S t3 t4 t5 t6 t7 t8 t9 • Act as a barrier • Carry the reconfiguration to be applied • Change the parallelism degree • Change other configuration t3 t4 t5 t6… … : Wait for , make sure are ready and stop : Wait for , make sure are ready and stop Najdataei, Hannaneh, et al. "Stretch: Scalable and elastic deterministic streaming analysis with virtual shared-nothing parallelism." Proceedings of the 13th ACM International Conference on Distributed and Event- based Systems. 2019.
  69. 69. 69 Agenda • Motivation, preliminaries and examples about data streaming and Stream Processing Engines • Causes of out-of-order data and solutions enforcing total-ordering • Pros/Cons of total-ordering • Relaxation of total-ordering and the watermarks
  70. 70. Total-Ordering Recap 70 Operator ts: 7 0 10 20 30 40 ts: 18 ts: 16ts: 35 ts: 3ts: 24ts: 25 ts: 3ts: 7ts: 16 …Merge Sort ts: 3 ts: 7 ts: 16 ts: 24 ts: 25 ts: 35 ts: 18✓ Correct window assignment ✓ Order in each window
  71. 71. Relaxing Correctness 71 Correctness Performance Flexibility ✓ In-order input streams ✓ Order in each window ✓ Correct window assignment ✗ In-order input streams ✓ Order in each window ✓ Correct window assignment
  72. 72. Buffering Approaches 72 Operator ts: 25 ts: 3 ts: 7ts: 16ts: 2 Buffer ts: 3ts: 7ts: 16ts: 25 Sort
  73. 73. Relaxing Correctness 73 Correctness Performance Flexibility ✓ In-order input streams ✓ Order in each window ✓ Correct window assignment ✗ In-order input streams ✓ Order in each window ✓ Correct window assignment ✗ In-order input streams ✗ Order in each window ✓ Correct window assignment
  74. 74. Disorder + Correct Window Assignment 1. When to create each window? 2. When to close each window (and produce result)? 74 0 10 20 ts: 7ts: 16 ts: 18ts: 3ts: 21
  75. 75. Creating Windows 75 0 10 20 ts: 7 ts: 18ts: 9
  76. 76. Closing Windows 76 0 10 20 ts: 7ts: 18 ts: 9ts: 21ts: 17 CLOSED Operator needs some guarantee it will not receive tuples with ts < W Safely close all windows where right_boundary ≤ W
  77. 77. Watermarks 77 Assume we can compute a monotonic function 𝐹: 𝑂 → 𝐸 that returns W ∈ E the earliest event time of any tuple that can arrive at operator O in the future 0 10 20 ts: 7ts: 18 ts: 9ts: 21ts: 17 We call the value of this monotonic function F* the (low) watermark of operator O!  Monotonicity ➔ no tuples with ts < W will arrive in the future.  Solves problem of safely closing windows! CLOSED * The watermark of operator O is function of O and all its upstream peers, but we omit the latter for brevity. W W
  78. 78. Watermarks in Practice  Watermarks are generated at the sources.  They (conceptually) flow through the pipeline.  They propagate regardless of data filtering ➔ all operators have up-to-date view of time 78 S O1 02 O3 WS WO2 WO1 WO3 WO1, OUT WO2, OUT WO3, OUT Input Watermark: Earliest ts that O can receive. Output Watermark: Earliest ts that O can emit.
  79. 79. Input & Output Timestamps 79 Stateless tsIN tsOUT = tsIN Stateful tsIN tsOUT ≠ tsIN tsIN 2 tsOUT tsOUT tsOUTtsOUT 2. How is the timestamp set for window results? State Event Time tsIN 1 1.Which windows are complete?
  80. 80. Computing Watermarks 80 O W1 W2 W3 WO,OUT WO,IN WO,IN = min W1, W2, W3 0 10 30 40 5020 WO,IN CLOSED WO,OUT 1 WO,OUT 2 WO,OUT = WO,IN if O stateless g stateO, semanticsO otherwise tsout 1 = 40 tsout 2 = 20
  81. 81. 81 Flink Example Map Aggregate Filter Aggregate Join0 Join1 Join2 Join3 P 1. Result Correctness from correct window assignment 0 10 20 t s : 7 t s : 1 8 t s : 9 CLOSED W W 2. Watermark propagation S O 1 0 2 O 3 WS WO2 WO1 WO3 WO1, OUT WO2, OUT WO3, OUT output watermark P
  82. 82. Generating Watermarks • Perfect Watermarks • Sorted data or very predictable data sources. • Determinism without sorting for order-independent window functions. • Disorder inside windows is still possible! • Sorting possible if needed, but not imposed. • Heuristic Watermarks • When impossible to perfectly predict data (e.g., distributed sources). • Best-effort prediction of event-time progress. • Possibility for late data. • More knowledge about internals of sources → less mispredictions (late data). 82
  83. 83. Fast and Slow Watermarks 83 Window Complete Perfect Watermark Event Time Window Complete Slow Watermark ✗ Performance (Latency) Window Complete Fast Watermark ✗ Correctness Window Lifetime
  84. 84. Triggering 84 0h 12h 36h24h Latency of 24 hours! Can we do better? Emit (partial) result every 1h Completeness Trigger Emit result when window is complete Repeated Update Trigger Periodically emit (partial) result (e.g., for every tuple, every hour, etc) Correctness Performance Flexibility
  85. 85. Repeated Update Trigger Results 85 Repeated Update Trigger ➔ Every 1h 0h 12h 24h Discarding 0h 12h 24h Accummulating 0h 12h 24h Accummulating + Retracting
  86. 86. Putting It All Together Window Complete Perfect Watermark Event Time Window Complete Slow Watermark ✗ Performance (Latency) Window Complete Fast Watermark ✗ Correctness Window Lifetime Repeated Trigger Early results ✓ Performance (Latency) Repeated Trigger Late results ✓ Correctness Watermark Trigger On-Time result 86
  87. 87. … the light at the end of the tunnel … 87 • Motivation, preliminaries and examples about data streaming and Stream Processing Engines • Causes of out-of-order data and solutions enforcing total-ordering • Pros/Cons of total-ordering • Relaxation of total-ordering and the watermarks
  88. 88. To summarize: Event time advances based on: 88 Tuples themselves Watermarks Cons Pros • Determinism, for order sensitive / insensitive functions • Costly merge-sorting • Coupled processing / output of tuples • Decoupled processing / output of tuples • Propagation of time passing even in the absence of tuples • Would require special support for order-sensitive functions • Latency depends on frequency of watermarks
  89. 89. Tutorial: The Role of Event-Time Order in Data Streaming Analysis VincenzoGulisano Chalmers University ofTechnology Gothenburg, Sweden vincenzo.gulisano@chalmers.se Dimitris Palyvos-Giannas Chalmers University ofTechnology Gothenburg, Sweden palyvos@chalmers.se Bastian Havers Chalmers University ofTechnology &Volvo Cars Gothenburg, Sweden havers@chalmers.se Marina Papatriantafilou Chalmers University ofTechnology Gothenburg, Sweden ptrianta@chalmers.se

×