Deep Dive into Concepts and Tools for Analyzing Streaming Data on AWS

© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Dr. Steffen Hausmann
Sr. Solutions Architect, Amazon Web Services
Deep Dive into Concepts and Tools for
Analyzing Streaming Data

Data originates in real-time
Creek 1 by mountainamoeba / cc by 2.0

Analytics is done in batches
Königsee by andresumida / cc by 2.0

Insights are Perishable
Chillis by Lucas Cobb / cc by 2.0

Analyzing Streaming Data on AWS

Challenges of Stream Processing
Lines by FollowYour Nose / cc by 2.0

Comparing Streams and Relations
𝑅 ⊆ 𝐼𝑑 × 𝐶𝑜𝑙𝑜𝑟
Relation
𝑆 ⊆ 𝐼𝑑 × 𝐶𝑜𝑙𝑜𝑟 × 𝑇𝑖𝑚𝑒
Stream
7
now

Querying Streams and Relations
Relation Stream
Fixed data and ad-hoc queries
Fixed queries and
continuously ingested data

Challenges of Querying Infinite Streams
SELECT * FROM S WHERE color = ‘black’
SELECT * FROM S JOIN S’
SELECT color, COUNT(1) FROM S GROUP BY color
... NOT EXISTS (SELECT * FROM S WHERE color = ‘red’)

Analyzing Streaming Data on AWS
• Runs standard SQL queries on
top of streaming data
• Fully managed and scales
automatically
• Only pay for the resources your
queries consume
Amazon Kinesis Analytics
• Open-source stream processing
framework
• Included in Amazon Elastic Map
Reduce (EMR)
• Flexible APIs with Java and
Scalar, SQL, and CEP support
Apache Flink
SQL

Evaluating Queries over Streams
Windows by Brad Greenlee / cc by 2.0

Evaluating Non-monotonic Operators
Tumbling Windows
SELECT STREAM color, COUNT(1)
FROM ...
GROUP BY STEP(rowtime BY INTERVAL ‘10’ SECOND), color;
t1 t3 t5 t6 t9
10 sec
SQL

Sliding Windows
SELECT STREAM color, COUNT(1) OVER w
FROM ...
GROUP BY color
WINDOW w AS (RANGE INTERVAL ’10’ SECOND PRECEDING);
t1 t3 t5 t6 t9
SQL

Session Windows
t5 t6t1 t3 t8 t9
stream
.keyBy(<key selector>)
.window(EventTimeSessionWindows.withGap(Time.minutes(10)))
.<windowed transformation>(<window function>);
session gap

SELECT STREAM *
FROM S AS s JOIN S’ AS t
ON s.color = t.color
SELECT STREAM *
FROM S OVER w AS s JOIN S’ OVER w AS t
ON s.color = t.color
WINDOW w AS (RANGE INTERVAL ‘10’ SECOND PRECEDING);
Evaluating Unbounded Queries
t2 t4 t8t7
t1 t3 t5 t6 t9
S
S‘
SQL

Different Time Semantics

Maintaining Order of Events
t1 t3 t8t7
Event Time
t1 t3 t8 7
Processing Time
t7
t11
t11

Using processing time based windows
t1 t3 t8 t7
Processing
Time
processing
time
count
0
processing
time
count
10
t11

Using multiple time-windows
SELECT STREAM
STEP(rowtime BY INTERVAL ’10’ SECOND) AS processing_time,
STEP(event_time BY INTERVAL ’10’ SECOND) AS event_time,
color,
COUNT(1)
FROM ...
GROUP BY processing_time, event_time, color;
SQL

Using multiple time-windows
t1 t3 t8 t7
Processing
Time
processing
time
event time count
0 0
processing
time
event time count
10 0
10 10
t11

Using event time and watermarks
t1 t3 t8 t7
10 20
event time count
0
event time count
10
0
Processing
Time
t11

Adding Watermarks to a Stream
- Periodic watermarks
- Assuming ascending timestamps
- Punctuated watermarks
stream.assignTimestampsAndWatermarks(
new AscendingTimestampExtractor<MyEvent>() {
@Override
public long extractAscendingTimestamp(MyEvent element) {
return element.getCreationTime();
}
});

Watermarks and Allowed Lateness
t3 t1 t8 t4
80
Processing
Time
stream
.keyBy(<key selector>)
.window(<window assigner>)
.allowedLateness(<time>)
.sideOutputLateData(lateOutputTag)
t5

Different Processing Semantics
Kaseki 2010 by Dominic Alves / cc by 2.0

Consuming Data from a Stream
Consumer
Output sink

At-most Once Semantics
Consumer
Output sink
Offset store
pos 561
pos 561
pos 1105
pos 1105

At-least Once Semantics
Consumer
Output sink
Offset store
pos 561
pos 0
pos 0

Exactly-once Semantics
• At-least-once event delivery plus
message deduplication
• Keep a transaction log of
processed messages
• On failure, replay events and
remove duplicated events for
every operator
Message Deduplication
• State for each operator is
periodically checkpointed
• On failure, rewind operator to
the previous consistent state
Distributed Snapshots

Go Build!

Please complete the session survey in
the summit mobile app.

Thank you!

Deep Dive into Concepts and Tools for Analyzing Streaming Data on AWS

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Deep Dive into Concepts and Tools for Analyzing Streaming Data on AWS

Similar to Deep Dive into Concepts and Tools for Analyzing Streaming Data on AWS (20)

More from AWS Germany

More from AWS Germany (20)

Recently uploaded

Recently uploaded (20)

Deep Dive into Concepts and Tools for Analyzing Streaming Data on AWS