SlideShare a Scribd company logo
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Dr. Steffen Hausmann
Sr. Solutions Architect, Amazon Web Services
Deep Dive into Concepts and Tools for
Analyzing Streaming Data
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Data originates in real-time
Creek 1 by mountainamoeba / cc by 2.0
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Analytics is done in batches
Königsee by andresumida / cc by 2.0
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Insights are Perishable
Chillis by Lucas Cobb / cc by 2.0
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Analyzing Streaming Data on AWS
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Challenges of Stream Processing
Lines by FollowYour Nose / cc by 2.0
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Comparing Streams and Relations
𝑅 ⊆ 𝐼𝑑 × 𝐶𝑜𝑙𝑜𝑟
Relation
𝑆 ⊆ 𝐼𝑑 × 𝐶𝑜𝑙𝑜𝑟 × 𝑇𝑖𝑚𝑒
Stream
7
now
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Querying Streams and Relations
Relation Stream
Fixed data and ad-hoc queries
Fixed queries and
continuously ingested data
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Challenges of Querying Infinite Streams
SELECT * FROM S WHERE color = ‘black’
SELECT * FROM S JOIN S’
SELECT color, COUNT(1) FROM S GROUP BY color
... NOT EXISTS (SELECT * FROM S WHERE color = ‘red’)
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Analyzing Streaming Data on AWS
• Runs standard SQL queries on
top of streaming data
• Fully managed and scales
automatically
• Only pay for the resources your
queries consume
Amazon Kinesis Analytics
• Open-source stream processing
framework
• Included in Amazon Elastic Map
Reduce (EMR)
• Flexible APIs with Java and
Scalar, SQL, and CEP support
Apache Flink
SQL
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Evaluating Queries over Streams
Windows by Brad Greenlee / cc by 2.0
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Evaluating Non-monotonic Operators
Tumbling Windows
SELECT STREAM color, COUNT(1)
FROM ...
GROUP BY STEP(rowtime BY INTERVAL ‘10’ SECOND), color;
t1 t3 t5 t6 t9
10 sec
SQL
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Evaluating Non-monotonic Operators
Sliding Windows
SELECT STREAM color, COUNT(1) OVER w
FROM ...
GROUP BY color
WINDOW w AS (RANGE INTERVAL ’10’ SECOND PRECEDING);
t1 t3 t5 t6 t9
SQL
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Evaluating Non-monotonic Operators
Session Windows
t5 t6t1 t3 t8 t9
stream
.keyBy(<key selector>)
.window(EventTimeSessionWindows.withGap(Time.minutes(10)))
.<windowed transformation>(<window function>);
session gap
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
SELECT STREAM *
FROM S AS s JOIN S’ AS t
ON s.color = t.color
SELECT STREAM *
FROM S OVER w AS s JOIN S’ OVER w AS t
ON s.color = t.color
WINDOW w AS (RANGE INTERVAL ‘10’ SECOND PRECEDING);
Evaluating Unbounded Queries
t2 t4 t8t7
t1 t3 t5 t6 t9
S
S‘
SQL
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Different Time Semantics
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Maintaining Order of Events
t1 t3 t8t7
Event Time
t1 t3 t8 7
Processing Time
t7
t11
t11
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Maintaining Order of Events
Using processing time based windows
t1 t3 t8 t7
Processing
Time
processing
time
count
0
processing
time
count
10
t11
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Maintaining Order of Events
Using multiple time-windows
SELECT STREAM
STEP(rowtime BY INTERVAL ’10’ SECOND) AS processing_time,
STEP(event_time BY INTERVAL ’10’ SECOND) AS event_time,
color,
COUNT(1)
FROM ...
GROUP BY processing_time, event_time, color;
SQL
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Maintaining Order of Events
Using multiple time-windows
t1 t3 t8 t7
Processing
Time
processing
time
event time count
0 0
processing
time
event time count
10 0
10 10
t11
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Maintaining Order of Events
Using event time and watermarks
t1 t3 t8 t7
10 20
event time count
0
event time count
10
0
Processing
Time
t11
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Adding Watermarks to a Stream
- Periodic watermarks
- Assuming ascending timestamps
- Punctuated watermarks
stream.assignTimestampsAndWatermarks(
new AscendingTimestampExtractor<MyEvent>() {
@Override
public long extractAscendingTimestamp(MyEvent element) {
return element.getCreationTime();
}
});
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Watermarks and Allowed Lateness
t3 t1 t8 t4
80
Processing
Time
stream
.keyBy(<key selector>)
.window(<window assigner>)
.allowedLateness(<time>)
.sideOutputLateData(lateOutputTag)
t5
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Different Processing Semantics
Kaseki 2010 by Dominic Alves / cc by 2.0
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Consuming Data from a Stream
Consumer
Output sink
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Different Processing Semantics
At-most Once Semantics
Consumer
Output sink
Offset store
pos 561
pos 561
pos 1105
pos 1105
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Different Processing Semantics
At-least Once Semantics
Consumer
Output sink
Offset store
pos 561
pos 0
pos 0
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Different Processing Semantics
Exactly-once Semantics
• At-least-once event delivery plus
message deduplication
• Keep a transaction log of
processed messages
• On failure, replay events and
remove duplicated events for
every operator
Message Deduplication
• State for each operator is
periodically checkpointed
• On failure, rewind operator to
the previous consistent state
Distributed Snapshots
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Go Build!
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Please complete the session survey in
the summit mobile app.
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Thank you!

More Related Content

What's hot

Data Warehousing and Data Lake Analytics, Together - AWS Online Tech Talks
Data Warehousing and Data Lake Analytics, Together - AWS Online Tech TalksData Warehousing and Data Lake Analytics, Together - AWS Online Tech Talks
Data Warehousing and Data Lake Analytics, Together - AWS Online Tech Talks
Amazon Web Services
 
Building Serverless Analytics Pipelines with AWS Glue (ANT308) - AWS re:Inven...
Building Serverless Analytics Pipelines with AWS Glue (ANT308) - AWS re:Inven...Building Serverless Analytics Pipelines with AWS Glue (ANT308) - AWS re:Inven...
Building Serverless Analytics Pipelines with AWS Glue (ANT308) - AWS re:Inven...
Amazon Web Services
 
Data Warehousing with Amazon Redshift
Data Warehousing with Amazon RedshiftData Warehousing with Amazon Redshift
Data Warehousing with Amazon Redshift
Amazon Web Services
 
(ADV402) Beating the Speed of Light with Your Infrastructure in AWS | AWS re:...
(ADV402) Beating the Speed of Light with Your Infrastructure in AWS | AWS re:...(ADV402) Beating the Speed of Light with Your Infrastructure in AWS | AWS re:...
(ADV402) Beating the Speed of Light with Your Infrastructure in AWS | AWS re:...
Amazon Web Services
 
Data Warehouses and Data Lakes
Data Warehouses and Data LakesData Warehouses and Data Lakes
Data Warehouses and Data Lakes
Amazon Web Services
 
Time-evolving Graph Processing on Commodity Clusters: Spark Summit East talk ...
Time-evolving Graph Processing on Commodity Clusters: Spark Summit East talk ...Time-evolving Graph Processing on Commodity Clusters: Spark Summit East talk ...
Time-evolving Graph Processing on Commodity Clusters: Spark Summit East talk ...
Spark Summit
 
Analyzing Streams
Analyzing StreamsAnalyzing Streams
Analyzing Streams
Amazon Web Services
 
Titan and Cassandra at WellAware
Titan and Cassandra at WellAwareTitan and Cassandra at WellAware
Titan and Cassandra at WellAware
twilmes
 
Loading Data into Amazon Redshift
Loading Data into Amazon RedshiftLoading Data into Amazon Redshift
Loading Data into Amazon Redshift
Amazon Web Services
 
Presto@Netflix Presto Meetup 03-19-15
Presto@Netflix Presto Meetup 03-19-15Presto@Netflix Presto Meetup 03-19-15
Presto@Netflix Presto Meetup 03-19-15
Zhenxiao Luo
 
Expanding Apache Spark Use Cases in 2.2 and Beyond with Matei Zaharia and dem...
Expanding Apache Spark Use Cases in 2.2 and Beyond with Matei Zaharia and dem...Expanding Apache Spark Use Cases in 2.2 and Beyond with Matei Zaharia and dem...
Expanding Apache Spark Use Cases in 2.2 and Beyond with Matei Zaharia and dem...
Databricks
 
Google professional data engineer exam dumps
Google professional data engineer exam dumpsGoogle professional data engineer exam dumps
Google professional data engineer exam dumps
TestPrep Training
 
Data Warehousing with Amazon Redshift
Data Warehousing with Amazon RedshiftData Warehousing with Amazon Redshift
Data Warehousing with Amazon Redshift
Amazon Web Services
 
ML on Big Data: Real-Time Analysis on Time Series
ML on Big Data: Real-Time Analysis on Time SeriesML on Big Data: Real-Time Analysis on Time Series
ML on Big Data: Real-Time Analysis on Time Series
Sigmoid
 
Using spark for timeseries graph analytics
Using spark for timeseries graph analyticsUsing spark for timeseries graph analytics
Using spark for timeseries graph analytics
Sigmoid
 
Loading Data into Redshift with Lab
Loading Data into Redshift with LabLoading Data into Redshift with Lab
Loading Data into Redshift with Lab
Amazon Web Services
 
Graph Database Prototyping made easy with Graphgen
Graph Database Prototyping made easy with GraphgenGraph Database Prototyping made easy with Graphgen
Graph Database Prototyping made easy with Graphgen
GraphAware
 
Using Data Lakes
Using Data LakesUsing Data Lakes
Using Data Lakes
Amazon Web Services
 
BDW16 London - William Vambenepe, Google - 3rd Generation Data Platform
BDW16 London - William Vambenepe, Google - 3rd Generation Data PlatformBDW16 London - William Vambenepe, Google - 3rd Generation Data Platform
BDW16 London - William Vambenepe, Google - 3rd Generation Data Platform
Big Data Week
 
Data Warehouses and Data Lakes
Data Warehouses and Data LakesData Warehouses and Data Lakes
Data Warehouses and Data Lakes
Amazon Web Services
 

What's hot (20)

Data Warehousing and Data Lake Analytics, Together - AWS Online Tech Talks
Data Warehousing and Data Lake Analytics, Together - AWS Online Tech TalksData Warehousing and Data Lake Analytics, Together - AWS Online Tech Talks
Data Warehousing and Data Lake Analytics, Together - AWS Online Tech Talks
 
Building Serverless Analytics Pipelines with AWS Glue (ANT308) - AWS re:Inven...
Building Serverless Analytics Pipelines with AWS Glue (ANT308) - AWS re:Inven...Building Serverless Analytics Pipelines with AWS Glue (ANT308) - AWS re:Inven...
Building Serverless Analytics Pipelines with AWS Glue (ANT308) - AWS re:Inven...
 
Data Warehousing with Amazon Redshift
Data Warehousing with Amazon RedshiftData Warehousing with Amazon Redshift
Data Warehousing with Amazon Redshift
 
(ADV402) Beating the Speed of Light with Your Infrastructure in AWS | AWS re:...
(ADV402) Beating the Speed of Light with Your Infrastructure in AWS | AWS re:...(ADV402) Beating the Speed of Light with Your Infrastructure in AWS | AWS re:...
(ADV402) Beating the Speed of Light with Your Infrastructure in AWS | AWS re:...
 
Data Warehouses and Data Lakes
Data Warehouses and Data LakesData Warehouses and Data Lakes
Data Warehouses and Data Lakes
 
Time-evolving Graph Processing on Commodity Clusters: Spark Summit East talk ...
Time-evolving Graph Processing on Commodity Clusters: Spark Summit East talk ...Time-evolving Graph Processing on Commodity Clusters: Spark Summit East talk ...
Time-evolving Graph Processing on Commodity Clusters: Spark Summit East talk ...
 
Analyzing Streams
Analyzing StreamsAnalyzing Streams
Analyzing Streams
 
Titan and Cassandra at WellAware
Titan and Cassandra at WellAwareTitan and Cassandra at WellAware
Titan and Cassandra at WellAware
 
Loading Data into Amazon Redshift
Loading Data into Amazon RedshiftLoading Data into Amazon Redshift
Loading Data into Amazon Redshift
 
Presto@Netflix Presto Meetup 03-19-15
Presto@Netflix Presto Meetup 03-19-15Presto@Netflix Presto Meetup 03-19-15
Presto@Netflix Presto Meetup 03-19-15
 
Expanding Apache Spark Use Cases in 2.2 and Beyond with Matei Zaharia and dem...
Expanding Apache Spark Use Cases in 2.2 and Beyond with Matei Zaharia and dem...Expanding Apache Spark Use Cases in 2.2 and Beyond with Matei Zaharia and dem...
Expanding Apache Spark Use Cases in 2.2 and Beyond with Matei Zaharia and dem...
 
Google professional data engineer exam dumps
Google professional data engineer exam dumpsGoogle professional data engineer exam dumps
Google professional data engineer exam dumps
 
Data Warehousing with Amazon Redshift
Data Warehousing with Amazon RedshiftData Warehousing with Amazon Redshift
Data Warehousing with Amazon Redshift
 
ML on Big Data: Real-Time Analysis on Time Series
ML on Big Data: Real-Time Analysis on Time SeriesML on Big Data: Real-Time Analysis on Time Series
ML on Big Data: Real-Time Analysis on Time Series
 
Using spark for timeseries graph analytics
Using spark for timeseries graph analyticsUsing spark for timeseries graph analytics
Using spark for timeseries graph analytics
 
Loading Data into Redshift with Lab
Loading Data into Redshift with LabLoading Data into Redshift with Lab
Loading Data into Redshift with Lab
 
Graph Database Prototyping made easy with Graphgen
Graph Database Prototyping made easy with GraphgenGraph Database Prototyping made easy with Graphgen
Graph Database Prototyping made easy with Graphgen
 
Using Data Lakes
Using Data LakesUsing Data Lakes
Using Data Lakes
 
BDW16 London - William Vambenepe, Google - 3rd Generation Data Platform
BDW16 London - William Vambenepe, Google - 3rd Generation Data PlatformBDW16 London - William Vambenepe, Google - 3rd Generation Data Platform
BDW16 London - William Vambenepe, Google - 3rd Generation Data Platform
 
Data Warehouses and Data Lakes
Data Warehouses and Data LakesData Warehouses and Data Lakes
Data Warehouses and Data Lakes
 

Similar to Deep Dive into Concepts and Tools for Analyzing Streaming Data on AWS

Architecting for Real-Time Insights with Amazon Kinesis (ANT310) - AWS re:Inv...
Architecting for Real-Time Insights with Amazon Kinesis (ANT310) - AWS re:Inv...Architecting for Real-Time Insights with Amazon Kinesis (ANT310) - AWS re:Inv...
Architecting for Real-Time Insights with Amazon Kinesis (ANT310) - AWS re:Inv...
Amazon Web Services
 
Analyzing Streams
Analyzing StreamsAnalyzing Streams
Analyzing Streams
Amazon Web Services
 
Analyzing Streams: Data Analytics Week SF
Analyzing Streams: Data Analytics Week SFAnalyzing Streams: Data Analytics Week SF
Analyzing Streams: Data Analytics Week SF
Amazon Web Services
 
Analyzing Streams
Analyzing StreamsAnalyzing Streams
Analyzing Streams
Amazon Web Services
 
Analyzing Streams
Analyzing StreamsAnalyzing Streams
Analyzing Streams
Amazon Web Services
 
Analyzing Streams: Data Analytics Week at the SF Loft
Analyzing Streams: Data Analytics Week at the SF LoftAnalyzing Streams: Data Analytics Week at the SF Loft
Analyzing Streams: Data Analytics Week at the SF Loft
Amazon Web Services
 
Analyzing Streams
Analyzing StreamsAnalyzing Streams
Analyzing Streams
Amazon Web Services
 
Analyzing Streaming Data in Real Time
Analyzing Streaming Data in Real TimeAnalyzing Streaming Data in Real Time
Analyzing Streaming Data in Real Time
Amazon Web Services
 
Building Massively Parallel Event-Driven Architectures (SRV373-R1) - AWS re:I...
Building Massively Parallel Event-Driven Architectures (SRV373-R1) - AWS re:I...Building Massively Parallel Event-Driven Architectures (SRV373-R1) - AWS re:I...
Building Massively Parallel Event-Driven Architectures (SRV373-R1) - AWS re:I...
Amazon Web Services
 
How Amazon.com Migrates Inventory Management Systems (DAT346) - AWS re:Invent...
How Amazon.com Migrates Inventory Management Systems (DAT346) - AWS re:Invent...How Amazon.com Migrates Inventory Management Systems (DAT346) - AWS re:Invent...
How Amazon.com Migrates Inventory Management Systems (DAT346) - AWS re:Invent...
Amazon Web Services
 
Analyze Amazon CloudFront and Lambda@Edge Logs to Improve Customer Experience...
Analyze Amazon CloudFront and Lambda@Edge Logs to Improve Customer Experience...Analyze Amazon CloudFront and Lambda@Edge Logs to Improve Customer Experience...
Analyze Amazon CloudFront and Lambda@Edge Logs to Improve Customer Experience...
Amazon Web Services
 
BDA307 Analyzing Data Streams in Real Time with Amazon Kinesis
BDA307 Analyzing Data Streams in Real Time with Amazon KinesisBDA307 Analyzing Data Streams in Real Time with Amazon Kinesis
BDA307 Analyzing Data Streams in Real Time with Amazon Kinesis
Amazon Web Services
 
Real Time Data Ingestion & Analysis - AWS Summit Sydney 2018
Real Time Data Ingestion & Analysis - AWS Summit Sydney 2018Real Time Data Ingestion & Analysis - AWS Summit Sydney 2018
Real Time Data Ingestion & Analysis - AWS Summit Sydney 2018
Amazon Web Services
 
Serverless Architectural Patterns - ServerlessDays TLV
Serverless Architectural Patterns - ServerlessDays TLVServerless Architectural Patterns - ServerlessDays TLV
Serverless Architectural Patterns - ServerlessDays TLV
Boaz Ziniman
 
Under the Hood: How Amazon Uses AWS Services for Analytics at a Massive Scale...
Under the Hood: How Amazon Uses AWS Services for Analytics at a Massive Scale...Under the Hood: How Amazon Uses AWS Services for Analytics at a Massive Scale...
Under the Hood: How Amazon Uses AWS Services for Analytics at a Massive Scale...
Amazon Web Services
 
Serverless Architectural Patterns - GOTO Amsterdam
Serverless Architectural Patterns - GOTO AmsterdamServerless Architectural Patterns - GOTO Amsterdam
Serverless Architectural Patterns - GOTO Amsterdam
Boaz Ziniman
 
Advanced Serverless application architecture and design considerations
Advanced Serverless application architecture and design considerationsAdvanced Serverless application architecture and design considerations
Advanced Serverless application architecture and design considerations
Dilip Kola
 
What Can Your Logs Tell You? (ANT215) - AWS re:Invent 2018
What Can Your Logs Tell You? (ANT215) - AWS re:Invent 2018What Can Your Logs Tell You? (ANT215) - AWS re:Invent 2018
What Can Your Logs Tell You? (ANT215) - AWS re:Invent 2018
Amazon Web Services
 
Configure Your Cloud to Make It Rain on Threats (SEC335-R1) - AWS re:Invent 2018
Configure Your Cloud to Make It Rain on Threats (SEC335-R1) - AWS re:Invent 2018Configure Your Cloud to Make It Rain on Threats (SEC335-R1) - AWS re:Invent 2018
Configure Your Cloud to Make It Rain on Threats (SEC335-R1) - AWS re:Invent 2018
Amazon Web Services
 
Connecting the dots - How Amazon Neptune and Graph Databases can transform yo...
Connecting the dots - How Amazon Neptune and Graph Databases can transform yo...Connecting the dots - How Amazon Neptune and Graph Databases can transform yo...
Connecting the dots - How Amazon Neptune and Graph Databases can transform yo...
Amazon Web Services
 

Similar to Deep Dive into Concepts and Tools for Analyzing Streaming Data on AWS (20)

Architecting for Real-Time Insights with Amazon Kinesis (ANT310) - AWS re:Inv...
Architecting for Real-Time Insights with Amazon Kinesis (ANT310) - AWS re:Inv...Architecting for Real-Time Insights with Amazon Kinesis (ANT310) - AWS re:Inv...
Architecting for Real-Time Insights with Amazon Kinesis (ANT310) - AWS re:Inv...
 
Analyzing Streams
Analyzing StreamsAnalyzing Streams
Analyzing Streams
 
Analyzing Streams: Data Analytics Week SF
Analyzing Streams: Data Analytics Week SFAnalyzing Streams: Data Analytics Week SF
Analyzing Streams: Data Analytics Week SF
 
Analyzing Streams
Analyzing StreamsAnalyzing Streams
Analyzing Streams
 
Analyzing Streams
Analyzing StreamsAnalyzing Streams
Analyzing Streams
 
Analyzing Streams: Data Analytics Week at the SF Loft
Analyzing Streams: Data Analytics Week at the SF LoftAnalyzing Streams: Data Analytics Week at the SF Loft
Analyzing Streams: Data Analytics Week at the SF Loft
 
Analyzing Streams
Analyzing StreamsAnalyzing Streams
Analyzing Streams
 
Analyzing Streaming Data in Real Time
Analyzing Streaming Data in Real TimeAnalyzing Streaming Data in Real Time
Analyzing Streaming Data in Real Time
 
Building Massively Parallel Event-Driven Architectures (SRV373-R1) - AWS re:I...
Building Massively Parallel Event-Driven Architectures (SRV373-R1) - AWS re:I...Building Massively Parallel Event-Driven Architectures (SRV373-R1) - AWS re:I...
Building Massively Parallel Event-Driven Architectures (SRV373-R1) - AWS re:I...
 
How Amazon.com Migrates Inventory Management Systems (DAT346) - AWS re:Invent...
How Amazon.com Migrates Inventory Management Systems (DAT346) - AWS re:Invent...How Amazon.com Migrates Inventory Management Systems (DAT346) - AWS re:Invent...
How Amazon.com Migrates Inventory Management Systems (DAT346) - AWS re:Invent...
 
Analyze Amazon CloudFront and Lambda@Edge Logs to Improve Customer Experience...
Analyze Amazon CloudFront and Lambda@Edge Logs to Improve Customer Experience...Analyze Amazon CloudFront and Lambda@Edge Logs to Improve Customer Experience...
Analyze Amazon CloudFront and Lambda@Edge Logs to Improve Customer Experience...
 
BDA307 Analyzing Data Streams in Real Time with Amazon Kinesis
BDA307 Analyzing Data Streams in Real Time with Amazon KinesisBDA307 Analyzing Data Streams in Real Time with Amazon Kinesis
BDA307 Analyzing Data Streams in Real Time with Amazon Kinesis
 
Real Time Data Ingestion & Analysis - AWS Summit Sydney 2018
Real Time Data Ingestion & Analysis - AWS Summit Sydney 2018Real Time Data Ingestion & Analysis - AWS Summit Sydney 2018
Real Time Data Ingestion & Analysis - AWS Summit Sydney 2018
 
Serverless Architectural Patterns - ServerlessDays TLV
Serverless Architectural Patterns - ServerlessDays TLVServerless Architectural Patterns - ServerlessDays TLV
Serverless Architectural Patterns - ServerlessDays TLV
 
Under the Hood: How Amazon Uses AWS Services for Analytics at a Massive Scale...
Under the Hood: How Amazon Uses AWS Services for Analytics at a Massive Scale...Under the Hood: How Amazon Uses AWS Services for Analytics at a Massive Scale...
Under the Hood: How Amazon Uses AWS Services for Analytics at a Massive Scale...
 
Serverless Architectural Patterns - GOTO Amsterdam
Serverless Architectural Patterns - GOTO AmsterdamServerless Architectural Patterns - GOTO Amsterdam
Serverless Architectural Patterns - GOTO Amsterdam
 
Advanced Serverless application architecture and design considerations
Advanced Serverless application architecture and design considerationsAdvanced Serverless application architecture and design considerations
Advanced Serverless application architecture and design considerations
 
What Can Your Logs Tell You? (ANT215) - AWS re:Invent 2018
What Can Your Logs Tell You? (ANT215) - AWS re:Invent 2018What Can Your Logs Tell You? (ANT215) - AWS re:Invent 2018
What Can Your Logs Tell You? (ANT215) - AWS re:Invent 2018
 
Configure Your Cloud to Make It Rain on Threats (SEC335-R1) - AWS re:Invent 2018
Configure Your Cloud to Make It Rain on Threats (SEC335-R1) - AWS re:Invent 2018Configure Your Cloud to Make It Rain on Threats (SEC335-R1) - AWS re:Invent 2018
Configure Your Cloud to Make It Rain on Threats (SEC335-R1) - AWS re:Invent 2018
 
Connecting the dots - How Amazon Neptune and Graph Databases can transform yo...
Connecting the dots - How Amazon Neptune and Graph Databases can transform yo...Connecting the dots - How Amazon Neptune and Graph Databases can transform yo...
Connecting the dots - How Amazon Neptune and Graph Databases can transform yo...
 

More from AWS Germany

Analytics Web Day | From Theory to Practice: Big Data Stories from the Field
Analytics Web Day | From Theory to Practice: Big Data Stories from the FieldAnalytics Web Day | From Theory to Practice: Big Data Stories from the Field
Analytics Web Day | From Theory to Practice: Big Data Stories from the Field
AWS Germany
 
Analytics Web Day | Query your Data in S3 with SQL and optimize for Cost and ...
Analytics Web Day | Query your Data in S3 with SQL and optimize for Cost and ...Analytics Web Day | Query your Data in S3 with SQL and optimize for Cost and ...
Analytics Web Day | Query your Data in S3 with SQL and optimize for Cost and ...
AWS Germany
 
Modern Applications Web Day | Impress Your Friends with Your First Serverless...
Modern Applications Web Day | Impress Your Friends with Your First Serverless...Modern Applications Web Day | Impress Your Friends with Your First Serverless...
Modern Applications Web Day | Impress Your Friends with Your First Serverless...
AWS Germany
 
Modern Applications Web Day | Manage Your Infrastructure and Configuration on...
Modern Applications Web Day | Manage Your Infrastructure and Configuration on...Modern Applications Web Day | Manage Your Infrastructure and Configuration on...
Modern Applications Web Day | Manage Your Infrastructure and Configuration on...
AWS Germany
 
Modern Applications Web Day | Container Workloads on AWS
Modern Applications Web Day | Container Workloads on AWSModern Applications Web Day | Container Workloads on AWS
Modern Applications Web Day | Container Workloads on AWS
AWS Germany
 
Modern Applications Web Day | Continuous Delivery to Amazon EKS with Spinnaker
Modern Applications Web Day | Continuous Delivery to Amazon EKS with SpinnakerModern Applications Web Day | Continuous Delivery to Amazon EKS with Spinnaker
Modern Applications Web Day | Continuous Delivery to Amazon EKS with Spinnaker
AWS Germany
 
Building Smart Home skills for Alexa
Building Smart Home skills for AlexaBuilding Smart Home skills for Alexa
Building Smart Home skills for Alexa
AWS Germany
 
Hotel or Taxi? "Sorting hat" for travel expenses with AWS ML infrastructure
Hotel or Taxi? "Sorting hat" for travel expenses with AWS ML infrastructureHotel or Taxi? "Sorting hat" for travel expenses with AWS ML infrastructure
Hotel or Taxi? "Sorting hat" for travel expenses with AWS ML infrastructure
AWS Germany
 
Wild Rydes with Big Data/Kinesis focus: AWS Serverless Workshop
Wild Rydes with Big Data/Kinesis focus: AWS Serverless WorkshopWild Rydes with Big Data/Kinesis focus: AWS Serverless Workshop
Wild Rydes with Big Data/Kinesis focus: AWS Serverless Workshop
AWS Germany
 
Log Analytics with AWS
Log Analytics with AWSLog Analytics with AWS
Log Analytics with AWS
AWS Germany
 
AWS Programme für Nonprofits
AWS Programme für NonprofitsAWS Programme für Nonprofits
AWS Programme für Nonprofits
AWS Germany
 
Microservices and Data Design
Microservices and Data DesignMicroservices and Data Design
Microservices and Data Design
AWS Germany
 
Serverless vs. Developers – the real crash
Serverless vs. Developers – the real crashServerless vs. Developers – the real crash
Serverless vs. Developers – the real crash
AWS Germany
 
Query your data in S3 with SQL and optimize for cost and performance
Query your data in S3 with SQL and optimize for cost and performanceQuery your data in S3 with SQL and optimize for cost and performance
Query your data in S3 with SQL and optimize for cost and performance
AWS Germany
 
Secret Management with Hashicorp’s Vault
Secret Management with Hashicorp’s VaultSecret Management with Hashicorp’s Vault
Secret Management with Hashicorp’s Vault
AWS Germany
 
EKS Workshop
 EKS Workshop EKS Workshop
EKS Workshop
AWS Germany
 
Scale to Infinity with ECS
Scale to Infinity with ECSScale to Infinity with ECS
Scale to Infinity with ECS
AWS Germany
 
Containers on AWS - State of the Union
Containers on AWS - State of the UnionContainers on AWS - State of the Union
Containers on AWS - State of the Union
AWS Germany
 
Deploying and Scaling Your First Cloud Application with Amazon Lightsail
Deploying and Scaling Your First Cloud Application with Amazon LightsailDeploying and Scaling Your First Cloud Application with Amazon Lightsail
Deploying and Scaling Your First Cloud Application with Amazon Lightsail
AWS Germany
 
Building Personalized Data Products - From Idea to Product
Building Personalized Data Products - From Idea to ProductBuilding Personalized Data Products - From Idea to Product
Building Personalized Data Products - From Idea to Product
AWS Germany
 

More from AWS Germany (20)

Analytics Web Day | From Theory to Practice: Big Data Stories from the Field
Analytics Web Day | From Theory to Practice: Big Data Stories from the FieldAnalytics Web Day | From Theory to Practice: Big Data Stories from the Field
Analytics Web Day | From Theory to Practice: Big Data Stories from the Field
 
Analytics Web Day | Query your Data in S3 with SQL and optimize for Cost and ...
Analytics Web Day | Query your Data in S3 with SQL and optimize for Cost and ...Analytics Web Day | Query your Data in S3 with SQL and optimize for Cost and ...
Analytics Web Day | Query your Data in S3 with SQL and optimize for Cost and ...
 
Modern Applications Web Day | Impress Your Friends with Your First Serverless...
Modern Applications Web Day | Impress Your Friends with Your First Serverless...Modern Applications Web Day | Impress Your Friends with Your First Serverless...
Modern Applications Web Day | Impress Your Friends with Your First Serverless...
 
Modern Applications Web Day | Manage Your Infrastructure and Configuration on...
Modern Applications Web Day | Manage Your Infrastructure and Configuration on...Modern Applications Web Day | Manage Your Infrastructure and Configuration on...
Modern Applications Web Day | Manage Your Infrastructure and Configuration on...
 
Modern Applications Web Day | Container Workloads on AWS
Modern Applications Web Day | Container Workloads on AWSModern Applications Web Day | Container Workloads on AWS
Modern Applications Web Day | Container Workloads on AWS
 
Modern Applications Web Day | Continuous Delivery to Amazon EKS with Spinnaker
Modern Applications Web Day | Continuous Delivery to Amazon EKS with SpinnakerModern Applications Web Day | Continuous Delivery to Amazon EKS with Spinnaker
Modern Applications Web Day | Continuous Delivery to Amazon EKS with Spinnaker
 
Building Smart Home skills for Alexa
Building Smart Home skills for AlexaBuilding Smart Home skills for Alexa
Building Smart Home skills for Alexa
 
Hotel or Taxi? "Sorting hat" for travel expenses with AWS ML infrastructure
Hotel or Taxi? "Sorting hat" for travel expenses with AWS ML infrastructureHotel or Taxi? "Sorting hat" for travel expenses with AWS ML infrastructure
Hotel or Taxi? "Sorting hat" for travel expenses with AWS ML infrastructure
 
Wild Rydes with Big Data/Kinesis focus: AWS Serverless Workshop
Wild Rydes with Big Data/Kinesis focus: AWS Serverless WorkshopWild Rydes with Big Data/Kinesis focus: AWS Serverless Workshop
Wild Rydes with Big Data/Kinesis focus: AWS Serverless Workshop
 
Log Analytics with AWS
Log Analytics with AWSLog Analytics with AWS
Log Analytics with AWS
 
AWS Programme für Nonprofits
AWS Programme für NonprofitsAWS Programme für Nonprofits
AWS Programme für Nonprofits
 
Microservices and Data Design
Microservices and Data DesignMicroservices and Data Design
Microservices and Data Design
 
Serverless vs. Developers – the real crash
Serverless vs. Developers – the real crashServerless vs. Developers – the real crash
Serverless vs. Developers – the real crash
 
Query your data in S3 with SQL and optimize for cost and performance
Query your data in S3 with SQL and optimize for cost and performanceQuery your data in S3 with SQL and optimize for cost and performance
Query your data in S3 with SQL and optimize for cost and performance
 
Secret Management with Hashicorp’s Vault
Secret Management with Hashicorp’s VaultSecret Management with Hashicorp’s Vault
Secret Management with Hashicorp’s Vault
 
EKS Workshop
 EKS Workshop EKS Workshop
EKS Workshop
 
Scale to Infinity with ECS
Scale to Infinity with ECSScale to Infinity with ECS
Scale to Infinity with ECS
 
Containers on AWS - State of the Union
Containers on AWS - State of the UnionContainers on AWS - State of the Union
Containers on AWS - State of the Union
 
Deploying and Scaling Your First Cloud Application with Amazon Lightsail
Deploying and Scaling Your First Cloud Application with Amazon LightsailDeploying and Scaling Your First Cloud Application with Amazon Lightsail
Deploying and Scaling Your First Cloud Application with Amazon Lightsail
 
Building Personalized Data Products - From Idea to Product
Building Personalized Data Products - From Idea to ProductBuilding Personalized Data Products - From Idea to Product
Building Personalized Data Products - From Idea to Product
 

Recently uploaded

Essentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FMEEssentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FME
Safe Software
 
How to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptxHow to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptx
danishmna97
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
DanBrown980551
 
20240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 202420240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 2024
Matthew Sinclair
 
UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6
DianaGray10
 
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
Neo4j
 
Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
Ana-Maria Mihalceanu
 
Uni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdfUni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems S.M.S.A.
 
Generative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to ProductionGenerative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to Production
Aggregage
 
The Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and SalesThe Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and Sales
Laura Byrne
 
GraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge GraphGraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge Graph
Guy Korland
 
PCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase TeamPCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase Team
ControlCase
 
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
SOFTTECHHUB
 
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
Neo4j
 
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
Neo4j
 
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptx
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptxSecstrike : Reverse Engineering & Pwnable tools for CTF.pptx
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptx
nkrafacyberclub
 
Large Language Model (LLM) and it’s Geospatial Applications
Large Language Model (LLM) and it’s Geospatial ApplicationsLarge Language Model (LLM) and it’s Geospatial Applications
Large Language Model (LLM) and it’s Geospatial Applications
Rohit Gautam
 
Pushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 daysPushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 days
Adtran
 
A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...
sonjaschweigert1
 
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
James Anderson
 

Recently uploaded (20)

Essentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FMEEssentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FME
 
How to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptxHow to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptx
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
 
20240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 202420240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 2024
 
UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6
 
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
 
Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
 
Uni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdfUni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdf
 
Generative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to ProductionGenerative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to Production
 
The Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and SalesThe Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and Sales
 
GraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge GraphGraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge Graph
 
PCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase TeamPCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase Team
 
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
 
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
 
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
 
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptx
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptxSecstrike : Reverse Engineering & Pwnable tools for CTF.pptx
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptx
 
Large Language Model (LLM) and it’s Geospatial Applications
Large Language Model (LLM) and it’s Geospatial ApplicationsLarge Language Model (LLM) and it’s Geospatial Applications
Large Language Model (LLM) and it’s Geospatial Applications
 
Pushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 daysPushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 days
 
A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...
 
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
 

Deep Dive into Concepts and Tools for Analyzing Streaming Data on AWS

  • 1. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Dr. Steffen Hausmann Sr. Solutions Architect, Amazon Web Services Deep Dive into Concepts and Tools for Analyzing Streaming Data
  • 2. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Data originates in real-time Creek 1 by mountainamoeba / cc by 2.0
  • 3. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Analytics is done in batches Königsee by andresumida / cc by 2.0
  • 4. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Insights are Perishable Chillis by Lucas Cobb / cc by 2.0
  • 5. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Analyzing Streaming Data on AWS
  • 6. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Challenges of Stream Processing Lines by FollowYour Nose / cc by 2.0
  • 7. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Comparing Streams and Relations 𝑅 ⊆ 𝐼𝑑 × 𝐶𝑜𝑙𝑜𝑟 Relation 𝑆 ⊆ 𝐼𝑑 × 𝐶𝑜𝑙𝑜𝑟 × 𝑇𝑖𝑚𝑒 Stream 7 now
  • 8. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Querying Streams and Relations Relation Stream Fixed data and ad-hoc queries Fixed queries and continuously ingested data
  • 9. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Challenges of Querying Infinite Streams SELECT * FROM S WHERE color = ‘black’ SELECT * FROM S JOIN S’ SELECT color, COUNT(1) FROM S GROUP BY color ... NOT EXISTS (SELECT * FROM S WHERE color = ‘red’)
  • 10. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
  • 11. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Analyzing Streaming Data on AWS • Runs standard SQL queries on top of streaming data • Fully managed and scales automatically • Only pay for the resources your queries consume Amazon Kinesis Analytics • Open-source stream processing framework • Included in Amazon Elastic Map Reduce (EMR) • Flexible APIs with Java and Scalar, SQL, and CEP support Apache Flink SQL
  • 12. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Evaluating Queries over Streams Windows by Brad Greenlee / cc by 2.0
  • 13. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Evaluating Non-monotonic Operators Tumbling Windows SELECT STREAM color, COUNT(1) FROM ... GROUP BY STEP(rowtime BY INTERVAL ‘10’ SECOND), color; t1 t3 t5 t6 t9 10 sec SQL
  • 14. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Evaluating Non-monotonic Operators Sliding Windows SELECT STREAM color, COUNT(1) OVER w FROM ... GROUP BY color WINDOW w AS (RANGE INTERVAL ’10’ SECOND PRECEDING); t1 t3 t5 t6 t9 SQL
  • 15. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Evaluating Non-monotonic Operators Session Windows t5 t6t1 t3 t8 t9 stream .keyBy(<key selector>) .window(EventTimeSessionWindows.withGap(Time.minutes(10))) .<windowed transformation>(<window function>); session gap
  • 16. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. SELECT STREAM * FROM S AS s JOIN S’ AS t ON s.color = t.color SELECT STREAM * FROM S OVER w AS s JOIN S’ OVER w AS t ON s.color = t.color WINDOW w AS (RANGE INTERVAL ‘10’ SECOND PRECEDING); Evaluating Unbounded Queries t2 t4 t8t7 t1 t3 t5 t6 t9 S S‘ SQL
  • 17. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Different Time Semantics
  • 18. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Maintaining Order of Events t1 t3 t8t7 Event Time t1 t3 t8 7 Processing Time t7 t11 t11
  • 19. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Maintaining Order of Events Using processing time based windows t1 t3 t8 t7 Processing Time processing time count 0 processing time count 10 t11
  • 20. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Maintaining Order of Events Using multiple time-windows SELECT STREAM STEP(rowtime BY INTERVAL ’10’ SECOND) AS processing_time, STEP(event_time BY INTERVAL ’10’ SECOND) AS event_time, color, COUNT(1) FROM ... GROUP BY processing_time, event_time, color; SQL
  • 21. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Maintaining Order of Events Using multiple time-windows t1 t3 t8 t7 Processing Time processing time event time count 0 0 processing time event time count 10 0 10 10 t11
  • 22. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Maintaining Order of Events Using event time and watermarks t1 t3 t8 t7 10 20 event time count 0 event time count 10 0 Processing Time t11
  • 23. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Adding Watermarks to a Stream - Periodic watermarks - Assuming ascending timestamps - Punctuated watermarks stream.assignTimestampsAndWatermarks( new AscendingTimestampExtractor<MyEvent>() { @Override public long extractAscendingTimestamp(MyEvent element) { return element.getCreationTime(); } });
  • 24. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Watermarks and Allowed Lateness t3 t1 t8 t4 80 Processing Time stream .keyBy(<key selector>) .window(<window assigner>) .allowedLateness(<time>) .sideOutputLateData(lateOutputTag) t5
  • 25. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Different Processing Semantics Kaseki 2010 by Dominic Alves / cc by 2.0
  • 26. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Consuming Data from a Stream Consumer Output sink
  • 27. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Different Processing Semantics At-most Once Semantics Consumer Output sink Offset store pos 561 pos 561 pos 1105 pos 1105
  • 28. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Different Processing Semantics At-least Once Semantics Consumer Output sink Offset store pos 561 pos 0 pos 0
  • 29. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Different Processing Semantics Exactly-once Semantics • At-least-once event delivery plus message deduplication • Keep a transaction log of processed messages • On failure, replay events and remove duplicated events for every operator Message Deduplication • State for each operator is periodically checkpointed • On failure, rewind operator to the previous consistent state Distributed Snapshots
  • 30. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Go Build!
  • 31. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Please complete the session survey in the summit mobile app.
  • 32. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Thank you!