SlideShare a Scribd company logo
1 of 43
Download to read offline
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
High Performance Data Streaming
with Amazon Kinesis: Best Practices
Allan MacInnis
Principal Solutions Architect
AWS
Gabriel Commeau
Data Platforms Architect
Comcast
A N T 3 2 2 - R 1
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Agenda
• Streaming data overview
• Interactive demo
• Introduction to Amazon Kinesis
• Standard consumers vs. enhanced fan-out consumers
• Headwaters: Comcast streaming data platform
• Five considerations to scale an Amazon Kinesis Data Streams
with standard consumers
• Impacts of enhanced fan-out
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Timely decisions require new data in minutes
Source: Perishable insights, Mike Gualtieri, Forrester
Data loses value quickly over time
Real time Seconds Minutes Hours Days Months
Valueofdatatodecision-making
Preventive/Predictive
Actionable Reactive Historical
Time critical decisions Traditional “batch” business intelligence
Information half-life
in decision-making
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Stream new data in seconds
Get actionable insights quickly
Streaming
Ingest data as
it’s generated
Real-time
analytics/ML,
alerts, actions
Process data
on the fly
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Most common uses of streaming
Industrial
Automation
Smart Home
Smart City
Data
Lakes
IoT
Analytics
Log
Analytics
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Streaming with Amazon Kinesis
Easily collect, process, and analyze video and data streams in real-time
Capture, process, and store
video streams
Amazon Kinesis
Video Streams
Load data streams into
data stores
Amazon Kinesis
Data Firehose
SQL
Analyze data streams with
SQL
Amazon Kinesis
Data Analytics
Capture, process, and store
data streams
Amazon Kinesis
Data Streams
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Demo
amzn.to/bigdata
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Demo architecture
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Amazon Kinesis Data Streams producers and
consumers
Producers Consumers
Kinesis Agent
Apache Kafka
AWS SDK
LOG4J
Flume
Fluentd
AWS Mobile SDK for
iOS
Amazon Kinesis
Producer Library
Get* APIs
Amazon Kinesis Client
Library + Connector
Library
Apache Storm
Amazon EMR
AWS Lambda
Apache Spark
Amazon
Kinesis
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Amazon Kinesis Data Streams: Standard
consumers
Shard 1
Shard 2
Shard 3
Shard n
Kinesis Data Stream
Consumer
application A
GetRecords()
Data
GetRecords():
Five transactions per second, per shard
Data:
2MB per second, per shard
Data
producer
up to 1 MB or
1000 records
per second,
per shard
With only one
consumer
application,
records can be
retrieved every
200 ms
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Amazon Kinesis Data Streams: Standard consumers
Shard 1
Shard 2
Shard 3
Shard n
Kinesis Data Stream
Consumer
application
A
Data
producer
Consumer
application
B
Consumer
application
C
Consumer
application
D
Consumer
application
E
With more
consumer
applications,
propagation delay
increases
For example, with
five consumer
applications, each
can only retrieve
records once per
second, and less
than 400 KBps
<= 400 KBps
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Amazon Kinesis Data Streams: Enhanced fan-out
consumers
Consumers do not poll. Messages are pushed to the consumer as they arrive
Shard 1
Kinesis Data Stream
Data
producer
Consumer
application A
SubscribeToShard()
Uses HTTP/2
• Up to five mins connection
• Data pushed to consumer
persist
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Amazon Kinesis Data Streams: Enhanced fan-out
consumers
Each consumer application gets dedicated 2MB per second egress, per shard
Shard 1
Kinesis Data Stream
Data
producer
Consumer
application B
Consumer
application A
RegisterStreamConsumer()
EFO Pipe
RegisterStreamConsumer()
EFO Pipe
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Amazon Kinesis Data Streams: Enhanced fan-out
When to use standard consumers:
• Total number of consuming applications is low (< 3)
• Consumers are not latency-sensitive
• Minimize cost
When to use enhanced fan-out consumers:
• Multiple consumer applications for the same Kinesis Data Stream
• Default limit of five registered consuming applications. More can be
supported with a service limit increase request
• Low-latency requirements for data processing
• Messages are typically delivered to a consumer in less than 70 ms
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Over time we came to call […] the
concept of managing this [stream
data] centrally a “streaming platform.”
Jay Kreps
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Headwaters: Comcast streaming data platform
• Decouple data producer from its consumers
• Scale data systems independently
• Act as buffer
• Formalize data exchanges
• Assist with data stream management
• Scale data stream
• Data retention period management
• Consumer on-boarding governance
• Foster real-time data exchange
• Stream metadata
• Data schema management
• Cost model
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Headwaters architecture
Manages
Creates
Headwaters control plane
Headwaters Data StreamHeadwaters Data StreamHeadwaters data stream
CI/CD pipeline
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Headwaters producer on-boarding
Headwaters
control plane
Headwaters data stream
(4) Welcome Email
(1) Registers
(2) Creates
Writes
Monitors
Communicates using
(3) Grants
Alerts
Data
Producer
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
#1: Amazon Kinesis Data Stream producer limits
Kinesis
Data Stream
1 MB/sec
1000 PUT/sec
Kinesis Producer Limits:
Shards
per shard
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
#1: Amazon Kinesis Data Streams producer limits
IP video analytics use case:
HTTP
Gateway
Headwaters
Streaming
Data
Platform
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
#1: Amazon Kinesis Data Streams producer limits
Bandwidth surge Average bandwidth
Maximum bandwidth
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
#1: Amazon Kinesis Data Streams producer limits
Bandwidth limitation:
Average bandwidth
Maximum bandwidth
Cost
Data latency
Back-pressure scale
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
#1: Amazon Kinesis Data Streams producer limits
n: Number of shards in Kinesis Data Streams
n ∈ ℕ*
• 1000 PUT/sec/shard
n ≥
MaxBandwidth
1 MB/sec
n ≥
MaxThroughput
1000 messages/sec
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
#2: Amazon Kinesis Data Streams consumer limits
Kinesis Limitations:
n ≥
MaxBandwidth × NbConsumers
2 MB/sec
2 MB / sec / shard
5 GET / sec / shard
Consumers
Copy
Slave
Consumers
Master
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
#3: Split/merge shards
0
1
2
3
4
5
6
1 2 3 4 5
Number of Shards
Data Stream (1 day retention)
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
#3: Update shard count
0
0 1
0
4
1
5
0
4
1
5 6
0
24
1
5 6
0
24
1
5 6
9 10
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
#4: Consumption speed
Kinesis stream
(1000 msg/sec)
Consumer 1
(10000 msg/sec)
Consumer 2
(500 msg/sec)
Producer
(1500 msg/sec)
3000 msg/sec
3000 msg/sec
1500 msg/sec
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
#4: Consumption speed
n : Number of shards in Kinesis Data Streams
N : Number of consumers
AverageThroughput : Average throughput of the producer
Ci(MaxThroughput) : Maximum throughput of consumer Ci
n ≥ maxi=1
N AverageThroughput
Ci(MaxThroughput)
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Producer:
• Constant 5MB/s
• Surge @ 20MB/s
 20-shard data stream
Four Consumers
Each consumer has:
20 × 2MB/s
4
= 10MB/s
Time
Bandwidth (MB/s)
#5: Max acceptable latency
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
On the producer side:
SurgeDur × SurgeBw + CiMaxLat
× MaxInBw
On the consumer side:
CiMaxBw × (SurgeDur + CiMaxLat)
Also:
CiMaxBw =
n × 2MB/sec
NbConsumers
Time
Bandwidth (MB/s)
#5: Max acceptable latency
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
#5: Max acceptable latency
Therefore:
Yields:
Example: ൜
Surge: 10s
CiMaxLat: 10s
 n ≥
4× 10×20+10×5
2× 10+10
= 25 shards
n × 2MB/sec
NbConsumers
× SurgeDur + CiMaxLat ≥
SurgeDur × SurgeBw
+ CiMaxLat × MaxInBw
n ≥
NbConsumers × (SurgeDur × SurgeBw + CiMaxLat × MaxInBw)
2MB/sec × (SurgeDur + CiMaxLat)
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Introducing enhanced fan-out consumers
Fan-out consumers:
- Dedicated 2MB/sec/shard
Isolate a consumer from the other consumers
Unchanged considerations:
#1: Kinesis Data Streams producer limits
#3: Split/merge operations
#4: Consumption speed
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
#2: Amazon Kinesis Data Streams consumer limits
Kinesis limitations:
n ≥
MaxInputBandwidth × NbConsumers
2 MB/sec
n ≥
MaxInputBandwidth × NbRegularConsumers
2 MB/sec
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Fan-out consumers: No lag
For regular consumers:
n ≥
NbConsumers × (SurgeDur × SurgeBw + CiMaxLat × MaxBw)
2MB/sec × (SurgeDur + CiMaxLat)
CiMaxLat = Maximum latency for regular consumers
n ≥
NbRegularConsumers × (SurgeDur × SurgeBw + CiMaxLat × MaxBw)
2MB/sec × (SurgeDur + CiMaxLat)
#5: Max acceptable latency
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Amazon Kinesis Data Streams Shard Calculator
https://t2m.io/vcAFLW8U
Thank you!
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Allan MacInnis
ajmac@amazon.com
Gabriel Commeau
Gabriel_Commeau@comcast.com
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.

More Related Content

What's hot

Real-Life Use Cases & Architectures for Event Streaming with Apache Kafka
Real-Life Use Cases & Architectures for Event Streaming with Apache KafkaReal-Life Use Cases & Architectures for Event Streaming with Apache Kafka
Real-Life Use Cases & Architectures for Event Streaming with Apache KafkaKai Wähner
 
Extending Analytics Beyond the Data Warehouse, ft. Warner Bros. Analytics (AN...
Extending Analytics Beyond the Data Warehouse, ft. Warner Bros. Analytics (AN...Extending Analytics Beyond the Data Warehouse, ft. Warner Bros. Analytics (AN...
Extending Analytics Beyond the Data Warehouse, ft. Warner Bros. Analytics (AN...Amazon Web Services
 
The Rise Of Event Streaming – Why Apache Kafka Changes Everything
The Rise Of Event Streaming – Why Apache Kafka Changes EverythingThe Rise Of Event Streaming – Why Apache Kafka Changes Everything
The Rise Of Event Streaming – Why Apache Kafka Changes EverythingKai Wähner
 
Observability For Modern Applications
Observability For Modern ApplicationsObservability For Modern Applications
Observability For Modern ApplicationsAmazon Web Services
 
A Thorough Comparison of Delta Lake, Iceberg and Hudi
A Thorough Comparison of Delta Lake, Iceberg and HudiA Thorough Comparison of Delta Lake, Iceberg and Hudi
A Thorough Comparison of Delta Lake, Iceberg and HudiDatabricks
 
AWS Lake Formation을 통한 손쉬운 데이터 레이크 구성 및 관리 - 윤석찬 :: AWS Unboxing 온라인 세미나
AWS Lake Formation을 통한 손쉬운 데이터 레이크 구성 및 관리 - 윤석찬 :: AWS Unboxing 온라인 세미나AWS Lake Formation을 통한 손쉬운 데이터 레이크 구성 및 관리 - 윤석찬 :: AWS Unboxing 온라인 세미나
AWS Lake Formation을 통한 손쉬운 데이터 레이크 구성 및 관리 - 윤석찬 :: AWS Unboxing 온라인 세미나Amazon Web Services Korea
 
Advanced Flink Training - Design patterns for streaming applications
Advanced Flink Training - Design patterns for streaming applicationsAdvanced Flink Training - Design patterns for streaming applications
Advanced Flink Training - Design patterns for streaming applicationsAljoscha Krettek
 
Kafka + Uber- The World’s Realtime Transit Infrastructure, Aaron Schildkrout
Kafka + Uber- The World’s Realtime Transit Infrastructure, Aaron SchildkroutKafka + Uber- The World’s Realtime Transit Infrastructure, Aaron Schildkrout
Kafka + Uber- The World’s Realtime Transit Infrastructure, Aaron Schildkroutconfluent
 
A deep dive into Amazon MSK - ADB206 - Chicago AWS Summit
A deep dive into Amazon MSK - ADB206 - Chicago AWS SummitA deep dive into Amazon MSK - ADB206 - Chicago AWS Summit
A deep dive into Amazon MSK - ADB206 - Chicago AWS SummitAmazon Web Services
 
Flink powered stream processing platform at Pinterest
Flink powered stream processing platform at PinterestFlink powered stream processing platform at Pinterest
Flink powered stream processing platform at PinterestFlink Forward
 
Parallelizing with Apache Spark in Unexpected Ways
Parallelizing with Apache Spark in Unexpected WaysParallelizing with Apache Spark in Unexpected Ways
Parallelizing with Apache Spark in Unexpected WaysDatabricks
 
Redis + Kafka = Performance at Scale | Julien Ruaux, Redis Labs
Redis + Kafka = Performance at Scale | Julien Ruaux, Redis LabsRedis + Kafka = Performance at Scale | Julien Ruaux, Redis Labs
Redis + Kafka = Performance at Scale | Julien Ruaux, Redis LabsHostedbyConfluent
 
Serverless computing with AWS Lambda
Serverless computing with AWS Lambda Serverless computing with AWS Lambda
Serverless computing with AWS Lambda Apigee | Google Cloud
 
ABD301-Analyzing Streaming Data in Real Time with Amazon Kinesis
ABD301-Analyzing Streaming Data in Real Time with Amazon KinesisABD301-Analyzing Streaming Data in Real Time with Amazon Kinesis
ABD301-Analyzing Streaming Data in Real Time with Amazon KinesisAmazon Web Services
 
Serverless Computing: build and run applications without thinking about servers
Serverless Computing: build and run applications without thinking about serversServerless Computing: build and run applications without thinking about servers
Serverless Computing: build and run applications without thinking about serversAmazon Web Services
 

What's hot (20)

Real-Life Use Cases & Architectures for Event Streaming with Apache Kafka
Real-Life Use Cases & Architectures for Event Streaming with Apache KafkaReal-Life Use Cases & Architectures for Event Streaming with Apache Kafka
Real-Life Use Cases & Architectures for Event Streaming with Apache Kafka
 
Introduction to Serverless
Introduction to ServerlessIntroduction to Serverless
Introduction to Serverless
 
Extending Analytics Beyond the Data Warehouse, ft. Warner Bros. Analytics (AN...
Extending Analytics Beyond the Data Warehouse, ft. Warner Bros. Analytics (AN...Extending Analytics Beyond the Data Warehouse, ft. Warner Bros. Analytics (AN...
Extending Analytics Beyond the Data Warehouse, ft. Warner Bros. Analytics (AN...
 
AWS API Gateway
AWS API GatewayAWS API Gateway
AWS API Gateway
 
The Rise Of Event Streaming – Why Apache Kafka Changes Everything
The Rise Of Event Streaming – Why Apache Kafka Changes EverythingThe Rise Of Event Streaming – Why Apache Kafka Changes Everything
The Rise Of Event Streaming – Why Apache Kafka Changes Everything
 
Observability For Modern Applications
Observability For Modern ApplicationsObservability For Modern Applications
Observability For Modern Applications
 
Introduction to Amazon Aurora
Introduction to Amazon AuroraIntroduction to Amazon Aurora
Introduction to Amazon Aurora
 
A Thorough Comparison of Delta Lake, Iceberg and Hudi
A Thorough Comparison of Delta Lake, Iceberg and HudiA Thorough Comparison of Delta Lake, Iceberg and Hudi
A Thorough Comparison of Delta Lake, Iceberg and Hudi
 
Introduction to Sagemaker
Introduction to SagemakerIntroduction to Sagemaker
Introduction to Sagemaker
 
AWS Lake Formation을 통한 손쉬운 데이터 레이크 구성 및 관리 - 윤석찬 :: AWS Unboxing 온라인 세미나
AWS Lake Formation을 통한 손쉬운 데이터 레이크 구성 및 관리 - 윤석찬 :: AWS Unboxing 온라인 세미나AWS Lake Formation을 통한 손쉬운 데이터 레이크 구성 및 관리 - 윤석찬 :: AWS Unboxing 온라인 세미나
AWS Lake Formation을 통한 손쉬운 데이터 레이크 구성 및 관리 - 윤석찬 :: AWS Unboxing 온라인 세미나
 
Advanced Flink Training - Design patterns for streaming applications
Advanced Flink Training - Design patterns for streaming applicationsAdvanced Flink Training - Design patterns for streaming applications
Advanced Flink Training - Design patterns for streaming applications
 
Kafka + Uber- The World’s Realtime Transit Infrastructure, Aaron Schildkrout
Kafka + Uber- The World’s Realtime Transit Infrastructure, Aaron SchildkroutKafka + Uber- The World’s Realtime Transit Infrastructure, Aaron Schildkrout
Kafka + Uber- The World’s Realtime Transit Infrastructure, Aaron Schildkrout
 
A deep dive into Amazon MSK - ADB206 - Chicago AWS Summit
A deep dive into Amazon MSK - ADB206 - Chicago AWS SummitA deep dive into Amazon MSK - ADB206 - Chicago AWS Summit
A deep dive into Amazon MSK - ADB206 - Chicago AWS Summit
 
Flink powered stream processing platform at Pinterest
Flink powered stream processing platform at PinterestFlink powered stream processing platform at Pinterest
Flink powered stream processing platform at Pinterest
 
Parallelizing with Apache Spark in Unexpected Ways
Parallelizing with Apache Spark in Unexpected WaysParallelizing with Apache Spark in Unexpected Ways
Parallelizing with Apache Spark in Unexpected Ways
 
Redis + Kafka = Performance at Scale | Julien Ruaux, Redis Labs
Redis + Kafka = Performance at Scale | Julien Ruaux, Redis LabsRedis + Kafka = Performance at Scale | Julien Ruaux, Redis Labs
Redis + Kafka = Performance at Scale | Julien Ruaux, Redis Labs
 
Real time data quality on Flink
Real time data quality on FlinkReal time data quality on Flink
Real time data quality on Flink
 
Serverless computing with AWS Lambda
Serverless computing with AWS Lambda Serverless computing with AWS Lambda
Serverless computing with AWS Lambda
 
ABD301-Analyzing Streaming Data in Real Time with Amazon Kinesis
ABD301-Analyzing Streaming Data in Real Time with Amazon KinesisABD301-Analyzing Streaming Data in Real Time with Amazon Kinesis
ABD301-Analyzing Streaming Data in Real Time with Amazon Kinesis
 
Serverless Computing: build and run applications without thinking about servers
Serverless Computing: build and run applications without thinking about serversServerless Computing: build and run applications without thinking about servers
Serverless Computing: build and run applications without thinking about servers
 

Similar to High Performance Data Streaming with Amazon Kinesis: Best Practices (ANT322-R1) - AWS re:Invent 2018

Using Amazon Kinesis Data Streams as a Low-Latency Message Bus (ANT361) - AWS...
Using Amazon Kinesis Data Streams as a Low-Latency Message Bus (ANT361) - AWS...Using Amazon Kinesis Data Streams as a Low-Latency Message Bus (ANT361) - AWS...
Using Amazon Kinesis Data Streams as a Low-Latency Message Bus (ANT361) - AWS...Amazon Web Services
 
Serverless Stream Processing Tips & Tricks (ANT358) - AWS re:Invent 2018
Serverless Stream Processing Tips & Tricks (ANT358) - AWS re:Invent 2018Serverless Stream Processing Tips & Tricks (ANT358) - AWS re:Invent 2018
Serverless Stream Processing Tips & Tricks (ANT358) - AWS re:Invent 2018Amazon Web Services
 
Serverless Stream Processing Pipeline Best Practices (SRV316-R1) - AWS re:Inv...
Serverless Stream Processing Pipeline Best Practices (SRV316-R1) - AWS re:Inv...Serverless Stream Processing Pipeline Best Practices (SRV316-R1) - AWS re:Inv...
Serverless Stream Processing Pipeline Best Practices (SRV316-R1) - AWS re:Inv...Amazon Web Services
 
Real-Time Web Analytics with Amazon Kinesis Data Analytics (ADT401) - AWS re:...
Real-Time Web Analytics with Amazon Kinesis Data Analytics (ADT401) - AWS re:...Real-Time Web Analytics with Amazon Kinesis Data Analytics (ADT401) - AWS re:...
Real-Time Web Analytics with Amazon Kinesis Data Analytics (ADT401) - AWS re:...Amazon Web Services
 
SRV316 Serverless Data Processing at Scale: An Amazon.com Case Study
 SRV316 Serverless Data Processing at Scale: An Amazon.com Case Study SRV316 Serverless Data Processing at Scale: An Amazon.com Case Study
SRV316 Serverless Data Processing at Scale: An Amazon.com Case StudyAmazon Web Services
 
BDA307 Analyzing Data Streams in Real Time with Amazon Kinesis
BDA307 Analyzing Data Streams in Real Time with Amazon KinesisBDA307 Analyzing Data Streams in Real Time with Amazon Kinesis
BDA307 Analyzing Data Streams in Real Time with Amazon KinesisAmazon Web Services
 
Analyzing Streams: Data Analytics Week at the SF Loft
Analyzing Streams: Data Analytics Week at the SF LoftAnalyzing Streams: Data Analytics Week at the SF Loft
Analyzing Streams: Data Analytics Week at the SF LoftAmazon Web Services
 
Analyzing Streams: Data Analytics Week SF
Analyzing Streams: Data Analytics Week SFAnalyzing Streams: Data Analytics Week SF
Analyzing Streams: Data Analytics Week SFAmazon Web Services
 
Amazon Kinesis - Building Serverless real-time solution - Tel Aviv Summit 2018
Amazon Kinesis - Building Serverless real-time solution - Tel Aviv Summit 2018Amazon Kinesis - Building Serverless real-time solution - Tel Aviv Summit 2018
Amazon Kinesis - Building Serverless real-time solution - Tel Aviv Summit 2018Amazon Web Services
 
Serverless Stream Processing Tips & Tricks - BDA311 - Chicago AWS Summit
Serverless Stream Processing Tips & Tricks - BDA311 - Chicago AWS SummitServerless Stream Processing Tips & Tricks - BDA311 - Chicago AWS Summit
Serverless Stream Processing Tips & Tricks - BDA311 - Chicago AWS SummitAmazon Web Services
 
Serverless Architectural Patterns - GOTO Amsterdam
Serverless Architectural Patterns - GOTO AmsterdamServerless Architectural Patterns - GOTO Amsterdam
Serverless Architectural Patterns - GOTO AmsterdamBoaz Ziniman
 
Serverless in Big Data
Serverless in Big DataServerless in Big Data
Serverless in Big DataEric Johnson
 
Tape Is a Four Letter Word: Back Up to the Cloud in Under an Hour (STG201) - ...
Tape Is a Four Letter Word: Back Up to the Cloud in Under an Hour (STG201) - ...Tape Is a Four Letter Word: Back Up to the Cloud in Under an Hour (STG201) - ...
Tape Is a Four Letter Word: Back Up to the Cloud in Under an Hour (STG201) - ...Amazon Web Services
 
Build an AWS Analytics Solution to Monitor the Video Streaming Experience (MA...
Build an AWS Analytics Solution to Monitor the Video Streaming Experience (MA...Build an AWS Analytics Solution to Monitor the Video Streaming Experience (MA...
Build an AWS Analytics Solution to Monitor the Video Streaming Experience (MA...Amazon Web Services
 
Shift-Left SRE: Self-Healing with AWS Lambda Functions (DEV313-S) - AWS re:In...
Shift-Left SRE: Self-Healing with AWS Lambda Functions (DEV313-S) - AWS re:In...Shift-Left SRE: Self-Healing with AWS Lambda Functions (DEV313-S) - AWS re:In...
Shift-Left SRE: Self-Healing with AWS Lambda Functions (DEV313-S) - AWS re:In...Amazon Web Services
 

Similar to High Performance Data Streaming with Amazon Kinesis: Best Practices (ANT322-R1) - AWS re:Invent 2018 (20)

Using Amazon Kinesis Data Streams as a Low-Latency Message Bus (ANT361) - AWS...
Using Amazon Kinesis Data Streams as a Low-Latency Message Bus (ANT361) - AWS...Using Amazon Kinesis Data Streams as a Low-Latency Message Bus (ANT361) - AWS...
Using Amazon Kinesis Data Streams as a Low-Latency Message Bus (ANT361) - AWS...
 
Serverless Stream Processing Tips & Tricks (ANT358) - AWS re:Invent 2018
Serverless Stream Processing Tips & Tricks (ANT358) - AWS re:Invent 2018Serverless Stream Processing Tips & Tricks (ANT358) - AWS re:Invent 2018
Serverless Stream Processing Tips & Tricks (ANT358) - AWS re:Invent 2018
 
Analyzing Streams
Analyzing StreamsAnalyzing Streams
Analyzing Streams
 
Serverless Stream Processing Pipeline Best Practices (SRV316-R1) - AWS re:Inv...
Serverless Stream Processing Pipeline Best Practices (SRV316-R1) - AWS re:Inv...Serverless Stream Processing Pipeline Best Practices (SRV316-R1) - AWS re:Inv...
Serverless Stream Processing Pipeline Best Practices (SRV316-R1) - AWS re:Inv...
 
Real-Time Web Analytics with Amazon Kinesis Data Analytics (ADT401) - AWS re:...
Real-Time Web Analytics with Amazon Kinesis Data Analytics (ADT401) - AWS re:...Real-Time Web Analytics with Amazon Kinesis Data Analytics (ADT401) - AWS re:...
Real-Time Web Analytics with Amazon Kinesis Data Analytics (ADT401) - AWS re:...
 
SRV316 Serverless Data Processing at Scale: An Amazon.com Case Study
 SRV316 Serverless Data Processing at Scale: An Amazon.com Case Study SRV316 Serverless Data Processing at Scale: An Amazon.com Case Study
SRV316 Serverless Data Processing at Scale: An Amazon.com Case Study
 
BDA307 Analyzing Data Streams in Real Time with Amazon Kinesis
BDA307 Analyzing Data Streams in Real Time with Amazon KinesisBDA307 Analyzing Data Streams in Real Time with Amazon Kinesis
BDA307 Analyzing Data Streams in Real Time with Amazon Kinesis
 
Analyzing Streams
Analyzing StreamsAnalyzing Streams
Analyzing Streams
 
Analyzing Streams
Analyzing StreamsAnalyzing Streams
Analyzing Streams
 
Analyzing Streams
Analyzing StreamsAnalyzing Streams
Analyzing Streams
 
Analyzing Streams: Data Analytics Week at the SF Loft
Analyzing Streams: Data Analytics Week at the SF LoftAnalyzing Streams: Data Analytics Week at the SF Loft
Analyzing Streams: Data Analytics Week at the SF Loft
 
Analyzing Streams
Analyzing StreamsAnalyzing Streams
Analyzing Streams
 
Analyzing Streams: Data Analytics Week SF
Analyzing Streams: Data Analytics Week SFAnalyzing Streams: Data Analytics Week SF
Analyzing Streams: Data Analytics Week SF
 
Amazon Kinesis - Building Serverless real-time solution - Tel Aviv Summit 2018
Amazon Kinesis - Building Serverless real-time solution - Tel Aviv Summit 2018Amazon Kinesis - Building Serverless real-time solution - Tel Aviv Summit 2018
Amazon Kinesis - Building Serverless real-time solution - Tel Aviv Summit 2018
 
Serverless Stream Processing Tips & Tricks - BDA311 - Chicago AWS Summit
Serverless Stream Processing Tips & Tricks - BDA311 - Chicago AWS SummitServerless Stream Processing Tips & Tricks - BDA311 - Chicago AWS Summit
Serverless Stream Processing Tips & Tricks - BDA311 - Chicago AWS Summit
 
Serverless Architectural Patterns - GOTO Amsterdam
Serverless Architectural Patterns - GOTO AmsterdamServerless Architectural Patterns - GOTO Amsterdam
Serverless Architectural Patterns - GOTO Amsterdam
 
Serverless in Big Data
Serverless in Big DataServerless in Big Data
Serverless in Big Data
 
Tape Is a Four Letter Word: Back Up to the Cloud in Under an Hour (STG201) - ...
Tape Is a Four Letter Word: Back Up to the Cloud in Under an Hour (STG201) - ...Tape Is a Four Letter Word: Back Up to the Cloud in Under an Hour (STG201) - ...
Tape Is a Four Letter Word: Back Up to the Cloud in Under an Hour (STG201) - ...
 
Build an AWS Analytics Solution to Monitor the Video Streaming Experience (MA...
Build an AWS Analytics Solution to Monitor the Video Streaming Experience (MA...Build an AWS Analytics Solution to Monitor the Video Streaming Experience (MA...
Build an AWS Analytics Solution to Monitor the Video Streaming Experience (MA...
 
Shift-Left SRE: Self-Healing with AWS Lambda Functions (DEV313-S) - AWS re:In...
Shift-Left SRE: Self-Healing with AWS Lambda Functions (DEV313-S) - AWS re:In...Shift-Left SRE: Self-Healing with AWS Lambda Functions (DEV313-S) - AWS re:In...
Shift-Left SRE: Self-Healing with AWS Lambda Functions (DEV313-S) - AWS re:In...
 

More from Amazon Web Services

Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Amazon Web Services
 
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Amazon Web Services
 
Esegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateEsegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateAmazon Web Services
 
Costruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSCostruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSAmazon Web Services
 
Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Amazon Web Services
 
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Amazon Web Services
 
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...Amazon Web Services
 
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsMicrosoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsAmazon Web Services
 
Database Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareDatabase Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareAmazon Web Services
 
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSCrea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSAmazon Web Services
 
API moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAPI moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAmazon Web Services
 
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareDatabase Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareAmazon Web Services
 
Tools for building your MVP on AWS
Tools for building your MVP on AWSTools for building your MVP on AWS
Tools for building your MVP on AWSAmazon Web Services
 
How to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckHow to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckAmazon Web Services
 
Building a web application without servers
Building a web application without serversBuilding a web application without servers
Building a web application without serversAmazon Web Services
 
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...Amazon Web Services
 
Introduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceIntroduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceAmazon Web Services
 

More from Amazon Web Services (20)

Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
 
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
 
Esegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateEsegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS Fargate
 
Costruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSCostruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWS
 
Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot
 
Open banking as a service
Open banking as a serviceOpen banking as a service
Open banking as a service
 
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
 
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
 
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsMicrosoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
 
Computer Vision con AWS
Computer Vision con AWSComputer Vision con AWS
Computer Vision con AWS
 
Database Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareDatabase Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatare
 
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSCrea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
 
API moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAPI moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e web
 
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareDatabase Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
 
Tools for building your MVP on AWS
Tools for building your MVP on AWSTools for building your MVP on AWS
Tools for building your MVP on AWS
 
How to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckHow to Build a Winning Pitch Deck
How to Build a Winning Pitch Deck
 
Building a web application without servers
Building a web application without serversBuilding a web application without servers
Building a web application without servers
 
Fundraising Essentials
Fundraising EssentialsFundraising Essentials
Fundraising Essentials
 
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
 
Introduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceIntroduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container Service
 

High Performance Data Streaming with Amazon Kinesis: Best Practices (ANT322-R1) - AWS re:Invent 2018

  • 1.
  • 2. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. High Performance Data Streaming with Amazon Kinesis: Best Practices Allan MacInnis Principal Solutions Architect AWS Gabriel Commeau Data Platforms Architect Comcast A N T 3 2 2 - R 1
  • 3. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Agenda • Streaming data overview • Interactive demo • Introduction to Amazon Kinesis • Standard consumers vs. enhanced fan-out consumers • Headwaters: Comcast streaming data platform • Five considerations to scale an Amazon Kinesis Data Streams with standard consumers • Impacts of enhanced fan-out
  • 4. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Timely decisions require new data in minutes Source: Perishable insights, Mike Gualtieri, Forrester Data loses value quickly over time Real time Seconds Minutes Hours Days Months Valueofdatatodecision-making Preventive/Predictive Actionable Reactive Historical Time critical decisions Traditional “batch” business intelligence Information half-life in decision-making
  • 5. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Stream new data in seconds Get actionable insights quickly Streaming Ingest data as it’s generated Real-time analytics/ML, alerts, actions Process data on the fly
  • 6. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Most common uses of streaming Industrial Automation Smart Home Smart City Data Lakes IoT Analytics Log Analytics
  • 7. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Streaming with Amazon Kinesis Easily collect, process, and analyze video and data streams in real-time Capture, process, and store video streams Amazon Kinesis Video Streams Load data streams into data stores Amazon Kinesis Data Firehose SQL Analyze data streams with SQL Amazon Kinesis Data Analytics Capture, process, and store data streams Amazon Kinesis Data Streams
  • 8. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
  • 9. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Demo amzn.to/bigdata
  • 10. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Demo architecture
  • 11. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
  • 12. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Amazon Kinesis Data Streams producers and consumers Producers Consumers Kinesis Agent Apache Kafka AWS SDK LOG4J Flume Fluentd AWS Mobile SDK for iOS Amazon Kinesis Producer Library Get* APIs Amazon Kinesis Client Library + Connector Library Apache Storm Amazon EMR AWS Lambda Apache Spark Amazon Kinesis
  • 13. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Amazon Kinesis Data Streams: Standard consumers Shard 1 Shard 2 Shard 3 Shard n Kinesis Data Stream Consumer application A GetRecords() Data GetRecords(): Five transactions per second, per shard Data: 2MB per second, per shard Data producer up to 1 MB or 1000 records per second, per shard With only one consumer application, records can be retrieved every 200 ms
  • 14. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Amazon Kinesis Data Streams: Standard consumers Shard 1 Shard 2 Shard 3 Shard n Kinesis Data Stream Consumer application A Data producer Consumer application B Consumer application C Consumer application D Consumer application E With more consumer applications, propagation delay increases For example, with five consumer applications, each can only retrieve records once per second, and less than 400 KBps <= 400 KBps
  • 15. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Amazon Kinesis Data Streams: Enhanced fan-out consumers Consumers do not poll. Messages are pushed to the consumer as they arrive Shard 1 Kinesis Data Stream Data producer Consumer application A SubscribeToShard() Uses HTTP/2 • Up to five mins connection • Data pushed to consumer persist
  • 16. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Amazon Kinesis Data Streams: Enhanced fan-out consumers Each consumer application gets dedicated 2MB per second egress, per shard Shard 1 Kinesis Data Stream Data producer Consumer application B Consumer application A RegisterStreamConsumer() EFO Pipe RegisterStreamConsumer() EFO Pipe
  • 17. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Amazon Kinesis Data Streams: Enhanced fan-out When to use standard consumers: • Total number of consuming applications is low (< 3) • Consumers are not latency-sensitive • Minimize cost When to use enhanced fan-out consumers: • Multiple consumer applications for the same Kinesis Data Stream • Default limit of five registered consuming applications. More can be supported with a service limit increase request • Low-latency requirements for data processing • Messages are typically delivered to a consumer in less than 70 ms
  • 18. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
  • 19. Over time we came to call […] the concept of managing this [stream data] centrally a “streaming platform.” Jay Kreps
  • 20. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Headwaters: Comcast streaming data platform • Decouple data producer from its consumers • Scale data systems independently • Act as buffer • Formalize data exchanges • Assist with data stream management • Scale data stream • Data retention period management • Consumer on-boarding governance • Foster real-time data exchange • Stream metadata • Data schema management • Cost model
  • 21. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Headwaters architecture Manages Creates Headwaters control plane Headwaters Data StreamHeadwaters Data StreamHeadwaters data stream CI/CD pipeline
  • 22. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Headwaters producer on-boarding Headwaters control plane Headwaters data stream (4) Welcome Email (1) Registers (2) Creates Writes Monitors Communicates using (3) Grants Alerts Data Producer
  • 23. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
  • 24. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. #1: Amazon Kinesis Data Stream producer limits Kinesis Data Stream 1 MB/sec 1000 PUT/sec Kinesis Producer Limits: Shards per shard
  • 25. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. #1: Amazon Kinesis Data Streams producer limits IP video analytics use case: HTTP Gateway Headwaters Streaming Data Platform
  • 26. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. #1: Amazon Kinesis Data Streams producer limits Bandwidth surge Average bandwidth Maximum bandwidth
  • 27. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. #1: Amazon Kinesis Data Streams producer limits Bandwidth limitation: Average bandwidth Maximum bandwidth Cost Data latency Back-pressure scale
  • 28. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. #1: Amazon Kinesis Data Streams producer limits n: Number of shards in Kinesis Data Streams n ∈ ℕ* • 1000 PUT/sec/shard n ≥ MaxBandwidth 1 MB/sec n ≥ MaxThroughput 1000 messages/sec
  • 29. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. #2: Amazon Kinesis Data Streams consumer limits Kinesis Limitations: n ≥ MaxBandwidth × NbConsumers 2 MB/sec 2 MB / sec / shard 5 GET / sec / shard Consumers Copy Slave Consumers Master
  • 30. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. #3: Split/merge shards 0 1 2 3 4 5 6 1 2 3 4 5 Number of Shards Data Stream (1 day retention)
  • 31. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. #3: Update shard count 0 0 1 0 4 1 5 0 4 1 5 6 0 24 1 5 6 0 24 1 5 6 9 10
  • 32. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. #4: Consumption speed Kinesis stream (1000 msg/sec) Consumer 1 (10000 msg/sec) Consumer 2 (500 msg/sec) Producer (1500 msg/sec) 3000 msg/sec 3000 msg/sec 1500 msg/sec
  • 33. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. #4: Consumption speed n : Number of shards in Kinesis Data Streams N : Number of consumers AverageThroughput : Average throughput of the producer Ci(MaxThroughput) : Maximum throughput of consumer Ci n ≥ maxi=1 N AverageThroughput Ci(MaxThroughput)
  • 34. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Producer: • Constant 5MB/s • Surge @ 20MB/s  20-shard data stream Four Consumers Each consumer has: 20 × 2MB/s 4 = 10MB/s Time Bandwidth (MB/s) #5: Max acceptable latency
  • 35. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. On the producer side: SurgeDur × SurgeBw + CiMaxLat × MaxInBw On the consumer side: CiMaxBw × (SurgeDur + CiMaxLat) Also: CiMaxBw = n × 2MB/sec NbConsumers Time Bandwidth (MB/s) #5: Max acceptable latency
  • 36. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. #5: Max acceptable latency Therefore: Yields: Example: ൜ Surge: 10s CiMaxLat: 10s  n ≥ 4× 10×20+10×5 2× 10+10 = 25 shards n × 2MB/sec NbConsumers × SurgeDur + CiMaxLat ≥ SurgeDur × SurgeBw + CiMaxLat × MaxInBw n ≥ NbConsumers × (SurgeDur × SurgeBw + CiMaxLat × MaxInBw) 2MB/sec × (SurgeDur + CiMaxLat)
  • 37. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
  • 38. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Introducing enhanced fan-out consumers Fan-out consumers: - Dedicated 2MB/sec/shard Isolate a consumer from the other consumers Unchanged considerations: #1: Kinesis Data Streams producer limits #3: Split/merge operations #4: Consumption speed
  • 39. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. #2: Amazon Kinesis Data Streams consumer limits Kinesis limitations: n ≥ MaxInputBandwidth × NbConsumers 2 MB/sec n ≥ MaxInputBandwidth × NbRegularConsumers 2 MB/sec
  • 40. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Fan-out consumers: No lag For regular consumers: n ≥ NbConsumers × (SurgeDur × SurgeBw + CiMaxLat × MaxBw) 2MB/sec × (SurgeDur + CiMaxLat) CiMaxLat = Maximum latency for regular consumers n ≥ NbRegularConsumers × (SurgeDur × SurgeBw + CiMaxLat × MaxBw) 2MB/sec × (SurgeDur + CiMaxLat) #5: Max acceptable latency
  • 41. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Amazon Kinesis Data Streams Shard Calculator https://t2m.io/vcAFLW8U
  • 42. Thank you! © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Allan MacInnis ajmac@amazon.com Gabriel Commeau Gabriel_Commeau@comcast.com
  • 43. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.