Choose Right Stream Storage:
Kinesis Data Streams vs MSK
Sungmin, Kim
Solutions Architect, AWS
2020-10-07
Agenda
• Key Components of Real-time Analytics
• Anatomy of Amazon Kinesis Data Streams and MSK
• Comparing Amazon Kinesis Data Streams to MSK
• Monitoring Metrics
• Reference Architecture
• Key Takeaways
Key Components of Real-
time Analytics
From Batch to Real-time:
Lambda Architecture
Data
Source
Stream
Storage
Speed Layer
Batch Layer
Batch
Process
Batch
View
Real-
time
View
Consumer
Query & Merge
Results
Service Layer
Stream
Ingestion
Raw Data
Storage
Streaming Data
Stream
Delivery
Stream
Process
Lambda Architecture
Streaming
Data
Batch View
Stream Process
Real-time
View
Query
Query
Batch View
Real-time
View
Raw Data
Batch Process
Batch Layer Serving Layer
Speed Layer
Key Components of Real-time Analytics
Data
Source
Stream
Storage
Stream
Process
Stream
Ingestion
Data
Sink
Devices and/or
applications that
produce real-time
data at high
velocity
Data from tens of
thousands of data
sources can be
written to a single
stream
Data are stored in the
order they were
received for a set
duration of time and
can be replayed
indefinitely during
that time
Records are read in
the order they are
produced, enabling
real-time analytics
or streaming ETL
Data lake
(most common)
Database
(least common)
Stream Storage
Data
Source
Stream
Storage
Stream
Process
Stream
Ingestion
Data
Sink
Amazon Kinesis
Data Streams
Amazon Managed
Streaming for Kafka
Anatomy of Amazon Kinesis
Data Streams and MSK
Key Features of Kinesis Data Streams and MSK
• Distributed Queue • Stream Storage
#Queue #Distributed #Storage
Consumer
oldest datanewest data
5 4 3 2 1 0
3 2 1 0 2
#Queue: FIFO, Scale-Up vs Scale-Out
5 4
4 3 2 1 05
Producers
Hash
Function
Consumer
PK
PK
PK
PK
oldest datanewest data
Producers
shard/partition-1
shard/partition-2
3 2 1 0
5 4 3 2 1 0
4 3 2 1 0
2
shard/partition-3
#Distributed: Scale-Out
Consumer0
Consumer4
0
Consumer Group
4 3 2 1 0
Hash
Function
Consumer
Consumer
Consumer
Consumer Group
PK
PK
PK
PK
= next consumer offset oldest datanewest data
Producers
shard/partition-1
shard/partition-2
5 4 3 2 1 0
3 2 1 0
4 3 2 1 0
4
2
0
shard/partition-3
#Storage: Stream Buffer
2 1 0
4 3 2 1 0
0
Hash
Function
Consumer
Consumer
Consumer
Consumer Group
PK
PK
PK
PK
= next consumer offset oldest datanewest data
Amazon Kinesis
Data Streams
Amazon Managed
Streaming for Kafka
Producers
shard/partition-1
shard/partition-2
5 4 3 2 1 0
3 2 1 0
4 3 2 1 0
4
2
0
shard/partition-3
Anatomy of
Benefits of Stream Storage
• Decouple producers &
consumers
• Persistent buffer
• Collect multiple streams
• Preserve client ordering
• Parallel consumption
• Streaming MapReduce
Comparing Amazon Kinesis
Data Streams to MSK
Topic
Amazon Kinesis
Data Streams
Amazon Managed
Streaming for Kafka
Comparing Kinesis Data Streams to MSK
Amazon Kinesis
Data Streams
Amazon Managed
Streaming for Kafka
• Operational Perspective
• Number of clusters?
• Number of brokers per cluster?
• Number of topics per broker?
• Number of partitions per topic?
• Cluster provisioning model
• Only increase number of
partitions; can’t decrease
• Integration with a few of AWS
Services such as Kinesis Data
Analytics for Java
• Operational Perspective
• Number of Kinesis Data Streams?
• Number of shards per stream?
• Throughput provisioning model
• Increase/Decrease number of
shards
• Fully Integration with AWS
Services such as Lambda
function, Kinesis Data Analytics,
etc
Monitoring Metrics
RequestQueue
- Length
- WaitTime
ResponseQueue
- Length
- WaitTime
Network
- Packet Drop?
Produce/Consume Rate Unbalance
Who is Leader? Disk Full?
Too many topics?
Metrics to Monitor: MSK (Kafka)
Metrics to Monitor: MSK (Kafka)
Metric Level Description
ActiveControllerCount DEFAULT Only one controller per cluster should be active at any given time.
OfflinePartitionsCount DEFAULT Total number of partitions that are offline in the cluster.
GlobalPartitionCount DEFAULT Total number of partitions across all brokers in the cluster.
GlobalTopicCount DEFAULT Total number of topics across all brokers in the cluster.
KafkaAppLogsDiskUsed DEFAULT The percentage of disk space used for application logs.
KafkaDataLogsDiskUsed DEFAULT The percentage of disk space used for data logs.
RootDiskUsed DEFAULT The percentage of the root disk used by the broker.
PartitionCount PER_BROKER The number of partitions for the broker.
LeaderCount PER_BROKER The number of leader replicas.
UnderMinIsrPartitionCount PER_BROKER The number of under minIsr partitions for the broker.
UnderReplicatedPartitions PER_BROKER The number of under-replicated partitions for the broker.
FetchConsumerTotalTimeMsMean PER_BROKER The mean total time in milliseconds that consumers spend on
fetching data from the broker.
ProduceTotalTimeMsMean PER_BROKER The mean produce time in milliseconds.
How about monitoring Kinesis Data Streams?
How long time does a record stay in a shard?
5 transactions
per second,
per shard
With only one
consumer application,
records can be
retrieved every 200 ms
up to 1MB or 1,000
records per seconds,
per shard for writes
• 10MB per second, per shard
• up to 10,000 records per call
Consumer
Application
GetRecords()
Data
Metrics to Monitor: Kinesis Data Streams
Metric Description
GetRecords.IteratorAgeMilliseconds Age of the last record in all GetRecords
ReadProvisionedThroughputExceeded Number of GetRecords calls throttled
WriteProvisionedThroughputExceeded Number of PutRecord(s) calls throttled
PutRecord.Success, PutRecords.Success Number of successful PutRecord(s) operations
GetRecords.Success Number of successful GetRecords operations
Choosing Right Metrics
Too Much = Useless = Too Little
Kafka vs MSK vs Kinesis Data Streams
Operational
Excellence
Kinesis Data
Streams
Kafka
Amazon MSK
Degree of Freedom
≈ Complexity
Comparison Summary
Attribute Apache Kafka Kinesis Streams Managed Streaming for Kafka
Cost $$$ $ (pay for what you use) $$ (pay for infrastructure)
Ease of use Advanced setup required Get started in minutes Get started in minutes
Management Overhead High Low Low
Scalability Difficult to scale
Scale in seconds with one
click
Scale in minutes with one click
Throughput Infinite
Scales with shards, supports
up to 1mb payloads
Infinite
Durability Configurable 3x by default Configurable
Infrastructure You manage AWS manages AWS manages
Write-to-Read Latency <100 ms is achievable <100 ms (with HTTP/2) <100 ms is achievable
Open Sourced? Yes No Yes
Reference Architecture
Data Hub: (Asynchronous) Event-Bus
Kinesis
Data Streams
Kinesis
Data Firehose
Amazon S3
Amazon EC2
AWS Lambda
Amazon ECS
Kinesis
Data Analytics
Amazon ES
Amazon Athena
Amazon CloudWatch
https://aws.amazon.com/solutions/case-studies/autodesk-log-analytics/
Example Usage Pattern 1: Data Hub
Amazon
MSK
Log Aggregation
Web servers access log
Aggregated logs
Example Usage Pattern 2: Web Analytics
and Leaderboards
Amazon
DynamoDB
Amazon Kinesis
Data Analytics
Amazon Kinesis
Data Streams
Amazon
Cognito
Lightweight JS
client code
Web server on
Amazon EC2
OR
Compute top 10 usersIngest web app data Persist to feed live apps
Lambda
function
https://aws.amazon.com/solutions/implementations/real-time-web-analytics-with-kinesis/
Amazon MSK
IoT
IoT
Things
Remote Control
Prediction/
Fraud Detection
Device Monitoring
Quality Control
Data Visualization
Events
Analytics
AI/ML
https://aws.amazon.com/blogs/aws/new-serverless-streaming-etl-with-aws-glue/
Example Usage Pattern 3: Monitoring
IoT Devices
Ingest sensor data
Convert json
to parquet
Store all data points
in an S3 data lake
AWS IoT
Core
IoT rule
AWS Glue
Streaming Job
Amazon Athena
Glue
Crawler
Glue Data
Catalog
S3
Bucket
AWS Cloud
MQTT
Topic
Amazon Kinesis
Data Streams
Raspberry PI
+ Sense HAT
Event Sourcing and CQRS
https://www.confluent.io/blog/event-sourcing-cqrs-stream-processing-apache-kafka-whats-connection/
App Write Interface App Read Interface
Event Queue
Application
State
Kafka Streams
Topology
Kafka Topic
Event Handler
App Write Interface App Read Interface
Kafka
Streams
State Store
Event Store
Event Handler + App State
Event Store
Amazon Kinesis
Data Streams
Amazon Kinesis
Data Analytics
(SQL)
Example Usage Pattern 4: Streaming SQL
Continuous filter
Aggregate function
Data enrichment (join)
S3 Bucket
Anomaly Detection
Ticker, Company
AMZN, Amazon
ASD, SomeCompanyA
BAC, SomeCompanyB
CRM, SomeCompanyC
Event Store
https://docs.aws.amazon.com/kinesisanalytics/latest/dev/examples.html
App Write Interface App Read Interface
{"TICKER_SYMBOL": "CVB",
"SECTOR": "TECHNOLOGY",
"CHANGE": 0.81,
"PRICE": 53.63}
{"TICKER_SYMBOL": "ABC",
"SECTOR": "RETAIL",
"CHANGE": -1.14,
"PRICE": 23.64}
{"TICKER_SYMBOL": "JKL",
"SECTOR": "TECHNOLOGY",
"CHANGE": 0.22,
"PRICE": 15.32}
Event Handler
+ App State join
Takeaways
Lambda
Kappa
Lambda vs Kappa Architecture
Key Takeaways
• Distributed Queue as Stream Storage
• Preserve Ordering
• Parallel Consumption
• Persistent Buffer
• Decouple producers & consumers
• Trade-off: Operational Excellence vs Degree of Freedom
• MUST keep an eye on the right monitoring metrics
• Architectural Patterns
• Data Hub: (Asynchronous) Event-Bus
• Log Aggregation
• IoT
• Event Sourcing and CQRS
Where To Go Next?
• Amazon MSK Labs
https://amazonmsk-labs.workshop.aws/
• Amazon Managed Streaming for Kafka: Best Practices
https://docs.aws.amazon.com/msk/latest/developerguide/bestpractices.html
• Monitoring Kafka performance metrics (2020-04-16)
https://tinyurl.com/y6hrhwbq
• Apache Kafka 모니터링을 위한 Metrics 이해 및 최적화 방안 (2018-11)
https://tinyurl.com/y4uwyenx
• AWS Analytics Immersion Day - Build BI System from Scratch
• Workshop - https://tinyurl.com/yapgwv77
• Slides - https://tinyurl.com/ybxkb74b
• Realtime Analytics on AWS
https://tinyurl.com/y3evwm3v
• Writing SQL on Streaming Data with Amazon Kinesis Analytics – Part 1, 2
• Part1 - https://tinyurl.com/y8vo8q7o
• Part2 - https://tinyurl.com/ycbv7wel

Choose Right Stream Storage: Amazon Kinesis Data Streams vs MSK

  • 1.
    Choose Right StreamStorage: Kinesis Data Streams vs MSK Sungmin, Kim Solutions Architect, AWS 2020-10-07
  • 2.
    Agenda • Key Componentsof Real-time Analytics • Anatomy of Amazon Kinesis Data Streams and MSK • Comparing Amazon Kinesis Data Streams to MSK • Monitoring Metrics • Reference Architecture • Key Takeaways
  • 3.
    Key Components ofReal- time Analytics
  • 4.
    From Batch toReal-time: Lambda Architecture Data Source Stream Storage Speed Layer Batch Layer Batch Process Batch View Real- time View Consumer Query & Merge Results Service Layer Stream Ingestion Raw Data Storage Streaming Data Stream Delivery Stream Process
  • 5.
    Lambda Architecture Streaming Data Batch View StreamProcess Real-time View Query Query Batch View Real-time View Raw Data Batch Process Batch Layer Serving Layer Speed Layer
  • 6.
    Key Components ofReal-time Analytics Data Source Stream Storage Stream Process Stream Ingestion Data Sink Devices and/or applications that produce real-time data at high velocity Data from tens of thousands of data sources can be written to a single stream Data are stored in the order they were received for a set duration of time and can be replayed indefinitely during that time Records are read in the order they are produced, enabling real-time analytics or streaming ETL Data lake (most common) Database (least common)
  • 7.
  • 8.
    Anatomy of AmazonKinesis Data Streams and MSK
  • 9.
    Key Features ofKinesis Data Streams and MSK • Distributed Queue • Stream Storage #Queue #Distributed #Storage
  • 10.
    Consumer oldest datanewest data 54 3 2 1 0 3 2 1 0 2 #Queue: FIFO, Scale-Up vs Scale-Out 5 4 4 3 2 1 05 Producers
  • 11.
    Hash Function Consumer PK PK PK PK oldest datanewest data Producers shard/partition-1 shard/partition-2 32 1 0 5 4 3 2 1 0 4 3 2 1 0 2 shard/partition-3 #Distributed: Scale-Out Consumer0 Consumer4 0 Consumer Group 4 3 2 1 0
  • 12.
    Hash Function Consumer Consumer Consumer Consumer Group PK PK PK PK = nextconsumer offset oldest datanewest data Producers shard/partition-1 shard/partition-2 5 4 3 2 1 0 3 2 1 0 4 3 2 1 0 4 2 0 shard/partition-3 #Storage: Stream Buffer 2 1 0 4 3 2 1 0 0
  • 13.
    Hash Function Consumer Consumer Consumer Consumer Group PK PK PK PK = nextconsumer offset oldest datanewest data Amazon Kinesis Data Streams Amazon Managed Streaming for Kafka Producers shard/partition-1 shard/partition-2 5 4 3 2 1 0 3 2 1 0 4 3 2 1 0 4 2 0 shard/partition-3 Anatomy of
  • 14.
    Benefits of StreamStorage • Decouple producers & consumers • Persistent buffer • Collect multiple streams • Preserve client ordering • Parallel consumption • Streaming MapReduce
  • 15.
  • 16.
    Topic Amazon Kinesis Data Streams AmazonManaged Streaming for Kafka Comparing Kinesis Data Streams to MSK
  • 17.
    Amazon Kinesis Data Streams AmazonManaged Streaming for Kafka • Operational Perspective • Number of clusters? • Number of brokers per cluster? • Number of topics per broker? • Number of partitions per topic? • Cluster provisioning model • Only increase number of partitions; can’t decrease • Integration with a few of AWS Services such as Kinesis Data Analytics for Java • Operational Perspective • Number of Kinesis Data Streams? • Number of shards per stream? • Throughput provisioning model • Increase/Decrease number of shards • Fully Integration with AWS Services such as Lambda function, Kinesis Data Analytics, etc
  • 18.
  • 19.
    RequestQueue - Length - WaitTime ResponseQueue -Length - WaitTime Network - Packet Drop? Produce/Consume Rate Unbalance Who is Leader? Disk Full? Too many topics? Metrics to Monitor: MSK (Kafka)
  • 20.
    Metrics to Monitor:MSK (Kafka) Metric Level Description ActiveControllerCount DEFAULT Only one controller per cluster should be active at any given time. OfflinePartitionsCount DEFAULT Total number of partitions that are offline in the cluster. GlobalPartitionCount DEFAULT Total number of partitions across all brokers in the cluster. GlobalTopicCount DEFAULT Total number of topics across all brokers in the cluster. KafkaAppLogsDiskUsed DEFAULT The percentage of disk space used for application logs. KafkaDataLogsDiskUsed DEFAULT The percentage of disk space used for data logs. RootDiskUsed DEFAULT The percentage of the root disk used by the broker. PartitionCount PER_BROKER The number of partitions for the broker. LeaderCount PER_BROKER The number of leader replicas. UnderMinIsrPartitionCount PER_BROKER The number of under minIsr partitions for the broker. UnderReplicatedPartitions PER_BROKER The number of under-replicated partitions for the broker. FetchConsumerTotalTimeMsMean PER_BROKER The mean total time in milliseconds that consumers spend on fetching data from the broker. ProduceTotalTimeMsMean PER_BROKER The mean produce time in milliseconds.
  • 21.
    How about monitoringKinesis Data Streams? How long time does a record stay in a shard? 5 transactions per second, per shard With only one consumer application, records can be retrieved every 200 ms up to 1MB or 1,000 records per seconds, per shard for writes • 10MB per second, per shard • up to 10,000 records per call Consumer Application GetRecords() Data
  • 22.
    Metrics to Monitor:Kinesis Data Streams Metric Description GetRecords.IteratorAgeMilliseconds Age of the last record in all GetRecords ReadProvisionedThroughputExceeded Number of GetRecords calls throttled WriteProvisionedThroughputExceeded Number of PutRecord(s) calls throttled PutRecord.Success, PutRecords.Success Number of successful PutRecord(s) operations GetRecords.Success Number of successful GetRecords operations
  • 23.
    Choosing Right Metrics TooMuch = Useless = Too Little
  • 24.
    Kafka vs MSKvs Kinesis Data Streams Operational Excellence Kinesis Data Streams Kafka Amazon MSK Degree of Freedom ≈ Complexity
  • 25.
    Comparison Summary Attribute ApacheKafka Kinesis Streams Managed Streaming for Kafka Cost $$$ $ (pay for what you use) $$ (pay for infrastructure) Ease of use Advanced setup required Get started in minutes Get started in minutes Management Overhead High Low Low Scalability Difficult to scale Scale in seconds with one click Scale in minutes with one click Throughput Infinite Scales with shards, supports up to 1mb payloads Infinite Durability Configurable 3x by default Configurable Infrastructure You manage AWS manages AWS manages Write-to-Read Latency <100 ms is achievable <100 ms (with HTTP/2) <100 ms is achievable Open Sourced? Yes No Yes
  • 26.
  • 27.
  • 28.
    Kinesis Data Streams Kinesis Data Firehose AmazonS3 Amazon EC2 AWS Lambda Amazon ECS Kinesis Data Analytics Amazon ES Amazon Athena Amazon CloudWatch https://aws.amazon.com/solutions/case-studies/autodesk-log-analytics/ Example Usage Pattern 1: Data Hub Amazon MSK
  • 29.
    Log Aggregation Web serversaccess log Aggregated logs
  • 30.
    Example Usage Pattern2: Web Analytics and Leaderboards Amazon DynamoDB Amazon Kinesis Data Analytics Amazon Kinesis Data Streams Amazon Cognito Lightweight JS client code Web server on Amazon EC2 OR Compute top 10 usersIngest web app data Persist to feed live apps Lambda function https://aws.amazon.com/solutions/implementations/real-time-web-analytics-with-kinesis/ Amazon MSK
  • 31.
    IoT IoT Things Remote Control Prediction/ Fraud Detection DeviceMonitoring Quality Control Data Visualization Events Analytics AI/ML
  • 32.
    https://aws.amazon.com/blogs/aws/new-serverless-streaming-etl-with-aws-glue/ Example Usage Pattern3: Monitoring IoT Devices Ingest sensor data Convert json to parquet Store all data points in an S3 data lake AWS IoT Core IoT rule AWS Glue Streaming Job Amazon Athena Glue Crawler Glue Data Catalog S3 Bucket AWS Cloud MQTT Topic Amazon Kinesis Data Streams Raspberry PI + Sense HAT
  • 33.
    Event Sourcing andCQRS https://www.confluent.io/blog/event-sourcing-cqrs-stream-processing-apache-kafka-whats-connection/ App Write Interface App Read Interface Event Queue Application State Kafka Streams Topology Kafka Topic Event Handler App Write Interface App Read Interface Kafka Streams State Store Event Store Event Handler + App State Event Store
  • 34.
    Amazon Kinesis Data Streams AmazonKinesis Data Analytics (SQL) Example Usage Pattern 4: Streaming SQL Continuous filter Aggregate function Data enrichment (join) S3 Bucket Anomaly Detection Ticker, Company AMZN, Amazon ASD, SomeCompanyA BAC, SomeCompanyB CRM, SomeCompanyC Event Store https://docs.aws.amazon.com/kinesisanalytics/latest/dev/examples.html App Write Interface App Read Interface {"TICKER_SYMBOL": "CVB", "SECTOR": "TECHNOLOGY", "CHANGE": 0.81, "PRICE": 53.63} {"TICKER_SYMBOL": "ABC", "SECTOR": "RETAIL", "CHANGE": -1.14, "PRICE": 23.64} {"TICKER_SYMBOL": "JKL", "SECTOR": "TECHNOLOGY", "CHANGE": 0.22, "PRICE": 15.32} Event Handler + App State join
  • 35.
  • 36.
  • 37.
    Key Takeaways • DistributedQueue as Stream Storage • Preserve Ordering • Parallel Consumption • Persistent Buffer • Decouple producers & consumers • Trade-off: Operational Excellence vs Degree of Freedom • MUST keep an eye on the right monitoring metrics • Architectural Patterns • Data Hub: (Asynchronous) Event-Bus • Log Aggregation • IoT • Event Sourcing and CQRS
  • 38.
    Where To GoNext? • Amazon MSK Labs https://amazonmsk-labs.workshop.aws/ • Amazon Managed Streaming for Kafka: Best Practices https://docs.aws.amazon.com/msk/latest/developerguide/bestpractices.html • Monitoring Kafka performance metrics (2020-04-16) https://tinyurl.com/y6hrhwbq • Apache Kafka 모니터링을 위한 Metrics 이해 및 최적화 방안 (2018-11) https://tinyurl.com/y4uwyenx • AWS Analytics Immersion Day - Build BI System from Scratch • Workshop - https://tinyurl.com/yapgwv77 • Slides - https://tinyurl.com/ybxkb74b • Realtime Analytics on AWS https://tinyurl.com/y3evwm3v • Writing SQL on Streaming Data with Amazon Kinesis Analytics – Part 1, 2 • Part1 - https://tinyurl.com/y8vo8q7o • Part2 - https://tinyurl.com/ycbv7wel