SlideShare a Scribd company logo
Choose Right Stream Storage:
Kinesis Data Streams vs MSK
Sungmin, Kim
Solutions Architect, AWS
Agenda
• Key Components of Real-time Analytics
• Anatomy of Amazon Kinesis Data Streams and MSK
• Comparing Amazon Kinesis Data Streams to MSK
• Monitoring Metrics
• Reference Architecture
• Key Takeaways
Key Components of Real-time
Analytics
From Batch to Real-time:
Lambda Architecture
Data
Source
Stream
Storage
Speed Layer
Batch Layer
Batch
Process
Batch
View
Real-
time
View
Consumer
Query & Merge
Results
Service Layer
Stream
Ingestion
Raw Data
Storage
Streaming Data
Stream
Delivery
Stream
Process
Lambda Architecture
Streaming
Data
Batch View
Stream Process
Real-time
View
Query
Query
Batch View
Real-time
View
Raw Data
Batch Process
Batch Layer Serving Layer
Speed Layer
Key Components of Real-time Analytics
Data
Source
Stream
Storage
Stream
Process
Stream
Ingestion
Data
Sink
Devices and/or
applications that
produce real-time
data at high
velocity
Data from tens of
thousands of data
sources can be
written to a single
stream
Data are stored in the
order they were
received for a set
duration of time and
can be replayed
indefinitely during that
time
Records are read in
the order they are
produced, enabling
real-time analytics
or streaming ETL
Data lake
(most common)
Database
(least common)
Stream Storage
Data
Source
Stream
Storage
Stream
Process
Stream
Ingestion
Data
Sink
Amazon Kinesis Data
Streams
Amazon Managed
Streaming for Kafka
Anatomy of Amazon Kinesis
Data Streams and MSK
Key Features of Kinesis Data Streams and MSK
• Distributed Queue • Stream Storage
#Queue #Distributed #Storage
Consumer
oldest data
newest data
5 4 3 2 1 0
3 2 1 0
#Queue: FIFO, Scale-Up vs Scale-Out
5 4
4 3 2 1 0
5
Producers
Hash
Function
Consumer
oldest data
newest data
Producers
shard/partition-1
shard/partition-2
3 2 1 0
5 4 3 2 1 0
4 3 2 1 0
shard/partition-3
#Distributed: Scale-Out
Consumer
Consumer
0
Consumer Group
4 3 2 1 0
Hash
Function
Consumer
Consumer
Consumer
Consumer Group
= next consumer offset oldest data
newest data
Producers
shard/partition-1
shard/partition-2
5 4 3 2 1 0
3 2 1 0
4 3 2 1 0
shard/partition-3
#Storage: Stream Buffer
2 1 0
4 3 2 1 0
0
Hash
Function
Consumer
Consumer
Consumer
Consumer Group
= next consumer offset oldest data
newest data
Amazon Kinesis
Data Streams
Amazon Managed
Streaming for Kafka
Producers
shard/partition-1
shard/partition-2
5 4 3 2 1 0
3 2 1 0
4 3 2 1 0
shard/partition-3
Anatomy of
Benefits of Stream Storage
• Decouple producers &
consumers
• Persistent buffer
• Collect multiple streams
• Preserve client ordering
• Parallel consumption
• Streaming MapReduce
Comparing Amazon Kinesis
Data Streams to MSK
Topi
c
Amazon Kinesis Data
Streams
Amazon Managed Streaming
for Kafka
Comparing Kinesis Data Streams to MSK
Amazon Kinesis Data
Streams
Amazon Managed Streaming
for Kafka
• Operational Perspective
• Number of clusters?
• Number of brokers per cluster?
• Number of topics per broker?
• Number of partitions per topic?
• Cluster provisioning model
• Only increase number of partitions;
can’t decrease
• Integration with a few of AWS Services
such as Kinesis Data Analytics for
Apache Flink
• Operational Perspective
• Number of Kinesis Data Streams?
• Number of shards per stream?
• Throughput provisioning model
• Increase/Decrease number of shards
• Fully Integration with AWS Services
such as Lambda function, Kinesis Data
Analytics, etc
Monitoring Metrics
RequestQueue
- Length
- WaitTime
ResponseQueue
- Length
- WaitTime
Network
- Packet Drop?
Produce/Consume Rate Unbalance
Who is Leader? Disk Full?
Too many topics?
Metrics to Monitor: MSK (Kafka)
Metrics to Monitor: MSK (Kafka)
Metric Level Description
ActiveControllerCount DEFAULT Only one controller per cluster should be active at any given time.
OfflinePartitionsCount DEFAULT Total number of partitions that are offline in the cluster.
GlobalPartitionCount DEFAULT Total number of partitions across all brokers in the cluster.
GlobalTopicCount DEFAULT Total number of topics across all brokers in the cluster.
KafkaAppLogsDiskUsed DEFAULT The percentage of disk space used for application logs.
KafkaDataLogsDiskUsed DEFAULT The percentage of disk space used for data logs.
RootDiskUsed DEFAULT The percentage of the root disk used by the broker.
PartitionCount PER_BROKER The number of partitions for the broker.
LeaderCount PER_BROKER The number of leader replicas.
UnderMinIsrPartitionCount PER_BROKER The number of under minIsr partitions for the broker.
UnderReplicatedPartitions PER_BROKER The number of under-replicated partitions for the broker.
FetchConsumerTotalTimeMsMean PER_BROKER The mean total time in milliseconds that consumers spend on
fetching data from the broker.
ProduceTotalTimeMsMean PER_BROKER The mean produce time in milliseconds.
How about monitoring Kinesis Data Streams?
How long time does a record stay in a shard?
5 transactions
per second,
per shard
With only one
consumer application,
records can be
retrieved every 200 ms
up to 1MB or 1,000
records per seconds,
per shard for writes
• 10MB per second, per shard
• up to 10,000 records per call
Consumer
Application
GetRecords()
Data
Metrics to Monitor: Kinesis Data Streams
Metric Description
GetRecords.IteratorAgeMilliseconds Age of the last record in all GetRecords
ReadProvisionedThroughputExceeded Number of GetRecords calls throttled
WriteProvisionedThroughputExceeded Number of PutRecord(s) calls throttled
PutRecord.Success, PutRecords.Success Number of successful PutRecord(s) operations
GetRecords.Success Number of successful GetRecords operations
Choosing Right Metrics
Too Much = Useless = Too Little
Kafka vs MSK vs Kinesis Data Streams
Operational
Excellence
Kinesis Data
Streams
Kafka
Amazon MSK
Degree of Freedom
≈ Complexity
Comparison Summary
Attribute Apache Kafka Kinesis Streams Managed Streaming for Kafka
Cost $$$ $ (pay for what you use) $$ (pay for infrastructure)
Ease of use Advanced setup required Get started in minutes Get started in minutes
Management Overhead High Low Low
Scalability Difficult to scale
Scale in seconds with one
click
Scale in minutes with one click
Throughput Infinite
Scales with shards, supports
up to 1mb payloads
Infinite
Durability Configurable 3x by default Configurable
Infrastructure You manage AWS manages AWS manages
Write-to-Read Latency <100 ms is achievable <100 ms (with HTTP/2) <100 ms is achievable
Open Sourced? Yes No Yes
Reference Architecture
Data Hub: (Asynchronous) Event-Bus
Kinesis
Data Streams
Kinesis
Data Firehose
Amazon S3
Amazon EC2
AWS Lambda
Amazon ECS
Kinesis
Data Analytics
Amazon ES
Amazon Athena
Amazon CloudWatch
https://aws.amazon.com/solutions/case-studies/autodesk-log-analytics/
Example Usage Pattern 1: Data Hub
AmazonM
SK
Log Aggregation
Web servers access log
Aggregated logs
Example Usage Pattern 2: Web Analytics
and Leaderboards
Amazon
DynamoDB
Amazon Kinesis
Data Analytics
Amazon Kinesis
Data Streams
Amazon
Cognito
Lightweight JS
client code
Web server on
Amazon EC2
OR
Compute top 10 users
Ingest web app data Persist to feed live apps
Lambda
function
https://aws.amazon.com/solutions/implementations/real-time-web-analytics-with-kinesis/
Amazon MSK
IoT
IoT
Things
Remote Control
Prediction/
Fraud Detection
Device Monitoring
Quality Control
https://aws.amazon.com/blogs/aws/new-serverless-streaming-etl-with-aws-glue/
Example Usage Pattern 3: Monitoring IoT
Devices
Ingest sensor data
Convert json
to parquet
Store all data points
in an S3 data lake
AWS IoT
Core
IoT rule
AWS Glue
Streaming Job
Amazon Athena
Glue
Crawler
Glue Data
Catalog
S3
Bucket
AWS Cloud
MQTT
Topic
Amazon Kinesis
Data Streams
Raspberry PI
+ Sense HAT
Event Sourcing and CQRS
https://www.confluent.io/blog/event-sourcing-cqrs-stream-processing-apache-kafka-whats-connection/
App Write Interface App Read Interface
Event Queue
Application
State
Kafka Streams
Topology
Kafka Topic
Event Handler
App Write Interface App Read Interface
Kafka
Streams
State Store
Event Store
Event Handler + App State
Event Store
Amazon Kinesis
Data Streams
Amazon Kinesis
Data Analytics
(SQL)
Example Usage Pattern 4: Streaming SQL
Continuous filter
Aggregate function
Data enrichment (join)
S3 Bucket
Anomaly Detection
Ticker, Company
AMZN, Amazon
ASD, SomeCompanyA
BAC, SomeCompanyB
CRM, SomeCompanyC
Event Store
https://docs.aws.amazon.com/kinesisanalytics/latest/dev/examples.html
App Write Interface App Read Interface
{"TICKER_SYMBOL": "CVB",
"SECTOR": "TECHNOLOGY",
"CHANGE": 0.81,
"PRICE": 53.63}
{"TICKER_SYMBOL": "ABC",
"SECTOR": "RETAIL",
"CHANGE": -1.14,
"PRICE": 23.64}
{"TICKER_SYMBOL": "JKL",
"SECTOR": "TECHNOLOGY",
"CHANGE": 0.22,
"PRICE": 15.32}
Event Handler
+ App State join
Takeaways
Lambda
Kappa
Lambda vs Kappa Architecture
Key Takeaways
• Distributed Queue as Stream Storage
• Preserve Ordering
• Parallel Consumption
• Persistent Buffer
• Decouple producers & consumers
• Trade-off: Operational Excellence vs Degree of Freedom
• MUST keep an eye on the right monitoring metrics
• Architectural Patterns
• Data Hub: (Asynchronous) Event-Bus
• Log Aggregation
• IoT
• Event Sourcing and CQRS
Where To Go Next?
• Amazon MSK Labs
https://amazonmsk-labs.workshop.aws/
• Amazon Managed Streaming for Kafka: Best Practices
https://docs.aws.amazon.com/msk/latest/developerguide/bestpractices.html
• Monitoring Kafka performance metrics (2020-04-16)
https://tinyurl.com/y6hrhwbq
• Apache Kafka 모니터링을 위한 Metrics 이해 및 최적화 방안 (2018-11)
https://tinyurl.com/y4uwyenx
• AWS Analytics Immersion Day - Build BI System from Scratch
• Workshop - https://tinyurl.com/yapgwv77
• Slides - https://tinyurl.com/ybxkb74b
• Realtime Analytics on AWS
https://tinyurl.com/y3evwm3v
• Writing SQL on Streaming Data with Amazon Kinesis Analytics – Part 1, 2
• Part1 - https://tinyurl.com/y8vo8q7o
• Part2 - https://tinyurl.com/ycbv7wel

More Related Content

Similar to Amazon Kinesis Data Streams Vs Msk (1).pptx

AWS Webcast - AWS Kinesis Webinar
AWS Webcast - AWS Kinesis WebinarAWS Webcast - AWS Kinesis Webinar
AWS Webcast - AWS Kinesis Webinar
Amazon Web Services
 
(SDD405) Amazon Kinesis Deep Dive | AWS re:Invent 2014
(SDD405) Amazon Kinesis Deep Dive | AWS re:Invent 2014(SDD405) Amazon Kinesis Deep Dive | AWS re:Invent 2014
(SDD405) Amazon Kinesis Deep Dive | AWS re:Invent 2014
Amazon Web Services
 
Real Time Data Processing Using AWS Lambda - DevDay Austin 2017
Real Time Data Processing Using AWS Lambda - DevDay Austin 2017Real Time Data Processing Using AWS Lambda - DevDay Austin 2017
Real Time Data Processing Using AWS Lambda - DevDay Austin 2017Amazon Web Services
 
Amazon Kinesis Platform – The Complete Overview - Pop-up Loft TLV 2017
Amazon Kinesis Platform – The Complete Overview - Pop-up Loft TLV 2017Amazon Kinesis Platform – The Complete Overview - Pop-up Loft TLV 2017
Amazon Kinesis Platform – The Complete Overview - Pop-up Loft TLV 2017
Amazon Web Services
 
BDA307 Real-time Streaming Applications on AWS, Patterns and Use Cases
BDA307 Real-time Streaming Applications on AWS, Patterns and Use CasesBDA307 Real-time Streaming Applications on AWS, Patterns and Use Cases
BDA307 Real-time Streaming Applications on AWS, Patterns and Use Cases
Amazon Web Services
 
Getting Started with Real-time Analytics
Getting Started with Real-time AnalyticsGetting Started with Real-time Analytics
Getting Started with Real-time Analytics
Amazon Web Services
 
Real Time Data Processing Using AWS Lambda
Real Time Data Processing Using AWS LambdaReal Time Data Processing Using AWS Lambda
Real Time Data Processing Using AWS Lambda
Amazon Web Services
 
Processamento em tempo real usando AWS - padrões e casos de uso
Processamento em tempo real usando AWS - padrões e casos de usoProcessamento em tempo real usando AWS - padrões e casos de uso
Processamento em tempo real usando AWS - padrões e casos de uso
Amazon Web Services LATAM
 
AWS를 활용한 첫 빅데이터 프로젝트 시작하기(김일호)- AWS 웨비나 시리즈 2015
AWS를 활용한 첫 빅데이터 프로젝트 시작하기(김일호)- AWS 웨비나 시리즈 2015AWS를 활용한 첫 빅데이터 프로젝트 시작하기(김일호)- AWS 웨비나 시리즈 2015
AWS를 활용한 첫 빅데이터 프로젝트 시작하기(김일호)- AWS 웨비나 시리즈 2015
Amazon Web Services Korea
 
Real-time Data Processing Using AWS Lambda
Real-time Data Processing Using AWS LambdaReal-time Data Processing Using AWS Lambda
Real-time Data Processing Using AWS Lambda
Amazon Web Services
 
SMC303 Real-time Data Processing Using AWS Lambda
SMC303 Real-time Data Processing Using AWS LambdaSMC303 Real-time Data Processing Using AWS Lambda
SMC303 Real-time Data Processing Using AWS Lambda
Amazon Web Services
 
Real-time Streaming and Querying with Amazon Kinesis and Amazon Elastic MapRe...
Real-time Streaming and Querying with Amazon Kinesis and Amazon Elastic MapRe...Real-time Streaming and Querying with Amazon Kinesis and Amazon Elastic MapRe...
Real-time Streaming and Querying with Amazon Kinesis and Amazon Elastic MapRe...
Amazon Web Services
 
Barga IC2E & IoTDI'16 Keynote
Barga IC2E & IoTDI'16 KeynoteBarga IC2E & IoTDI'16 Keynote
Barga IC2E & IoTDI'16 KeynoteRoger Barga
 
Em tempo real: Ingestão, processamento e analise de dados
Em tempo real: Ingestão, processamento e analise de dadosEm tempo real: Ingestão, processamento e analise de dados
Em tempo real: Ingestão, processamento e analise de dados
Amazon Web Services LATAM
 
Big Data Architectural Patterns and Best Practices
Big Data Architectural Patterns and Best PracticesBig Data Architectural Patterns and Best Practices
Big Data Architectural Patterns and Best Practices
Amazon Web Services
 
Big Data Architectural Patterns and Best Practices
Big Data Architectural Patterns and Best PracticesBig Data Architectural Patterns and Best Practices
Big Data Architectural Patterns and Best Practices
Amazon Web Services
 
(BDT403) Best Practices for Building Real-time Streaming Applications with Am...
(BDT403) Best Practices for Building Real-time Streaming Applications with Am...(BDT403) Best Practices for Building Real-time Streaming Applications with Am...
(BDT403) Best Practices for Building Real-time Streaming Applications with Am...
Amazon Web Services
 
Streaming data for real time analysis
Streaming data for real time analysisStreaming data for real time analysis
Streaming data for real time analysis
Amazon Web Services
 
Real Time Data Processing Using AWS Lambda - DevDay Los Angeles 2017
Real Time Data Processing Using AWS Lambda - DevDay Los Angeles 2017Real Time Data Processing Using AWS Lambda - DevDay Los Angeles 2017
Real Time Data Processing Using AWS Lambda - DevDay Los Angeles 2017
Amazon Web Services
 
AWS Analytics Immersion Day - Build BI System from Scratch (Day1, Day2 Full V...
AWS Analytics Immersion Day - Build BI System from Scratch (Day1, Day2 Full V...AWS Analytics Immersion Day - Build BI System from Scratch (Day1, Day2 Full V...
AWS Analytics Immersion Day - Build BI System from Scratch (Day1, Day2 Full V...
Sungmin Kim
 

Similar to Amazon Kinesis Data Streams Vs Msk (1).pptx (20)

AWS Webcast - AWS Kinesis Webinar
AWS Webcast - AWS Kinesis WebinarAWS Webcast - AWS Kinesis Webinar
AWS Webcast - AWS Kinesis Webinar
 
(SDD405) Amazon Kinesis Deep Dive | AWS re:Invent 2014
(SDD405) Amazon Kinesis Deep Dive | AWS re:Invent 2014(SDD405) Amazon Kinesis Deep Dive | AWS re:Invent 2014
(SDD405) Amazon Kinesis Deep Dive | AWS re:Invent 2014
 
Real Time Data Processing Using AWS Lambda - DevDay Austin 2017
Real Time Data Processing Using AWS Lambda - DevDay Austin 2017Real Time Data Processing Using AWS Lambda - DevDay Austin 2017
Real Time Data Processing Using AWS Lambda - DevDay Austin 2017
 
Amazon Kinesis Platform – The Complete Overview - Pop-up Loft TLV 2017
Amazon Kinesis Platform – The Complete Overview - Pop-up Loft TLV 2017Amazon Kinesis Platform – The Complete Overview - Pop-up Loft TLV 2017
Amazon Kinesis Platform – The Complete Overview - Pop-up Loft TLV 2017
 
BDA307 Real-time Streaming Applications on AWS, Patterns and Use Cases
BDA307 Real-time Streaming Applications on AWS, Patterns and Use CasesBDA307 Real-time Streaming Applications on AWS, Patterns and Use Cases
BDA307 Real-time Streaming Applications on AWS, Patterns and Use Cases
 
Getting Started with Real-time Analytics
Getting Started with Real-time AnalyticsGetting Started with Real-time Analytics
Getting Started with Real-time Analytics
 
Real Time Data Processing Using AWS Lambda
Real Time Data Processing Using AWS LambdaReal Time Data Processing Using AWS Lambda
Real Time Data Processing Using AWS Lambda
 
Processamento em tempo real usando AWS - padrões e casos de uso
Processamento em tempo real usando AWS - padrões e casos de usoProcessamento em tempo real usando AWS - padrões e casos de uso
Processamento em tempo real usando AWS - padrões e casos de uso
 
AWS를 활용한 첫 빅데이터 프로젝트 시작하기(김일호)- AWS 웨비나 시리즈 2015
AWS를 활용한 첫 빅데이터 프로젝트 시작하기(김일호)- AWS 웨비나 시리즈 2015AWS를 활용한 첫 빅데이터 프로젝트 시작하기(김일호)- AWS 웨비나 시리즈 2015
AWS를 활용한 첫 빅데이터 프로젝트 시작하기(김일호)- AWS 웨비나 시리즈 2015
 
Real-time Data Processing Using AWS Lambda
Real-time Data Processing Using AWS LambdaReal-time Data Processing Using AWS Lambda
Real-time Data Processing Using AWS Lambda
 
SMC303 Real-time Data Processing Using AWS Lambda
SMC303 Real-time Data Processing Using AWS LambdaSMC303 Real-time Data Processing Using AWS Lambda
SMC303 Real-time Data Processing Using AWS Lambda
 
Real-time Streaming and Querying with Amazon Kinesis and Amazon Elastic MapRe...
Real-time Streaming and Querying with Amazon Kinesis and Amazon Elastic MapRe...Real-time Streaming and Querying with Amazon Kinesis and Amazon Elastic MapRe...
Real-time Streaming and Querying with Amazon Kinesis and Amazon Elastic MapRe...
 
Barga IC2E & IoTDI'16 Keynote
Barga IC2E & IoTDI'16 KeynoteBarga IC2E & IoTDI'16 Keynote
Barga IC2E & IoTDI'16 Keynote
 
Em tempo real: Ingestão, processamento e analise de dados
Em tempo real: Ingestão, processamento e analise de dadosEm tempo real: Ingestão, processamento e analise de dados
Em tempo real: Ingestão, processamento e analise de dados
 
Big Data Architectural Patterns and Best Practices
Big Data Architectural Patterns and Best PracticesBig Data Architectural Patterns and Best Practices
Big Data Architectural Patterns and Best Practices
 
Big Data Architectural Patterns and Best Practices
Big Data Architectural Patterns and Best PracticesBig Data Architectural Patterns and Best Practices
Big Data Architectural Patterns and Best Practices
 
(BDT403) Best Practices for Building Real-time Streaming Applications with Am...
(BDT403) Best Practices for Building Real-time Streaming Applications with Am...(BDT403) Best Practices for Building Real-time Streaming Applications with Am...
(BDT403) Best Practices for Building Real-time Streaming Applications with Am...
 
Streaming data for real time analysis
Streaming data for real time analysisStreaming data for real time analysis
Streaming data for real time analysis
 
Real Time Data Processing Using AWS Lambda - DevDay Los Angeles 2017
Real Time Data Processing Using AWS Lambda - DevDay Los Angeles 2017Real Time Data Processing Using AWS Lambda - DevDay Los Angeles 2017
Real Time Data Processing Using AWS Lambda - DevDay Los Angeles 2017
 
AWS Analytics Immersion Day - Build BI System from Scratch (Day1, Day2 Full V...
AWS Analytics Immersion Day - Build BI System from Scratch (Day1, Day2 Full V...AWS Analytics Immersion Day - Build BI System from Scratch (Day1, Day2 Full V...
AWS Analytics Immersion Day - Build BI System from Scratch (Day1, Day2 Full V...
 

Recently uploaded

Connector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a buttonConnector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a button
DianaGray10
 
Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
Alan Dix
 
How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...
Product School
 
Accelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish CachingAccelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish Caching
Thijs Feryn
 
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
Product School
 
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdfSmart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
91mobiles
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
DanBrown980551
 
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
Product School
 
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance
 
Key Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdfKey Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdf
Cheryl Hung
 
Assuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyesAssuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyes
ThousandEyes
 
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Thierry Lestable
 
Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...
Product School
 
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
Sri Ambati
 
UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4
DianaGray10
 
Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*
Frank van Harmelen
 
Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........
Alison B. Lowndes
 
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Tobias Schneck
 
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualitySoftware Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Inflectra
 
Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...
Product School
 

Recently uploaded (20)

Connector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a buttonConnector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a button
 
Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
 
How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...
 
Accelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish CachingAccelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish Caching
 
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
 
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdfSmart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
 
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
 
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
 
Key Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdfKey Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdf
 
Assuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyesAssuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyes
 
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
 
Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...
 
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
 
UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4
 
Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*
 
Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........
 
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
 
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualitySoftware Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
 
Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...
 

Amazon Kinesis Data Streams Vs Msk (1).pptx

  • 1. Choose Right Stream Storage: Kinesis Data Streams vs MSK Sungmin, Kim Solutions Architect, AWS
  • 2. Agenda • Key Components of Real-time Analytics • Anatomy of Amazon Kinesis Data Streams and MSK • Comparing Amazon Kinesis Data Streams to MSK • Monitoring Metrics • Reference Architecture • Key Takeaways
  • 3. Key Components of Real-time Analytics
  • 4. From Batch to Real-time: Lambda Architecture Data Source Stream Storage Speed Layer Batch Layer Batch Process Batch View Real- time View Consumer Query & Merge Results Service Layer Stream Ingestion Raw Data Storage Streaming Data Stream Delivery Stream Process
  • 5. Lambda Architecture Streaming Data Batch View Stream Process Real-time View Query Query Batch View Real-time View Raw Data Batch Process Batch Layer Serving Layer Speed Layer
  • 6. Key Components of Real-time Analytics Data Source Stream Storage Stream Process Stream Ingestion Data Sink Devices and/or applications that produce real-time data at high velocity Data from tens of thousands of data sources can be written to a single stream Data are stored in the order they were received for a set duration of time and can be replayed indefinitely during that time Records are read in the order they are produced, enabling real-time analytics or streaming ETL Data lake (most common) Database (least common)
  • 8. Anatomy of Amazon Kinesis Data Streams and MSK
  • 9. Key Features of Kinesis Data Streams and MSK • Distributed Queue • Stream Storage #Queue #Distributed #Storage
  • 10. Consumer oldest data newest data 5 4 3 2 1 0 3 2 1 0 #Queue: FIFO, Scale-Up vs Scale-Out 5 4 4 3 2 1 0 5 Producers
  • 11. Hash Function Consumer oldest data newest data Producers shard/partition-1 shard/partition-2 3 2 1 0 5 4 3 2 1 0 4 3 2 1 0 shard/partition-3 #Distributed: Scale-Out Consumer Consumer 0 Consumer Group 4 3 2 1 0
  • 12. Hash Function Consumer Consumer Consumer Consumer Group = next consumer offset oldest data newest data Producers shard/partition-1 shard/partition-2 5 4 3 2 1 0 3 2 1 0 4 3 2 1 0 shard/partition-3 #Storage: Stream Buffer 2 1 0 4 3 2 1 0 0
  • 13. Hash Function Consumer Consumer Consumer Consumer Group = next consumer offset oldest data newest data Amazon Kinesis Data Streams Amazon Managed Streaming for Kafka Producers shard/partition-1 shard/partition-2 5 4 3 2 1 0 3 2 1 0 4 3 2 1 0 shard/partition-3 Anatomy of
  • 14. Benefits of Stream Storage • Decouple producers & consumers • Persistent buffer • Collect multiple streams • Preserve client ordering • Parallel consumption • Streaming MapReduce
  • 16. Topi c Amazon Kinesis Data Streams Amazon Managed Streaming for Kafka Comparing Kinesis Data Streams to MSK
  • 17. Amazon Kinesis Data Streams Amazon Managed Streaming for Kafka • Operational Perspective • Number of clusters? • Number of brokers per cluster? • Number of topics per broker? • Number of partitions per topic? • Cluster provisioning model • Only increase number of partitions; can’t decrease • Integration with a few of AWS Services such as Kinesis Data Analytics for Apache Flink • Operational Perspective • Number of Kinesis Data Streams? • Number of shards per stream? • Throughput provisioning model • Increase/Decrease number of shards • Fully Integration with AWS Services such as Lambda function, Kinesis Data Analytics, etc
  • 19. RequestQueue - Length - WaitTime ResponseQueue - Length - WaitTime Network - Packet Drop? Produce/Consume Rate Unbalance Who is Leader? Disk Full? Too many topics? Metrics to Monitor: MSK (Kafka)
  • 20. Metrics to Monitor: MSK (Kafka) Metric Level Description ActiveControllerCount DEFAULT Only one controller per cluster should be active at any given time. OfflinePartitionsCount DEFAULT Total number of partitions that are offline in the cluster. GlobalPartitionCount DEFAULT Total number of partitions across all brokers in the cluster. GlobalTopicCount DEFAULT Total number of topics across all brokers in the cluster. KafkaAppLogsDiskUsed DEFAULT The percentage of disk space used for application logs. KafkaDataLogsDiskUsed DEFAULT The percentage of disk space used for data logs. RootDiskUsed DEFAULT The percentage of the root disk used by the broker. PartitionCount PER_BROKER The number of partitions for the broker. LeaderCount PER_BROKER The number of leader replicas. UnderMinIsrPartitionCount PER_BROKER The number of under minIsr partitions for the broker. UnderReplicatedPartitions PER_BROKER The number of under-replicated partitions for the broker. FetchConsumerTotalTimeMsMean PER_BROKER The mean total time in milliseconds that consumers spend on fetching data from the broker. ProduceTotalTimeMsMean PER_BROKER The mean produce time in milliseconds.
  • 21. How about monitoring Kinesis Data Streams? How long time does a record stay in a shard? 5 transactions per second, per shard With only one consumer application, records can be retrieved every 200 ms up to 1MB or 1,000 records per seconds, per shard for writes • 10MB per second, per shard • up to 10,000 records per call Consumer Application GetRecords() Data
  • 22. Metrics to Monitor: Kinesis Data Streams Metric Description GetRecords.IteratorAgeMilliseconds Age of the last record in all GetRecords ReadProvisionedThroughputExceeded Number of GetRecords calls throttled WriteProvisionedThroughputExceeded Number of PutRecord(s) calls throttled PutRecord.Success, PutRecords.Success Number of successful PutRecord(s) operations GetRecords.Success Number of successful GetRecords operations
  • 23. Choosing Right Metrics Too Much = Useless = Too Little
  • 24. Kafka vs MSK vs Kinesis Data Streams Operational Excellence Kinesis Data Streams Kafka Amazon MSK Degree of Freedom ≈ Complexity
  • 25. Comparison Summary Attribute Apache Kafka Kinesis Streams Managed Streaming for Kafka Cost $$$ $ (pay for what you use) $$ (pay for infrastructure) Ease of use Advanced setup required Get started in minutes Get started in minutes Management Overhead High Low Low Scalability Difficult to scale Scale in seconds with one click Scale in minutes with one click Throughput Infinite Scales with shards, supports up to 1mb payloads Infinite Durability Configurable 3x by default Configurable Infrastructure You manage AWS manages AWS manages Write-to-Read Latency <100 ms is achievable <100 ms (with HTTP/2) <100 ms is achievable Open Sourced? Yes No Yes
  • 28. Kinesis Data Streams Kinesis Data Firehose Amazon S3 Amazon EC2 AWS Lambda Amazon ECS Kinesis Data Analytics Amazon ES Amazon Athena Amazon CloudWatch https://aws.amazon.com/solutions/case-studies/autodesk-log-analytics/ Example Usage Pattern 1: Data Hub AmazonM SK
  • 29. Log Aggregation Web servers access log Aggregated logs
  • 30. Example Usage Pattern 2: Web Analytics and Leaderboards Amazon DynamoDB Amazon Kinesis Data Analytics Amazon Kinesis Data Streams Amazon Cognito Lightweight JS client code Web server on Amazon EC2 OR Compute top 10 users Ingest web app data Persist to feed live apps Lambda function https://aws.amazon.com/solutions/implementations/real-time-web-analytics-with-kinesis/ Amazon MSK
  • 32. https://aws.amazon.com/blogs/aws/new-serverless-streaming-etl-with-aws-glue/ Example Usage Pattern 3: Monitoring IoT Devices Ingest sensor data Convert json to parquet Store all data points in an S3 data lake AWS IoT Core IoT rule AWS Glue Streaming Job Amazon Athena Glue Crawler Glue Data Catalog S3 Bucket AWS Cloud MQTT Topic Amazon Kinesis Data Streams Raspberry PI + Sense HAT
  • 33. Event Sourcing and CQRS https://www.confluent.io/blog/event-sourcing-cqrs-stream-processing-apache-kafka-whats-connection/ App Write Interface App Read Interface Event Queue Application State Kafka Streams Topology Kafka Topic Event Handler App Write Interface App Read Interface Kafka Streams State Store Event Store Event Handler + App State Event Store
  • 34. Amazon Kinesis Data Streams Amazon Kinesis Data Analytics (SQL) Example Usage Pattern 4: Streaming SQL Continuous filter Aggregate function Data enrichment (join) S3 Bucket Anomaly Detection Ticker, Company AMZN, Amazon ASD, SomeCompanyA BAC, SomeCompanyB CRM, SomeCompanyC Event Store https://docs.aws.amazon.com/kinesisanalytics/latest/dev/examples.html App Write Interface App Read Interface {"TICKER_SYMBOL": "CVB", "SECTOR": "TECHNOLOGY", "CHANGE": 0.81, "PRICE": 53.63} {"TICKER_SYMBOL": "ABC", "SECTOR": "RETAIL", "CHANGE": -1.14, "PRICE": 23.64} {"TICKER_SYMBOL": "JKL", "SECTOR": "TECHNOLOGY", "CHANGE": 0.22, "PRICE": 15.32} Event Handler + App State join
  • 37. Key Takeaways • Distributed Queue as Stream Storage • Preserve Ordering • Parallel Consumption • Persistent Buffer • Decouple producers & consumers • Trade-off: Operational Excellence vs Degree of Freedom • MUST keep an eye on the right monitoring metrics • Architectural Patterns • Data Hub: (Asynchronous) Event-Bus • Log Aggregation • IoT • Event Sourcing and CQRS
  • 38. Where To Go Next? • Amazon MSK Labs https://amazonmsk-labs.workshop.aws/ • Amazon Managed Streaming for Kafka: Best Practices https://docs.aws.amazon.com/msk/latest/developerguide/bestpractices.html • Monitoring Kafka performance metrics (2020-04-16) https://tinyurl.com/y6hrhwbq • Apache Kafka 모니터링을 위한 Metrics 이해 및 최적화 방안 (2018-11) https://tinyurl.com/y4uwyenx • AWS Analytics Immersion Day - Build BI System from Scratch • Workshop - https://tinyurl.com/yapgwv77 • Slides - https://tinyurl.com/ybxkb74b • Realtime Analytics on AWS https://tinyurl.com/y3evwm3v • Writing SQL on Streaming Data with Amazon Kinesis Analytics – Part 1, 2 • Part1 - https://tinyurl.com/y8vo8q7o • Part2 - https://tinyurl.com/ycbv7wel

Editor's Notes

  1. [Kinesis Data Streams] ------------------ Streams and shards AWS API experience Throughput provisioning model Seamless scaling Typically lower costs Deep AWS integrations [Amazon MSK] ------------ Topics and partitions Open-source compatibility Strong third-party tooling Cluster provisioning model Kafka scaling isn’t seamless to clients Raw performance ----------- https://aws.highspot.com/items/5cb120e7429d7b4ed26391e3?lfrm=irel.7#29
  2. Shard당 최대 처리량: 10,000/200ms = 50,000 records/sec (5만)
  3. Shared Responsibility Model 관점에서 메시지를 주자 Under the hood: Scaling your Kinesis data streams https://aws.amazon.com/ko/blogs/big-data/under-the-hood-scaling-your-kinesis-data-streams/
  4. Too much information can be just as useless as too little
  5. https://aws.highspot.com/items/5cb120e7429d7b4ed26391e3?lfrm=irel.7#30
  6. Command Query Responsibility Segregation (CQRS) Event sourcing involves modeling the state changes made by applications as an immutable sequence or “log” of events. Instead of modifying the state of the application in-place, event sourcing involves storing the event that triggers the state change in an immutable log and modeling the state changes as responses to the events in the log.  Event Sourcing and CQRS Furthermore, the event sourcing and CQRS application architecture patterns are also related. Command Query Responsibility Segregation (CQRS) is an application architecture pattern most commonly used with event sourcing. CQRS involves splitting an application into two parts internally — the command side ordering the system to update state and the query side that gets information without changing state. CQRS provides separation of concerns – The command or write side is all about the business; it does not care about the queries, different materialized views over the data, optimal storage of the materialized views for performance and so on. On the other hand, the query or read side is all about the read access; its main purpose is making queries fast and efficient.