SlideShare a Scribd company logo
1 of 74
Download to read offline
© 2015, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Adi Krishnan, Principal PM Amazon Kinesis
Nick Barrash, SDE, AdRoll
10/08/2015
Amazon Kinesis
Capture, Deliver, and Process
Real-time Data Streams on AWS
What to Expect from the Session
• Amazon Kinesis: System Overview
• Amazon Kinesis Streams (this session)
• Amazon Kinesis Firehose (available now)
• Amazon Kinesis Analytics (announced)
• Top 10 things you need to know about Kinesis Streams!
• Amazon Kinesis Streaming Data Ingestion Model (5 things)
• Streaming applications with Kinesis Client Library (5 things)
• AdRoll: Re-architecting our real-time data pipeline
• Featuring Nick Barrash, SDE AdRoll
Amazon Kinesis
Streams
Build your own custom
applications that
process or analyze
streaming data
Amazon Kinesis
Firehose
Easily load massive
volumes of streaming
data into Amazon S3
and Redshift
Amazon Kinesis
Analytics
Easily analyze data
streams using
standard SQL queries
Amazon Kinesis: Streaming data done the AWS way
Services that make it easy to work with real-time data streams on AWS
Amazon Kinesis Streams
Amazon Kinesis Streams
Build your own data streaming applications
Easy Administration: Create a new stream, set desired capacity and partitioning
to match your data throughput rate and volume.
Build real-time applications: Perform custom record processing on streaming
data using Kinesis Client Library, Apache Spark/ Storm, AWS Lambda, and more.
Low cost: Cost-efficient for workloads of any scale.
Amazon Confidential
Send clickstream data
to Kinesis Streams
Kinesis Streams stores
and exposes
clickstream data for
processing
Custom application built on
Kinesis Client Library makes real-
time content recommendations
Readers see personalized
content suggestions
Amazon Web Services
AZ AZ AZ
Durable, highly consistent storage replicates data
across three data centers (availability zones)
Aggregate and
archive to S3
Millions of
sources producing
100s of terabytes
per hour
Front
End
Authentication
Authorization
Ordered stream
of events supports
multiple readers
Real-time
dashboards
and alarms
Machine learning
algorithms or
sliding window
analytics
Aggregate analysis
in Hadoop or a
data warehouse
Inexpensive: $0.028 per million puts
Real-Time Streaming Data Ingestion
Custom-built
Streaming
Applications
(KCL)
Inexpensive: $0.014 per 1,000,000 PUT Payload Units
Amazon Kinesis Streams (re:Invent 2013)
Fully managed service for real-time processing of streaming data
Amazon Kinesis Streams:
Streaming Data Ingestion
• A provisioned entity called a Stream composed of Shards
• Producers use a PUT call to store data in a Stream.
• Each record <= 1 MB. PutRecord or PutRecords
• A partition key is supplied by producer and used to distribute (MD5 hash)
the PUTs across (hash key range) of Shards
• Unique Sequence# returned upon successful PUT call
• Approximate arrival timestamp affixed to each record
Putting Data into Kinesis
Simple Put* interface to capture and store data in Streams
Managed Buffer
• Care about a reliable, scalable
way to capture data
• Defer all other aggregation to
consumer
• Generate Random Partition
Keys
• Ensure a high cardinality for
Partition Keys with respect to
shards, to spray evenly across
available shards
Topic #1: Thinking about ingestion model
Workload determines partition key strategy
Streaming Map-Reduce
• Streaming Map-Reduce: leverage
partition keys as a natural way to
aggregate data
• For e.g. Partition Keys per
billing customer, per DeviceId,
per stock symbol
• Design partition keys to scale
• Be aware of “hot partition keys or
shards ”
Topic #2: Kinesis PutRecords API
High throughput API for efficient writes to Kinesis
• PutRecords {Records {Data,PartitionKey}, StreamName}
• Supports 500 records.
• Record can be =<1 MB, and up to 5 MB for whole request
• Can include records with different partition keys
• Ordering not guaranteed
• Successful response includes ShardID and SeqNumber values
• Unsuccessful Response
Topic #2: Kinesis PutRecords API (Continued)
Dealing with unsuccessful records
• Example Failed Response
• Examines the PutRecordsResult objects to resend
• Check FailedRecordCount parameter to confirm failed records.
• If so, each putRecordsEntry with ErrorCode != NULL should be added
to a subsequent request
Topic #3: Kinesis Producer Library
Highly configurable library to write to Kinesis
• Collects records and uses PutRecords for high throughput writes
• Integrates seamlessly with the Amazon Kinesis Client Library (KCL) to
de-aggregate batched records
• Submits Amazon CloudWatch metrics on your behalf to provide
visibility into producer performance
• https://github.com/awslabs/amazon-kinesis-producer
Topic #4: Extended Retention in Kinesis
Default 24 Hours but configurable to 7 days
• 2 New APIs
• IncreaseStreamRetentionPeriod(String StreamName, int
RetentionPeriodHours)
• DecreaseStreamRetentionPeriod(String StreamName, int
RetentionPeriodHours)
• Use it in one of two-modes
• as always-on extended retention
• Raise retention period in response to operational event and/
planned maintenance
• Extended Retention is priced at US$ 0.020/ Shard-Hour
New!
• Keep track of your metrics
• Monitor CloudWatch metrics:
PutRecord.Bytes + GetRecords.Bytes
metrics keep track of shard usage
• Retry if rise in input rate is temporary
• Reshard to increase number of shards
• SplitShard – Adds more shards
• MergeShard – Removes shards
• Use the Kinesis Scaling Utility -
https://github.com/awslabs/amazon-
kinesis-scaling-utils
Metric Units
PutRecords.Bytes Bytes
PutRecords.Latency Milliseconds
PutRecords.Success Count
PutRecords.Records Count
Incoming Bytes Bytes
Incoming Records Count
Topic #5: Dealing with provisioned throughput exceeded
Metrics and Re-sharding (SplitShard/ MergeShard)
Amazon Kinesis Streams:
Build Apps w/ Kinesis Client Library
• Open Source Java client library, also available for Python, Ruby,
Node.JS, .NET. Available on Github
• Connects to the stream and enumerates the shards
• Coordinates shard associations with other workers (if any)
• Instantiates a record processor for every shard it manages
• Pulls data records from the stream
• Pushes the records to the corresponding record processor
• Checkpoints processed records
• Balances shard-worker associations when the worker instance count
changes
• Balances shard-worker associations when shards are split or merged
Building Applications: Kinesis Client Library
For fault-tolerant stream processing applications
State Management with Kinesis Client Library
• A named KCL application has 1
Worker per EC2 instance
• One worker maps to one or more
record processors
• One record processor maps to one
shard and processes data records
from that shard
• The KCL uses the IRecordProcessor
interface to communicate with your
application
• ProcessRecords() contains the
business logic
Topic #1: Empty records returned even with data in stream
• App calls GetRecords in a loop with no back-offs.
• Each call to GetRecords also returns a ShardIterator value, which must be used
in the next iteration.
• GetRecords operation is non-blocking. Returns with either relevant data records
or with an empty Records element.
• An empty Records element is returned if:
•There is no more data currently in the shard
•There is no data near the part of the shard pointed to by the ShardIterator
• In production, the only time the continuous loop should be exited is
when the NextShardIterator value is NULL.
•When NextShardIterator is NULL, current shard has been closed and
ShardIteratorvalue would otherwise point past the last record.
•If app never calls SplitShard or MergeShards, the shard remains open and
the calls to GetRecords never return a NextShardIterator value that is NULL.
Topic #2: Some records are skipped
Handle exceptions with processRecords
• Most common cause: Unhandled exception thrown from
processRecords.
• Any exception thrown from processRecords is absorbed by KCL.
• To avoid infinite retries on recurring failures, KCL does not resend
the batch of records processed at the time of the exception.
• It calls processRecords for the next batch of data records without
restarting the record processor. This effectively results in
consumer applications observing skipped records.
• Handle all exceptions within processRecords appropriately.
Metric Name Description
KinesisDataFetcher.get
Records.Success
Number of successful GetRecords operations per Amazon Kinesis stream shard.
KinesisDataFetcher.get
Records.Time
Time taken per GetRecords operation for the Amazon Kinesis stream shard.
UpdateLease.Success Number of successful checkpoints made by the record processor for the given
shard.
UpdateLease.Time Time taken for each checkpoint operation for the given shard.
DataBytesProcessed Total size of records processed in bytes on each ProcessTask invocation.
RecordsProcessed Number of records processed on each ProcessTask invocation.
ExpiredIterator Number of ExpiredIteratorException received when calling GetRecords.
MillisBehindLatest Time the current iterator is behind from the latest record in the shard.
RecordProcessor.proc
essRecords.Time
Time taken by the record processor’s processRecords method.
Shard-level metrics on KCL are your best friend
KCL Topic #3: Consumer is reading slower than expected
• Multiple applications have total reads exceeding the per-shard limits.
• In this case, increase shards in the Amazon Kinesis stream.
• The maximum number of GetRecords per call limit may have been
configured with a low value.
• May have configured the worker with a low value for the maxRecords
property. Recommend system defaults for this property.
• The logic inside processRecords call may be taking longer than
expected for a number of possible reasons; the logic may be CPU
intensive, I/O blocking, or bottlenecked on synchronization.
• Test run empty record processors and compare the read throughput.
KCL Topic #4: Records of same shard are processed by
different record processors at the same time
• One record processor on a particular shard BUT multiple record
processors may temporarily process the same shard.
• A worker instance loses connectivity
• New and original record processors from the unreachable worker process data
from the same shard
• If record processor has shards taken by another record processor, it
must handle two cases to perform graceful shutdown:
• After current call to processRecords is completed, KCL invokes shutdown
method on record processor with shutdown reason 'ZOMBIE'. Record
processors are expected to clean up any resources and then exit.
• When checkpointing from a 'zombie' worker, the KCL throws a Shutdown
Exception. After receiving, code is expected to exit the current method cleanly.
KCL Topic #5: Duplicates in processing
Consumer retries can cause duplicates
• Consumer (data processing application) retries happen when record
processors restart. Record processors for the same shard restart in
the following cases:
• A worker terminates unexpectedly
• Worker instances are added or removed
• Shards are merged or split
• In all these cases, the shards-to-worker-to-record-processor
mapping is continuously updated to load balance processing. Shard
processors that were migrated to other instances restart processing
records from the last checkpoint.
• In KCL app, ensure data being processed is persisted to durable store like
DynamoDB, or S3, prior to check-pointing.
• Duplicates: Make the authoritative data repository (usually at the end of the
data flow) resilient to duplicates. That way the rest of the system has a simple
policy – keep retrying until you succeed.
• Idempotent Processing: Use number of records since previous checkpoint, to
get repeatable results when the record processors fail over.
• Record Processor uses a fixed number of records per Amazon S3 file, such as 5000.
• The file name uses this schema: Amazon S3 prefix, shard-id, and First-Sequence-
Num. Something like sample-shard000001-10001.
• After uploading the S3 file, checkpoint by specifying Last-Sequence-Num. In this
case, checkpoint at record number 15000.
KCL Topic #5 Continued …
Make final destination resilient to duplicates
Sending & Reading Data from Kinesis Streams
AWS SDK
LOG4J
Flume
Fluentd
Get* APIs
Kinesis Client Library
+
Connector Library
Apache
Storm
Amazon Elastic
MapReduce
Sending Consuming
AWS Mobile
SDK
Kinesis
Producer
Library
AWS Lambda
Apache
Spark
Kinesis at Adroll
Nick Barrash
► AdRoll’s real-time data pipeline
• Old architecture
• Motivation to move to Kinesis
► Producer applications
► Consumers applications
► Kinesis pipeline architectures
Menu
AdRoll is a performance marketing platform that enables
brands to leverage their data to run intelligent, ROI-positive
marketing campaigns
AdRoll
Our Mission
• Real-time data at AdRoll
• Advertiser’s website clickstream data
• Serving ads
• Impressions / Clicks / Conversions
• Real-time bidding (RTB) data
• Recency directly correlated with performance
AdRoll
Our Data
AdRoll
Our Scale
• Daily impression data: 200 million events / 240GB
• Daily user data: 1.6 billion events / 700GB
• Daily RTB data: 60 billion events / 80TB
• Daily impression data: 200 million events / 240GB
• Daily user data:1.6 billion events / 700GB
• Daily RTB data: 60 billion events / 80TB
AdRoll
Our Scale
Data Pipeline Architecture
Pre-Kinesis
Dynamo
DBS3 Lambda
S3
HBase
EC2
EC2
EC2
...
SQS
SQS
SQS
...
Real-time
Processing
Application
s
EC2
EC2
EC2
...
Log
Producers
10 min ~5 min
~15 min
Dynamo
DBS3 Lambda
S3
HBase
EC2
EC2
EC2
...
SQS
SQS
SQS
...
Real-time
Processing
Application
s
EC2
EC2
EC2
...
Log
Producers
Data Pipeline Architecture
Pre-Kinesis
...streaming service?
Data Pipeline Architecture
Pre-Kinesis
Dynamo
DBS3 Lambda
S3
HBase
EC2
EC2
EC2
...
SQS
SQS
SQS
...
Real-time
Processing
Application
s
EC2
EC2
EC2
...
Log
Producers
• Low latency
• Horizontally scalable
• Durable
• High availability
• Globally deployable
Streaming Service Requirements
Streaming Service Requirements
And so we tried...
► Low latency
► Horizontally scalable
► Durable
► High availability
► Globally deployable
Solution?
Kafka
► Had Issues with
• Allocating appropriate amount of disk space per topic
• Setting up cross datacenter mirroring
► Couldn’t spare manpower required to manage Kafka
ourselves
► Low ops burden
► Low latency
► Horizontally scalable
► Durable
► High availability
► Globally deployable
Streaming Service Requirements
Streaming Service Requirements
And so next we tried...
► Low ops burden
► Low latency
► Horizontally scalable
► Durable
► High availability
► Globally deployable
Better Solution!
Dynamo
DBS3 Lambda
S3
HBase
EC2
EC2
EC2
...
SQS
SQS
SQS
...
Real-time
Processing
Applications
EC2
EC2
EC2
...
Log Producers
Data Pipeline Architecture
Pre-Kinesis
Dynamo
DB
S3
HBase
EC2
EC2
EC2
...
Real-time
Processing
Applications
EC2
EC2
EC2
...
Kinesis
Log
Producers
Data Pipeline Architecture
Post-Kinesis
~3 sec
Dynamo
DB
S3
HBase
EC2
EC2
EC2
...
Real-time
Processing
Applications
EC2
EC2
EC2
...
Kinesis
Log
Producers
Data Pipeline Architecture
Post-Kinesis
► Across 5 regions...
► 155 streams
► 264 shards
► 2.5 billion events (100 million Kinesis records) / day
Kinesis Usage
► Across 5 regions...
► 155 streams
► 264 shards
► 2.5 billion events (100 million Kinesis records) / day
Kinesis Usage
► 15 minutes -> 3 seconds
► While simultaneously...
Kinesis Wins
► 15 minutes -> 3 seconds
► While simultaneously...
Kinesis Wins
Saving money
► 15 minutes -> 3 seconds
► While simultaneously...
Kinesis Wins
Improving stability
Saving money
► 300+ servers generating logs
• 700 records per second per host
► Primarily written in Erlang
• github.com/AdRoll/kinetic
Kinesis Producers
Our Setup
► Concatenate logs
• Flush at max record size
• Or 1 second has passed
• Big cost saver
► Retries + Exponential Backoff
(+ Jitter?)
► Randomized partition keys
Kinesis Producers
Best Practices
log
1 log
2 log
N
logN+
1logN
+2 logN+
M
Record (1 MB)
logN+1,logN+2,...,lo
gN+M
Record (1 MB)
log1,log2,...,
logN
► 12 major Kinesis consumer applications
► github.com/awslabs/kinesis-storm-spout (storm)
► github.com/AdRoll/kinetic (erlang)
Kinesis Consumers
Our Setup
Kinesis Consumers
Storm spout library (Amazon’s version)
Kinesis Consumers
Storm spout library (Amazon’s version)
Kinesis Consumers
Storm spout library (Amazon’s version)
Kinesis Consumers
Storm spout library (Amazon’s version)
Kinesis Consumers
Storm spout library (AdRoll’s version)
Kinesis Consumers
Storm spout library (AdRoll’s version)
Kinesis Consumers
Storm spout library (AdRoll’s version)
Kinesis Consumers
Storm spout library (AdRoll’s version)
Kinesis Consumers
Storm spout library (AdRoll’s version)
Kinesis Consumers
Storm spout library (AdRoll’s version)
Kinesis Pipeline Architecture
Kinesis limitations
Kinesis Pipeline Architecture
Kinesis limitations
► Tough to have large number of applications per stream
• Coordinate 0.2MB reads every N/5 seconds?
• New applications?
• Testing?
Kinesis Pipeline Architecture
Kinesis limitations
Kinesis Pipeline Architecture
Naive
Dynamo
DB
S3
HBas
e
EC2
EC2
EC2
...
Real-time
Processing
Application
s
EC2
EC2
EC2
...
Kinesis
Log
Producers
Kinesis Pipeline Architecture
Application Kinesis streams
Dynamo
DB
S3
HBase
EC2
EC2
EC2
...
Real-time
Processing
Applications
EC2
EC2
EC2
...
Kinesis
Kinesis
Kinesis
...
Kinesis
KCL
Log Producers
Dynamo
DB
S3
HBase
EC2
EC2
EC2
...
Real-time
Processing
Applications
EC2
EC2
EC2
...
Kinesis S3KCL
Log Producers
Kinesis Pipeline Architecture
Flush to S3
Dynamo
DB
S3
HBase
EC2
EC2
EC2
...
Kinesis
S3
Kinesis
Kinesis
...
Kinesis
KCL
Realtime
Processing
Applications
EC2
EC2
EC2
...
EC2
EC2
...
Log Producers
Kinesis Pipeline Architecture
Application Streams + S3
Summary
► Consider moving Kinesis near your data producers
► Better (cheaper?) to work with smooth, constant load
instead of bursty batch load
► Batch data inside Kinesis records
► Try to make asynchronous requests
► Put a KCL application in the middle of your streams to
allow for multiple real-time applications
Amazon Kinesis
Streams
Build your own custom
applications that
process or analyze
streaming data
Amazon Kinesis
Firehose
Easily load massive
volumes of streaming
data into Amazon S3
and Redshift
Amazon Kinesis
Analytics
Easily analyze data
streams using
standard SQL queries
Amazon Kinesis: Streaming data made easy
Services make it easy to capture, deliver, and process streams on AWS
Amazon Confidential
Thank you!
adad
Appendix
Remember to complete
your evaluations!

More Related Content

What's hot

Performance Tuning RocksDB for Kafka Streams’ State Stores
Performance Tuning RocksDB for Kafka Streams’ State StoresPerformance Tuning RocksDB for Kafka Streams’ State Stores
Performance Tuning RocksDB for Kafka Streams’ State Storesconfluent
 
게임서비스를 위한 ElastiCache 활용 전략 :: 구승모 솔루션즈 아키텍트 :: Gaming on AWS 2016
게임서비스를 위한 ElastiCache 활용 전략 :: 구승모 솔루션즈 아키텍트 :: Gaming on AWS 2016게임서비스를 위한 ElastiCache 활용 전략 :: 구승모 솔루션즈 아키텍트 :: Gaming on AWS 2016
게임서비스를 위한 ElastiCache 활용 전략 :: 구승모 솔루션즈 아키텍트 :: Gaming on AWS 2016Amazon Web Services Korea
 
AWS System Manager: Parameter Store를 사용한 AWS 구성 데이터 관리 기법 - 정창훈, 당근마켓 / 김대권, ...
AWS System Manager: Parameter Store를 사용한 AWS 구성 데이터 관리 기법 - 정창훈, 당근마켓 / 김대권, ...AWS System Manager: Parameter Store를 사용한 AWS 구성 데이터 관리 기법 - 정창훈, 당근마켓 / 김대권, ...
AWS System Manager: Parameter Store를 사용한 AWS 구성 데이터 관리 기법 - 정창훈, 당근마켓 / 김대권, ...Amazon Web Services Korea
 
Realtime Analytics on AWS
Realtime Analytics on AWSRealtime Analytics on AWS
Realtime Analytics on AWSSungmin Kim
 
Deep Dive and Best Practices for Real Time Streaming Applications
Deep Dive and Best Practices for Real Time Streaming ApplicationsDeep Dive and Best Practices for Real Time Streaming Applications
Deep Dive and Best Practices for Real Time Streaming ApplicationsAmazon Web Services
 
Flexible and Real-Time Stream Processing with Apache Flink
Flexible and Real-Time Stream Processing with Apache FlinkFlexible and Real-Time Stream Processing with Apache Flink
Flexible and Real-Time Stream Processing with Apache FlinkDataWorks Summit
 
Amazon S3 Best Practice and Tuning for Hadoop/Spark in the Cloud
Amazon S3 Best Practice and Tuning for Hadoop/Spark in the CloudAmazon S3 Best Practice and Tuning for Hadoop/Spark in the Cloud
Amazon S3 Best Practice and Tuning for Hadoop/Spark in the CloudNoritaka Sekiyama
 
Top 10 Mistakes When Migrating From Oracle to PostgreSQL
Top 10 Mistakes When Migrating From Oracle to PostgreSQLTop 10 Mistakes When Migrating From Oracle to PostgreSQL
Top 10 Mistakes When Migrating From Oracle to PostgreSQLJim Mlodgenski
 
[NEW LAUNCH!] Introducing Amazon Managed Streaming for Kafka (Amazon MSK) (AN...
[NEW LAUNCH!] Introducing Amazon Managed Streaming for Kafka (Amazon MSK) (AN...[NEW LAUNCH!] Introducing Amazon Managed Streaming for Kafka (Amazon MSK) (AN...
[NEW LAUNCH!] Introducing Amazon Managed Streaming for Kafka (Amazon MSK) (AN...Amazon Web Services
 
AWS EMR Cost optimization
AWS EMR Cost optimizationAWS EMR Cost optimization
AWS EMR Cost optimizationSANG WON PARK
 
Amazon EMR Deep Dive & Best Practices
Amazon EMR Deep Dive & Best PracticesAmazon EMR Deep Dive & Best Practices
Amazon EMR Deep Dive & Best PracticesAmazon Web Services
 
Introduction to Kafka Streams
Introduction to Kafka StreamsIntroduction to Kafka Streams
Introduction to Kafka StreamsGuozhang Wang
 
Deep Dive - Amazon Elastic MapReduce (EMR)
Deep Dive - Amazon Elastic MapReduce (EMR)Deep Dive - Amazon Elastic MapReduce (EMR)
Deep Dive - Amazon Elastic MapReduce (EMR)Amazon Web Services
 
Introduction to Apache Flink - Fast and reliable big data processing
Introduction to Apache Flink - Fast and reliable big data processingIntroduction to Apache Flink - Fast and reliable big data processing
Introduction to Apache Flink - Fast and reliable big data processingTill Rohrmann
 
Cloudera training: secure your Cloudera cluster
Cloudera training: secure your Cloudera clusterCloudera training: secure your Cloudera cluster
Cloudera training: secure your Cloudera clusterCloudera, Inc.
 
Performance Tuning RocksDB for Kafka Streams' State Stores (Dhruba Borthakur,...
Performance Tuning RocksDB for Kafka Streams' State Stores (Dhruba Borthakur,...Performance Tuning RocksDB for Kafka Streams' State Stores (Dhruba Borthakur,...
Performance Tuning RocksDB for Kafka Streams' State Stores (Dhruba Borthakur,...confluent
 
Kafka Tutorial - Introduction to Apache Kafka (Part 1)
Kafka Tutorial - Introduction to Apache Kafka (Part 1)Kafka Tutorial - Introduction to Apache Kafka (Part 1)
Kafka Tutorial - Introduction to Apache Kafka (Part 1)Jean-Paul Azar
 
Apache Spark Listeners: A Crash Course in Fast, Easy Monitoring
Apache Spark Listeners: A Crash Course in Fast, Easy MonitoringApache Spark Listeners: A Crash Course in Fast, Easy Monitoring
Apache Spark Listeners: A Crash Course in Fast, Easy MonitoringDatabricks
 
Deep Dive into Spark SQL with Advanced Performance Tuning with Xiao Li & Wenc...
Deep Dive into Spark SQL with Advanced Performance Tuning with Xiao Li & Wenc...Deep Dive into Spark SQL with Advanced Performance Tuning with Xiao Li & Wenc...
Deep Dive into Spark SQL with Advanced Performance Tuning with Xiao Li & Wenc...Databricks
 

What's hot (20)

Performance Tuning RocksDB for Kafka Streams’ State Stores
Performance Tuning RocksDB for Kafka Streams’ State StoresPerformance Tuning RocksDB for Kafka Streams’ State Stores
Performance Tuning RocksDB for Kafka Streams’ State Stores
 
게임서비스를 위한 ElastiCache 활용 전략 :: 구승모 솔루션즈 아키텍트 :: Gaming on AWS 2016
게임서비스를 위한 ElastiCache 활용 전략 :: 구승모 솔루션즈 아키텍트 :: Gaming on AWS 2016게임서비스를 위한 ElastiCache 활용 전략 :: 구승모 솔루션즈 아키텍트 :: Gaming on AWS 2016
게임서비스를 위한 ElastiCache 활용 전략 :: 구승모 솔루션즈 아키텍트 :: Gaming on AWS 2016
 
AWS System Manager: Parameter Store를 사용한 AWS 구성 데이터 관리 기법 - 정창훈, 당근마켓 / 김대권, ...
AWS System Manager: Parameter Store를 사용한 AWS 구성 데이터 관리 기법 - 정창훈, 당근마켓 / 김대권, ...AWS System Manager: Parameter Store를 사용한 AWS 구성 데이터 관리 기법 - 정창훈, 당근마켓 / 김대권, ...
AWS System Manager: Parameter Store를 사용한 AWS 구성 데이터 관리 기법 - 정창훈, 당근마켓 / 김대권, ...
 
Realtime Analytics on AWS
Realtime Analytics on AWSRealtime Analytics on AWS
Realtime Analytics on AWS
 
Deep Dive and Best Practices for Real Time Streaming Applications
Deep Dive and Best Practices for Real Time Streaming ApplicationsDeep Dive and Best Practices for Real Time Streaming Applications
Deep Dive and Best Practices for Real Time Streaming Applications
 
Flexible and Real-Time Stream Processing with Apache Flink
Flexible and Real-Time Stream Processing with Apache FlinkFlexible and Real-Time Stream Processing with Apache Flink
Flexible and Real-Time Stream Processing with Apache Flink
 
Amazon S3 Best Practice and Tuning for Hadoop/Spark in the Cloud
Amazon S3 Best Practice and Tuning for Hadoop/Spark in the CloudAmazon S3 Best Practice and Tuning for Hadoop/Spark in the Cloud
Amazon S3 Best Practice and Tuning for Hadoop/Spark in the Cloud
 
Top 10 Mistakes When Migrating From Oracle to PostgreSQL
Top 10 Mistakes When Migrating From Oracle to PostgreSQLTop 10 Mistakes When Migrating From Oracle to PostgreSQL
Top 10 Mistakes When Migrating From Oracle to PostgreSQL
 
[NEW LAUNCH!] Introducing Amazon Managed Streaming for Kafka (Amazon MSK) (AN...
[NEW LAUNCH!] Introducing Amazon Managed Streaming for Kafka (Amazon MSK) (AN...[NEW LAUNCH!] Introducing Amazon Managed Streaming for Kafka (Amazon MSK) (AN...
[NEW LAUNCH!] Introducing Amazon Managed Streaming for Kafka (Amazon MSK) (AN...
 
AWS EMR Cost optimization
AWS EMR Cost optimizationAWS EMR Cost optimization
AWS EMR Cost optimization
 
Dive into PySpark
Dive into PySparkDive into PySpark
Dive into PySpark
 
Amazon EMR Deep Dive & Best Practices
Amazon EMR Deep Dive & Best PracticesAmazon EMR Deep Dive & Best Practices
Amazon EMR Deep Dive & Best Practices
 
Introduction to Kafka Streams
Introduction to Kafka StreamsIntroduction to Kafka Streams
Introduction to Kafka Streams
 
Deep Dive - Amazon Elastic MapReduce (EMR)
Deep Dive - Amazon Elastic MapReduce (EMR)Deep Dive - Amazon Elastic MapReduce (EMR)
Deep Dive - Amazon Elastic MapReduce (EMR)
 
Introduction to Apache Flink - Fast and reliable big data processing
Introduction to Apache Flink - Fast and reliable big data processingIntroduction to Apache Flink - Fast and reliable big data processing
Introduction to Apache Flink - Fast and reliable big data processing
 
Cloudera training: secure your Cloudera cluster
Cloudera training: secure your Cloudera clusterCloudera training: secure your Cloudera cluster
Cloudera training: secure your Cloudera cluster
 
Performance Tuning RocksDB for Kafka Streams' State Stores (Dhruba Borthakur,...
Performance Tuning RocksDB for Kafka Streams' State Stores (Dhruba Borthakur,...Performance Tuning RocksDB for Kafka Streams' State Stores (Dhruba Borthakur,...
Performance Tuning RocksDB for Kafka Streams' State Stores (Dhruba Borthakur,...
 
Kafka Tutorial - Introduction to Apache Kafka (Part 1)
Kafka Tutorial - Introduction to Apache Kafka (Part 1)Kafka Tutorial - Introduction to Apache Kafka (Part 1)
Kafka Tutorial - Introduction to Apache Kafka (Part 1)
 
Apache Spark Listeners: A Crash Course in Fast, Easy Monitoring
Apache Spark Listeners: A Crash Course in Fast, Easy MonitoringApache Spark Listeners: A Crash Course in Fast, Easy Monitoring
Apache Spark Listeners: A Crash Course in Fast, Easy Monitoring
 
Deep Dive into Spark SQL with Advanced Performance Tuning with Xiao Li & Wenc...
Deep Dive into Spark SQL with Advanced Performance Tuning with Xiao Li & Wenc...Deep Dive into Spark SQL with Advanced Performance Tuning with Xiao Li & Wenc...
Deep Dive into Spark SQL with Advanced Performance Tuning with Xiao Li & Wenc...
 

Similar to (BDT403) Best Practices for Building Real-time Streaming Applications with Amazon Kinesis

AWS Webcast - AWS Kinesis Webinar
AWS Webcast - AWS Kinesis WebinarAWS Webcast - AWS Kinesis Webinar
AWS Webcast - AWS Kinesis WebinarAmazon Web Services
 
AWS Kinesis - Streams, Firehose, Analytics
AWS Kinesis - Streams, Firehose, AnalyticsAWS Kinesis - Streams, Firehose, Analytics
AWS Kinesis - Streams, Firehose, AnalyticsSerhat Can
 
Deep dive and best practices on real time streaming applications nyc-loft_oct...
Deep dive and best practices on real time streaming applications nyc-loft_oct...Deep dive and best practices on real time streaming applications nyc-loft_oct...
Deep dive and best practices on real time streaming applications nyc-loft_oct...Amazon Web Services
 
(SDD405) Amazon Kinesis Deep Dive | AWS re:Invent 2014
(SDD405) Amazon Kinesis Deep Dive | AWS re:Invent 2014(SDD405) Amazon Kinesis Deep Dive | AWS re:Invent 2014
(SDD405) Amazon Kinesis Deep Dive | AWS re:Invent 2014Amazon Web Services
 
AWS를 활용한 첫 빅데이터 프로젝트 시작하기(김일호)- AWS 웨비나 시리즈 2015
AWS를 활용한 첫 빅데이터 프로젝트 시작하기(김일호)- AWS 웨비나 시리즈 2015AWS를 활용한 첫 빅데이터 프로젝트 시작하기(김일호)- AWS 웨비나 시리즈 2015
AWS를 활용한 첫 빅데이터 프로젝트 시작하기(김일호)- AWS 웨비나 시리즈 2015Amazon Web Services Korea
 
AWS APAC Webinar Week - Real Time Data Processing with Kinesis
AWS APAC Webinar Week - Real Time Data Processing with KinesisAWS APAC Webinar Week - Real Time Data Processing with Kinesis
AWS APAC Webinar Week - Real Time Data Processing with KinesisAmazon Web Services
 
Amazon Kinesis Data Streams Vs Msk (1).pptx
Amazon Kinesis Data Streams Vs Msk (1).pptxAmazon Kinesis Data Streams Vs Msk (1).pptx
Amazon Kinesis Data Streams Vs Msk (1).pptxRenjithPillai26
 
AWS April 2016 Webinar Series - Getting Started with Real-Time Data Analytics...
AWS April 2016 Webinar Series - Getting Started with Real-Time Data Analytics...AWS April 2016 Webinar Series - Getting Started with Real-Time Data Analytics...
AWS April 2016 Webinar Series - Getting Started with Real-Time Data Analytics...Amazon Web Services
 
Choose Right Stream Storage: Amazon Kinesis Data Streams vs MSK
Choose Right Stream Storage: Amazon Kinesis Data Streams vs MSKChoose Right Stream Storage: Amazon Kinesis Data Streams vs MSK
Choose Right Stream Storage: Amazon Kinesis Data Streams vs MSKSungmin Kim
 
찾아가는 AWS 세미나(구로,가산,판교) - AWS 기반 빅데이터 활용 방법 (김일호 솔루션즈 아키텍트)
찾아가는 AWS 세미나(구로,가산,판교) - AWS 기반 빅데이터 활용 방법 (김일호 솔루션즈 아키텍트)찾아가는 AWS 세미나(구로,가산,판교) - AWS 기반 빅데이터 활용 방법 (김일호 솔루션즈 아키텍트)
찾아가는 AWS 세미나(구로,가산,판교) - AWS 기반 빅데이터 활용 방법 (김일호 솔루션즈 아키텍트)Amazon Web Services Korea
 
Getting Started with Real-time Analytics
Getting Started with Real-time AnalyticsGetting Started with Real-time Analytics
Getting Started with Real-time AnalyticsAmazon Web Services
 
Structure Data 2014: BIG DATA ANALYTICS RE-INVENTED, Ryan Waite
Structure Data 2014: BIG DATA ANALYTICS RE-INVENTED, Ryan WaiteStructure Data 2014: BIG DATA ANALYTICS RE-INVENTED, Ryan Waite
Structure Data 2014: BIG DATA ANALYTICS RE-INVENTED, Ryan WaiteGigaom
 
AWS Webcast - Managing Big Data in the AWS Cloud_20140924
AWS Webcast - Managing Big Data in the AWS Cloud_20140924AWS Webcast - Managing Big Data in the AWS Cloud_20140924
AWS Webcast - Managing Big Data in the AWS Cloud_20140924Amazon Web Services
 
Log Analytics with Amazon Elasticsearch Service and Amazon Kinesis - March 20...
Log Analytics with Amazon Elasticsearch Service and Amazon Kinesis - March 20...Log Analytics with Amazon Elasticsearch Service and Amazon Kinesis - March 20...
Log Analytics with Amazon Elasticsearch Service and Amazon Kinesis - March 20...Amazon Web Services
 
AWS re:Invent 2016: Beeswax: Building a Real-Time Streaming Data Platform on ...
AWS re:Invent 2016: Beeswax: Building a Real-Time Streaming Data Platform on ...AWS re:Invent 2016: Beeswax: Building a Real-Time Streaming Data Platform on ...
AWS re:Invent 2016: Beeswax: Building a Real-Time Streaming Data Platform on ...Amazon Web Services
 
AWS Community Nordics Virtual Meetup
AWS Community Nordics Virtual MeetupAWS Community Nordics Virtual Meetup
AWS Community Nordics Virtual MeetupAnahit Pogosova
 
BDA402 Deep Dive: Log Analytics with Amazon Elasticsearch Service
BDA402 Deep Dive: Log Analytics with Amazon Elasticsearch ServiceBDA402 Deep Dive: Log Analytics with Amazon Elasticsearch Service
BDA402 Deep Dive: Log Analytics with Amazon Elasticsearch ServiceAmazon Web Services
 
Barga IC2E & IoTDI'16 Keynote
Barga IC2E & IoTDI'16 KeynoteBarga IC2E & IoTDI'16 Keynote
Barga IC2E & IoTDI'16 KeynoteRoger Barga
 
Day 5 - Real-time Data Processing/Internet of Things (IoT) with Amazon Kinesis
Day 5 - Real-time Data Processing/Internet of Things (IoT) with Amazon KinesisDay 5 - Real-time Data Processing/Internet of Things (IoT) with Amazon Kinesis
Day 5 - Real-time Data Processing/Internet of Things (IoT) with Amazon KinesisAmazon Web Services
 
Building Big Data Applications with Serverless Architectures - June 2017 AWS...
Building Big Data Applications with Serverless Architectures -  June 2017 AWS...Building Big Data Applications with Serverless Architectures -  June 2017 AWS...
Building Big Data Applications with Serverless Architectures - June 2017 AWS...Amazon Web Services
 

Similar to (BDT403) Best Practices for Building Real-time Streaming Applications with Amazon Kinesis (20)

AWS Webcast - AWS Kinesis Webinar
AWS Webcast - AWS Kinesis WebinarAWS Webcast - AWS Kinesis Webinar
AWS Webcast - AWS Kinesis Webinar
 
AWS Kinesis - Streams, Firehose, Analytics
AWS Kinesis - Streams, Firehose, AnalyticsAWS Kinesis - Streams, Firehose, Analytics
AWS Kinesis - Streams, Firehose, Analytics
 
Deep dive and best practices on real time streaming applications nyc-loft_oct...
Deep dive and best practices on real time streaming applications nyc-loft_oct...Deep dive and best practices on real time streaming applications nyc-loft_oct...
Deep dive and best practices on real time streaming applications nyc-loft_oct...
 
(SDD405) Amazon Kinesis Deep Dive | AWS re:Invent 2014
(SDD405) Amazon Kinesis Deep Dive | AWS re:Invent 2014(SDD405) Amazon Kinesis Deep Dive | AWS re:Invent 2014
(SDD405) Amazon Kinesis Deep Dive | AWS re:Invent 2014
 
AWS를 활용한 첫 빅데이터 프로젝트 시작하기(김일호)- AWS 웨비나 시리즈 2015
AWS를 활용한 첫 빅데이터 프로젝트 시작하기(김일호)- AWS 웨비나 시리즈 2015AWS를 활용한 첫 빅데이터 프로젝트 시작하기(김일호)- AWS 웨비나 시리즈 2015
AWS를 활용한 첫 빅데이터 프로젝트 시작하기(김일호)- AWS 웨비나 시리즈 2015
 
AWS APAC Webinar Week - Real Time Data Processing with Kinesis
AWS APAC Webinar Week - Real Time Data Processing with KinesisAWS APAC Webinar Week - Real Time Data Processing with Kinesis
AWS APAC Webinar Week - Real Time Data Processing with Kinesis
 
Amazon Kinesis Data Streams Vs Msk (1).pptx
Amazon Kinesis Data Streams Vs Msk (1).pptxAmazon Kinesis Data Streams Vs Msk (1).pptx
Amazon Kinesis Data Streams Vs Msk (1).pptx
 
AWS April 2016 Webinar Series - Getting Started with Real-Time Data Analytics...
AWS April 2016 Webinar Series - Getting Started with Real-Time Data Analytics...AWS April 2016 Webinar Series - Getting Started with Real-Time Data Analytics...
AWS April 2016 Webinar Series - Getting Started with Real-Time Data Analytics...
 
Choose Right Stream Storage: Amazon Kinesis Data Streams vs MSK
Choose Right Stream Storage: Amazon Kinesis Data Streams vs MSKChoose Right Stream Storage: Amazon Kinesis Data Streams vs MSK
Choose Right Stream Storage: Amazon Kinesis Data Streams vs MSK
 
찾아가는 AWS 세미나(구로,가산,판교) - AWS 기반 빅데이터 활용 방법 (김일호 솔루션즈 아키텍트)
찾아가는 AWS 세미나(구로,가산,판교) - AWS 기반 빅데이터 활용 방법 (김일호 솔루션즈 아키텍트)찾아가는 AWS 세미나(구로,가산,판교) - AWS 기반 빅데이터 활용 방법 (김일호 솔루션즈 아키텍트)
찾아가는 AWS 세미나(구로,가산,판교) - AWS 기반 빅데이터 활용 방법 (김일호 솔루션즈 아키텍트)
 
Getting Started with Real-time Analytics
Getting Started with Real-time AnalyticsGetting Started with Real-time Analytics
Getting Started with Real-time Analytics
 
Structure Data 2014: BIG DATA ANALYTICS RE-INVENTED, Ryan Waite
Structure Data 2014: BIG DATA ANALYTICS RE-INVENTED, Ryan WaiteStructure Data 2014: BIG DATA ANALYTICS RE-INVENTED, Ryan Waite
Structure Data 2014: BIG DATA ANALYTICS RE-INVENTED, Ryan Waite
 
AWS Webcast - Managing Big Data in the AWS Cloud_20140924
AWS Webcast - Managing Big Data in the AWS Cloud_20140924AWS Webcast - Managing Big Data in the AWS Cloud_20140924
AWS Webcast - Managing Big Data in the AWS Cloud_20140924
 
Log Analytics with Amazon Elasticsearch Service and Amazon Kinesis - March 20...
Log Analytics with Amazon Elasticsearch Service and Amazon Kinesis - March 20...Log Analytics with Amazon Elasticsearch Service and Amazon Kinesis - March 20...
Log Analytics with Amazon Elasticsearch Service and Amazon Kinesis - March 20...
 
AWS re:Invent 2016: Beeswax: Building a Real-Time Streaming Data Platform on ...
AWS re:Invent 2016: Beeswax: Building a Real-Time Streaming Data Platform on ...AWS re:Invent 2016: Beeswax: Building a Real-Time Streaming Data Platform on ...
AWS re:Invent 2016: Beeswax: Building a Real-Time Streaming Data Platform on ...
 
AWS Community Nordics Virtual Meetup
AWS Community Nordics Virtual MeetupAWS Community Nordics Virtual Meetup
AWS Community Nordics Virtual Meetup
 
BDA402 Deep Dive: Log Analytics with Amazon Elasticsearch Service
BDA402 Deep Dive: Log Analytics with Amazon Elasticsearch ServiceBDA402 Deep Dive: Log Analytics with Amazon Elasticsearch Service
BDA402 Deep Dive: Log Analytics with Amazon Elasticsearch Service
 
Barga IC2E & IoTDI'16 Keynote
Barga IC2E & IoTDI'16 KeynoteBarga IC2E & IoTDI'16 Keynote
Barga IC2E & IoTDI'16 Keynote
 
Day 5 - Real-time Data Processing/Internet of Things (IoT) with Amazon Kinesis
Day 5 - Real-time Data Processing/Internet of Things (IoT) with Amazon KinesisDay 5 - Real-time Data Processing/Internet of Things (IoT) with Amazon Kinesis
Day 5 - Real-time Data Processing/Internet of Things (IoT) with Amazon Kinesis
 
Building Big Data Applications with Serverless Architectures - June 2017 AWS...
Building Big Data Applications with Serverless Architectures -  June 2017 AWS...Building Big Data Applications with Serverless Architectures -  June 2017 AWS...
Building Big Data Applications with Serverless Architectures - June 2017 AWS...
 

More from Amazon Web Services

Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Amazon Web Services
 
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Amazon Web Services
 
Esegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateEsegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateAmazon Web Services
 
Costruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSCostruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSAmazon Web Services
 
Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Amazon Web Services
 
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Amazon Web Services
 
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...Amazon Web Services
 
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsMicrosoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsAmazon Web Services
 
Database Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareDatabase Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareAmazon Web Services
 
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSCrea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSAmazon Web Services
 
API moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAPI moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAmazon Web Services
 
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareDatabase Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareAmazon Web Services
 
Tools for building your MVP on AWS
Tools for building your MVP on AWSTools for building your MVP on AWS
Tools for building your MVP on AWSAmazon Web Services
 
How to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckHow to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckAmazon Web Services
 
Building a web application without servers
Building a web application without serversBuilding a web application without servers
Building a web application without serversAmazon Web Services
 
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...Amazon Web Services
 
Introduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceIntroduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceAmazon Web Services
 

More from Amazon Web Services (20)

Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
 
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
 
Esegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateEsegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS Fargate
 
Costruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSCostruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWS
 
Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot
 
Open banking as a service
Open banking as a serviceOpen banking as a service
Open banking as a service
 
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
 
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
 
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsMicrosoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
 
Computer Vision con AWS
Computer Vision con AWSComputer Vision con AWS
Computer Vision con AWS
 
Database Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareDatabase Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatare
 
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSCrea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
 
API moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAPI moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e web
 
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareDatabase Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
 
Tools for building your MVP on AWS
Tools for building your MVP on AWSTools for building your MVP on AWS
Tools for building your MVP on AWS
 
How to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckHow to Build a Winning Pitch Deck
How to Build a Winning Pitch Deck
 
Building a web application without servers
Building a web application without serversBuilding a web application without servers
Building a web application without servers
 
Fundraising Essentials
Fundraising EssentialsFundraising Essentials
Fundraising Essentials
 
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
 
Introduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceIntroduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container Service
 

Recently uploaded

JohnPollard-hybrid-app-RailsConf2024.pptx
JohnPollard-hybrid-app-RailsConf2024.pptxJohnPollard-hybrid-app-RailsConf2024.pptx
JohnPollard-hybrid-app-RailsConf2024.pptxJohnPollard37
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MIND CTI
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingEdi Saputra
 
Introduction to use of FHIR Documents in ABDM
Introduction to use of FHIR Documents in ABDMIntroduction to use of FHIR Documents in ABDM
Introduction to use of FHIR Documents in ABDMKumar Satyam
 
Vector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxVector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxRemote DBA Services
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodJuan lago vázquez
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Orbitshub
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistandanishmna97
 
Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Zilliz
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAndrey Devyatkin
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Jeffrey Haguewood
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdfSandro Moreira
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusZilliz
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobeapidays
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024The Digital Insurer
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century educationjfdjdjcjdnsjd
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native ApplicationsWSO2
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...apidays
 

Recently uploaded (20)

JohnPollard-hybrid-app-RailsConf2024.pptx
JohnPollard-hybrid-app-RailsConf2024.pptxJohnPollard-hybrid-app-RailsConf2024.pptx
JohnPollard-hybrid-app-RailsConf2024.pptx
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Introduction to use of FHIR Documents in ABDM
Introduction to use of FHIR Documents in ABDMIntroduction to use of FHIR Documents in ABDM
Introduction to use of FHIR Documents in ABDM
 
Vector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxVector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptx
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
 
Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with Milvus
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 

(BDT403) Best Practices for Building Real-time Streaming Applications with Amazon Kinesis

  • 1. © 2015, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Adi Krishnan, Principal PM Amazon Kinesis Nick Barrash, SDE, AdRoll 10/08/2015 Amazon Kinesis Capture, Deliver, and Process Real-time Data Streams on AWS
  • 2. What to Expect from the Session • Amazon Kinesis: System Overview • Amazon Kinesis Streams (this session) • Amazon Kinesis Firehose (available now) • Amazon Kinesis Analytics (announced) • Top 10 things you need to know about Kinesis Streams! • Amazon Kinesis Streaming Data Ingestion Model (5 things) • Streaming applications with Kinesis Client Library (5 things) • AdRoll: Re-architecting our real-time data pipeline • Featuring Nick Barrash, SDE AdRoll
  • 3. Amazon Kinesis Streams Build your own custom applications that process or analyze streaming data Amazon Kinesis Firehose Easily load massive volumes of streaming data into Amazon S3 and Redshift Amazon Kinesis Analytics Easily analyze data streams using standard SQL queries Amazon Kinesis: Streaming data done the AWS way Services that make it easy to work with real-time data streams on AWS
  • 5. Amazon Kinesis Streams Build your own data streaming applications Easy Administration: Create a new stream, set desired capacity and partitioning to match your data throughput rate and volume. Build real-time applications: Perform custom record processing on streaming data using Kinesis Client Library, Apache Spark/ Storm, AWS Lambda, and more. Low cost: Cost-efficient for workloads of any scale. Amazon Confidential Send clickstream data to Kinesis Streams Kinesis Streams stores and exposes clickstream data for processing Custom application built on Kinesis Client Library makes real- time content recommendations Readers see personalized content suggestions
  • 6. Amazon Web Services AZ AZ AZ Durable, highly consistent storage replicates data across three data centers (availability zones) Aggregate and archive to S3 Millions of sources producing 100s of terabytes per hour Front End Authentication Authorization Ordered stream of events supports multiple readers Real-time dashboards and alarms Machine learning algorithms or sliding window analytics Aggregate analysis in Hadoop or a data warehouse Inexpensive: $0.028 per million puts Real-Time Streaming Data Ingestion Custom-built Streaming Applications (KCL) Inexpensive: $0.014 per 1,000,000 PUT Payload Units Amazon Kinesis Streams (re:Invent 2013) Fully managed service for real-time processing of streaming data
  • 8. • A provisioned entity called a Stream composed of Shards • Producers use a PUT call to store data in a Stream. • Each record <= 1 MB. PutRecord or PutRecords • A partition key is supplied by producer and used to distribute (MD5 hash) the PUTs across (hash key range) of Shards • Unique Sequence# returned upon successful PUT call • Approximate arrival timestamp affixed to each record Putting Data into Kinesis Simple Put* interface to capture and store data in Streams
  • 9. Managed Buffer • Care about a reliable, scalable way to capture data • Defer all other aggregation to consumer • Generate Random Partition Keys • Ensure a high cardinality for Partition Keys with respect to shards, to spray evenly across available shards Topic #1: Thinking about ingestion model Workload determines partition key strategy Streaming Map-Reduce • Streaming Map-Reduce: leverage partition keys as a natural way to aggregate data • For e.g. Partition Keys per billing customer, per DeviceId, per stock symbol • Design partition keys to scale • Be aware of “hot partition keys or shards ”
  • 10. Topic #2: Kinesis PutRecords API High throughput API for efficient writes to Kinesis • PutRecords {Records {Data,PartitionKey}, StreamName} • Supports 500 records. • Record can be =<1 MB, and up to 5 MB for whole request • Can include records with different partition keys • Ordering not guaranteed • Successful response includes ShardID and SeqNumber values • Unsuccessful Response
  • 11. Topic #2: Kinesis PutRecords API (Continued) Dealing with unsuccessful records • Example Failed Response • Examines the PutRecordsResult objects to resend • Check FailedRecordCount parameter to confirm failed records. • If so, each putRecordsEntry with ErrorCode != NULL should be added to a subsequent request
  • 12. Topic #3: Kinesis Producer Library Highly configurable library to write to Kinesis • Collects records and uses PutRecords for high throughput writes • Integrates seamlessly with the Amazon Kinesis Client Library (KCL) to de-aggregate batched records • Submits Amazon CloudWatch metrics on your behalf to provide visibility into producer performance • https://github.com/awslabs/amazon-kinesis-producer
  • 13. Topic #4: Extended Retention in Kinesis Default 24 Hours but configurable to 7 days • 2 New APIs • IncreaseStreamRetentionPeriod(String StreamName, int RetentionPeriodHours) • DecreaseStreamRetentionPeriod(String StreamName, int RetentionPeriodHours) • Use it in one of two-modes • as always-on extended retention • Raise retention period in response to operational event and/ planned maintenance • Extended Retention is priced at US$ 0.020/ Shard-Hour New!
  • 14. • Keep track of your metrics • Monitor CloudWatch metrics: PutRecord.Bytes + GetRecords.Bytes metrics keep track of shard usage • Retry if rise in input rate is temporary • Reshard to increase number of shards • SplitShard – Adds more shards • MergeShard – Removes shards • Use the Kinesis Scaling Utility - https://github.com/awslabs/amazon- kinesis-scaling-utils Metric Units PutRecords.Bytes Bytes PutRecords.Latency Milliseconds PutRecords.Success Count PutRecords.Records Count Incoming Bytes Bytes Incoming Records Count Topic #5: Dealing with provisioned throughput exceeded Metrics and Re-sharding (SplitShard/ MergeShard)
  • 15. Amazon Kinesis Streams: Build Apps w/ Kinesis Client Library
  • 16. • Open Source Java client library, also available for Python, Ruby, Node.JS, .NET. Available on Github • Connects to the stream and enumerates the shards • Coordinates shard associations with other workers (if any) • Instantiates a record processor for every shard it manages • Pulls data records from the stream • Pushes the records to the corresponding record processor • Checkpoints processed records • Balances shard-worker associations when the worker instance count changes • Balances shard-worker associations when shards are split or merged Building Applications: Kinesis Client Library For fault-tolerant stream processing applications
  • 17. State Management with Kinesis Client Library • A named KCL application has 1 Worker per EC2 instance • One worker maps to one or more record processors • One record processor maps to one shard and processes data records from that shard • The KCL uses the IRecordProcessor interface to communicate with your application • ProcessRecords() contains the business logic
  • 18. Topic #1: Empty records returned even with data in stream • App calls GetRecords in a loop with no back-offs. • Each call to GetRecords also returns a ShardIterator value, which must be used in the next iteration. • GetRecords operation is non-blocking. Returns with either relevant data records or with an empty Records element. • An empty Records element is returned if: •There is no more data currently in the shard •There is no data near the part of the shard pointed to by the ShardIterator • In production, the only time the continuous loop should be exited is when the NextShardIterator value is NULL. •When NextShardIterator is NULL, current shard has been closed and ShardIteratorvalue would otherwise point past the last record. •If app never calls SplitShard or MergeShards, the shard remains open and the calls to GetRecords never return a NextShardIterator value that is NULL.
  • 19. Topic #2: Some records are skipped Handle exceptions with processRecords • Most common cause: Unhandled exception thrown from processRecords. • Any exception thrown from processRecords is absorbed by KCL. • To avoid infinite retries on recurring failures, KCL does not resend the batch of records processed at the time of the exception. • It calls processRecords for the next batch of data records without restarting the record processor. This effectively results in consumer applications observing skipped records. • Handle all exceptions within processRecords appropriately.
  • 20. Metric Name Description KinesisDataFetcher.get Records.Success Number of successful GetRecords operations per Amazon Kinesis stream shard. KinesisDataFetcher.get Records.Time Time taken per GetRecords operation for the Amazon Kinesis stream shard. UpdateLease.Success Number of successful checkpoints made by the record processor for the given shard. UpdateLease.Time Time taken for each checkpoint operation for the given shard. DataBytesProcessed Total size of records processed in bytes on each ProcessTask invocation. RecordsProcessed Number of records processed on each ProcessTask invocation. ExpiredIterator Number of ExpiredIteratorException received when calling GetRecords. MillisBehindLatest Time the current iterator is behind from the latest record in the shard. RecordProcessor.proc essRecords.Time Time taken by the record processor’s processRecords method. Shard-level metrics on KCL are your best friend
  • 21. KCL Topic #3: Consumer is reading slower than expected • Multiple applications have total reads exceeding the per-shard limits. • In this case, increase shards in the Amazon Kinesis stream. • The maximum number of GetRecords per call limit may have been configured with a low value. • May have configured the worker with a low value for the maxRecords property. Recommend system defaults for this property. • The logic inside processRecords call may be taking longer than expected for a number of possible reasons; the logic may be CPU intensive, I/O blocking, or bottlenecked on synchronization. • Test run empty record processors and compare the read throughput.
  • 22. KCL Topic #4: Records of same shard are processed by different record processors at the same time • One record processor on a particular shard BUT multiple record processors may temporarily process the same shard. • A worker instance loses connectivity • New and original record processors from the unreachable worker process data from the same shard • If record processor has shards taken by another record processor, it must handle two cases to perform graceful shutdown: • After current call to processRecords is completed, KCL invokes shutdown method on record processor with shutdown reason 'ZOMBIE'. Record processors are expected to clean up any resources and then exit. • When checkpointing from a 'zombie' worker, the KCL throws a Shutdown Exception. After receiving, code is expected to exit the current method cleanly.
  • 23. KCL Topic #5: Duplicates in processing Consumer retries can cause duplicates • Consumer (data processing application) retries happen when record processors restart. Record processors for the same shard restart in the following cases: • A worker terminates unexpectedly • Worker instances are added or removed • Shards are merged or split • In all these cases, the shards-to-worker-to-record-processor mapping is continuously updated to load balance processing. Shard processors that were migrated to other instances restart processing records from the last checkpoint.
  • 24. • In KCL app, ensure data being processed is persisted to durable store like DynamoDB, or S3, prior to check-pointing. • Duplicates: Make the authoritative data repository (usually at the end of the data flow) resilient to duplicates. That way the rest of the system has a simple policy – keep retrying until you succeed. • Idempotent Processing: Use number of records since previous checkpoint, to get repeatable results when the record processors fail over. • Record Processor uses a fixed number of records per Amazon S3 file, such as 5000. • The file name uses this schema: Amazon S3 prefix, shard-id, and First-Sequence- Num. Something like sample-shard000001-10001. • After uploading the S3 file, checkpoint by specifying Last-Sequence-Num. In this case, checkpoint at record number 15000. KCL Topic #5 Continued … Make final destination resilient to duplicates
  • 25. Sending & Reading Data from Kinesis Streams AWS SDK LOG4J Flume Fluentd Get* APIs Kinesis Client Library + Connector Library Apache Storm Amazon Elastic MapReduce Sending Consuming AWS Mobile SDK Kinesis Producer Library AWS Lambda Apache Spark
  • 27. ► AdRoll’s real-time data pipeline • Old architecture • Motivation to move to Kinesis ► Producer applications ► Consumers applications ► Kinesis pipeline architectures Menu
  • 28. AdRoll is a performance marketing platform that enables brands to leverage their data to run intelligent, ROI-positive marketing campaigns AdRoll Our Mission
  • 29. • Real-time data at AdRoll • Advertiser’s website clickstream data • Serving ads • Impressions / Clicks / Conversions • Real-time bidding (RTB) data • Recency directly correlated with performance AdRoll Our Data
  • 30. AdRoll Our Scale • Daily impression data: 200 million events / 240GB • Daily user data: 1.6 billion events / 700GB • Daily RTB data: 60 billion events / 80TB
  • 31. • Daily impression data: 200 million events / 240GB • Daily user data:1.6 billion events / 700GB • Daily RTB data: 60 billion events / 80TB AdRoll Our Scale
  • 32. Data Pipeline Architecture Pre-Kinesis Dynamo DBS3 Lambda S3 HBase EC2 EC2 EC2 ... SQS SQS SQS ... Real-time Processing Application s EC2 EC2 EC2 ... Log Producers
  • 33. 10 min ~5 min ~15 min Dynamo DBS3 Lambda S3 HBase EC2 EC2 EC2 ... SQS SQS SQS ... Real-time Processing Application s EC2 EC2 EC2 ... Log Producers Data Pipeline Architecture Pre-Kinesis
  • 34. ...streaming service? Data Pipeline Architecture Pre-Kinesis Dynamo DBS3 Lambda S3 HBase EC2 EC2 EC2 ... SQS SQS SQS ... Real-time Processing Application s EC2 EC2 EC2 ... Log Producers
  • 35. • Low latency • Horizontally scalable • Durable • High availability • Globally deployable Streaming Service Requirements
  • 36. Streaming Service Requirements And so we tried... ► Low latency ► Horizontally scalable ► Durable ► High availability ► Globally deployable
  • 38. Kafka ► Had Issues with • Allocating appropriate amount of disk space per topic • Setting up cross datacenter mirroring ► Couldn’t spare manpower required to manage Kafka ourselves
  • 39. ► Low ops burden ► Low latency ► Horizontally scalable ► Durable ► High availability ► Globally deployable Streaming Service Requirements
  • 40. Streaming Service Requirements And so next we tried... ► Low ops burden ► Low latency ► Horizontally scalable ► Durable ► High availability ► Globally deployable
  • 45. ► Across 5 regions... ► 155 streams ► 264 shards ► 2.5 billion events (100 million Kinesis records) / day Kinesis Usage
  • 46. ► Across 5 regions... ► 155 streams ► 264 shards ► 2.5 billion events (100 million Kinesis records) / day Kinesis Usage
  • 47. ► 15 minutes -> 3 seconds ► While simultaneously... Kinesis Wins
  • 48. ► 15 minutes -> 3 seconds ► While simultaneously... Kinesis Wins Saving money
  • 49. ► 15 minutes -> 3 seconds ► While simultaneously... Kinesis Wins Improving stability Saving money
  • 50. ► 300+ servers generating logs • 700 records per second per host ► Primarily written in Erlang • github.com/AdRoll/kinetic Kinesis Producers Our Setup
  • 51. ► Concatenate logs • Flush at max record size • Or 1 second has passed • Big cost saver ► Retries + Exponential Backoff (+ Jitter?) ► Randomized partition keys Kinesis Producers Best Practices log 1 log 2 log N logN+ 1logN +2 logN+ M Record (1 MB) logN+1,logN+2,...,lo gN+M Record (1 MB) log1,log2,..., logN
  • 52. ► 12 major Kinesis consumer applications ► github.com/awslabs/kinesis-storm-spout (storm) ► github.com/AdRoll/kinetic (erlang) Kinesis Consumers Our Setup
  • 53. Kinesis Consumers Storm spout library (Amazon’s version)
  • 54. Kinesis Consumers Storm spout library (Amazon’s version)
  • 55. Kinesis Consumers Storm spout library (Amazon’s version)
  • 56. Kinesis Consumers Storm spout library (Amazon’s version)
  • 57. Kinesis Consumers Storm spout library (AdRoll’s version)
  • 58. Kinesis Consumers Storm spout library (AdRoll’s version)
  • 59. Kinesis Consumers Storm spout library (AdRoll’s version)
  • 60. Kinesis Consumers Storm spout library (AdRoll’s version)
  • 61. Kinesis Consumers Storm spout library (AdRoll’s version)
  • 62. Kinesis Consumers Storm spout library (AdRoll’s version)
  • 65. ► Tough to have large number of applications per stream • Coordinate 0.2MB reads every N/5 seconds? • New applications? • Testing? Kinesis Pipeline Architecture Kinesis limitations
  • 67. Kinesis Pipeline Architecture Application Kinesis streams Dynamo DB S3 HBase EC2 EC2 EC2 ... Real-time Processing Applications EC2 EC2 EC2 ... Kinesis Kinesis Kinesis ... Kinesis KCL Log Producers
  • 70. Summary ► Consider moving Kinesis near your data producers ► Better (cheaper?) to work with smooth, constant load instead of bursty batch load ► Batch data inside Kinesis records ► Try to make asynchronous requests ► Put a KCL application in the middle of your streams to allow for multiple real-time applications
  • 71. Amazon Kinesis Streams Build your own custom applications that process or analyze streaming data Amazon Kinesis Firehose Easily load massive volumes of streaming data into Amazon S3 and Redshift Amazon Kinesis Analytics Easily analyze data streams using standard SQL queries Amazon Kinesis: Streaming data made easy Services make it easy to capture, deliver, and process streams on AWS Amazon Confidential