SlideShare a Scribd company logo
1 of 35
Download to read offline
SERHAT CAN • @SRHTCN
AWS Kinesis
•
Table of Contents
Streaming data?
Big Data Processing Approaches
AWS Kinesis Family
Amazon Kinesis Streams in detail
Amazon Kinesis Firehose
Amazon Kinesis Analytics
Streaming Data: Life As It Happens
After the event occurs -> at rest (batch)
As the event occurs -> in motion (streaming)
Big Data Processing Approaches
• Common Big Data Processing Approaches
• Query Engine Approach (Data Warehouse, SQL, NoSQL Databases)
• Repeated queries over the same well-structured data
• Pre-computations like indices and dimensional views improve query performance
• Batch Engines (Map-Reduce)
• The “query” is run on the data. There are no pre-computations
• Streaming Big Data Processing Approach
• Real-time response to content in semi-structured data streams
• Relatively simple computations on data (aggregates, filters, sliding window, etc.)
• Enables data lifecycle by moving data to different stores / open source systems
Kinesis Family
Amazon Kinesis Streams
• A fully managed service for real-time processing of
high- volume, streaming data.
• Kinesis can store and process terabytes of data an
hour from hundreds of thousands of sources.
• Data is replicated across multiple Availability Zones
to ensure high durability and availability.
Amazon Kinesis Streams Concepts
Shard
• Streams are made of Shards. A shard is the base
throughput unit of an Amazon Kinesis stream.
• One shard provides a capacity of 1MB/sec data input
and 2MB/sec data output.
• One shard can support up to 1000 PUT records per
second.
• You can monitor shard-level metrics in Amazon Kinesis
Streams
• Add or remove shards from your stream dynamically
as your data throughput changes by resharding the
stream.
Data Record
• A record is the unit of data stored in an Amazon Kinesis stream.
• A record is composed of a;
• partition key
• sequence number,
• data blob (the data you want to send)
• The maximum size of a data blob (the data payload after Base64-
decoding) is 1 megabyte (MB).
Partition Key
• Partition key is used to segregate and route data records to different
shards of a stream.
• A partition key is specified by your data producer while putting data
into an Amazon Kinesis stream.
• For example, assuming you have an Amazon Kinesis stream with two
shards (Shard 1 and Shard 2). You can configure your data producer
to use two partition keys (Key A and Key B) so that all data records
with Key A are added to Shard 1 and all data records with Key B are
added to Shard 2.
Sequence Number
• Each data record has a sequence number that is unique within its
shard.
• The sequence number is assigned by Streams after you write to the
stream with client.putRecords or client.putRecord.
• Sequence numbers for the same partition key generally increase over
time; the longer the time period between write requests, the larger the
sequence numbers become.
Resharding the Stream
• Streams supports resharding, which enables you to adjust the number of
shards in your stream in order to adapt to changes in the rate of data flow
through the stream.
• There are two types of resharding operations: shard split and shard
merge.
• Shard split: divide a single shard into two shards.
• Shard merge: combine two shards into a single shard.
Resharding the Stream
• Resharding is always “pairwise”: split into & merge more than two shards
in a single operation is NOT allowed
• Resharding is typically performed by an administrative application which
is distinct from the producer (put) applications, and the consumer (get)
applications
• The administrative application would also need a broader set of IAM
permissions for resharding
Splitting a Shard
• Specify how hash key values from the parent shard should be redistributed to the child shards
• The possible hash key values for a given shard constitute a set of ordered contiguous non-
negative integers. This range of possible hash key values is given by
shard.getHashKeyRange().getStartingHashKey();
shard.getHashKeyRange().getEndingHashKey();
• When you split the shard, you specify a value in this range.
• That hash key value and all higher hash key values are distributed to one of the child shards.
• All the lower hash key values are distributed to the other child shard.
Merging Two Shards
• In order to merge two shards, the shards must be adjacent.
• Two shards are considered adjacent if the union of the hash key ranges
for the two shards form a contiguous set with no gaps.
• To identify shards that are candidates for merging, you should filter out all
shards that are in a CLOSED state.
• Shards that are OPEN—that is, not CLOSED—have an ending sequence
number of null.
After Resharding
• After you call a resharding operation, either splitShard or mergeShards,
you need to wait for the stream to become active again. (like create)
• In the process of resharding, a parent shard transitions from an OPEN
state to a CLOSED state to an EXPIRED state.
• When all is done back to ACTIVE state.
Retention Period
• Data records are accessible for a default of 24 hours from the
time they are added to a stream
• Configurable in hourly increments
• From 24 to 168 hours (1 to 7 days)
Amazon Kinesis Producer Library (KPL)
• The KPL is an easy-to-use, highly configurable library that helps you
write to a Amazon Kinesis stream.
• Writes to one or more Amazon Kinesis streams with an automatic and configurable
retry mechanism
• Collects records and uses PutRecords to write multiple records to multiple shards
per request
• Aggregates user records to increase payload size and improve throughput
• Integrates seamlessly with the Amazon Kinesis Client Library (KCL) to de-aggregate
batched records on the consumer
• Submits Amazon CloudWatch metrics on your behalf to provide visibility into
producer performance
• Develop a consumer application for Amazon Kinesis Streams
• The KCL acts as an intermediary between your record processing logic and
Streams.
• KCL application instantiates a worker with configuration information, and then
uses a record processor to process the data received from an Amazon Kinesis
stream.
• You can run a KCL application on any number of instances. Multiple instances
of the same application coordinate on failures and load-balance dynamically.
• You can also have multiple KCL applications working on the same stream,
subject to throughput limits.
Amazon Kinesis Client Library (Life Saver)
Amazon Kinesis Client Library
• Connects to the stream
• Enumerates the shards
• Coordinates shard associations with other workers (if any)
• Instantiates a record processor for every shard it manages
• Pulls data records from the stream
• Pushes the records to the corresponding record processor
• Checkpoints processed records
• Balances shard-worker associations when the worker instance count changes
• Balances shard-worker associations when shards are split or merged
Amazon Kinesis Client Library
• KCL uses a unique Amazon DynamoDB table to keep
track of the application's state
• KCL creates the table with a provisioned throughput of
10 reads per second and 10 writes per second
• Each row in the DynamoDB table represents a shard that
is being processed by your application. The hash key for
the table is the shard ID.
Amazon Kinesis Client Library
• In addition to the shard ID, each row also includes the following data:
• checkpoint: The most recent checkpoint sequence number for the shard. This value is unique across
all shards in the stream.
• checkpointSubSequenceNumber: When using the Kinesis Producer Library's aggregation feature,
this is an extension to checkpoint that tracks individual user records within the Amazon Kinesis record.
• leaseCounter: Used for lease versioning so that workers can detect that their lease has been taken by
another worker.
• leaseKey: A unique identifier for a lease. Each lease is particular to a shard in the stream and is held
by one worker at a time.
• leaseOwner: The worker that is holding this lease.
• ownerSwitchesSinceCheckpoint: How many times this lease has changed workers since the last
time a checkpoint was written.
• parentShardId: Used to ensure that the parent shard is fully processed before processing starts on
the child shards. This ensures that records are processed in the same order they were put into the
stream.
Using Shard Iterators
• You retrieve records from the stream on a per-
shard basis. 

• AT_SEQUENCE_NUMBER
• AFTER_SEQUENCE_NUMBER
• AT_TIMESTAMP
• TRIM_HORIZON
• LATEST
Recovering from Failures
• Record Processor Failure
• The worker invokes record processor methods using Java ExecutorService tasks.
• If a task fails, the worker retains control of the shard that the record processor was
processing.
• The worker starts a new record processor task to process that shard
• Worker or Application Failure
• If a worker — or an instance of the Amazon Kinesis Streams application — fails,
you should detect and handle the situation.
Handling Duplicate Records
(Idempotency)
• There are two primary reasons why records may be
delivered more than one time to your Amazon
Kinesis Streams application:
• producer retries
• consumer retries
• Your application must anticipate and appropriately
handle processing individual records multiple times.
Pricing
• Shard Hour (1MB/second ingress, 2MB/second egress)$0.015
• PUT Payload Units, per 1,000,000 units $0.014
• Extended Data Retention (Up to 7 days), per Shard Hour $0.020
• DynamoDB price if you use KCL
Kafka vs. Kinesis Streams
• In Kafka you can configure, for each topic, the replication factor and how many replicas
have to acknowledge a message before is considered successful.So you can definitely
make it highly available.
• Amazon ensures that you won't lose data, but that comes with a performance cost. 

(messages are written to 3 different AZ’s synchronously)
• There are several benchmarks online comparing Kafka and Kinesis, but the result it's
always the same: you'll have a hard time to replicate Kafka's performance in Kinesis. At
least for a reasonable price.
• This is in part is because Kafka is insanely fast, but also because Kinesis writes each
message synchronously to 3 different machines. And this is quite costly in terms of
latency and throughput.
• Kafka is one of the preferred options for the Apache stream processing frameworks
• Unsurprisingly, Kinesis is really well integrated with other AWS services
DynamoDB Streams vs. Kinesis Streams
• DynamoDB Streams actions are similar to their
counterparts in Amazon Kinesis Streams, they
are not 100% identical.
• You can write applications for Amazon Kinesis
Streams using the Amazon Kinesis Client Library
(KCL).
• You can leverage the design patterns found
within the KCL to process DynamoDB Streams
shards and stream records. To do this, you use
the DynamoDB Streams Kinesis Adapter
SQS vs. Kinesis Streams
• Amazon Kinesis Streams enables real-time
processing of streaming big data.
• It provides ordering of records, as well as the
ability to read and/or replay records in the same
order to multiple Amazon Kinesis Applications.
• The Amazon Kinesis Client Library (KCL)
delivers all records for a given partition key to
the same record processor, making it easier to
build multiple applications reading from the same
Amazon Kinesis stream (for example, to perform
counting, aggregation, and filtering).
• Amazon Simple Queue Service (Amazon SQS)
offers a reliable, highly scalable hosted queue
for storing messages as they travel between
computers.
• Amazon SQS lets you easily move data between
distributed application components and helps
you build applications in which messages are
processed independently (with message-level
ack/fail semantics), such as automated
workflows.
Amazon Kinesis Firehose
Amazon Kinesis Firehose
• Amazon Kinesis Firehose is the easiest way to load streaming data into AWS.
• It can capture, transform, and load streaming data into Amazon Kinesis
Analytics, Amazon S3, Amazon Redshift, and Amazon Elasticsearch Service
• Fully managed service that automatically scales to match the throughput of
your data and requires no ongoing administration.
• It can also batch, compress, and encrypt the data before loading it,
minimizing the amount of storage used at the destination and increasing
security.
Amazon Kinesis Analytics
• Process streaming data in real time with standard SQL
• Query streaming data or build entire streaming applications using SQL, so
that you can gain actionable insights and respond to your business and
customer needs promptly.
• Scales automatically to match the volume and throughput rate of your
incoming data
• Only pay for the resources your queries consume. There is no minimum fee
or setup cost.
Amazon Kinesis Analytics
Step 1: Configure Input Stream Step 2: Write your SQL queries Step 3: Configure Output Stream
Thank you!
Time to show you real life examples
from OpsGenie

More Related Content

What's hot

AWS re:Invent 2016: Tableau Rules of Engagement in the Cloud (STG306)
AWS re:Invent 2016: Tableau Rules of Engagement in the Cloud (STG306)AWS re:Invent 2016: Tableau Rules of Engagement in the Cloud (STG306)
AWS re:Invent 2016: Tableau Rules of Engagement in the Cloud (STG306)Amazon Web Services
 
AWS re:Invent 2016: Case Study: How Startups like Mapbox, Ring, Hudl, and Oth...
AWS re:Invent 2016: Case Study: How Startups like Mapbox, Ring, Hudl, and Oth...AWS re:Invent 2016: Case Study: How Startups like Mapbox, Ring, Hudl, and Oth...
AWS re:Invent 2016: Case Study: How Startups like Mapbox, Ring, Hudl, and Oth...Amazon Web Services
 
Co 4, session 2, aws analytics services
Co 4, session 2, aws analytics servicesCo 4, session 2, aws analytics services
Co 4, session 2, aws analytics servicesm vaishnavi
 
Log Analytics with Amazon Elasticsearch Service & Kibana
Log Analytics with Amazon Elasticsearch Service & KibanaLog Analytics with Amazon Elasticsearch Service & Kibana
Log Analytics with Amazon Elasticsearch Service & KibanaAmazon Web Services
 
Real-Time Log Analytics using Amazon Kinesis and Amazon Elasticsearch Service...
Real-Time Log Analytics using Amazon Kinesis and Amazon Elasticsearch Service...Real-Time Log Analytics using Amazon Kinesis and Amazon Elasticsearch Service...
Real-Time Log Analytics using Amazon Kinesis and Amazon Elasticsearch Service...Amazon Web Services
 
Supercharging the Value of Your Data with Amazon S3
Supercharging the Value of Your Data with Amazon S3Supercharging the Value of Your Data with Amazon S3
Supercharging the Value of Your Data with Amazon S3Amazon Web Services
 
Deep Dive on Log Analytics with Elasticsearch Service
Deep Dive on Log Analytics with Elasticsearch ServiceDeep Dive on Log Analytics with Elasticsearch Service
Deep Dive on Log Analytics with Elasticsearch ServiceAmazon Web Services
 
BDA402 Deep Dive: Log analytics with Amazon Elasticsearch Service
BDA402 Deep Dive: Log analytics with Amazon Elasticsearch ServiceBDA402 Deep Dive: Log analytics with Amazon Elasticsearch Service
BDA402 Deep Dive: Log analytics with Amazon Elasticsearch ServiceAmazon Web Services
 
Introduction to Real-time, Streaming Data and Amazon Kinesis. Streaming Data ...
Introduction to Real-time, Streaming Data and Amazon Kinesis. Streaming Data ...Introduction to Real-time, Streaming Data and Amazon Kinesis. Streaming Data ...
Introduction to Real-time, Streaming Data and Amazon Kinesis. Streaming Data ...Amazon Web Services
 
Introduction to Amazon Kinesis Analytics
Introduction to Amazon Kinesis AnalyticsIntroduction to Amazon Kinesis Analytics
Introduction to Amazon Kinesis AnalyticsAmazon Web Services
 
AWS Data Transfer Services: Data Ingest Strategies Into the AWS Cloud
AWS Data Transfer Services: Data Ingest Strategies Into the AWS CloudAWS Data Transfer Services: Data Ingest Strategies Into the AWS Cloud
AWS Data Transfer Services: Data Ingest Strategies Into the AWS CloudAmazon Web Services
 
Cloud Backup & Recovery Options with AWS Partner Solutions - June 2017 AWS On...
Cloud Backup & Recovery Options with AWS Partner Solutions - June 2017 AWS On...Cloud Backup & Recovery Options with AWS Partner Solutions - June 2017 AWS On...
Cloud Backup & Recovery Options with AWS Partner Solutions - June 2017 AWS On...Amazon Web Services
 
Big Data Architectural Patterns and Best Practices on AWS
Big Data Architectural Patterns and Best Practices on AWSBig Data Architectural Patterns and Best Practices on AWS
Big Data Architectural Patterns and Best Practices on AWSAmazon Web Services
 
SRV420 Analyzing Streaming Data in Real-time with Amazon Kinesis
SRV420 Analyzing Streaming Data in Real-time with Amazon KinesisSRV420 Analyzing Streaming Data in Real-time with Amazon Kinesis
SRV420 Analyzing Streaming Data in Real-time with Amazon KinesisAmazon Web Services
 
AWS re:Invent 2016: Streaming ETL for RDS and DynamoDB (DAT315)
AWS re:Invent 2016: Streaming ETL for RDS and DynamoDB (DAT315)AWS re:Invent 2016: Streaming ETL for RDS and DynamoDB (DAT315)
AWS re:Invent 2016: Streaming ETL for RDS and DynamoDB (DAT315)Amazon Web Services
 
Serverless Streaming Data Processing using Amazon Kinesis Analytics
Serverless Streaming Data Processing using Amazon Kinesis AnalyticsServerless Streaming Data Processing using Amazon Kinesis Analytics
Serverless Streaming Data Processing using Amazon Kinesis AnalyticsAmazon Web Services
 
Real time sentiment analysis using twitter stream api & aws kinesis
Real time sentiment analysis using twitter stream api & aws kinesisReal time sentiment analysis using twitter stream api & aws kinesis
Real time sentiment analysis using twitter stream api & aws kinesisArmando Padilla
 
Real-Time Streaming Data Solution on AWS with Beeswax
Real-Time Streaming Data Solution on AWS with BeeswaxReal-Time Streaming Data Solution on AWS with Beeswax
Real-Time Streaming Data Solution on AWS with BeeswaxAmazon Web Services
 
Real-time Streaming and Querying with Amazon Kinesis and Amazon Elastic MapRe...
Real-time Streaming and Querying with Amazon Kinesis and Amazon Elastic MapRe...Real-time Streaming and Querying with Amazon Kinesis and Amazon Elastic MapRe...
Real-time Streaming and Querying with Amazon Kinesis and Amazon Elastic MapRe...Amazon Web Services
 

What's hot (20)

AWS re:Invent 2016: Tableau Rules of Engagement in the Cloud (STG306)
AWS re:Invent 2016: Tableau Rules of Engagement in the Cloud (STG306)AWS re:Invent 2016: Tableau Rules of Engagement in the Cloud (STG306)
AWS re:Invent 2016: Tableau Rules of Engagement in the Cloud (STG306)
 
AWS Kinesis Streams
AWS Kinesis StreamsAWS Kinesis Streams
AWS Kinesis Streams
 
AWS re:Invent 2016: Case Study: How Startups like Mapbox, Ring, Hudl, and Oth...
AWS re:Invent 2016: Case Study: How Startups like Mapbox, Ring, Hudl, and Oth...AWS re:Invent 2016: Case Study: How Startups like Mapbox, Ring, Hudl, and Oth...
AWS re:Invent 2016: Case Study: How Startups like Mapbox, Ring, Hudl, and Oth...
 
Co 4, session 2, aws analytics services
Co 4, session 2, aws analytics servicesCo 4, session 2, aws analytics services
Co 4, session 2, aws analytics services
 
Log Analytics with Amazon Elasticsearch Service & Kibana
Log Analytics with Amazon Elasticsearch Service & KibanaLog Analytics with Amazon Elasticsearch Service & Kibana
Log Analytics with Amazon Elasticsearch Service & Kibana
 
Real-Time Log Analytics using Amazon Kinesis and Amazon Elasticsearch Service...
Real-Time Log Analytics using Amazon Kinesis and Amazon Elasticsearch Service...Real-Time Log Analytics using Amazon Kinesis and Amazon Elasticsearch Service...
Real-Time Log Analytics using Amazon Kinesis and Amazon Elasticsearch Service...
 
Supercharging the Value of Your Data with Amazon S3
Supercharging the Value of Your Data with Amazon S3Supercharging the Value of Your Data with Amazon S3
Supercharging the Value of Your Data with Amazon S3
 
Deep Dive on Log Analytics with Elasticsearch Service
Deep Dive on Log Analytics with Elasticsearch ServiceDeep Dive on Log Analytics with Elasticsearch Service
Deep Dive on Log Analytics with Elasticsearch Service
 
BDA402 Deep Dive: Log analytics with Amazon Elasticsearch Service
BDA402 Deep Dive: Log analytics with Amazon Elasticsearch ServiceBDA402 Deep Dive: Log analytics with Amazon Elasticsearch Service
BDA402 Deep Dive: Log analytics with Amazon Elasticsearch Service
 
Introduction to Real-time, Streaming Data and Amazon Kinesis. Streaming Data ...
Introduction to Real-time, Streaming Data and Amazon Kinesis. Streaming Data ...Introduction to Real-time, Streaming Data and Amazon Kinesis. Streaming Data ...
Introduction to Real-time, Streaming Data and Amazon Kinesis. Streaming Data ...
 
Introduction to Amazon Kinesis Analytics
Introduction to Amazon Kinesis AnalyticsIntroduction to Amazon Kinesis Analytics
Introduction to Amazon Kinesis Analytics
 
AWS Data Transfer Services: Data Ingest Strategies Into the AWS Cloud
AWS Data Transfer Services: Data Ingest Strategies Into the AWS CloudAWS Data Transfer Services: Data Ingest Strategies Into the AWS Cloud
AWS Data Transfer Services: Data Ingest Strategies Into the AWS Cloud
 
Cloud Backup & Recovery Options with AWS Partner Solutions - June 2017 AWS On...
Cloud Backup & Recovery Options with AWS Partner Solutions - June 2017 AWS On...Cloud Backup & Recovery Options with AWS Partner Solutions - June 2017 AWS On...
Cloud Backup & Recovery Options with AWS Partner Solutions - June 2017 AWS On...
 
Big Data Architectural Patterns and Best Practices on AWS
Big Data Architectural Patterns and Best Practices on AWSBig Data Architectural Patterns and Best Practices on AWS
Big Data Architectural Patterns and Best Practices on AWS
 
SRV420 Analyzing Streaming Data in Real-time with Amazon Kinesis
SRV420 Analyzing Streaming Data in Real-time with Amazon KinesisSRV420 Analyzing Streaming Data in Real-time with Amazon Kinesis
SRV420 Analyzing Streaming Data in Real-time with Amazon Kinesis
 
AWS re:Invent 2016: Streaming ETL for RDS and DynamoDB (DAT315)
AWS re:Invent 2016: Streaming ETL for RDS and DynamoDB (DAT315)AWS re:Invent 2016: Streaming ETL for RDS and DynamoDB (DAT315)
AWS re:Invent 2016: Streaming ETL for RDS and DynamoDB (DAT315)
 
Serverless Streaming Data Processing using Amazon Kinesis Analytics
Serverless Streaming Data Processing using Amazon Kinesis AnalyticsServerless Streaming Data Processing using Amazon Kinesis Analytics
Serverless Streaming Data Processing using Amazon Kinesis Analytics
 
Real time sentiment analysis using twitter stream api & aws kinesis
Real time sentiment analysis using twitter stream api & aws kinesisReal time sentiment analysis using twitter stream api & aws kinesis
Real time sentiment analysis using twitter stream api & aws kinesis
 
Real-Time Streaming Data Solution on AWS with Beeswax
Real-Time Streaming Data Solution on AWS with BeeswaxReal-Time Streaming Data Solution on AWS with Beeswax
Real-Time Streaming Data Solution on AWS with Beeswax
 
Real-time Streaming and Querying with Amazon Kinesis and Amazon Elastic MapRe...
Real-time Streaming and Querying with Amazon Kinesis and Amazon Elastic MapRe...Real-time Streaming and Querying with Amazon Kinesis and Amazon Elastic MapRe...
Real-time Streaming and Querying with Amazon Kinesis and Amazon Elastic MapRe...
 

Similar to AWS Kinesis Streaming Data Guide

(BDT403) Best Practices for Building Real-time Streaming Applications with Am...
(BDT403) Best Practices for Building Real-time Streaming Applications with Am...(BDT403) Best Practices for Building Real-time Streaming Applications with Am...
(BDT403) Best Practices for Building Real-time Streaming Applications with Am...Amazon Web Services
 
Deep dive and best practices on real time streaming applications nyc-loft_oct...
Deep dive and best practices on real time streaming applications nyc-loft_oct...Deep dive and best practices on real time streaming applications nyc-loft_oct...
Deep dive and best practices on real time streaming applications nyc-loft_oct...Amazon Web Services
 
AWS April 2016 Webinar Series - Getting Started with Real-Time Data Analytics...
AWS April 2016 Webinar Series - Getting Started with Real-Time Data Analytics...AWS April 2016 Webinar Series - Getting Started with Real-Time Data Analytics...
AWS April 2016 Webinar Series - Getting Started with Real-Time Data Analytics...Amazon Web Services
 
(SDD405) Amazon Kinesis Deep Dive | AWS re:Invent 2014
(SDD405) Amazon Kinesis Deep Dive | AWS re:Invent 2014(SDD405) Amazon Kinesis Deep Dive | AWS re:Invent 2014
(SDD405) Amazon Kinesis Deep Dive | AWS re:Invent 2014Amazon Web Services
 
Deep Dive and Best Practices for Real Time Streaming Applications
Deep Dive and Best Practices for Real Time Streaming ApplicationsDeep Dive and Best Practices for Real Time Streaming Applications
Deep Dive and Best Practices for Real Time Streaming ApplicationsAmazon Web Services
 
Structure Data 2014: BIG DATA ANALYTICS RE-INVENTED, Ryan Waite
Structure Data 2014: BIG DATA ANALYTICS RE-INVENTED, Ryan WaiteStructure Data 2014: BIG DATA ANALYTICS RE-INVENTED, Ryan Waite
Structure Data 2014: BIG DATA ANALYTICS RE-INVENTED, Ryan WaiteGigaom
 
AWS Webcast - Managing Big Data in the AWS Cloud_20140924
AWS Webcast - Managing Big Data in the AWS Cloud_20140924AWS Webcast - Managing Big Data in the AWS Cloud_20140924
AWS Webcast - Managing Big Data in the AWS Cloud_20140924Amazon Web Services
 
Log Analytics with Amazon Elasticsearch Service and Amazon Kinesis - March 20...
Log Analytics with Amazon Elasticsearch Service and Amazon Kinesis - March 20...Log Analytics with Amazon Elasticsearch Service and Amazon Kinesis - March 20...
Log Analytics with Amazon Elasticsearch Service and Amazon Kinesis - March 20...Amazon Web Services
 
AWS를 활용한 첫 빅데이터 프로젝트 시작하기(김일호)- AWS 웨비나 시리즈 2015
AWS를 활용한 첫 빅데이터 프로젝트 시작하기(김일호)- AWS 웨비나 시리즈 2015AWS를 활용한 첫 빅데이터 프로젝트 시작하기(김일호)- AWS 웨비나 시리즈 2015
AWS를 활용한 첫 빅데이터 프로젝트 시작하기(김일호)- AWS 웨비나 시리즈 2015Amazon Web Services Korea
 
Highlights of AWS ReInvent 2023 (Announcements and Best Practices)
Highlights of AWS ReInvent 2023 (Announcements and Best Practices)Highlights of AWS ReInvent 2023 (Announcements and Best Practices)
Highlights of AWS ReInvent 2023 (Announcements and Best Practices)Emprovise
 
AWS re:Invent 2016: Beeswax: Building a Real-Time Streaming Data Platform on ...
AWS re:Invent 2016: Beeswax: Building a Real-Time Streaming Data Platform on ...AWS re:Invent 2016: Beeswax: Building a Real-Time Streaming Data Platform on ...
AWS re:Invent 2016: Beeswax: Building a Real-Time Streaming Data Platform on ...Amazon Web Services
 
AWS Community Nordics Virtual Meetup
AWS Community Nordics Virtual MeetupAWS Community Nordics Virtual Meetup
AWS Community Nordics Virtual MeetupAnahit Pogosova
 
Real-time Data Processing Using AWS Lambda
Real-time Data Processing Using AWS LambdaReal-time Data Processing Using AWS Lambda
Real-time Data Processing Using AWS LambdaAmazon Web Services
 
Data & Analytics - Session 2 - Introducing Amazon Redshift
Data & Analytics - Session 2 - Introducing Amazon RedshiftData & Analytics - Session 2 - Introducing Amazon Redshift
Data & Analytics - Session 2 - Introducing Amazon RedshiftAmazon Web Services
 
Optimizing Your Amazon Redshift Cluster for Peak Performance - AWS Summit Syd...
Optimizing Your Amazon Redshift Cluster for Peak Performance - AWS Summit Syd...Optimizing Your Amazon Redshift Cluster for Peak Performance - AWS Summit Syd...
Optimizing Your Amazon Redshift Cluster for Peak Performance - AWS Summit Syd...Amazon Web Services
 
Getting Started with Real-time Analytics
Getting Started with Real-time AnalyticsGetting Started with Real-time Analytics
Getting Started with Real-time AnalyticsAmazon Web Services
 
BDA402 Deep Dive: Log Analytics with Amazon Elasticsearch Service
BDA402 Deep Dive: Log Analytics with Amazon Elasticsearch ServiceBDA402 Deep Dive: Log Analytics with Amazon Elasticsearch Service
BDA402 Deep Dive: Log Analytics with Amazon Elasticsearch ServiceAmazon Web Services
 
Optimising your Amazon Redshift Cluster for Peak Performance
Optimising your Amazon Redshift Cluster for Peak PerformanceOptimising your Amazon Redshift Cluster for Peak Performance
Optimising your Amazon Redshift Cluster for Peak PerformanceAmazon Web Services
 

Similar to AWS Kinesis Streaming Data Guide (20)

(BDT403) Best Practices for Building Real-time Streaming Applications with Am...
(BDT403) Best Practices for Building Real-time Streaming Applications with Am...(BDT403) Best Practices for Building Real-time Streaming Applications with Am...
(BDT403) Best Practices for Building Real-time Streaming Applications with Am...
 
Deep dive and best practices on real time streaming applications nyc-loft_oct...
Deep dive and best practices on real time streaming applications nyc-loft_oct...Deep dive and best practices on real time streaming applications nyc-loft_oct...
Deep dive and best practices on real time streaming applications nyc-loft_oct...
 
AWS April 2016 Webinar Series - Getting Started with Real-Time Data Analytics...
AWS April 2016 Webinar Series - Getting Started with Real-Time Data Analytics...AWS April 2016 Webinar Series - Getting Started with Real-Time Data Analytics...
AWS April 2016 Webinar Series - Getting Started with Real-Time Data Analytics...
 
(SDD405) Amazon Kinesis Deep Dive | AWS re:Invent 2014
(SDD405) Amazon Kinesis Deep Dive | AWS re:Invent 2014(SDD405) Amazon Kinesis Deep Dive | AWS re:Invent 2014
(SDD405) Amazon Kinesis Deep Dive | AWS re:Invent 2014
 
Deep Dive and Best Practices for Real Time Streaming Applications
Deep Dive and Best Practices for Real Time Streaming ApplicationsDeep Dive and Best Practices for Real Time Streaming Applications
Deep Dive and Best Practices for Real Time Streaming Applications
 
Structure Data 2014: BIG DATA ANALYTICS RE-INVENTED, Ryan Waite
Structure Data 2014: BIG DATA ANALYTICS RE-INVENTED, Ryan WaiteStructure Data 2014: BIG DATA ANALYTICS RE-INVENTED, Ryan Waite
Structure Data 2014: BIG DATA ANALYTICS RE-INVENTED, Ryan Waite
 
AWS Webcast - Managing Big Data in the AWS Cloud_20140924
AWS Webcast - Managing Big Data in the AWS Cloud_20140924AWS Webcast - Managing Big Data in the AWS Cloud_20140924
AWS Webcast - Managing Big Data in the AWS Cloud_20140924
 
Log Analytics with Amazon Elasticsearch Service and Amazon Kinesis - March 20...
Log Analytics with Amazon Elasticsearch Service and Amazon Kinesis - March 20...Log Analytics with Amazon Elasticsearch Service and Amazon Kinesis - March 20...
Log Analytics with Amazon Elasticsearch Service and Amazon Kinesis - March 20...
 
AWS를 활용한 첫 빅데이터 프로젝트 시작하기(김일호)- AWS 웨비나 시리즈 2015
AWS를 활용한 첫 빅데이터 프로젝트 시작하기(김일호)- AWS 웨비나 시리즈 2015AWS를 활용한 첫 빅데이터 프로젝트 시작하기(김일호)- AWS 웨비나 시리즈 2015
AWS를 활용한 첫 빅데이터 프로젝트 시작하기(김일호)- AWS 웨비나 시리즈 2015
 
Highlights of AWS ReInvent 2023 (Announcements and Best Practices)
Highlights of AWS ReInvent 2023 (Announcements and Best Practices)Highlights of AWS ReInvent 2023 (Announcements and Best Practices)
Highlights of AWS ReInvent 2023 (Announcements and Best Practices)
 
AWS re:Invent 2016: Beeswax: Building a Real-Time Streaming Data Platform on ...
AWS re:Invent 2016: Beeswax: Building a Real-Time Streaming Data Platform on ...AWS re:Invent 2016: Beeswax: Building a Real-Time Streaming Data Platform on ...
AWS re:Invent 2016: Beeswax: Building a Real-Time Streaming Data Platform on ...
 
Amazon Redshift Deep Dive
Amazon Redshift Deep Dive Amazon Redshift Deep Dive
Amazon Redshift Deep Dive
 
AWS Community Nordics Virtual Meetup
AWS Community Nordics Virtual MeetupAWS Community Nordics Virtual Meetup
AWS Community Nordics Virtual Meetup
 
Real-time Data Processing Using AWS Lambda
Real-time Data Processing Using AWS LambdaReal-time Data Processing Using AWS Lambda
Real-time Data Processing Using AWS Lambda
 
Data & Analytics - Session 2 - Introducing Amazon Redshift
Data & Analytics - Session 2 - Introducing Amazon RedshiftData & Analytics - Session 2 - Introducing Amazon Redshift
Data & Analytics - Session 2 - Introducing Amazon Redshift
 
Optimizing Your Amazon Redshift Cluster for Peak Performance - AWS Summit Syd...
Optimizing Your Amazon Redshift Cluster for Peak Performance - AWS Summit Syd...Optimizing Your Amazon Redshift Cluster for Peak Performance - AWS Summit Syd...
Optimizing Your Amazon Redshift Cluster for Peak Performance - AWS Summit Syd...
 
Getting Started with Real-time Analytics
Getting Started with Real-time AnalyticsGetting Started with Real-time Analytics
Getting Started with Real-time Analytics
 
Bigdata meetup dwarak_realtime_score_app
Bigdata meetup dwarak_realtime_score_appBigdata meetup dwarak_realtime_score_app
Bigdata meetup dwarak_realtime_score_app
 
BDA402 Deep Dive: Log Analytics with Amazon Elasticsearch Service
BDA402 Deep Dive: Log Analytics with Amazon Elasticsearch ServiceBDA402 Deep Dive: Log Analytics with Amazon Elasticsearch Service
BDA402 Deep Dive: Log Analytics with Amazon Elasticsearch Service
 
Optimising your Amazon Redshift Cluster for Peak Performance
Optimising your Amazon Redshift Cluster for Peak PerformanceOptimising your Amazon Redshift Cluster for Peak Performance
Optimising your Amazon Redshift Cluster for Peak Performance
 

Recently uploaded

Unveiling the Future: Sylius 2.0 New Features
Unveiling the Future: Sylius 2.0 New FeaturesUnveiling the Future: Sylius 2.0 New Features
Unveiling the Future: Sylius 2.0 New FeaturesŁukasz Chruściel
 
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...Matt Ray
 
Machine Learning Software Engineering Patterns and Their Engineering
Machine Learning Software Engineering Patterns and Their EngineeringMachine Learning Software Engineering Patterns and Their Engineering
Machine Learning Software Engineering Patterns and Their EngineeringHironori Washizaki
 
Unveiling Design Patterns: A Visual Guide with UML Diagrams
Unveiling Design Patterns: A Visual Guide with UML DiagramsUnveiling Design Patterns: A Visual Guide with UML Diagrams
Unveiling Design Patterns: A Visual Guide with UML DiagramsAhmed Mohamed
 
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptxKnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptxTier1 app
 
cpct NetworkING BASICS AND NETWORK TOOL.ppt
cpct NetworkING BASICS AND NETWORK TOOL.pptcpct NetworkING BASICS AND NETWORK TOOL.ppt
cpct NetworkING BASICS AND NETWORK TOOL.pptrcbcrtm
 
Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...
Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...
Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...Natan Silnitsky
 
SpotFlow: Tracking Method Calls and States at Runtime
SpotFlow: Tracking Method Calls and States at RuntimeSpotFlow: Tracking Method Calls and States at Runtime
SpotFlow: Tracking Method Calls and States at Runtimeandrehoraa
 
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...confluent
 
A healthy diet for your Java application Devoxx France.pdf
A healthy diet for your Java application Devoxx France.pdfA healthy diet for your Java application Devoxx France.pdf
A healthy diet for your Java application Devoxx France.pdfMarharyta Nedzelska
 
Introduction Computer Science - Software Design.pdf
Introduction Computer Science - Software Design.pdfIntroduction Computer Science - Software Design.pdf
Introduction Computer Science - Software Design.pdfFerryKemperman
 
Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...
Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...
Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...Angel Borroy López
 
VK Business Profile - provides IT solutions and Web Development
VK Business Profile - provides IT solutions and Web DevelopmentVK Business Profile - provides IT solutions and Web Development
VK Business Profile - provides IT solutions and Web Developmentvyaparkranti
 
Cloud Data Center Network Construction - IEEE
Cloud Data Center Network Construction - IEEECloud Data Center Network Construction - IEEE
Cloud Data Center Network Construction - IEEEVICTOR MAESTRE RAMIREZ
 
Intelligent Home Wi-Fi Solutions | ThinkPalm
Intelligent Home Wi-Fi Solutions | ThinkPalmIntelligent Home Wi-Fi Solutions | ThinkPalm
Intelligent Home Wi-Fi Solutions | ThinkPalmSujith Sukumaran
 
Odoo 14 - eLearning Module In Odoo 14 Enterprise
Odoo 14 - eLearning Module In Odoo 14 EnterpriseOdoo 14 - eLearning Module In Odoo 14 Enterprise
Odoo 14 - eLearning Module In Odoo 14 Enterprisepreethippts
 
Exploring Selenium_Appium Frameworks for Seamless Integration with HeadSpin.pdf
Exploring Selenium_Appium Frameworks for Seamless Integration with HeadSpin.pdfExploring Selenium_Appium Frameworks for Seamless Integration with HeadSpin.pdf
Exploring Selenium_Appium Frameworks for Seamless Integration with HeadSpin.pdfkalichargn70th171
 
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样umasea
 

Recently uploaded (20)

Unveiling the Future: Sylius 2.0 New Features
Unveiling the Future: Sylius 2.0 New FeaturesUnveiling the Future: Sylius 2.0 New Features
Unveiling the Future: Sylius 2.0 New Features
 
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...
 
Machine Learning Software Engineering Patterns and Their Engineering
Machine Learning Software Engineering Patterns and Their EngineeringMachine Learning Software Engineering Patterns and Their Engineering
Machine Learning Software Engineering Patterns and Their Engineering
 
Unveiling Design Patterns: A Visual Guide with UML Diagrams
Unveiling Design Patterns: A Visual Guide with UML DiagramsUnveiling Design Patterns: A Visual Guide with UML Diagrams
Unveiling Design Patterns: A Visual Guide with UML Diagrams
 
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptxKnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
 
cpct NetworkING BASICS AND NETWORK TOOL.ppt
cpct NetworkING BASICS AND NETWORK TOOL.pptcpct NetworkING BASICS AND NETWORK TOOL.ppt
cpct NetworkING BASICS AND NETWORK TOOL.ppt
 
Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...
Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...
Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...
 
Odoo Development Company in India | Devintelle Consulting Service
Odoo Development Company in India | Devintelle Consulting ServiceOdoo Development Company in India | Devintelle Consulting Service
Odoo Development Company in India | Devintelle Consulting Service
 
SpotFlow: Tracking Method Calls and States at Runtime
SpotFlow: Tracking Method Calls and States at RuntimeSpotFlow: Tracking Method Calls and States at Runtime
SpotFlow: Tracking Method Calls and States at Runtime
 
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
 
A healthy diet for your Java application Devoxx France.pdf
A healthy diet for your Java application Devoxx France.pdfA healthy diet for your Java application Devoxx France.pdf
A healthy diet for your Java application Devoxx France.pdf
 
Introduction Computer Science - Software Design.pdf
Introduction Computer Science - Software Design.pdfIntroduction Computer Science - Software Design.pdf
Introduction Computer Science - Software Design.pdf
 
Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...
Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...
Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...
 
VK Business Profile - provides IT solutions and Web Development
VK Business Profile - provides IT solutions and Web DevelopmentVK Business Profile - provides IT solutions and Web Development
VK Business Profile - provides IT solutions and Web Development
 
Cloud Data Center Network Construction - IEEE
Cloud Data Center Network Construction - IEEECloud Data Center Network Construction - IEEE
Cloud Data Center Network Construction - IEEE
 
Intelligent Home Wi-Fi Solutions | ThinkPalm
Intelligent Home Wi-Fi Solutions | ThinkPalmIntelligent Home Wi-Fi Solutions | ThinkPalm
Intelligent Home Wi-Fi Solutions | ThinkPalm
 
Odoo 14 - eLearning Module In Odoo 14 Enterprise
Odoo 14 - eLearning Module In Odoo 14 EnterpriseOdoo 14 - eLearning Module In Odoo 14 Enterprise
Odoo 14 - eLearning Module In Odoo 14 Enterprise
 
Exploring Selenium_Appium Frameworks for Seamless Integration with HeadSpin.pdf
Exploring Selenium_Appium Frameworks for Seamless Integration with HeadSpin.pdfExploring Selenium_Appium Frameworks for Seamless Integration with HeadSpin.pdf
Exploring Selenium_Appium Frameworks for Seamless Integration with HeadSpin.pdf
 
Advantages of Odoo ERP 17 for Your Business
Advantages of Odoo ERP 17 for Your BusinessAdvantages of Odoo ERP 17 for Your Business
Advantages of Odoo ERP 17 for Your Business
 
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样
 

AWS Kinesis Streaming Data Guide

  • 1. SERHAT CAN • @SRHTCN AWS Kinesis •
  • 2.
  • 3. Table of Contents Streaming data? Big Data Processing Approaches AWS Kinesis Family Amazon Kinesis Streams in detail Amazon Kinesis Firehose Amazon Kinesis Analytics
  • 4. Streaming Data: Life As It Happens After the event occurs -> at rest (batch) As the event occurs -> in motion (streaming)
  • 5. Big Data Processing Approaches • Common Big Data Processing Approaches • Query Engine Approach (Data Warehouse, SQL, NoSQL Databases) • Repeated queries over the same well-structured data • Pre-computations like indices and dimensional views improve query performance • Batch Engines (Map-Reduce) • The “query” is run on the data. There are no pre-computations • Streaming Big Data Processing Approach • Real-time response to content in semi-structured data streams • Relatively simple computations on data (aggregates, filters, sliding window, etc.) • Enables data lifecycle by moving data to different stores / open source systems
  • 7. Amazon Kinesis Streams • A fully managed service for real-time processing of high- volume, streaming data. • Kinesis can store and process terabytes of data an hour from hundreds of thousands of sources. • Data is replicated across multiple Availability Zones to ensure high durability and availability.
  • 9. Shard • Streams are made of Shards. A shard is the base throughput unit of an Amazon Kinesis stream. • One shard provides a capacity of 1MB/sec data input and 2MB/sec data output. • One shard can support up to 1000 PUT records per second. • You can monitor shard-level metrics in Amazon Kinesis Streams • Add or remove shards from your stream dynamically as your data throughput changes by resharding the stream.
  • 10. Data Record • A record is the unit of data stored in an Amazon Kinesis stream. • A record is composed of a; • partition key • sequence number, • data blob (the data you want to send) • The maximum size of a data blob (the data payload after Base64- decoding) is 1 megabyte (MB).
  • 11. Partition Key • Partition key is used to segregate and route data records to different shards of a stream. • A partition key is specified by your data producer while putting data into an Amazon Kinesis stream. • For example, assuming you have an Amazon Kinesis stream with two shards (Shard 1 and Shard 2). You can configure your data producer to use two partition keys (Key A and Key B) so that all data records with Key A are added to Shard 1 and all data records with Key B are added to Shard 2.
  • 12. Sequence Number • Each data record has a sequence number that is unique within its shard. • The sequence number is assigned by Streams after you write to the stream with client.putRecords or client.putRecord. • Sequence numbers for the same partition key generally increase over time; the longer the time period between write requests, the larger the sequence numbers become.
  • 13. Resharding the Stream • Streams supports resharding, which enables you to adjust the number of shards in your stream in order to adapt to changes in the rate of data flow through the stream. • There are two types of resharding operations: shard split and shard merge. • Shard split: divide a single shard into two shards. • Shard merge: combine two shards into a single shard.
  • 14. Resharding the Stream • Resharding is always “pairwise”: split into & merge more than two shards in a single operation is NOT allowed • Resharding is typically performed by an administrative application which is distinct from the producer (put) applications, and the consumer (get) applications • The administrative application would also need a broader set of IAM permissions for resharding
  • 15. Splitting a Shard • Specify how hash key values from the parent shard should be redistributed to the child shards • The possible hash key values for a given shard constitute a set of ordered contiguous non- negative integers. This range of possible hash key values is given by shard.getHashKeyRange().getStartingHashKey(); shard.getHashKeyRange().getEndingHashKey(); • When you split the shard, you specify a value in this range. • That hash key value and all higher hash key values are distributed to one of the child shards. • All the lower hash key values are distributed to the other child shard.
  • 16. Merging Two Shards • In order to merge two shards, the shards must be adjacent. • Two shards are considered adjacent if the union of the hash key ranges for the two shards form a contiguous set with no gaps. • To identify shards that are candidates for merging, you should filter out all shards that are in a CLOSED state. • Shards that are OPEN—that is, not CLOSED—have an ending sequence number of null.
  • 17. After Resharding • After you call a resharding operation, either splitShard or mergeShards, you need to wait for the stream to become active again. (like create) • In the process of resharding, a parent shard transitions from an OPEN state to a CLOSED state to an EXPIRED state. • When all is done back to ACTIVE state.
  • 18. Retention Period • Data records are accessible for a default of 24 hours from the time they are added to a stream • Configurable in hourly increments • From 24 to 168 hours (1 to 7 days)
  • 19. Amazon Kinesis Producer Library (KPL) • The KPL is an easy-to-use, highly configurable library that helps you write to a Amazon Kinesis stream. • Writes to one or more Amazon Kinesis streams with an automatic and configurable retry mechanism • Collects records and uses PutRecords to write multiple records to multiple shards per request • Aggregates user records to increase payload size and improve throughput • Integrates seamlessly with the Amazon Kinesis Client Library (KCL) to de-aggregate batched records on the consumer • Submits Amazon CloudWatch metrics on your behalf to provide visibility into producer performance
  • 20. • Develop a consumer application for Amazon Kinesis Streams • The KCL acts as an intermediary between your record processing logic and Streams. • KCL application instantiates a worker with configuration information, and then uses a record processor to process the data received from an Amazon Kinesis stream. • You can run a KCL application on any number of instances. Multiple instances of the same application coordinate on failures and load-balance dynamically. • You can also have multiple KCL applications working on the same stream, subject to throughput limits. Amazon Kinesis Client Library (Life Saver)
  • 21. Amazon Kinesis Client Library • Connects to the stream • Enumerates the shards • Coordinates shard associations with other workers (if any) • Instantiates a record processor for every shard it manages • Pulls data records from the stream • Pushes the records to the corresponding record processor • Checkpoints processed records • Balances shard-worker associations when the worker instance count changes • Balances shard-worker associations when shards are split or merged
  • 22. Amazon Kinesis Client Library • KCL uses a unique Amazon DynamoDB table to keep track of the application's state • KCL creates the table with a provisioned throughput of 10 reads per second and 10 writes per second • Each row in the DynamoDB table represents a shard that is being processed by your application. The hash key for the table is the shard ID.
  • 23. Amazon Kinesis Client Library • In addition to the shard ID, each row also includes the following data: • checkpoint: The most recent checkpoint sequence number for the shard. This value is unique across all shards in the stream. • checkpointSubSequenceNumber: When using the Kinesis Producer Library's aggregation feature, this is an extension to checkpoint that tracks individual user records within the Amazon Kinesis record. • leaseCounter: Used for lease versioning so that workers can detect that their lease has been taken by another worker. • leaseKey: A unique identifier for a lease. Each lease is particular to a shard in the stream and is held by one worker at a time. • leaseOwner: The worker that is holding this lease. • ownerSwitchesSinceCheckpoint: How many times this lease has changed workers since the last time a checkpoint was written. • parentShardId: Used to ensure that the parent shard is fully processed before processing starts on the child shards. This ensures that records are processed in the same order they were put into the stream.
  • 24. Using Shard Iterators • You retrieve records from the stream on a per- shard basis. 
 • AT_SEQUENCE_NUMBER • AFTER_SEQUENCE_NUMBER • AT_TIMESTAMP • TRIM_HORIZON • LATEST
  • 25. Recovering from Failures • Record Processor Failure • The worker invokes record processor methods using Java ExecutorService tasks. • If a task fails, the worker retains control of the shard that the record processor was processing. • The worker starts a new record processor task to process that shard • Worker or Application Failure • If a worker — or an instance of the Amazon Kinesis Streams application — fails, you should detect and handle the situation.
  • 26. Handling Duplicate Records (Idempotency) • There are two primary reasons why records may be delivered more than one time to your Amazon Kinesis Streams application: • producer retries • consumer retries • Your application must anticipate and appropriately handle processing individual records multiple times.
  • 27. Pricing • Shard Hour (1MB/second ingress, 2MB/second egress)$0.015 • PUT Payload Units, per 1,000,000 units $0.014 • Extended Data Retention (Up to 7 days), per Shard Hour $0.020 • DynamoDB price if you use KCL
  • 28. Kafka vs. Kinesis Streams • In Kafka you can configure, for each topic, the replication factor and how many replicas have to acknowledge a message before is considered successful.So you can definitely make it highly available. • Amazon ensures that you won't lose data, but that comes with a performance cost. 
 (messages are written to 3 different AZ’s synchronously) • There are several benchmarks online comparing Kafka and Kinesis, but the result it's always the same: you'll have a hard time to replicate Kafka's performance in Kinesis. At least for a reasonable price. • This is in part is because Kafka is insanely fast, but also because Kinesis writes each message synchronously to 3 different machines. And this is quite costly in terms of latency and throughput. • Kafka is one of the preferred options for the Apache stream processing frameworks • Unsurprisingly, Kinesis is really well integrated with other AWS services
  • 29. DynamoDB Streams vs. Kinesis Streams • DynamoDB Streams actions are similar to their counterparts in Amazon Kinesis Streams, they are not 100% identical. • You can write applications for Amazon Kinesis Streams using the Amazon Kinesis Client Library (KCL). • You can leverage the design patterns found within the KCL to process DynamoDB Streams shards and stream records. To do this, you use the DynamoDB Streams Kinesis Adapter
  • 30. SQS vs. Kinesis Streams • Amazon Kinesis Streams enables real-time processing of streaming big data. • It provides ordering of records, as well as the ability to read and/or replay records in the same order to multiple Amazon Kinesis Applications. • The Amazon Kinesis Client Library (KCL) delivers all records for a given partition key to the same record processor, making it easier to build multiple applications reading from the same Amazon Kinesis stream (for example, to perform counting, aggregation, and filtering). • Amazon Simple Queue Service (Amazon SQS) offers a reliable, highly scalable hosted queue for storing messages as they travel between computers. • Amazon SQS lets you easily move data between distributed application components and helps you build applications in which messages are processed independently (with message-level ack/fail semantics), such as automated workflows.
  • 32. Amazon Kinesis Firehose • Amazon Kinesis Firehose is the easiest way to load streaming data into AWS. • It can capture, transform, and load streaming data into Amazon Kinesis Analytics, Amazon S3, Amazon Redshift, and Amazon Elasticsearch Service • Fully managed service that automatically scales to match the throughput of your data and requires no ongoing administration. • It can also batch, compress, and encrypt the data before loading it, minimizing the amount of storage used at the destination and increasing security.
  • 33. Amazon Kinesis Analytics • Process streaming data in real time with standard SQL • Query streaming data or build entire streaming applications using SQL, so that you can gain actionable insights and respond to your business and customer needs promptly. • Scales automatically to match the volume and throughput rate of your incoming data • Only pay for the resources your queries consume. There is no minimum fee or setup cost.
  • 34. Amazon Kinesis Analytics Step 1: Configure Input Stream Step 2: Write your SQL queries Step 3: Configure Output Stream
  • 35. Thank you! Time to show you real life examples from OpsGenie