SlideShare a Scribd company logo
SERHAT CAN • @SRHTCN
AWS Kinesis
•
Table of Contents
Streaming data?
Big Data Processing Approaches
AWS Kinesis Family
Amazon Kinesis Streams in detail
Amazon Kinesis Firehose
Amazon Kinesis Analytics
Streaming Data: Life As It Happens
After the event occurs -> at rest (batch)
As the event occurs -> in motion (streaming)
Big Data Processing Approaches
• Common Big Data Processing Approaches
• Query Engine Approach (Data Warehouse, SQL, NoSQL Databases)
• Repeated queries over the same well-structured data
• Pre-computations like indices and dimensional views improve query performance
• Batch Engines (Map-Reduce)
• The “query” is run on the data. There are no pre-computations
• Streaming Big Data Processing Approach
• Real-time response to content in semi-structured data streams
• Relatively simple computations on data (aggregates, filters, sliding window, etc.)
• Enables data lifecycle by moving data to different stores / open source systems
Kinesis Family
Amazon Kinesis Streams
• A fully managed service for real-time processing of
high- volume, streaming data.
• Kinesis can store and process terabytes of data an
hour from hundreds of thousands of sources.
• Data is replicated across multiple Availability Zones
to ensure high durability and availability.
Amazon Kinesis Streams Concepts
Shard
• Streams are made of Shards. A shard is the base
throughput unit of an Amazon Kinesis stream.
• One shard provides a capacity of 1MB/sec data input
and 2MB/sec data output.
• One shard can support up to 1000 PUT records per
second.
• You can monitor shard-level metrics in Amazon Kinesis
Streams
• Add or remove shards from your stream dynamically
as your data throughput changes by resharding the
stream.
Data Record
• A record is the unit of data stored in an Amazon Kinesis stream.
• A record is composed of a;
• partition key
• sequence number,
• data blob (the data you want to send)
• The maximum size of a data blob (the data payload after Base64-
decoding) is 1 megabyte (MB).
Partition Key
• Partition key is used to segregate and route data records to different
shards of a stream.
• A partition key is specified by your data producer while putting data
into an Amazon Kinesis stream.
• For example, assuming you have an Amazon Kinesis stream with two
shards (Shard 1 and Shard 2). You can configure your data producer
to use two partition keys (Key A and Key B) so that all data records
with Key A are added to Shard 1 and all data records with Key B are
added to Shard 2.
Sequence Number
• Each data record has a sequence number that is unique within its
shard.
• The sequence number is assigned by Streams after you write to the
stream with client.putRecords or client.putRecord.
• Sequence numbers for the same partition key generally increase over
time; the longer the time period between write requests, the larger the
sequence numbers become.
Resharding the Stream
• Streams supports resharding, which enables you to adjust the number of
shards in your stream in order to adapt to changes in the rate of data flow
through the stream.
• There are two types of resharding operations: shard split and shard
merge.
• Shard split: divide a single shard into two shards.
• Shard merge: combine two shards into a single shard.
Resharding the Stream
• Resharding is always “pairwise”: split into & merge more than two shards
in a single operation is NOT allowed
• Resharding is typically performed by an administrative application which
is distinct from the producer (put) applications, and the consumer (get)
applications
• The administrative application would also need a broader set of IAM
permissions for resharding
Splitting a Shard
• Specify how hash key values from the parent shard should be redistributed to the child shards
• The possible hash key values for a given shard constitute a set of ordered contiguous non-
negative integers. This range of possible hash key values is given by
shard.getHashKeyRange().getStartingHashKey();
shard.getHashKeyRange().getEndingHashKey();
• When you split the shard, you specify a value in this range.
• That hash key value and all higher hash key values are distributed to one of the child shards.
• All the lower hash key values are distributed to the other child shard.
Merging Two Shards
• In order to merge two shards, the shards must be adjacent.
• Two shards are considered adjacent if the union of the hash key ranges
for the two shards form a contiguous set with no gaps.
• To identify shards that are candidates for merging, you should filter out all
shards that are in a CLOSED state.
• Shards that are OPEN—that is, not CLOSED—have an ending sequence
number of null.
After Resharding
• After you call a resharding operation, either splitShard or mergeShards,
you need to wait for the stream to become active again. (like create)
• In the process of resharding, a parent shard transitions from an OPEN
state to a CLOSED state to an EXPIRED state.
• When all is done back to ACTIVE state.
Retention Period
• Data records are accessible for a default of 24 hours from the
time they are added to a stream
• Configurable in hourly increments
• From 24 to 168 hours (1 to 7 days)
Amazon Kinesis Producer Library (KPL)
• The KPL is an easy-to-use, highly configurable library that helps you
write to a Amazon Kinesis stream.
• Writes to one or more Amazon Kinesis streams with an automatic and configurable
retry mechanism
• Collects records and uses PutRecords to write multiple records to multiple shards
per request
• Aggregates user records to increase payload size and improve throughput
• Integrates seamlessly with the Amazon Kinesis Client Library (KCL) to de-aggregate
batched records on the consumer
• Submits Amazon CloudWatch metrics on your behalf to provide visibility into
producer performance
• Develop a consumer application for Amazon Kinesis Streams
• The KCL acts as an intermediary between your record processing logic and
Streams.
• KCL application instantiates a worker with configuration information, and then
uses a record processor to process the data received from an Amazon Kinesis
stream.
• You can run a KCL application on any number of instances. Multiple instances
of the same application coordinate on failures and load-balance dynamically.
• You can also have multiple KCL applications working on the same stream,
subject to throughput limits.
Amazon Kinesis Client Library (Life Saver)
Amazon Kinesis Client Library
• Connects to the stream
• Enumerates the shards
• Coordinates shard associations with other workers (if any)
• Instantiates a record processor for every shard it manages
• Pulls data records from the stream
• Pushes the records to the corresponding record processor
• Checkpoints processed records
• Balances shard-worker associations when the worker instance count changes
• Balances shard-worker associations when shards are split or merged
Amazon Kinesis Client Library
• KCL uses a unique Amazon DynamoDB table to keep
track of the application's state
• KCL creates the table with a provisioned throughput of
10 reads per second and 10 writes per second
• Each row in the DynamoDB table represents a shard that
is being processed by your application. The hash key for
the table is the shard ID.
Amazon Kinesis Client Library
• In addition to the shard ID, each row also includes the following data:
• checkpoint: The most recent checkpoint sequence number for the shard. This value is unique across
all shards in the stream.
• checkpointSubSequenceNumber: When using the Kinesis Producer Library's aggregation feature,
this is an extension to checkpoint that tracks individual user records within the Amazon Kinesis record.
• leaseCounter: Used for lease versioning so that workers can detect that their lease has been taken by
another worker.
• leaseKey: A unique identifier for a lease. Each lease is particular to a shard in the stream and is held
by one worker at a time.
• leaseOwner: The worker that is holding this lease.
• ownerSwitchesSinceCheckpoint: How many times this lease has changed workers since the last
time a checkpoint was written.
• parentShardId: Used to ensure that the parent shard is fully processed before processing starts on
the child shards. This ensures that records are processed in the same order they were put into the
stream.
Using Shard Iterators
• You retrieve records from the stream on a per-
shard basis. 

• AT_SEQUENCE_NUMBER
• AFTER_SEQUENCE_NUMBER
• AT_TIMESTAMP
• TRIM_HORIZON
• LATEST
Recovering from Failures
• Record Processor Failure
• The worker invokes record processor methods using Java ExecutorService tasks.
• If a task fails, the worker retains control of the shard that the record processor was
processing.
• The worker starts a new record processor task to process that shard
• Worker or Application Failure
• If a worker — or an instance of the Amazon Kinesis Streams application — fails,
you should detect and handle the situation.
Handling Duplicate Records
(Idempotency)
• There are two primary reasons why records may be
delivered more than one time to your Amazon
Kinesis Streams application:
• producer retries
• consumer retries
• Your application must anticipate and appropriately
handle processing individual records multiple times.
Pricing
• Shard Hour (1MB/second ingress, 2MB/second egress)$0.015
• PUT Payload Units, per 1,000,000 units $0.014
• Extended Data Retention (Up to 7 days), per Shard Hour $0.020
• DynamoDB price if you use KCL
Kafka vs. Kinesis Streams
• In Kafka you can configure, for each topic, the replication factor and how many replicas
have to acknowledge a message before is considered successful.So you can definitely
make it highly available.
• Amazon ensures that you won't lose data, but that comes with a performance cost. 

(messages are written to 3 different AZ’s synchronously)
• There are several benchmarks online comparing Kafka and Kinesis, but the result it's
always the same: you'll have a hard time to replicate Kafka's performance in Kinesis. At
least for a reasonable price.
• This is in part is because Kafka is insanely fast, but also because Kinesis writes each
message synchronously to 3 different machines. And this is quite costly in terms of
latency and throughput.
• Kafka is one of the preferred options for the Apache stream processing frameworks
• Unsurprisingly, Kinesis is really well integrated with other AWS services
DynamoDB Streams vs. Kinesis Streams
• DynamoDB Streams actions are similar to their
counterparts in Amazon Kinesis Streams, they
are not 100% identical.
• You can write applications for Amazon Kinesis
Streams using the Amazon Kinesis Client Library
(KCL).
• You can leverage the design patterns found
within the KCL to process DynamoDB Streams
shards and stream records. To do this, you use
the DynamoDB Streams Kinesis Adapter
SQS vs. Kinesis Streams
• Amazon Kinesis Streams enables real-time
processing of streaming big data.
• It provides ordering of records, as well as the
ability to read and/or replay records in the same
order to multiple Amazon Kinesis Applications.
• The Amazon Kinesis Client Library (KCL)
delivers all records for a given partition key to
the same record processor, making it easier to
build multiple applications reading from the same
Amazon Kinesis stream (for example, to perform
counting, aggregation, and filtering).
• Amazon Simple Queue Service (Amazon SQS)
offers a reliable, highly scalable hosted queue
for storing messages as they travel between
computers.
• Amazon SQS lets you easily move data between
distributed application components and helps
you build applications in which messages are
processed independently (with message-level
ack/fail semantics), such as automated
workflows.
Amazon Kinesis Firehose
Amazon Kinesis Firehose
• Amazon Kinesis Firehose is the easiest way to load streaming data into AWS.
• It can capture, transform, and load streaming data into Amazon Kinesis
Analytics, Amazon S3, Amazon Redshift, and Amazon Elasticsearch Service
• Fully managed service that automatically scales to match the throughput of
your data and requires no ongoing administration.
• It can also batch, compress, and encrypt the data before loading it,
minimizing the amount of storage used at the destination and increasing
security.
Amazon Kinesis Analytics
• Process streaming data in real time with standard SQL
• Query streaming data or build entire streaming applications using SQL, so
that you can gain actionable insights and respond to your business and
customer needs promptly.
• Scales automatically to match the volume and throughput rate of your
incoming data
• Only pay for the resources your queries consume. There is no minimum fee
or setup cost.
Amazon Kinesis Analytics
Step 1: Configure Input Stream Step 2: Write your SQL queries Step 3: Configure Output Stream
Thank you!
Time to show you real life examples
from OpsGenie

More Related Content

What's hot

AWS re:Invent 2016: Tableau Rules of Engagement in the Cloud (STG306)
AWS re:Invent 2016: Tableau Rules of Engagement in the Cloud (STG306)AWS re:Invent 2016: Tableau Rules of Engagement in the Cloud (STG306)
AWS re:Invent 2016: Tableau Rules of Engagement in the Cloud (STG306)
Amazon Web Services
 
AWS Kinesis Streams
AWS Kinesis StreamsAWS Kinesis Streams
AWS Kinesis Streams
Fernando Rodriguez
 
AWS re:Invent 2016: Case Study: How Startups like Mapbox, Ring, Hudl, and Oth...
AWS re:Invent 2016: Case Study: How Startups like Mapbox, Ring, Hudl, and Oth...AWS re:Invent 2016: Case Study: How Startups like Mapbox, Ring, Hudl, and Oth...
AWS re:Invent 2016: Case Study: How Startups like Mapbox, Ring, Hudl, and Oth...
Amazon Web Services
 
Co 4, session 2, aws analytics services
Co 4, session 2, aws analytics servicesCo 4, session 2, aws analytics services
Co 4, session 2, aws analytics services
m vaishnavi
 
Log Analytics with Amazon Elasticsearch Service & Kibana
Log Analytics with Amazon Elasticsearch Service & KibanaLog Analytics with Amazon Elasticsearch Service & Kibana
Log Analytics with Amazon Elasticsearch Service & Kibana
Amazon Web Services
 
Real-Time Log Analytics using Amazon Kinesis and Amazon Elasticsearch Service...
Real-Time Log Analytics using Amazon Kinesis and Amazon Elasticsearch Service...Real-Time Log Analytics using Amazon Kinesis and Amazon Elasticsearch Service...
Real-Time Log Analytics using Amazon Kinesis and Amazon Elasticsearch Service...
Amazon Web Services
 
Supercharging the Value of Your Data with Amazon S3
Supercharging the Value of Your Data with Amazon S3Supercharging the Value of Your Data with Amazon S3
Supercharging the Value of Your Data with Amazon S3
Amazon Web Services
 
Deep Dive on Log Analytics with Elasticsearch Service
Deep Dive on Log Analytics with Elasticsearch ServiceDeep Dive on Log Analytics with Elasticsearch Service
Deep Dive on Log Analytics with Elasticsearch Service
Amazon Web Services
 
BDA402 Deep Dive: Log analytics with Amazon Elasticsearch Service
BDA402 Deep Dive: Log analytics with Amazon Elasticsearch ServiceBDA402 Deep Dive: Log analytics with Amazon Elasticsearch Service
BDA402 Deep Dive: Log analytics with Amazon Elasticsearch Service
Amazon Web Services
 
Introduction to Real-time, Streaming Data and Amazon Kinesis. Streaming Data ...
Introduction to Real-time, Streaming Data and Amazon Kinesis. Streaming Data ...Introduction to Real-time, Streaming Data and Amazon Kinesis. Streaming Data ...
Introduction to Real-time, Streaming Data and Amazon Kinesis. Streaming Data ...
Amazon Web Services
 
Introduction to Amazon Kinesis Analytics
Introduction to Amazon Kinesis AnalyticsIntroduction to Amazon Kinesis Analytics
Introduction to Amazon Kinesis Analytics
Amazon Web Services
 
AWS Data Transfer Services: Data Ingest Strategies Into the AWS Cloud
AWS Data Transfer Services: Data Ingest Strategies Into the AWS CloudAWS Data Transfer Services: Data Ingest Strategies Into the AWS Cloud
AWS Data Transfer Services: Data Ingest Strategies Into the AWS Cloud
Amazon Web Services
 
Cloud Backup & Recovery Options with AWS Partner Solutions - June 2017 AWS On...
Cloud Backup & Recovery Options with AWS Partner Solutions - June 2017 AWS On...Cloud Backup & Recovery Options with AWS Partner Solutions - June 2017 AWS On...
Cloud Backup & Recovery Options with AWS Partner Solutions - June 2017 AWS On...
Amazon Web Services
 
Big Data Architectural Patterns and Best Practices on AWS
Big Data Architectural Patterns and Best Practices on AWSBig Data Architectural Patterns and Best Practices on AWS
Big Data Architectural Patterns and Best Practices on AWS
Amazon Web Services
 
SRV420 Analyzing Streaming Data in Real-time with Amazon Kinesis
SRV420 Analyzing Streaming Data in Real-time with Amazon KinesisSRV420 Analyzing Streaming Data in Real-time with Amazon Kinesis
SRV420 Analyzing Streaming Data in Real-time with Amazon Kinesis
Amazon Web Services
 
AWS re:Invent 2016: Streaming ETL for RDS and DynamoDB (DAT315)
AWS re:Invent 2016: Streaming ETL for RDS and DynamoDB (DAT315)AWS re:Invent 2016: Streaming ETL for RDS and DynamoDB (DAT315)
AWS re:Invent 2016: Streaming ETL for RDS and DynamoDB (DAT315)
Amazon Web Services
 
Serverless Streaming Data Processing using Amazon Kinesis Analytics
Serverless Streaming Data Processing using Amazon Kinesis AnalyticsServerless Streaming Data Processing using Amazon Kinesis Analytics
Serverless Streaming Data Processing using Amazon Kinesis Analytics
Amazon Web Services
 
Real time sentiment analysis using twitter stream api & aws kinesis
Real time sentiment analysis using twitter stream api & aws kinesisReal time sentiment analysis using twitter stream api & aws kinesis
Real time sentiment analysis using twitter stream api & aws kinesis
Armando Padilla
 
Real-Time Streaming Data Solution on AWS with Beeswax
Real-Time Streaming Data Solution on AWS with BeeswaxReal-Time Streaming Data Solution on AWS with Beeswax
Real-Time Streaming Data Solution on AWS with Beeswax
Amazon Web Services
 
Real-time Streaming and Querying with Amazon Kinesis and Amazon Elastic MapRe...
Real-time Streaming and Querying with Amazon Kinesis and Amazon Elastic MapRe...Real-time Streaming and Querying with Amazon Kinesis and Amazon Elastic MapRe...
Real-time Streaming and Querying with Amazon Kinesis and Amazon Elastic MapRe...
Amazon Web Services
 

What's hot (20)

AWS re:Invent 2016: Tableau Rules of Engagement in the Cloud (STG306)
AWS re:Invent 2016: Tableau Rules of Engagement in the Cloud (STG306)AWS re:Invent 2016: Tableau Rules of Engagement in the Cloud (STG306)
AWS re:Invent 2016: Tableau Rules of Engagement in the Cloud (STG306)
 
AWS Kinesis Streams
AWS Kinesis StreamsAWS Kinesis Streams
AWS Kinesis Streams
 
AWS re:Invent 2016: Case Study: How Startups like Mapbox, Ring, Hudl, and Oth...
AWS re:Invent 2016: Case Study: How Startups like Mapbox, Ring, Hudl, and Oth...AWS re:Invent 2016: Case Study: How Startups like Mapbox, Ring, Hudl, and Oth...
AWS re:Invent 2016: Case Study: How Startups like Mapbox, Ring, Hudl, and Oth...
 
Co 4, session 2, aws analytics services
Co 4, session 2, aws analytics servicesCo 4, session 2, aws analytics services
Co 4, session 2, aws analytics services
 
Log Analytics with Amazon Elasticsearch Service & Kibana
Log Analytics with Amazon Elasticsearch Service & KibanaLog Analytics with Amazon Elasticsearch Service & Kibana
Log Analytics with Amazon Elasticsearch Service & Kibana
 
Real-Time Log Analytics using Amazon Kinesis and Amazon Elasticsearch Service...
Real-Time Log Analytics using Amazon Kinesis and Amazon Elasticsearch Service...Real-Time Log Analytics using Amazon Kinesis and Amazon Elasticsearch Service...
Real-Time Log Analytics using Amazon Kinesis and Amazon Elasticsearch Service...
 
Supercharging the Value of Your Data with Amazon S3
Supercharging the Value of Your Data with Amazon S3Supercharging the Value of Your Data with Amazon S3
Supercharging the Value of Your Data with Amazon S3
 
Deep Dive on Log Analytics with Elasticsearch Service
Deep Dive on Log Analytics with Elasticsearch ServiceDeep Dive on Log Analytics with Elasticsearch Service
Deep Dive on Log Analytics with Elasticsearch Service
 
BDA402 Deep Dive: Log analytics with Amazon Elasticsearch Service
BDA402 Deep Dive: Log analytics with Amazon Elasticsearch ServiceBDA402 Deep Dive: Log analytics with Amazon Elasticsearch Service
BDA402 Deep Dive: Log analytics with Amazon Elasticsearch Service
 
Introduction to Real-time, Streaming Data and Amazon Kinesis. Streaming Data ...
Introduction to Real-time, Streaming Data and Amazon Kinesis. Streaming Data ...Introduction to Real-time, Streaming Data and Amazon Kinesis. Streaming Data ...
Introduction to Real-time, Streaming Data and Amazon Kinesis. Streaming Data ...
 
Introduction to Amazon Kinesis Analytics
Introduction to Amazon Kinesis AnalyticsIntroduction to Amazon Kinesis Analytics
Introduction to Amazon Kinesis Analytics
 
AWS Data Transfer Services: Data Ingest Strategies Into the AWS Cloud
AWS Data Transfer Services: Data Ingest Strategies Into the AWS CloudAWS Data Transfer Services: Data Ingest Strategies Into the AWS Cloud
AWS Data Transfer Services: Data Ingest Strategies Into the AWS Cloud
 
Cloud Backup & Recovery Options with AWS Partner Solutions - June 2017 AWS On...
Cloud Backup & Recovery Options with AWS Partner Solutions - June 2017 AWS On...Cloud Backup & Recovery Options with AWS Partner Solutions - June 2017 AWS On...
Cloud Backup & Recovery Options with AWS Partner Solutions - June 2017 AWS On...
 
Big Data Architectural Patterns and Best Practices on AWS
Big Data Architectural Patterns and Best Practices on AWSBig Data Architectural Patterns and Best Practices on AWS
Big Data Architectural Patterns and Best Practices on AWS
 
SRV420 Analyzing Streaming Data in Real-time with Amazon Kinesis
SRV420 Analyzing Streaming Data in Real-time with Amazon KinesisSRV420 Analyzing Streaming Data in Real-time with Amazon Kinesis
SRV420 Analyzing Streaming Data in Real-time with Amazon Kinesis
 
AWS re:Invent 2016: Streaming ETL for RDS and DynamoDB (DAT315)
AWS re:Invent 2016: Streaming ETL for RDS and DynamoDB (DAT315)AWS re:Invent 2016: Streaming ETL for RDS and DynamoDB (DAT315)
AWS re:Invent 2016: Streaming ETL for RDS and DynamoDB (DAT315)
 
Serverless Streaming Data Processing using Amazon Kinesis Analytics
Serverless Streaming Data Processing using Amazon Kinesis AnalyticsServerless Streaming Data Processing using Amazon Kinesis Analytics
Serverless Streaming Data Processing using Amazon Kinesis Analytics
 
Real time sentiment analysis using twitter stream api & aws kinesis
Real time sentiment analysis using twitter stream api & aws kinesisReal time sentiment analysis using twitter stream api & aws kinesis
Real time sentiment analysis using twitter stream api & aws kinesis
 
Real-Time Streaming Data Solution on AWS with Beeswax
Real-Time Streaming Data Solution on AWS with BeeswaxReal-Time Streaming Data Solution on AWS with Beeswax
Real-Time Streaming Data Solution on AWS with Beeswax
 
Real-time Streaming and Querying with Amazon Kinesis and Amazon Elastic MapRe...
Real-time Streaming and Querying with Amazon Kinesis and Amazon Elastic MapRe...Real-time Streaming and Querying with Amazon Kinesis and Amazon Elastic MapRe...
Real-time Streaming and Querying with Amazon Kinesis and Amazon Elastic MapRe...
 

Similar to AWS Kinesis - Streams, Firehose, Analytics

(BDT403) Best Practices for Building Real-time Streaming Applications with Am...
(BDT403) Best Practices for Building Real-time Streaming Applications with Am...(BDT403) Best Practices for Building Real-time Streaming Applications with Am...
(BDT403) Best Practices for Building Real-time Streaming Applications with Am...
Amazon Web Services
 
Deep dive and best practices on real time streaming applications nyc-loft_oct...
Deep dive and best practices on real time streaming applications nyc-loft_oct...Deep dive and best practices on real time streaming applications nyc-loft_oct...
Deep dive and best practices on real time streaming applications nyc-loft_oct...
Amazon Web Services
 
AWS April 2016 Webinar Series - Getting Started with Real-Time Data Analytics...
AWS April 2016 Webinar Series - Getting Started with Real-Time Data Analytics...AWS April 2016 Webinar Series - Getting Started with Real-Time Data Analytics...
AWS April 2016 Webinar Series - Getting Started with Real-Time Data Analytics...
Amazon Web Services
 
(SDD405) Amazon Kinesis Deep Dive | AWS re:Invent 2014
(SDD405) Amazon Kinesis Deep Dive | AWS re:Invent 2014(SDD405) Amazon Kinesis Deep Dive | AWS re:Invent 2014
(SDD405) Amazon Kinesis Deep Dive | AWS re:Invent 2014
Amazon Web Services
 
Deep Dive and Best Practices for Real Time Streaming Applications
Deep Dive and Best Practices for Real Time Streaming ApplicationsDeep Dive and Best Practices for Real Time Streaming Applications
Deep Dive and Best Practices for Real Time Streaming Applications
Amazon Web Services
 
Structure Data 2014: BIG DATA ANALYTICS RE-INVENTED, Ryan Waite
Structure Data 2014: BIG DATA ANALYTICS RE-INVENTED, Ryan WaiteStructure Data 2014: BIG DATA ANALYTICS RE-INVENTED, Ryan Waite
Structure Data 2014: BIG DATA ANALYTICS RE-INVENTED, Ryan Waite
Gigaom
 
AWS Webcast - Managing Big Data in the AWS Cloud_20140924
AWS Webcast - Managing Big Data in the AWS Cloud_20140924AWS Webcast - Managing Big Data in the AWS Cloud_20140924
AWS Webcast - Managing Big Data in the AWS Cloud_20140924
Amazon Web Services
 
Log Analytics with Amazon Elasticsearch Service and Amazon Kinesis - March 20...
Log Analytics with Amazon Elasticsearch Service and Amazon Kinesis - March 20...Log Analytics with Amazon Elasticsearch Service and Amazon Kinesis - March 20...
Log Analytics with Amazon Elasticsearch Service and Amazon Kinesis - March 20...
Amazon Web Services
 
AWS를 활용한 첫 빅데이터 프로젝트 시작하기(김일호)- AWS 웨비나 시리즈 2015
AWS를 활용한 첫 빅데이터 프로젝트 시작하기(김일호)- AWS 웨비나 시리즈 2015AWS를 활용한 첫 빅데이터 프로젝트 시작하기(김일호)- AWS 웨비나 시리즈 2015
AWS를 활용한 첫 빅데이터 프로젝트 시작하기(김일호)- AWS 웨비나 시리즈 2015
Amazon Web Services Korea
 
Highlights of AWS ReInvent 2023 (Announcements and Best Practices)
Highlights of AWS ReInvent 2023 (Announcements and Best Practices)Highlights of AWS ReInvent 2023 (Announcements and Best Practices)
Highlights of AWS ReInvent 2023 (Announcements and Best Practices)
Emprovise
 
AWS re:Invent 2016: Beeswax: Building a Real-Time Streaming Data Platform on ...
AWS re:Invent 2016: Beeswax: Building a Real-Time Streaming Data Platform on ...AWS re:Invent 2016: Beeswax: Building a Real-Time Streaming Data Platform on ...
AWS re:Invent 2016: Beeswax: Building a Real-Time Streaming Data Platform on ...
Amazon Web Services
 
Amazon Redshift Deep Dive
Amazon Redshift Deep Dive Amazon Redshift Deep Dive
Amazon Redshift Deep Dive
Amazon Web Services
 
AWS Community Nordics Virtual Meetup
AWS Community Nordics Virtual MeetupAWS Community Nordics Virtual Meetup
AWS Community Nordics Virtual Meetup
Anahit Pogosova
 
Real-time Data Processing Using AWS Lambda
Real-time Data Processing Using AWS LambdaReal-time Data Processing Using AWS Lambda
Real-time Data Processing Using AWS Lambda
Amazon Web Services
 
Data & Analytics - Session 2 - Introducing Amazon Redshift
Data & Analytics - Session 2 - Introducing Amazon RedshiftData & Analytics - Session 2 - Introducing Amazon Redshift
Data & Analytics - Session 2 - Introducing Amazon Redshift
Amazon Web Services
 
Optimizing Your Amazon Redshift Cluster for Peak Performance - AWS Summit Syd...
Optimizing Your Amazon Redshift Cluster for Peak Performance - AWS Summit Syd...Optimizing Your Amazon Redshift Cluster for Peak Performance - AWS Summit Syd...
Optimizing Your Amazon Redshift Cluster for Peak Performance - AWS Summit Syd...
Amazon Web Services
 
Getting Started with Real-time Analytics
Getting Started with Real-time AnalyticsGetting Started with Real-time Analytics
Getting Started with Real-time Analytics
Amazon Web Services
 
Bigdata meetup dwarak_realtime_score_app
Bigdata meetup dwarak_realtime_score_appBigdata meetup dwarak_realtime_score_app
Bigdata meetup dwarak_realtime_score_app
Dwarakanath Ramachandran
 
BDA402 Deep Dive: Log Analytics with Amazon Elasticsearch Service
BDA402 Deep Dive: Log Analytics with Amazon Elasticsearch ServiceBDA402 Deep Dive: Log Analytics with Amazon Elasticsearch Service
BDA402 Deep Dive: Log Analytics with Amazon Elasticsearch Service
Amazon Web Services
 
Optimising your Amazon Redshift Cluster for Peak Performance
Optimising your Amazon Redshift Cluster for Peak PerformanceOptimising your Amazon Redshift Cluster for Peak Performance
Optimising your Amazon Redshift Cluster for Peak Performance
Amazon Web Services
 

Similar to AWS Kinesis - Streams, Firehose, Analytics (20)

(BDT403) Best Practices for Building Real-time Streaming Applications with Am...
(BDT403) Best Practices for Building Real-time Streaming Applications with Am...(BDT403) Best Practices for Building Real-time Streaming Applications with Am...
(BDT403) Best Practices for Building Real-time Streaming Applications with Am...
 
Deep dive and best practices on real time streaming applications nyc-loft_oct...
Deep dive and best practices on real time streaming applications nyc-loft_oct...Deep dive and best practices on real time streaming applications nyc-loft_oct...
Deep dive and best practices on real time streaming applications nyc-loft_oct...
 
AWS April 2016 Webinar Series - Getting Started with Real-Time Data Analytics...
AWS April 2016 Webinar Series - Getting Started with Real-Time Data Analytics...AWS April 2016 Webinar Series - Getting Started with Real-Time Data Analytics...
AWS April 2016 Webinar Series - Getting Started with Real-Time Data Analytics...
 
(SDD405) Amazon Kinesis Deep Dive | AWS re:Invent 2014
(SDD405) Amazon Kinesis Deep Dive | AWS re:Invent 2014(SDD405) Amazon Kinesis Deep Dive | AWS re:Invent 2014
(SDD405) Amazon Kinesis Deep Dive | AWS re:Invent 2014
 
Deep Dive and Best Practices for Real Time Streaming Applications
Deep Dive and Best Practices for Real Time Streaming ApplicationsDeep Dive and Best Practices for Real Time Streaming Applications
Deep Dive and Best Practices for Real Time Streaming Applications
 
Structure Data 2014: BIG DATA ANALYTICS RE-INVENTED, Ryan Waite
Structure Data 2014: BIG DATA ANALYTICS RE-INVENTED, Ryan WaiteStructure Data 2014: BIG DATA ANALYTICS RE-INVENTED, Ryan Waite
Structure Data 2014: BIG DATA ANALYTICS RE-INVENTED, Ryan Waite
 
AWS Webcast - Managing Big Data in the AWS Cloud_20140924
AWS Webcast - Managing Big Data in the AWS Cloud_20140924AWS Webcast - Managing Big Data in the AWS Cloud_20140924
AWS Webcast - Managing Big Data in the AWS Cloud_20140924
 
Log Analytics with Amazon Elasticsearch Service and Amazon Kinesis - March 20...
Log Analytics with Amazon Elasticsearch Service and Amazon Kinesis - March 20...Log Analytics with Amazon Elasticsearch Service and Amazon Kinesis - March 20...
Log Analytics with Amazon Elasticsearch Service and Amazon Kinesis - March 20...
 
AWS를 활용한 첫 빅데이터 프로젝트 시작하기(김일호)- AWS 웨비나 시리즈 2015
AWS를 활용한 첫 빅데이터 프로젝트 시작하기(김일호)- AWS 웨비나 시리즈 2015AWS를 활용한 첫 빅데이터 프로젝트 시작하기(김일호)- AWS 웨비나 시리즈 2015
AWS를 활용한 첫 빅데이터 프로젝트 시작하기(김일호)- AWS 웨비나 시리즈 2015
 
Highlights of AWS ReInvent 2023 (Announcements and Best Practices)
Highlights of AWS ReInvent 2023 (Announcements and Best Practices)Highlights of AWS ReInvent 2023 (Announcements and Best Practices)
Highlights of AWS ReInvent 2023 (Announcements and Best Practices)
 
AWS re:Invent 2016: Beeswax: Building a Real-Time Streaming Data Platform on ...
AWS re:Invent 2016: Beeswax: Building a Real-Time Streaming Data Platform on ...AWS re:Invent 2016: Beeswax: Building a Real-Time Streaming Data Platform on ...
AWS re:Invent 2016: Beeswax: Building a Real-Time Streaming Data Platform on ...
 
Amazon Redshift Deep Dive
Amazon Redshift Deep Dive Amazon Redshift Deep Dive
Amazon Redshift Deep Dive
 
AWS Community Nordics Virtual Meetup
AWS Community Nordics Virtual MeetupAWS Community Nordics Virtual Meetup
AWS Community Nordics Virtual Meetup
 
Real-time Data Processing Using AWS Lambda
Real-time Data Processing Using AWS LambdaReal-time Data Processing Using AWS Lambda
Real-time Data Processing Using AWS Lambda
 
Data & Analytics - Session 2 - Introducing Amazon Redshift
Data & Analytics - Session 2 - Introducing Amazon RedshiftData & Analytics - Session 2 - Introducing Amazon Redshift
Data & Analytics - Session 2 - Introducing Amazon Redshift
 
Optimizing Your Amazon Redshift Cluster for Peak Performance - AWS Summit Syd...
Optimizing Your Amazon Redshift Cluster for Peak Performance - AWS Summit Syd...Optimizing Your Amazon Redshift Cluster for Peak Performance - AWS Summit Syd...
Optimizing Your Amazon Redshift Cluster for Peak Performance - AWS Summit Syd...
 
Getting Started with Real-time Analytics
Getting Started with Real-time AnalyticsGetting Started with Real-time Analytics
Getting Started with Real-time Analytics
 
Bigdata meetup dwarak_realtime_score_app
Bigdata meetup dwarak_realtime_score_appBigdata meetup dwarak_realtime_score_app
Bigdata meetup dwarak_realtime_score_app
 
BDA402 Deep Dive: Log Analytics with Amazon Elasticsearch Service
BDA402 Deep Dive: Log Analytics with Amazon Elasticsearch ServiceBDA402 Deep Dive: Log Analytics with Amazon Elasticsearch Service
BDA402 Deep Dive: Log Analytics with Amazon Elasticsearch Service
 
Optimising your Amazon Redshift Cluster for Peak Performance
Optimising your Amazon Redshift Cluster for Peak PerformanceOptimising your Amazon Redshift Cluster for Peak Performance
Optimising your Amazon Redshift Cluster for Peak Performance
 

Recently uploaded

Paketo Buildpacks : la meilleure façon de construire des images OCI? DevopsDa...
Paketo Buildpacks : la meilleure façon de construire des images OCI? DevopsDa...Paketo Buildpacks : la meilleure façon de construire des images OCI? DevopsDa...
Paketo Buildpacks : la meilleure façon de construire des images OCI? DevopsDa...
Anthony Dahanne
 
Globus Compute wth IRI Workflows - GlobusWorld 2024
Globus Compute wth IRI Workflows - GlobusWorld 2024Globus Compute wth IRI Workflows - GlobusWorld 2024
Globus Compute wth IRI Workflows - GlobusWorld 2024
Globus
 
Providing Globus Services to Users of JASMIN for Environmental Data Analysis
Providing Globus Services to Users of JASMIN for Environmental Data AnalysisProviding Globus Services to Users of JASMIN for Environmental Data Analysis
Providing Globus Services to Users of JASMIN for Environmental Data Analysis
Globus
 
Navigating the Metaverse: A Journey into Virtual Evolution"
Navigating the Metaverse: A Journey into Virtual Evolution"Navigating the Metaverse: A Journey into Virtual Evolution"
Navigating the Metaverse: A Journey into Virtual Evolution"
Donna Lenk
 
How to Position Your Globus Data Portal for Success Ten Good Practices
How to Position Your Globus Data Portal for Success Ten Good PracticesHow to Position Your Globus Data Portal for Success Ten Good Practices
How to Position Your Globus Data Portal for Success Ten Good Practices
Globus
 
Lecture 1 Introduction to games development
Lecture 1 Introduction to games developmentLecture 1 Introduction to games development
Lecture 1 Introduction to games development
abdulrafaychaudhry
 
top nidhi software solution freedownload
top nidhi software solution freedownloadtop nidhi software solution freedownload
top nidhi software solution freedownload
vrstrong314
 
Large Language Models and the End of Programming
Large Language Models and the End of ProgrammingLarge Language Models and the End of Programming
Large Language Models and the End of Programming
Matt Welsh
 
Gamify Your Mind; The Secret Sauce to Delivering Success, Continuously Improv...
Gamify Your Mind; The Secret Sauce to Delivering Success, Continuously Improv...Gamify Your Mind; The Secret Sauce to Delivering Success, Continuously Improv...
Gamify Your Mind; The Secret Sauce to Delivering Success, Continuously Improv...
Shahin Sheidaei
 
Top Features to Include in Your Winzo Clone App for Business Growth (4).pptx
Top Features to Include in Your Winzo Clone App for Business Growth (4).pptxTop Features to Include in Your Winzo Clone App for Business Growth (4).pptx
Top Features to Include in Your Winzo Clone App for Business Growth (4).pptx
rickgrimesss22
 
Globus Compute Introduction - GlobusWorld 2024
Globus Compute Introduction - GlobusWorld 2024Globus Compute Introduction - GlobusWorld 2024
Globus Compute Introduction - GlobusWorld 2024
Globus
 
2024 RoOUG Security model for the cloud.pptx
2024 RoOUG Security model for the cloud.pptx2024 RoOUG Security model for the cloud.pptx
2024 RoOUG Security model for the cloud.pptx
Georgi Kodinov
 
BoxLang: Review our Visionary Licenses of 2024
BoxLang: Review our Visionary Licenses of 2024BoxLang: Review our Visionary Licenses of 2024
BoxLang: Review our Visionary Licenses of 2024
Ortus Solutions, Corp
 
SOCRadar Research Team: Latest Activities of IntelBroker
SOCRadar Research Team: Latest Activities of IntelBrokerSOCRadar Research Team: Latest Activities of IntelBroker
SOCRadar Research Team: Latest Activities of IntelBroker
SOCRadar
 
Accelerate Enterprise Software Engineering with Platformless
Accelerate Enterprise Software Engineering with PlatformlessAccelerate Enterprise Software Engineering with Platformless
Accelerate Enterprise Software Engineering with Platformless
WSO2
 
Developing Distributed High-performance Computing Capabilities of an Open Sci...
Developing Distributed High-performance Computing Capabilities of an Open Sci...Developing Distributed High-performance Computing Capabilities of an Open Sci...
Developing Distributed High-performance Computing Capabilities of an Open Sci...
Globus
 
Into the Box 2024 - Keynote Day 2 Slides.pdf
Into the Box 2024 - Keynote Day 2 Slides.pdfInto the Box 2024 - Keynote Day 2 Slides.pdf
Into the Box 2024 - Keynote Day 2 Slides.pdf
Ortus Solutions, Corp
 
Cracking the code review at SpringIO 2024
Cracking the code review at SpringIO 2024Cracking the code review at SpringIO 2024
Cracking the code review at SpringIO 2024
Paco van Beckhoven
 
GlobusWorld 2024 Opening Keynote session
GlobusWorld 2024 Opening Keynote sessionGlobusWorld 2024 Opening Keynote session
GlobusWorld 2024 Opening Keynote session
Globus
 
TROUBLESHOOTING 9 TYPES OF OUTOFMEMORYERROR
TROUBLESHOOTING 9 TYPES OF OUTOFMEMORYERRORTROUBLESHOOTING 9 TYPES OF OUTOFMEMORYERROR
TROUBLESHOOTING 9 TYPES OF OUTOFMEMORYERROR
Tier1 app
 

Recently uploaded (20)

Paketo Buildpacks : la meilleure façon de construire des images OCI? DevopsDa...
Paketo Buildpacks : la meilleure façon de construire des images OCI? DevopsDa...Paketo Buildpacks : la meilleure façon de construire des images OCI? DevopsDa...
Paketo Buildpacks : la meilleure façon de construire des images OCI? DevopsDa...
 
Globus Compute wth IRI Workflows - GlobusWorld 2024
Globus Compute wth IRI Workflows - GlobusWorld 2024Globus Compute wth IRI Workflows - GlobusWorld 2024
Globus Compute wth IRI Workflows - GlobusWorld 2024
 
Providing Globus Services to Users of JASMIN for Environmental Data Analysis
Providing Globus Services to Users of JASMIN for Environmental Data AnalysisProviding Globus Services to Users of JASMIN for Environmental Data Analysis
Providing Globus Services to Users of JASMIN for Environmental Data Analysis
 
Navigating the Metaverse: A Journey into Virtual Evolution"
Navigating the Metaverse: A Journey into Virtual Evolution"Navigating the Metaverse: A Journey into Virtual Evolution"
Navigating the Metaverse: A Journey into Virtual Evolution"
 
How to Position Your Globus Data Portal for Success Ten Good Practices
How to Position Your Globus Data Portal for Success Ten Good PracticesHow to Position Your Globus Data Portal for Success Ten Good Practices
How to Position Your Globus Data Portal for Success Ten Good Practices
 
Lecture 1 Introduction to games development
Lecture 1 Introduction to games developmentLecture 1 Introduction to games development
Lecture 1 Introduction to games development
 
top nidhi software solution freedownload
top nidhi software solution freedownloadtop nidhi software solution freedownload
top nidhi software solution freedownload
 
Large Language Models and the End of Programming
Large Language Models and the End of ProgrammingLarge Language Models and the End of Programming
Large Language Models and the End of Programming
 
Gamify Your Mind; The Secret Sauce to Delivering Success, Continuously Improv...
Gamify Your Mind; The Secret Sauce to Delivering Success, Continuously Improv...Gamify Your Mind; The Secret Sauce to Delivering Success, Continuously Improv...
Gamify Your Mind; The Secret Sauce to Delivering Success, Continuously Improv...
 
Top Features to Include in Your Winzo Clone App for Business Growth (4).pptx
Top Features to Include in Your Winzo Clone App for Business Growth (4).pptxTop Features to Include in Your Winzo Clone App for Business Growth (4).pptx
Top Features to Include in Your Winzo Clone App for Business Growth (4).pptx
 
Globus Compute Introduction - GlobusWorld 2024
Globus Compute Introduction - GlobusWorld 2024Globus Compute Introduction - GlobusWorld 2024
Globus Compute Introduction - GlobusWorld 2024
 
2024 RoOUG Security model for the cloud.pptx
2024 RoOUG Security model for the cloud.pptx2024 RoOUG Security model for the cloud.pptx
2024 RoOUG Security model for the cloud.pptx
 
BoxLang: Review our Visionary Licenses of 2024
BoxLang: Review our Visionary Licenses of 2024BoxLang: Review our Visionary Licenses of 2024
BoxLang: Review our Visionary Licenses of 2024
 
SOCRadar Research Team: Latest Activities of IntelBroker
SOCRadar Research Team: Latest Activities of IntelBrokerSOCRadar Research Team: Latest Activities of IntelBroker
SOCRadar Research Team: Latest Activities of IntelBroker
 
Accelerate Enterprise Software Engineering with Platformless
Accelerate Enterprise Software Engineering with PlatformlessAccelerate Enterprise Software Engineering with Platformless
Accelerate Enterprise Software Engineering with Platformless
 
Developing Distributed High-performance Computing Capabilities of an Open Sci...
Developing Distributed High-performance Computing Capabilities of an Open Sci...Developing Distributed High-performance Computing Capabilities of an Open Sci...
Developing Distributed High-performance Computing Capabilities of an Open Sci...
 
Into the Box 2024 - Keynote Day 2 Slides.pdf
Into the Box 2024 - Keynote Day 2 Slides.pdfInto the Box 2024 - Keynote Day 2 Slides.pdf
Into the Box 2024 - Keynote Day 2 Slides.pdf
 
Cracking the code review at SpringIO 2024
Cracking the code review at SpringIO 2024Cracking the code review at SpringIO 2024
Cracking the code review at SpringIO 2024
 
GlobusWorld 2024 Opening Keynote session
GlobusWorld 2024 Opening Keynote sessionGlobusWorld 2024 Opening Keynote session
GlobusWorld 2024 Opening Keynote session
 
TROUBLESHOOTING 9 TYPES OF OUTOFMEMORYERROR
TROUBLESHOOTING 9 TYPES OF OUTOFMEMORYERRORTROUBLESHOOTING 9 TYPES OF OUTOFMEMORYERROR
TROUBLESHOOTING 9 TYPES OF OUTOFMEMORYERROR
 

AWS Kinesis - Streams, Firehose, Analytics

  • 1. SERHAT CAN • @SRHTCN AWS Kinesis •
  • 2.
  • 3. Table of Contents Streaming data? Big Data Processing Approaches AWS Kinesis Family Amazon Kinesis Streams in detail Amazon Kinesis Firehose Amazon Kinesis Analytics
  • 4. Streaming Data: Life As It Happens After the event occurs -> at rest (batch) As the event occurs -> in motion (streaming)
  • 5. Big Data Processing Approaches • Common Big Data Processing Approaches • Query Engine Approach (Data Warehouse, SQL, NoSQL Databases) • Repeated queries over the same well-structured data • Pre-computations like indices and dimensional views improve query performance • Batch Engines (Map-Reduce) • The “query” is run on the data. There are no pre-computations • Streaming Big Data Processing Approach • Real-time response to content in semi-structured data streams • Relatively simple computations on data (aggregates, filters, sliding window, etc.) • Enables data lifecycle by moving data to different stores / open source systems
  • 7. Amazon Kinesis Streams • A fully managed service for real-time processing of high- volume, streaming data. • Kinesis can store and process terabytes of data an hour from hundreds of thousands of sources. • Data is replicated across multiple Availability Zones to ensure high durability and availability.
  • 9. Shard • Streams are made of Shards. A shard is the base throughput unit of an Amazon Kinesis stream. • One shard provides a capacity of 1MB/sec data input and 2MB/sec data output. • One shard can support up to 1000 PUT records per second. • You can monitor shard-level metrics in Amazon Kinesis Streams • Add or remove shards from your stream dynamically as your data throughput changes by resharding the stream.
  • 10. Data Record • A record is the unit of data stored in an Amazon Kinesis stream. • A record is composed of a; • partition key • sequence number, • data blob (the data you want to send) • The maximum size of a data blob (the data payload after Base64- decoding) is 1 megabyte (MB).
  • 11. Partition Key • Partition key is used to segregate and route data records to different shards of a stream. • A partition key is specified by your data producer while putting data into an Amazon Kinesis stream. • For example, assuming you have an Amazon Kinesis stream with two shards (Shard 1 and Shard 2). You can configure your data producer to use two partition keys (Key A and Key B) so that all data records with Key A are added to Shard 1 and all data records with Key B are added to Shard 2.
  • 12. Sequence Number • Each data record has a sequence number that is unique within its shard. • The sequence number is assigned by Streams after you write to the stream with client.putRecords or client.putRecord. • Sequence numbers for the same partition key generally increase over time; the longer the time period between write requests, the larger the sequence numbers become.
  • 13. Resharding the Stream • Streams supports resharding, which enables you to adjust the number of shards in your stream in order to adapt to changes in the rate of data flow through the stream. • There are two types of resharding operations: shard split and shard merge. • Shard split: divide a single shard into two shards. • Shard merge: combine two shards into a single shard.
  • 14. Resharding the Stream • Resharding is always “pairwise”: split into & merge more than two shards in a single operation is NOT allowed • Resharding is typically performed by an administrative application which is distinct from the producer (put) applications, and the consumer (get) applications • The administrative application would also need a broader set of IAM permissions for resharding
  • 15. Splitting a Shard • Specify how hash key values from the parent shard should be redistributed to the child shards • The possible hash key values for a given shard constitute a set of ordered contiguous non- negative integers. This range of possible hash key values is given by shard.getHashKeyRange().getStartingHashKey(); shard.getHashKeyRange().getEndingHashKey(); • When you split the shard, you specify a value in this range. • That hash key value and all higher hash key values are distributed to one of the child shards. • All the lower hash key values are distributed to the other child shard.
  • 16. Merging Two Shards • In order to merge two shards, the shards must be adjacent. • Two shards are considered adjacent if the union of the hash key ranges for the two shards form a contiguous set with no gaps. • To identify shards that are candidates for merging, you should filter out all shards that are in a CLOSED state. • Shards that are OPEN—that is, not CLOSED—have an ending sequence number of null.
  • 17. After Resharding • After you call a resharding operation, either splitShard or mergeShards, you need to wait for the stream to become active again. (like create) • In the process of resharding, a parent shard transitions from an OPEN state to a CLOSED state to an EXPIRED state. • When all is done back to ACTIVE state.
  • 18. Retention Period • Data records are accessible for a default of 24 hours from the time they are added to a stream • Configurable in hourly increments • From 24 to 168 hours (1 to 7 days)
  • 19. Amazon Kinesis Producer Library (KPL) • The KPL is an easy-to-use, highly configurable library that helps you write to a Amazon Kinesis stream. • Writes to one or more Amazon Kinesis streams with an automatic and configurable retry mechanism • Collects records and uses PutRecords to write multiple records to multiple shards per request • Aggregates user records to increase payload size and improve throughput • Integrates seamlessly with the Amazon Kinesis Client Library (KCL) to de-aggregate batched records on the consumer • Submits Amazon CloudWatch metrics on your behalf to provide visibility into producer performance
  • 20. • Develop a consumer application for Amazon Kinesis Streams • The KCL acts as an intermediary between your record processing logic and Streams. • KCL application instantiates a worker with configuration information, and then uses a record processor to process the data received from an Amazon Kinesis stream. • You can run a KCL application on any number of instances. Multiple instances of the same application coordinate on failures and load-balance dynamically. • You can also have multiple KCL applications working on the same stream, subject to throughput limits. Amazon Kinesis Client Library (Life Saver)
  • 21. Amazon Kinesis Client Library • Connects to the stream • Enumerates the shards • Coordinates shard associations with other workers (if any) • Instantiates a record processor for every shard it manages • Pulls data records from the stream • Pushes the records to the corresponding record processor • Checkpoints processed records • Balances shard-worker associations when the worker instance count changes • Balances shard-worker associations when shards are split or merged
  • 22. Amazon Kinesis Client Library • KCL uses a unique Amazon DynamoDB table to keep track of the application's state • KCL creates the table with a provisioned throughput of 10 reads per second and 10 writes per second • Each row in the DynamoDB table represents a shard that is being processed by your application. The hash key for the table is the shard ID.
  • 23. Amazon Kinesis Client Library • In addition to the shard ID, each row also includes the following data: • checkpoint: The most recent checkpoint sequence number for the shard. This value is unique across all shards in the stream. • checkpointSubSequenceNumber: When using the Kinesis Producer Library's aggregation feature, this is an extension to checkpoint that tracks individual user records within the Amazon Kinesis record. • leaseCounter: Used for lease versioning so that workers can detect that their lease has been taken by another worker. • leaseKey: A unique identifier for a lease. Each lease is particular to a shard in the stream and is held by one worker at a time. • leaseOwner: The worker that is holding this lease. • ownerSwitchesSinceCheckpoint: How many times this lease has changed workers since the last time a checkpoint was written. • parentShardId: Used to ensure that the parent shard is fully processed before processing starts on the child shards. This ensures that records are processed in the same order they were put into the stream.
  • 24. Using Shard Iterators • You retrieve records from the stream on a per- shard basis. 
 • AT_SEQUENCE_NUMBER • AFTER_SEQUENCE_NUMBER • AT_TIMESTAMP • TRIM_HORIZON • LATEST
  • 25. Recovering from Failures • Record Processor Failure • The worker invokes record processor methods using Java ExecutorService tasks. • If a task fails, the worker retains control of the shard that the record processor was processing. • The worker starts a new record processor task to process that shard • Worker or Application Failure • If a worker — or an instance of the Amazon Kinesis Streams application — fails, you should detect and handle the situation.
  • 26. Handling Duplicate Records (Idempotency) • There are two primary reasons why records may be delivered more than one time to your Amazon Kinesis Streams application: • producer retries • consumer retries • Your application must anticipate and appropriately handle processing individual records multiple times.
  • 27. Pricing • Shard Hour (1MB/second ingress, 2MB/second egress)$0.015 • PUT Payload Units, per 1,000,000 units $0.014 • Extended Data Retention (Up to 7 days), per Shard Hour $0.020 • DynamoDB price if you use KCL
  • 28. Kafka vs. Kinesis Streams • In Kafka you can configure, for each topic, the replication factor and how many replicas have to acknowledge a message before is considered successful.So you can definitely make it highly available. • Amazon ensures that you won't lose data, but that comes with a performance cost. 
 (messages are written to 3 different AZ’s synchronously) • There are several benchmarks online comparing Kafka and Kinesis, but the result it's always the same: you'll have a hard time to replicate Kafka's performance in Kinesis. At least for a reasonable price. • This is in part is because Kafka is insanely fast, but also because Kinesis writes each message synchronously to 3 different machines. And this is quite costly in terms of latency and throughput. • Kafka is one of the preferred options for the Apache stream processing frameworks • Unsurprisingly, Kinesis is really well integrated with other AWS services
  • 29. DynamoDB Streams vs. Kinesis Streams • DynamoDB Streams actions are similar to their counterparts in Amazon Kinesis Streams, they are not 100% identical. • You can write applications for Amazon Kinesis Streams using the Amazon Kinesis Client Library (KCL). • You can leverage the design patterns found within the KCL to process DynamoDB Streams shards and stream records. To do this, you use the DynamoDB Streams Kinesis Adapter
  • 30. SQS vs. Kinesis Streams • Amazon Kinesis Streams enables real-time processing of streaming big data. • It provides ordering of records, as well as the ability to read and/or replay records in the same order to multiple Amazon Kinesis Applications. • The Amazon Kinesis Client Library (KCL) delivers all records for a given partition key to the same record processor, making it easier to build multiple applications reading from the same Amazon Kinesis stream (for example, to perform counting, aggregation, and filtering). • Amazon Simple Queue Service (Amazon SQS) offers a reliable, highly scalable hosted queue for storing messages as they travel between computers. • Amazon SQS lets you easily move data between distributed application components and helps you build applications in which messages are processed independently (with message-level ack/fail semantics), such as automated workflows.
  • 32. Amazon Kinesis Firehose • Amazon Kinesis Firehose is the easiest way to load streaming data into AWS. • It can capture, transform, and load streaming data into Amazon Kinesis Analytics, Amazon S3, Amazon Redshift, and Amazon Elasticsearch Service • Fully managed service that automatically scales to match the throughput of your data and requires no ongoing administration. • It can also batch, compress, and encrypt the data before loading it, minimizing the amount of storage used at the destination and increasing security.
  • 33. Amazon Kinesis Analytics • Process streaming data in real time with standard SQL • Query streaming data or build entire streaming applications using SQL, so that you can gain actionable insights and respond to your business and customer needs promptly. • Scales automatically to match the volume and throughput rate of your incoming data • Only pay for the resources your queries consume. There is no minimum fee or setup cost.
  • 34. Amazon Kinesis Analytics Step 1: Configure Input Stream Step 2: Write your SQL queries Step 3: Configure Output Stream
  • 35. Thank you! Time to show you real life examples from OpsGenie