SlideShare a Scribd company logo
1 of 50
Real-time Data Processing/ Internet of Things 
(IoT) with Amazon Kinesis 
Amit Sharma
Amazon Kinesis 
Managed Service for Streaming Data Ingestion & Processing 
v 
o Origins of Kinesis 
 The motivation for continuous, real-time processing 
 Developing the ‘Right tool for the right job’ 
o What can you do with streaming data today? 
 Customer Scenarios 
 Current approaches 
o What is Amazon Kinesis? 
 Kinesis is a building block 
 Putting data into Kinesis 
 Getting data from Kinesis Streams: Building applications with KCL 
o Connecting Amazon Kinesis to other systems 
 Moving data into S3, DynamoDB, Redshift 
 Leveraging existing Lambda, EMR, Storm infrastructure 
Click Stream 
Demo
The Motivation for Continuous 
Processing
Some statistics about what AWS Data Services 
v 
• Metering service 
• 10s of millions records per second 
• Terabytes per hour 
• Hundreds of thousands of sources 
• Auditors guarantee 100% accuracy at month end 
• Data Warehouse 
• 100s extract-transform-load (ETL) jobs every day 
• Hundreds of thousands of files per load cycle 
• Hundreds of daily users 
• Hundreds of queries per hour
Metering Service
Internal AWS Metering Service 
v 
Workload 
• 10s of millions records/sec 
• Multiple TB per hour 
• 100,000s of sources 
Pain points 
• Doesn’t scale elastically 
• Customers want real-time 
alerts 
• Expensive to operate 
• Relies on eventually consistent 
storage
v 
Our Big Data Transition 
Old requirements 
• Capture huge amounts of data and process it in hourly or daily batches 
New requirements 
• Make decisions faster, sometimes in real-time 
• Scale entire system elastically 
• Make it easy to “keep everything” 
• Multiple applications can process data in parallel
A General Purpose Data Flow 
Many different technologies, at different stages of evolution 
Client/Sensor Aggregator Continuous 
Processing 
Storage Analytics + 
Reporting 
?
Big data comes from the small 
v 
{ 
"payerId": "Joe", 
"productCode": "AmazonS3", 
"clientProductCode": "AmazonS3", 
"usageType": "Bandwidth", 
"operation": "PUT", 
"value": "22490", 
"timestamp": "1216674828" 
} 
Metering Record 
127.0.0.1 user-identifier frank [10/Oct/2000:13:55:36 - 
0700] "GET /apache_pb.gif HTTP/1.0" 200 2326 
Common Log Entry 
<165>1 2003-10-11T22:14:15.003Z 
mymachine.example.com evntslog - ID47 
[exampleSDID@32473 iut="3" 
eventSource="Application" 
eventID="1011"][examplePriority@32473 
class="high"] 
Syslog Entry 
“SeattlePublicWater/Kinesis/123/Realtime” 
– 412309129140 
MQTT Record <R,AMZN ,T,G,R1> 
NASDAQ OMX Record
v 
Kinesis 
Movement or activity in response to a stimulus. 
A fully managed service for real-time processing of high-volume, 
streaming data. Kinesis can store and process 
terabytes of data an hour from hundreds of thousands of 
sources. Data is replicated across multiple Availability 
Zones to ensure high durability and availability.
Customer View
Customer Scenarios across Industry Segments 
Scenarios Accelerated Ingest-Transform-Load Continual Metrics/ KPI Extraction Responsive Data Analysis 
Data Types 
IT infrastructure, Applications logs, Social media, Fin. Market data, Web Clickstreams, Sensors, Geo/Location data 
Software/ 
Technology 
IT server , App logs ingestion IT operational metrics dashboards Devices / Sensor Operational 
Intelligence 
Digital Ad Tech./ 
Marketing 
Advertising Data aggregation Advertising metrics like coverage, 
yield, conversion 
Analytics on User engagement with 
Ads, Optimized bid/ buy engines 
Financial Services Market/ Financial Transaction order data 
collection 
Financial market data metrics Fraud monitoring, and Value-at-Risk 
assessment, Auditing of market order 
data 
Consumer Online/ 
E-Commerce 
Online customer engagement data 
aggregation 
Consumer engagement metrics like 
page views, CTR 
Customer clickstream analytics, 
Recommendation engines 
1 2 3
What Biz. Problem needs to be solved? 
Mobile/ Social Gaming Digital Advertising Tech. 
Deliver continuous/ real-time delivery of game insight 
data by 100’s of game servers 
Generate real-time metrics, KPIs for online ad performance 
for advertisers/ publishers 
Custom-built solutions operationally complex to 
manage, & not scalable 
Store + Forward fleet of log servers, and Hadoop based 
processing pipeline 
• Delay with critical business data delivery 
• Developer burden in building reliable, scalable 
platform for real-time data ingestion/ processing 
• Slow-down of real-time customer insights 
• Lost data with Store/ Forward layer 
• Operational burden in managing reliable, scalable platform 
for real-time data ingestion/ processing 
• Batch-driven real-time customer insights 
Accelerate time to market of elastic, real-time 
applications – while minimizing operational overhead 
Generate freshest analytics on advertiser performance to 
optimize marketing spend, and increase responsiveness to 
clients
v 
Smart Devices 
Internet Of Things
v 
Smart Devices 
Internet Of Things
v 
Smart Devices 
Internet Of Things
v 
Smart? Devices 
Internet Of Things 
Arduino Uno Raspberry Pi 
CPU 20MHz 8bit 700MHz 32bit 
Memory 2 KB 512 MB 
Storage 32 KB SD card
Camera 
v 
Microphone 
Thermometer 
Distance 
GPS 
Gyroscope 
Accelerometer 
Actuator 
Relay 
Motor 
Manipulator 
Pressure Switch 
Wheel 
Propeller 
Rotor
v 
Challenges
v 
Challenges 
Thousands – Millions of 
Devices / Producers 
Thousands – Millions of 
Users / Consumers
v 
Distributed 
Thousands – Millions of 
Devices / Producers 
Thousands – Millions of 
Users / Consumers
v 
At Scale 
Thousands – Millions of 
Devices / Producers 
Thousands – Millions of 
Users / Consumers
v
‘Typical’ Technology Solution Set 
Solution Architecture Set 
o Streaming Data Ingestion 
 Kafka 
 Flume 
 Kestrel / Scribe 
 RabbitMQ / AMQP 
o Streaming Data Processing 
 Storm 
o Do-It-yourself (AWS) based solution 
 EC2: Logging/ pass through servers 
 EBS: holds log/ other data snapshots 
 SQS: Queue data store 
 S3: Persistence store 
 EMR: workflow to ingest data from S3 and 
process 
o Exploring Continual data Ingestion & 
Processing 
Solution Architecture Considerations 
Flexibility: Select the most appropriate software, and configure 
underlying infrastructure 
Control: Software and hardware can be tuned to meet specific 
business and scenario needs. 
Ongoing Operational Complexity: Deploy, and manage an 
end-to-end system 
Infrastructure planning and maintenance: Managing a 
reliable, scalable infrastructure 
Developer/ IT staff expense: Developers, Devops and IT staff 
time and energy expended 
Software Maintenance : Tech. and professional services support
Foundation for Data Streams Ingestion, Continuous Processing 
Right Toolset for the Right Job 
Real-time Ingest 
• Highly Scalable 
• Durable 
• Elastic 
• Replay-able Reads 
Continuous Processing FX 
• Load-balancing incoming streams 
• Fault-tolerance, Checkpoint / Replay 
• Elastic 
• Enable multiple apps to process in parallel 
Continuous, real-time workloads 
Managed Service 
Low end-to-end latency 
Enable data movement into Stores/ Processing Engines
Kinesis Architecture 
AZ AZ AZ 
Durable, highly consistent storage replicates data 
across three data centers (availability zones) 
Amazon Web Services 
Aggregate and 
archive to S3 
Millions of 
sources producing 
100s of terabytes 
per hour 
Front 
End 
Authentication 
Authorization 
Ordered stream 
of events supports 
multiple readers 
Real-time 
dashboards 
and alarms 
Machine learning 
algorithms or 
sliding window 
analytics 
Aggregate analysis 
in Hadoop or a 
data warehouse 
Inexpensive: $0.028 per million puts 
Run code in response to an event and 
automatically manage compute.
Amazon Kinesis – An Overview
Kinesis Stream: 
Managed ability to capture and store data 
• Streams are made of Shards 
• Each Shard ingests data up to 
1MB/sec, and up to 1000 TPS 
• Each Shard emits up to 2 MB/sec 
• All data is stored for 24 hours 
• Scale Kinesis streams by adding or 
removing Shards 
• Replay data inside of 24Hr. Window
Putting Data into Kinesis 
Simple Put interface to store data in Kinesis 
• Producers use a PUT call to store data in a Stream 
• PutRecord {Data, PartitionKey, StreamName} 
• A Partition Key is supplied by producer and used to 
distribute the PUTs across Shards 
• Kinesis MD5 hashes supplied partition key over the hash 
key range of a Shard 
• A unique Sequence # is returned to the Producer upon a 
successful PUT call
Creating and Sizing a Kinesis Stream
Getting Started with Kinesis – Writing to a Stream 
POST / HTTP/1.1 
Host: kinesis.<region>.<domain> 
x-amz-Date: <Date> 
Authorization: AWS4-HMAC-SHA256 Credential=<Credential>, SignedHeaders=content-type; 
date;host;user-agent;x-amz-date;x-amz-target;x-amzn-requestid, 
Signature=<Signature> 
User-Agent: <UserAgentString> 
Content-Type: application/x-amz-json-1.1 
Content-Length: <PayloadSizeBytes> 
Connection: Keep-Alive 
X-Amz-Target: Kinesis_20131202.PutRecord 
{ 
"StreamName": "exampleStreamName", 
"Data": "XzxkYXRhPl8x", 
"PartitionKey": "partitionKey" 
}
Sending & Reading Data from Kinesis Streams 
Sending Reading 
HTTP Post 
AWS SDK 
LOG4J 
Flume 
Fluentd 
Get* APIs 
Kinesis Client Library 
+ 
Connector Library 
Apache 
Storm 
Amazon Elastic 
MapReduce 
http://aws.amazon.com/kinesis/developer-resources/
Building Kinesis Processing Apps: Kinesis Client Library 
Client library for fault-tolerant, at least-once, Continuous Processing 
o Java client library, source available on Github 
o Build & Deploy app with KCL on your EC2 instance(s) 
o KCL is intermediary b/w your application & stream 
 Automatically starts a Kinesis Worker for each shard 
 Simplifies reading by abstracting individual shards 
 Increase / Decrease Workers as # of shards changes 
 Checkpoints to keep track of a Worker’s location in the 
stream, Restarts Workers if they fail 
o Integrates with AutoScaling groups to redistribute workers to 
new instances
Processing Data with Kinesis : Sample RecordProcessor 
public class SampleRecordProcessor implements IRecordProcessor { 
@Override 
public void initialize(String shardId) { 
LOG.info("Initializing record processor for shard: " + shardId); 
this.kinesisShardId = shardId; 
} 
@Override 
public void processRecords(List<Record> records, IRecordProcessorCheckpointer checkpointer) { 
LOG.info("Processing " + records.size() + " records for kinesisShardId " + kinesisShardId); 
// Process records and perform all exception handling. 
processRecordsWithRetries(records); 
// Checkpoint once every checkpoint interval. 
if (System.currentTimeMillis() > nextCheckpointTimeInMillis) { 
checkpoint(checkpointer); 
nextCheckpointTimeInMillis = System.currentTimeMillis() + CHECKPOINT_INTERVAL_MILLIS; 
} 
} 
}
Processing Data with Kinesis : Sample Worker 
IRecordProcessorFactory recordProcessorFactory = new 
SampleRecordProcessorFactory(); 
Worker worker = new Worker(recordProcessorFactory, 
kinesisClientLibConfiguration); 
int exitCode = 0; 
try { 
worker.run(); 
} catch (Throwable t) { 
LOG.error("Caught throwable while processing data.", t); 
exitCode = 1; 
}
Amazon Kinesis Connector Library 
Customizable, Open Source code to Connect Kinesis with S3, Redshift, DynamoDB 
S3 
DynamoDB 
Redshift 
Kinesis 
ITransformer 
• Defines the 
transformation 
of records 
from the 
Amazon 
Kinesis 
stream in 
order to suit 
the user-defined 
data 
model 
IFilter 
• Excludes 
irrelevant 
records from 
the 
processing. 
IBuffer 
• Buffers the set 
of records to 
be processed 
by specifying 
size limit (# of 
records)& 
total byte 
count 
IEmitter 
• Makes client 
calls to other 
AWS services 
and persists 
the records 
stored in the 
buffer.
More Options to read from Kinesis Streams 
Leveraging Get APIs, existing Storm topologies 
o Use the Get APIs for raw reads of Kinesis data streams 
• GetRecords {Limit, ShardIterator} 
• GetShardIterator {ShardId, ShardIteratorType, StartingSequenceNumber, 
StreamName} 
o Integrate Kinesis Streams with Storm Topologies 
• Bootstraps, via Zookeeper to map Shards to Spout tasks 
• Fetches data from Kinesis stream 
• Emits tuples and Checkpoints (in Zookeeper)
Using EMR to read, and process data from Kinesis Streams 
v 
Input 
• Dev 
Processing 
• User 
My Website 
Kinesis 
Log4J 
Appender 
push to 
Kinesis 
EMR – AMI 3.0.5 
Hive 
Pig 
Cascading 
MapReduce 
pull from
Hadoop ecosystem Implementation & Features 
v 
• Logical names 
• Labels that define units of work 
(Job A vs Job B) 
• Checkpoints 
• Creating an input start and end 
points to allow batch processing 
• Error Handling 
• Service errors 
• Retries 
• Iterations 
• Provide idempotency 
(pessimistic locking of the Logical 
name) 
Hadoop Input format 
Hive Storage Handler 
Pig Load Function 
Cascading Scheme and Tap
v 
Intended use 
• Unlock the power of Hadoop on fresh 
data 
• Join multiple data sources for analysis 
• Filter and preprocess streams 
• Export and archive streaming data
Customers using Amazon Kinesis 
Mobile/ Social Gaming Digital Advertising Tech. 
Deliver continuous/ real-time delivery of game insight 
data by 100’s of game servers 
Generate real-time metrics, KPIs for online ad performance 
for advertisers/ publishers 
Custom-built solutions operationally complex to 
manage, & not scalable 
Store + Forward fleet of log servers, and Hadoop based 
processing pipeline 
• Delay with critical business data delivery 
• Developer burden in building reliable, scalable 
platform for real-time data ingestion/ processing 
• Slow-down of real-time customer insights 
• Lost data with Store/ Forward layer 
• Operational burden in managing reliable, scalable platform 
for real-time data ingestion/ processing 
• Batch-driven real-time customer insights 
Accelerate time to market of elastic, real-time 
applications – while minimizing operational overhead 
Generate freshest analytics on advertiser performance to 
optimize marketing spend, and increase responsiveness to 
clients
Gaming Analytics with Amazon Kinesis
Digital Ad. Tech Metering with Kinesis 
Incremental Ad. 
Statistics 
Computation 
Continuous Ad 
Metrics Extraction 
Metering Record Archive 
Ad Analytics Dashboard
Kinesis Pricing 
Simple, Pay-as-you-go, & no up-front costs 
Pricing Dimension Value 
Hourly Shard Rate $0.015 
Per 1,000,000 PUT 
$0.028 
transactions: 
• Customers specify throughput requirements in shards, that they control 
• Each Shard delivers 1 MB/s on ingest, and 2MB/s on egress 
• Inbound data transfer is free 
• EC2 instance charges apply for Kinesis processing applications
v 
Demo Time.
46 
Amazon Kinesis: Key Developer Benefits 
Easy Administration 
Managed service for real-time streaming 
data collection, processing and analysis. 
Simply create a new stream, set the 
desired level of capacity, and let the 
service handle the rest. 
Real-time Performance 
Perform continual processing on streaming 
big data. Processing latencies fall to a few 
seconds, compared with the minutes or 
hours associated with batch processing. 
High Throughput. Elastic 
Seamlessly scale to match your data 
throughput rate and volume. You can 
easily scale up to gigabytes per second. 
The service will scale up or down based on 
your operational or business needs. 
S3, Redshift, & DynamoDB Integration 
Reliably collect, process, and transform all 
of your data in real-time & deliver to AWS 
data stores of choice, with Connectors for 
S3, Redshift, and DynamoDB. 
Build Real-time Applications 
Client libraries that enable developers to 
design and operate real-time streaming 
data processing applications. 
Low Cost 
Cost-efficient for workloads of any scale. 
You can get started by provisioning a small 
stream, and pay low hourly rates only for 
what you use.
v 
Try out Amazon Kinesis 
• Try out Amazon Kinesis 
• http://aws.amazon.com/kinesis/ 
• Thumb through the Developer Guide 
• http://aws.amazon.com/documentation/kinesis/ 
• Visit, and Post on Kinesis Forum 
• https://forums.aws.amazon.com/forum.jspa?forumID=169#
Online Labs | Training 
Gain confidence and hands-on 
experience with AWS. Watch free 
Instructional Videos and explore Self- 
Paced Labs 
Instructor Led Classes 
Learn how to design, deploy and operate 
highly available, cost-effective and secure 
applications on AWS in courses led by 
qualified AWS instructors 
AWS Certification 
Validate your technical expertise 
with AWS and use practice exams 
to help you prepare for AWS 
Certification 
http://aws.amazon.com/training
Day 5 - Real-time Data Processing/Internet of Things (IoT) with Amazon Kinesis
Day 5 - Real-time Data Processing/Internet of Things (IoT) with Amazon Kinesis

More Related Content

What's hot

What's hot (20)

AWS 101
AWS 101AWS 101
AWS 101
 
(DVO315) Log, Monitor and Analyze your IT with Amazon CloudWatch
(DVO315) Log, Monitor and Analyze your IT with Amazon CloudWatch(DVO315) Log, Monitor and Analyze your IT with Amazon CloudWatch
(DVO315) Log, Monitor and Analyze your IT with Amazon CloudWatch
 
Amazon Kinesis
Amazon KinesisAmazon Kinesis
Amazon Kinesis
 
A Brief Look at Serverless Architecture
A Brief Look at Serverless ArchitectureA Brief Look at Serverless Architecture
A Brief Look at Serverless Architecture
 
Amazon RDS Proxy 집중 탐구 - 윤석찬 :: AWS Unboxing 온라인 세미나
Amazon RDS Proxy 집중 탐구 - 윤석찬 :: AWS Unboxing 온라인 세미나Amazon RDS Proxy 집중 탐구 - 윤석찬 :: AWS Unboxing 온라인 세미나
Amazon RDS Proxy 집중 탐구 - 윤석찬 :: AWS Unboxing 온라인 세미나
 
Getting Started with Amazon Kinesis
Getting Started with Amazon KinesisGetting Started with Amazon Kinesis
Getting Started with Amazon Kinesis
 
Introduction to AWS Cost Management
Introduction to AWS Cost ManagementIntroduction to AWS Cost Management
Introduction to AWS Cost Management
 
Introduction to Serverless
Introduction to ServerlessIntroduction to Serverless
Introduction to Serverless
 
AWS Lambda
AWS LambdaAWS Lambda
AWS Lambda
 
Data Driven Application with GraphQL and AWS App Sync
Data Driven Application with GraphQL and AWS App SyncData Driven Application with GraphQL and AWS App Sync
Data Driven Application with GraphQL and AWS App Sync
 
Cost efficiencies and security best practices with Amazon S3 storage - STG301...
Cost efficiencies and security best practices with Amazon S3 storage - STG301...Cost efficiencies and security best practices with Amazon S3 storage - STG301...
Cost efficiencies and security best practices with Amazon S3 storage - STG301...
 
Real-Time Streaming: Intro to Amazon Kinesis
Real-Time Streaming: Intro to Amazon KinesisReal-Time Streaming: Intro to Amazon Kinesis
Real-Time Streaming: Intro to Amazon Kinesis
 
Introduction to AWS Batch
Introduction to AWS BatchIntroduction to AWS Batch
Introduction to AWS Batch
 
Partnering with AWS
Partnering with AWSPartnering with AWS
Partnering with AWS
 
Introduction to Amazon EC2
Introduction to Amazon EC2Introduction to Amazon EC2
Introduction to Amazon EC2
 
Big Data Architectural Patterns
Big Data Architectural PatternsBig Data Architectural Patterns
Big Data Architectural Patterns
 
AWS Lambda
AWS LambdaAWS Lambda
AWS Lambda
 
Driving Business Insights with a Modern Data Architecture AWS Summit SG 2017
Driving Business Insights with a Modern Data Architecture  AWS Summit SG 2017Driving Business Insights with a Modern Data Architecture  AWS Summit SG 2017
Driving Business Insights with a Modern Data Architecture AWS Summit SG 2017
 
AWS CloudFormation Masterclass
AWS CloudFormation MasterclassAWS CloudFormation Masterclass
AWS CloudFormation Masterclass
 
Best Practices for Partnering with AWS
Best Practices for Partnering with AWSBest Practices for Partnering with AWS
Best Practices for Partnering with AWS
 

Viewers also liked

Data preprocessing
Data preprocessingData preprocessing
Data preprocessing
suganmca14
 

Viewers also liked (20)

Day 4 - Big Data on AWS - RedShift, EMR & the Internet of Things
Day 4 - Big Data on AWS - RedShift, EMR & the Internet of ThingsDay 4 - Big Data on AWS - RedShift, EMR & the Internet of Things
Day 4 - Big Data on AWS - RedShift, EMR & the Internet of Things
 
AWS re:Invent 2016: Understanding IoT Data: How to Leverage Amazon Kinesis in...
AWS re:Invent 2016: Understanding IoT Data: How to Leverage Amazon Kinesis in...AWS re:Invent 2016: Understanding IoT Data: How to Leverage Amazon Kinesis in...
AWS re:Invent 2016: Understanding IoT Data: How to Leverage Amazon Kinesis in...
 
Stream Data Analytics with Amazon Kinesis Firehose & Redshift - AWS August We...
Stream Data Analytics with Amazon Kinesis Firehose & Redshift - AWS August We...Stream Data Analytics with Amazon Kinesis Firehose & Redshift - AWS August We...
Stream Data Analytics with Amazon Kinesis Firehose & Redshift - AWS August We...
 
Apache NiFi- MiNiFi meetup Slides
Apache NiFi- MiNiFi meetup SlidesApache NiFi- MiNiFi meetup Slides
Apache NiFi- MiNiFi meetup Slides
 
AWS March 2016 Webinar Series - AWS IoT Real Time Stream Processing with AWS ...
AWS March 2016 Webinar Series - AWS IoT Real Time Stream Processing with AWS ...AWS March 2016 Webinar Series - AWS IoT Real Time Stream Processing with AWS ...
AWS March 2016 Webinar Series - AWS IoT Real Time Stream Processing with AWS ...
 
(MBL305) You Have Data from the Devices, Now What?: Getting the Value of the IoT
(MBL305) You Have Data from the Devices, Now What?: Getting the Value of the IoT(MBL305) You Have Data from the Devices, Now What?: Getting the Value of the IoT
(MBL305) You Have Data from the Devices, Now What?: Getting the Value of the IoT
 
AWS re:Invent 2016: Analyzing Streaming Data in Real-time with Amazon Kinesis...
AWS re:Invent 2016: Analyzing Streaming Data in Real-time with Amazon Kinesis...AWS re:Invent 2016: Analyzing Streaming Data in Real-time with Amazon Kinesis...
AWS re:Invent 2016: Analyzing Streaming Data in Real-time with Amazon Kinesis...
 
From java to scala at crowd mix
From java to scala at crowd mixFrom java to scala at crowd mix
From java to scala at crowd mix
 
Big Data Goes Airborne. Propelling Your Big Data Initiative with Ironcluster ...
Big Data Goes Airborne. Propelling Your Big Data Initiative with Ironcluster ...Big Data Goes Airborne. Propelling Your Big Data Initiative with Ironcluster ...
Big Data Goes Airborne. Propelling Your Big Data Initiative with Ironcluster ...
 
New Era of Software with modern Application Security (v0.6)
New Era of Software with modern Application Security (v0.6)New Era of Software with modern Application Security (v0.6)
New Era of Software with modern Application Security (v0.6)
 
Scale, baby, scale!
Scale, baby, scale!Scale, baby, scale!
Scale, baby, scale!
 
Internet of things architecture perspective - IndicThreads Conference
Internet of things architecture perspective - IndicThreads ConferenceInternet of things architecture perspective - IndicThreads Conference
Internet of things architecture perspective - IndicThreads Conference
 
A modern IoT data processing toolbox
A modern IoT data processing toolboxA modern IoT data processing toolbox
A modern IoT data processing toolbox
 
Introduction to Tmux - Codementor Tmux Office Hours Part 1
Introduction to Tmux - Codementor Tmux Office Hours Part 1Introduction to Tmux - Codementor Tmux Office Hours Part 1
Introduction to Tmux - Codementor Tmux Office Hours Part 1
 
Distributed tracing - get a grasp on your production
Distributed tracing - get a grasp on your productionDistributed tracing - get a grasp on your production
Distributed tracing - get a grasp on your production
 
What is codename one
What is codename oneWhat is codename one
What is codename one
 
Big data lambda architecture - Streaming Layer Hands On
Big data lambda architecture - Streaming Layer Hands OnBig data lambda architecture - Streaming Layer Hands On
Big data lambda architecture - Streaming Layer Hands On
 
Data preprocessing
Data preprocessingData preprocessing
Data preprocessing
 
AWS Data Transfer Services: Data Ingest Strategies Into the AWS Cloud
AWS Data Transfer Services: Data Ingest Strategies Into the AWS CloudAWS Data Transfer Services: Data Ingest Strategies Into the AWS Cloud
AWS Data Transfer Services: Data Ingest Strategies Into the AWS Cloud
 
Databus - LinkedIn's Change Data Capture Pipeline
Databus - LinkedIn's Change Data Capture PipelineDatabus - LinkedIn's Change Data Capture Pipeline
Databus - LinkedIn's Change Data Capture Pipeline
 

Similar to Day 5 - Real-time Data Processing/Internet of Things (IoT) with Amazon Kinesis

Barga IC2E & IoTDI'16 Keynote
Barga IC2E & IoTDI'16 KeynoteBarga IC2E & IoTDI'16 Keynote
Barga IC2E & IoTDI'16 Keynote
Roger Barga
 

Similar to Day 5 - Real-time Data Processing/Internet of Things (IoT) with Amazon Kinesis (20)

Real-time Streaming and Querying with Amazon Kinesis and Amazon Elastic MapRe...
Real-time Streaming and Querying with Amazon Kinesis and Amazon Elastic MapRe...Real-time Streaming and Querying with Amazon Kinesis and Amazon Elastic MapRe...
Real-time Streaming and Querying with Amazon Kinesis and Amazon Elastic MapRe...
 
AWS Webcast - Introduction to Amazon Kinesis
AWS Webcast - Introduction to Amazon KinesisAWS Webcast - Introduction to Amazon Kinesis
AWS Webcast - Introduction to Amazon Kinesis
 
Amazon Kinesis Platform – The Complete Overview - Pop-up Loft TLV 2017
Amazon Kinesis Platform – The Complete Overview - Pop-up Loft TLV 2017Amazon Kinesis Platform – The Complete Overview - Pop-up Loft TLV 2017
Amazon Kinesis Platform – The Complete Overview - Pop-up Loft TLV 2017
 
AWS APAC Webinar Week - Real Time Data Processing with Kinesis
AWS APAC Webinar Week - Real Time Data Processing with KinesisAWS APAC Webinar Week - Real Time Data Processing with Kinesis
AWS APAC Webinar Week - Real Time Data Processing with Kinesis
 
(SDD405) Amazon Kinesis Deep Dive | AWS re:Invent 2014
(SDD405) Amazon Kinesis Deep Dive | AWS re:Invent 2014(SDD405) Amazon Kinesis Deep Dive | AWS re:Invent 2014
(SDD405) Amazon Kinesis Deep Dive | AWS re:Invent 2014
 
Real-time Streaming and Querying with Amazon Kinesis and Amazon Elastic MapRe...
Real-time Streaming and Querying with Amazon Kinesis and Amazon Elastic MapRe...Real-time Streaming and Querying with Amazon Kinesis and Amazon Elastic MapRe...
Real-time Streaming and Querying with Amazon Kinesis and Amazon Elastic MapRe...
 
Barga IC2E & IoTDI'16 Keynote
Barga IC2E & IoTDI'16 KeynoteBarga IC2E & IoTDI'16 Keynote
Barga IC2E & IoTDI'16 Keynote
 
Choose Right Stream Storage: Amazon Kinesis Data Streams vs MSK
Choose Right Stream Storage: Amazon Kinesis Data Streams vs MSKChoose Right Stream Storage: Amazon Kinesis Data Streams vs MSK
Choose Right Stream Storage: Amazon Kinesis Data Streams vs MSK
 
BDA307 Real-time Streaming Applications on AWS, Patterns and Use Cases
BDA307 Real-time Streaming Applications on AWS, Patterns and Use CasesBDA307 Real-time Streaming Applications on AWS, Patterns and Use Cases
BDA307 Real-time Streaming Applications on AWS, Patterns and Use Cases
 
Deep Dive and Best Practices for Real Time Streaming Applications
Deep Dive and Best Practices for Real Time Streaming ApplicationsDeep Dive and Best Practices for Real Time Streaming Applications
Deep Dive and Best Practices for Real Time Streaming Applications
 
Amazon Kinesis Data Streams Vs Msk (1).pptx
Amazon Kinesis Data Streams Vs Msk (1).pptxAmazon Kinesis Data Streams Vs Msk (1).pptx
Amazon Kinesis Data Streams Vs Msk (1).pptx
 
Getting Started with Real-time Analytics
Getting Started with Real-time AnalyticsGetting Started with Real-time Analytics
Getting Started with Real-time Analytics
 
AWS April 2016 Webinar Series - Getting Started with Real-Time Data Analytics...
AWS April 2016 Webinar Series - Getting Started with Real-Time Data Analytics...AWS April 2016 Webinar Series - Getting Started with Real-Time Data Analytics...
AWS April 2016 Webinar Series - Getting Started with Real-Time Data Analytics...
 
AWS를 활용한 첫 빅데이터 프로젝트 시작하기(김일호)- AWS 웨비나 시리즈 2015
AWS를 활용한 첫 빅데이터 프로젝트 시작하기(김일호)- AWS 웨비나 시리즈 2015AWS를 활용한 첫 빅데이터 프로젝트 시작하기(김일호)- AWS 웨비나 시리즈 2015
AWS를 활용한 첫 빅데이터 프로젝트 시작하기(김일호)- AWS 웨비나 시리즈 2015
 
Structure Data 2014: BIG DATA ANALYTICS RE-INVENTED, Ryan Waite
Structure Data 2014: BIG DATA ANALYTICS RE-INVENTED, Ryan WaiteStructure Data 2014: BIG DATA ANALYTICS RE-INVENTED, Ryan Waite
Structure Data 2014: BIG DATA ANALYTICS RE-INVENTED, Ryan Waite
 
Real-Time Streaming Data on AWS
Real-Time Streaming Data on AWSReal-Time Streaming Data on AWS
Real-Time Streaming Data on AWS
 
[DSC Europe 23] Pramod Immaneni - Real-time analytics at IoT scale
[DSC Europe 23] Pramod Immaneni - Real-time analytics at IoT scale[DSC Europe 23] Pramod Immaneni - Real-time analytics at IoT scale
[DSC Europe 23] Pramod Immaneni - Real-time analytics at IoT scale
 
Deep dive and best practices on real time streaming applications nyc-loft_oct...
Deep dive and best practices on real time streaming applications nyc-loft_oct...Deep dive and best practices on real time streaming applications nyc-loft_oct...
Deep dive and best practices on real time streaming applications nyc-loft_oct...
 
AWS Webcast - AWS Kinesis Webinar
AWS Webcast - AWS Kinesis WebinarAWS Webcast - AWS Kinesis Webinar
AWS Webcast - AWS Kinesis Webinar
 
BDA307 Real-time Streaming Applications on AWS, Patterns and Use Cases
BDA307 Real-time Streaming Applications on AWS, Patterns and Use CasesBDA307 Real-time Streaming Applications on AWS, Patterns and Use Cases
BDA307 Real-time Streaming Applications on AWS, Patterns and Use Cases
 

More from Amazon Web Services

Tools for building your MVP on AWS
Tools for building your MVP on AWSTools for building your MVP on AWS
Tools for building your MVP on AWS
Amazon Web Services
 
How to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckHow to Build a Winning Pitch Deck
How to Build a Winning Pitch Deck
Amazon Web Services
 
Building a web application without servers
Building a web application without serversBuilding a web application without servers
Building a web application without servers
Amazon Web Services
 
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
Amazon Web Services
 

More from Amazon Web Services (20)

Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
 
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
 
Esegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateEsegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS Fargate
 
Costruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSCostruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWS
 
Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot
 
Open banking as a service
Open banking as a serviceOpen banking as a service
Open banking as a service
 
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
 
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
 
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsMicrosoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
 
Computer Vision con AWS
Computer Vision con AWSComputer Vision con AWS
Computer Vision con AWS
 
Database Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareDatabase Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatare
 
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSCrea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
 
API moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAPI moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e web
 
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareDatabase Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
 
Tools for building your MVP on AWS
Tools for building your MVP on AWSTools for building your MVP on AWS
Tools for building your MVP on AWS
 
How to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckHow to Build a Winning Pitch Deck
How to Build a Winning Pitch Deck
 
Building a web application without servers
Building a web application without serversBuilding a web application without servers
Building a web application without servers
 
Fundraising Essentials
Fundraising EssentialsFundraising Essentials
Fundraising Essentials
 
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
 
Introduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceIntroduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container Service
 

Recently uploaded

EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
Earley Information Science
 

Recently uploaded (20)

Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 

Day 5 - Real-time Data Processing/Internet of Things (IoT) with Amazon Kinesis

  • 1. Real-time Data Processing/ Internet of Things (IoT) with Amazon Kinesis Amit Sharma
  • 2. Amazon Kinesis Managed Service for Streaming Data Ingestion & Processing v o Origins of Kinesis  The motivation for continuous, real-time processing  Developing the ‘Right tool for the right job’ o What can you do with streaming data today?  Customer Scenarios  Current approaches o What is Amazon Kinesis?  Kinesis is a building block  Putting data into Kinesis  Getting data from Kinesis Streams: Building applications with KCL o Connecting Amazon Kinesis to other systems  Moving data into S3, DynamoDB, Redshift  Leveraging existing Lambda, EMR, Storm infrastructure Click Stream Demo
  • 3. The Motivation for Continuous Processing
  • 4. Some statistics about what AWS Data Services v • Metering service • 10s of millions records per second • Terabytes per hour • Hundreds of thousands of sources • Auditors guarantee 100% accuracy at month end • Data Warehouse • 100s extract-transform-load (ETL) jobs every day • Hundreds of thousands of files per load cycle • Hundreds of daily users • Hundreds of queries per hour
  • 6. Internal AWS Metering Service v Workload • 10s of millions records/sec • Multiple TB per hour • 100,000s of sources Pain points • Doesn’t scale elastically • Customers want real-time alerts • Expensive to operate • Relies on eventually consistent storage
  • 7. v Our Big Data Transition Old requirements • Capture huge amounts of data and process it in hourly or daily batches New requirements • Make decisions faster, sometimes in real-time • Scale entire system elastically • Make it easy to “keep everything” • Multiple applications can process data in parallel
  • 8. A General Purpose Data Flow Many different technologies, at different stages of evolution Client/Sensor Aggregator Continuous Processing Storage Analytics + Reporting ?
  • 9. Big data comes from the small v { "payerId": "Joe", "productCode": "AmazonS3", "clientProductCode": "AmazonS3", "usageType": "Bandwidth", "operation": "PUT", "value": "22490", "timestamp": "1216674828" } Metering Record 127.0.0.1 user-identifier frank [10/Oct/2000:13:55:36 - 0700] "GET /apache_pb.gif HTTP/1.0" 200 2326 Common Log Entry <165>1 2003-10-11T22:14:15.003Z mymachine.example.com evntslog - ID47 [exampleSDID@32473 iut="3" eventSource="Application" eventID="1011"][examplePriority@32473 class="high"] Syslog Entry “SeattlePublicWater/Kinesis/123/Realtime” – 412309129140 MQTT Record <R,AMZN ,T,G,R1> NASDAQ OMX Record
  • 10. v Kinesis Movement or activity in response to a stimulus. A fully managed service for real-time processing of high-volume, streaming data. Kinesis can store and process terabytes of data an hour from hundreds of thousands of sources. Data is replicated across multiple Availability Zones to ensure high durability and availability.
  • 12. Customer Scenarios across Industry Segments Scenarios Accelerated Ingest-Transform-Load Continual Metrics/ KPI Extraction Responsive Data Analysis Data Types IT infrastructure, Applications logs, Social media, Fin. Market data, Web Clickstreams, Sensors, Geo/Location data Software/ Technology IT server , App logs ingestion IT operational metrics dashboards Devices / Sensor Operational Intelligence Digital Ad Tech./ Marketing Advertising Data aggregation Advertising metrics like coverage, yield, conversion Analytics on User engagement with Ads, Optimized bid/ buy engines Financial Services Market/ Financial Transaction order data collection Financial market data metrics Fraud monitoring, and Value-at-Risk assessment, Auditing of market order data Consumer Online/ E-Commerce Online customer engagement data aggregation Consumer engagement metrics like page views, CTR Customer clickstream analytics, Recommendation engines 1 2 3
  • 13. What Biz. Problem needs to be solved? Mobile/ Social Gaming Digital Advertising Tech. Deliver continuous/ real-time delivery of game insight data by 100’s of game servers Generate real-time metrics, KPIs for online ad performance for advertisers/ publishers Custom-built solutions operationally complex to manage, & not scalable Store + Forward fleet of log servers, and Hadoop based processing pipeline • Delay with critical business data delivery • Developer burden in building reliable, scalable platform for real-time data ingestion/ processing • Slow-down of real-time customer insights • Lost data with Store/ Forward layer • Operational burden in managing reliable, scalable platform for real-time data ingestion/ processing • Batch-driven real-time customer insights Accelerate time to market of elastic, real-time applications – while minimizing operational overhead Generate freshest analytics on advertiser performance to optimize marketing spend, and increase responsiveness to clients
  • 14. v Smart Devices Internet Of Things
  • 15. v Smart Devices Internet Of Things
  • 16. v Smart Devices Internet Of Things
  • 17. v Smart? Devices Internet Of Things Arduino Uno Raspberry Pi CPU 20MHz 8bit 700MHz 32bit Memory 2 KB 512 MB Storage 32 KB SD card
  • 18. Camera v Microphone Thermometer Distance GPS Gyroscope Accelerometer Actuator Relay Motor Manipulator Pressure Switch Wheel Propeller Rotor
  • 20. v Challenges Thousands – Millions of Devices / Producers Thousands – Millions of Users / Consumers
  • 21. v Distributed Thousands – Millions of Devices / Producers Thousands – Millions of Users / Consumers
  • 22. v At Scale Thousands – Millions of Devices / Producers Thousands – Millions of Users / Consumers
  • 23. v
  • 24. ‘Typical’ Technology Solution Set Solution Architecture Set o Streaming Data Ingestion  Kafka  Flume  Kestrel / Scribe  RabbitMQ / AMQP o Streaming Data Processing  Storm o Do-It-yourself (AWS) based solution  EC2: Logging/ pass through servers  EBS: holds log/ other data snapshots  SQS: Queue data store  S3: Persistence store  EMR: workflow to ingest data from S3 and process o Exploring Continual data Ingestion & Processing Solution Architecture Considerations Flexibility: Select the most appropriate software, and configure underlying infrastructure Control: Software and hardware can be tuned to meet specific business and scenario needs. Ongoing Operational Complexity: Deploy, and manage an end-to-end system Infrastructure planning and maintenance: Managing a reliable, scalable infrastructure Developer/ IT staff expense: Developers, Devops and IT staff time and energy expended Software Maintenance : Tech. and professional services support
  • 25. Foundation for Data Streams Ingestion, Continuous Processing Right Toolset for the Right Job Real-time Ingest • Highly Scalable • Durable • Elastic • Replay-able Reads Continuous Processing FX • Load-balancing incoming streams • Fault-tolerance, Checkpoint / Replay • Elastic • Enable multiple apps to process in parallel Continuous, real-time workloads Managed Service Low end-to-end latency Enable data movement into Stores/ Processing Engines
  • 26. Kinesis Architecture AZ AZ AZ Durable, highly consistent storage replicates data across three data centers (availability zones) Amazon Web Services Aggregate and archive to S3 Millions of sources producing 100s of terabytes per hour Front End Authentication Authorization Ordered stream of events supports multiple readers Real-time dashboards and alarms Machine learning algorithms or sliding window analytics Aggregate analysis in Hadoop or a data warehouse Inexpensive: $0.028 per million puts Run code in response to an event and automatically manage compute.
  • 27. Amazon Kinesis – An Overview
  • 28. Kinesis Stream: Managed ability to capture and store data • Streams are made of Shards • Each Shard ingests data up to 1MB/sec, and up to 1000 TPS • Each Shard emits up to 2 MB/sec • All data is stored for 24 hours • Scale Kinesis streams by adding or removing Shards • Replay data inside of 24Hr. Window
  • 29. Putting Data into Kinesis Simple Put interface to store data in Kinesis • Producers use a PUT call to store data in a Stream • PutRecord {Data, PartitionKey, StreamName} • A Partition Key is supplied by producer and used to distribute the PUTs across Shards • Kinesis MD5 hashes supplied partition key over the hash key range of a Shard • A unique Sequence # is returned to the Producer upon a successful PUT call
  • 30. Creating and Sizing a Kinesis Stream
  • 31. Getting Started with Kinesis – Writing to a Stream POST / HTTP/1.1 Host: kinesis.<region>.<domain> x-amz-Date: <Date> Authorization: AWS4-HMAC-SHA256 Credential=<Credential>, SignedHeaders=content-type; date;host;user-agent;x-amz-date;x-amz-target;x-amzn-requestid, Signature=<Signature> User-Agent: <UserAgentString> Content-Type: application/x-amz-json-1.1 Content-Length: <PayloadSizeBytes> Connection: Keep-Alive X-Amz-Target: Kinesis_20131202.PutRecord { "StreamName": "exampleStreamName", "Data": "XzxkYXRhPl8x", "PartitionKey": "partitionKey" }
  • 32. Sending & Reading Data from Kinesis Streams Sending Reading HTTP Post AWS SDK LOG4J Flume Fluentd Get* APIs Kinesis Client Library + Connector Library Apache Storm Amazon Elastic MapReduce http://aws.amazon.com/kinesis/developer-resources/
  • 33. Building Kinesis Processing Apps: Kinesis Client Library Client library for fault-tolerant, at least-once, Continuous Processing o Java client library, source available on Github o Build & Deploy app with KCL on your EC2 instance(s) o KCL is intermediary b/w your application & stream  Automatically starts a Kinesis Worker for each shard  Simplifies reading by abstracting individual shards  Increase / Decrease Workers as # of shards changes  Checkpoints to keep track of a Worker’s location in the stream, Restarts Workers if they fail o Integrates with AutoScaling groups to redistribute workers to new instances
  • 34. Processing Data with Kinesis : Sample RecordProcessor public class SampleRecordProcessor implements IRecordProcessor { @Override public void initialize(String shardId) { LOG.info("Initializing record processor for shard: " + shardId); this.kinesisShardId = shardId; } @Override public void processRecords(List<Record> records, IRecordProcessorCheckpointer checkpointer) { LOG.info("Processing " + records.size() + " records for kinesisShardId " + kinesisShardId); // Process records and perform all exception handling. processRecordsWithRetries(records); // Checkpoint once every checkpoint interval. if (System.currentTimeMillis() > nextCheckpointTimeInMillis) { checkpoint(checkpointer); nextCheckpointTimeInMillis = System.currentTimeMillis() + CHECKPOINT_INTERVAL_MILLIS; } } }
  • 35. Processing Data with Kinesis : Sample Worker IRecordProcessorFactory recordProcessorFactory = new SampleRecordProcessorFactory(); Worker worker = new Worker(recordProcessorFactory, kinesisClientLibConfiguration); int exitCode = 0; try { worker.run(); } catch (Throwable t) { LOG.error("Caught throwable while processing data.", t); exitCode = 1; }
  • 36. Amazon Kinesis Connector Library Customizable, Open Source code to Connect Kinesis with S3, Redshift, DynamoDB S3 DynamoDB Redshift Kinesis ITransformer • Defines the transformation of records from the Amazon Kinesis stream in order to suit the user-defined data model IFilter • Excludes irrelevant records from the processing. IBuffer • Buffers the set of records to be processed by specifying size limit (# of records)& total byte count IEmitter • Makes client calls to other AWS services and persists the records stored in the buffer.
  • 37. More Options to read from Kinesis Streams Leveraging Get APIs, existing Storm topologies o Use the Get APIs for raw reads of Kinesis data streams • GetRecords {Limit, ShardIterator} • GetShardIterator {ShardId, ShardIteratorType, StartingSequenceNumber, StreamName} o Integrate Kinesis Streams with Storm Topologies • Bootstraps, via Zookeeper to map Shards to Spout tasks • Fetches data from Kinesis stream • Emits tuples and Checkpoints (in Zookeeper)
  • 38. Using EMR to read, and process data from Kinesis Streams v Input • Dev Processing • User My Website Kinesis Log4J Appender push to Kinesis EMR – AMI 3.0.5 Hive Pig Cascading MapReduce pull from
  • 39. Hadoop ecosystem Implementation & Features v • Logical names • Labels that define units of work (Job A vs Job B) • Checkpoints • Creating an input start and end points to allow batch processing • Error Handling • Service errors • Retries • Iterations • Provide idempotency (pessimistic locking of the Logical name) Hadoop Input format Hive Storage Handler Pig Load Function Cascading Scheme and Tap
  • 40. v Intended use • Unlock the power of Hadoop on fresh data • Join multiple data sources for analysis • Filter and preprocess streams • Export and archive streaming data
  • 41. Customers using Amazon Kinesis Mobile/ Social Gaming Digital Advertising Tech. Deliver continuous/ real-time delivery of game insight data by 100’s of game servers Generate real-time metrics, KPIs for online ad performance for advertisers/ publishers Custom-built solutions operationally complex to manage, & not scalable Store + Forward fleet of log servers, and Hadoop based processing pipeline • Delay with critical business data delivery • Developer burden in building reliable, scalable platform for real-time data ingestion/ processing • Slow-down of real-time customer insights • Lost data with Store/ Forward layer • Operational burden in managing reliable, scalable platform for real-time data ingestion/ processing • Batch-driven real-time customer insights Accelerate time to market of elastic, real-time applications – while minimizing operational overhead Generate freshest analytics on advertiser performance to optimize marketing spend, and increase responsiveness to clients
  • 42. Gaming Analytics with Amazon Kinesis
  • 43. Digital Ad. Tech Metering with Kinesis Incremental Ad. Statistics Computation Continuous Ad Metrics Extraction Metering Record Archive Ad Analytics Dashboard
  • 44. Kinesis Pricing Simple, Pay-as-you-go, & no up-front costs Pricing Dimension Value Hourly Shard Rate $0.015 Per 1,000,000 PUT $0.028 transactions: • Customers specify throughput requirements in shards, that they control • Each Shard delivers 1 MB/s on ingest, and 2MB/s on egress • Inbound data transfer is free • EC2 instance charges apply for Kinesis processing applications
  • 46. 46 Amazon Kinesis: Key Developer Benefits Easy Administration Managed service for real-time streaming data collection, processing and analysis. Simply create a new stream, set the desired level of capacity, and let the service handle the rest. Real-time Performance Perform continual processing on streaming big data. Processing latencies fall to a few seconds, compared with the minutes or hours associated with batch processing. High Throughput. Elastic Seamlessly scale to match your data throughput rate and volume. You can easily scale up to gigabytes per second. The service will scale up or down based on your operational or business needs. S3, Redshift, & DynamoDB Integration Reliably collect, process, and transform all of your data in real-time & deliver to AWS data stores of choice, with Connectors for S3, Redshift, and DynamoDB. Build Real-time Applications Client libraries that enable developers to design and operate real-time streaming data processing applications. Low Cost Cost-efficient for workloads of any scale. You can get started by provisioning a small stream, and pay low hourly rates only for what you use.
  • 47. v Try out Amazon Kinesis • Try out Amazon Kinesis • http://aws.amazon.com/kinesis/ • Thumb through the Developer Guide • http://aws.amazon.com/documentation/kinesis/ • Visit, and Post on Kinesis Forum • https://forums.aws.amazon.com/forum.jspa?forumID=169#
  • 48. Online Labs | Training Gain confidence and hands-on experience with AWS. Watch free Instructional Videos and explore Self- Paced Labs Instructor Led Classes Learn how to design, deploy and operate highly available, cost-effective and secure applications on AWS in courses led by qualified AWS instructors AWS Certification Validate your technical expertise with AWS and use practice exams to help you prepare for AWS Certification http://aws.amazon.com/training

Editor's Notes

  1. Good morning……/ Good Afternoon…….. My name is………. Today’s session is about …… Fire your Questions
  2. This is the agenda … over next 45 – 50mins or so We are going to talk about – customers moving from batch to continuous – how kinesis help – how it integrates with other aws services
  3. Like many other Amazon Web Services – Lets talk about Genesis – Hourly server logs for alarms – billing alerts – AWS piece of business Other applications are from amazon.com business – what is popular item – fraud detection – right away instead of hours back
  4. Metering services is related to any action that you take on the AWS platform for ex. Reading from dynamodb deliver content from CF and so on…… Like many of you, we’re dealing with rapid increases in data In our endeavor to increase the operational efficiency and reduce costs Our problem was not just how to scale to handle the data but how to increase the speed at which we analyze and take action on that data. Just to give some numbers …. Read the slide number
  5. AS it will be obvious to you that metering services are the most critical function of any business because loss of data could mean loss of revenue and in certain case delayed analysis could also mean the loss They no longer remained SIMPLE – billing alerts are required near real time – fraud detection
  6. ----- Meeting Notes (12/4/14 21:32) ----- Lets visualize this ... massive volume We realized that there was this lifecycle that evolved – INGEST (many sources) – PROCESS (Simple queries like aggregates) – STORE – ANALYZE (Complex queries such as trends) Pain points cropped up Apart from S3.....
  7. ----- Meeting Notes (12/4/14 21:32) ----- That problem area and raised customer expectation formed a new requirement set .... ----- Meeting Notes (12/4/14 22:15) ----- Formed ----> FRAMED new requirements
  8. ----- Meeting Notes (12/4/14 21:32) ----- Many different technologies are and have been available for a long time now, however, managing those on own is a challenge and not many of these pieces have been tested or easy to manage at that scale. However not all of those have necesaily been used at that scale and in some cases the use case itself might not be APT/ FIT
  9. We also realized that there were lots of differnet formats but almost all were small in size say < 1KB MQTT records are used by IOT devices
  10. Datasets are growing Need to make fast decisions Potentially millions of sources Data records are really small Needs to be auditable Need to be durable Needs to be less expensive
  11. The benefits of real-time processing apply to pretty much every scenario where new data is being generated on a continual basis. We see it pervasively across all the big data areas we’ve looked at, and in pretty much every industry segment whose customers we’ve spoken to. What do customers tell us – Log processing – grab – ingest and do analysis – classic example – Continual extraction model – start getting metrics on a sliding window basis – say 5mins – stock variations – feeds from NASDAQ or other stock providers to avoid exposure Real time analytics – take some type of action based on data coming in say last few mins logical extension to the last two options One feed into other feed Customers tend to start with mundane, unglamorous applications, such as system log ingestion and processing. They then progress over time to ever-more sophisticated forms of real-time processing. Initially they may simply process their data streams to produce simple metrics and reports, and perform simple actions in response, such as emitting alarms over metrics that have breached their warning thresholds. Eventually they start to do more sophisticated forms of data analysis in order to obtain deeper understanding of their data, such as applying machine learning algorithms. In the long run they can be expected to apply a multiplicity of complex stream and event processing algorithms to their data. We then looked outside the company for Specific use cases Alert when public sentiment changes Customers want to respond quickly to social media feeds Twitter Firehose: ~450 million events per second, or 10.1 MB/sec1 Respond to changes in the stock market Customers want to compute value-at-risk or rebalance portfolios based on stock prices NASDAQ Ultra Feed: 230 GB/day, or 8.1 MB/sec1 Aggregate data before loading in external data store Customers have billions of click stream records arriving continuously from servers Large, aggregated PUTs to S3 are cost effective Internet marketing company: 20MB/sec Recommendation and Personalization engines
  12. Nasdaq sending in all market order data across exchanges in real-time into Kinesis , so that their real-time processing workers (built with Kinesis libraries) would maintain a consolidated audit trail that collects and accurately identifies every order, cancellation, modification and trade execution for all exchange-listed equities and options across all U.S. markets. Finland-based Supercell is one of the fastest-growing social game developers and their two games Hay Day and Clash of Clans attract more than 8.5 million players on iOS and Android devices every day. “Our player base has scaled at an incredible pace. Using AWS means we can rely on AWS tools in managing our infrastructure to match our growth,” said Sami Yliharju, Services Lead at Supercell. “We are using Amazon Kinesis for real-time delivery of game insight data sent by hundreds of our game engine servers. Amazon Kinesis enables our business-critical analytics and dashboard applications to reliably get the data streams they need, without delays. Amazon Kinesis also offloads a lot of developer burden in building a real-time, streaming data ingestion platform, and enables Supercell to focus on delivering games that delight players worldwide.” Bizo, an AdTech company – ingesting customers online advertising performance data in real-time to generating real-time metrics, KPIs about Ad performance, conversion, yield, and more. They would previously do this end of day with a Hadoop job.
  13. Pros/ Cons (+) Flexibility: Customers can evaluate and select the most appropriate software and underlying infrastructure for their use case. Additional flexibility to those using cloud based infrastructure for running streaming data systems. (+) Control: Software and hardware can be tuned to meet specific business and scenario needs. (+) No software licensing costs: Mostly open source software so no direct licensing costs or cloud based software. (-) Operational complexity: Deploy, manage, and end to end systems administration of underlying hardware/ service and all software components involved. (-) Hardware/ Infrastructure maintenance: Direct and indirect costs associated with hardware/ service infrastructure evaluation, capacity planning, and provisioning for steady state, peak and durability. (-) Unsupported Software: Some component software are not maintained anymore, e.g. Scribe, in other cases the software is a pre-version 1 release such as Kafka/ Storm without vendor support. (-) Developer/ IT staff overhead: Developers, Devops and IT staff spend time and energy in managing the streaming data infrastructure and lesser time on creating streaming applications for their business.
  14. [2 minutes] KINESIS is a new service that scales elastically for near realtime processing of streaming big data. The service will store large streams of data in durable, consistent storage, reliably, for near realtime processing of data by an elastically scalable fleet of data processing servers. Large streams means millions of records per second, GBs of data per second and near real-time means order of a few seconds Streaming data processing has two layers: a storage layer and a processing layer. The storage layer needs to support specialized ordering and consistency semantics that enable fast, inexpensive, and replayable reads and writes of large streams of data. Kinesis is the storage layer in Kinesis / Kinesis. The processing layer is responsible for reading data from the storage layer, processing that data, and notifying the storage layer to delete data that is no longer needed. Kinesis supports the processing layer. Customers compile the Kinesis library into their data processing application. Kinesis notifies the application (the Kinesis Worker) when there is new data to process. The Kinesis / Kinesis control plane works with Kinesis Workers to solve scalability and fault tolerance problems in the processing layer.
  15. Kinesis provides High Availability, Elasticity and Scale. The fixed unit of scale in a stream is called a shard Shards can be spit to scale up or merge to scale down
  16. MQTT – Financial record – Log entry Partition Key - Idea is to be able to do a basic aggregation of the Data coming in and identify - Best practices for a partition key would be – customer id / serverId / Partition Keys When records are added to a Kinesis Stream a Partition Key must be provided to Kinesis so that the record can be directed at the appropriate shard within the stream Kinesis currently will generate an MD5 hash of the provided partition key to determine the proper shard to place the record It is important to not exceed the per shard data rate of any one unique partition key Kinesis guarantees that records will be placed in order with in a data stream for each unique partition key
  17. Three clicks and you are done with fully managed service to store data and manage Assistance to size
  18. Refer to S3 Always route to the same shard Example if partition keys are – customer id or the server which is sending the logs – sections of the website Because of shard you can do basic aggregation of the
  19. Lots of way to process streaming data, just like there are lots of ways to process batch data with Hadoop Open source from AWS
  20. Challeneges of real time application on the consumer side - Application has to keep up the pace with the stream - If fall behind you loose value of real time streaming - So this app has to be – distributed + fault tolerant + scale + handle multiple shards - Recover from failure and start from the point of failure - Although Low level interfaces are available but to do all this is complicated - So KCL simplifies this – it provides – checkpointing + keeps state in dynamodb + launches workers and scales and failovers them + simplifies the reading from shards – just write your business logic Kinesis Client library simplifies parallel processing of streaming big data by allowing customers to write simple applications for processing records. Kinesis is a Java library that customers compile into their data processing application. The application with the Kinesis library is called a Kinesis Worker. Kinesis’s library notifies the customer’s processing code when there is new data to process. Kinesis Workers will often process data and write output to an external data store such as DynamoDB, EMR, Redshift, or S3. Kinesis guarantees that every record within a Kinesis Data stream is processed at least once
  21. This is an example of how the KCL simplifies record processing. All you need to do is start the process and shutdown the process AND in between use the RECORDPROCESSOR to write your own business logic Just one method where you write your business logic and KCL takes care of the heavylifting
  22. Other important part of the KCL is Worker management – this is a sample code for the samepl
  23. K Client library gets you up and running quickly in Addition to that we also have library for connecting to other AWS Services – CALLED Connector library
  24. Kinesis Client library simplifies parallel processing of streaming big data by allowing customers to write simple applications for processing records. Kinesis is a Java library that customers compile into their data processing application. The application with the Kinesis library is called a Kinesis Worker. Kinesis’s library notifies the customer’s processing code when there is new data to process. Kinesis Workers will often process data and write output to an external data store such as DynamoDB, EMR, Redshift, or S3. Kinesis guarantees that every record within a Kinesis Data stream is processed at least once The grey box on the top right – will be EC2 instances in your space. Within there is the Kinesis worker process in green. That is your specific application and business logic. That logic leverages our libraries that connect back to Kinesis (in pink on the left) . There is also another VIP on the GET side.
  25. EMR connector for Kinesis - convert from streaming model to micro batching on EMR – use existing jobs/ tasks in hive/ pig/ cascasding etc
  26. The EMR-Kinesis connector takes care of the change or adoption from batch to streaming paradigm for ex. Change or re-cast the stream or a shard to logical names Do checkpointing Hadoop handles retry of failed map tasks Iterations allow retrys Fixed input boundaries on a stream (idempotency for reruns) Enable multiple queries on the same input boundaries
  27. Fresh data streaming joined with stored data Filter and preprocess and do complex ETL functions such as route data to S3 and Hbase Opens whole lot of new use cases
  28. Nasdaq sending in all market order data across exchanges in real-time into Kinesis , so that their real-time processing workers (built with Kinesis libraries) would maintain a consolidated audit trail that collects and accurately identifies every order, cancellation, modification and trade execution for all exchange-listed equities and options across all U.S. markets. Finland-based Supercell is one of the fastest-growing social game developers and their two games Hay Day and Clash of Clans attract more than 8.5 million players on iOS and Android devices every day. “Our player base has scaled at an incredible pace. Using AWS means we can rely on AWS tools in managing our infrastructure to match our growth,” said Sami Yliharju, Services Lead at Supercell. “We are using Amazon Kinesis for real-time delivery of game insight data sent by hundreds of our game engine servers. Amazon Kinesis enables our business-critical analytics and dashboard applications to reliably get the data streams they need, without delays. Amazon Kinesis also offloads a lot of developer burden in building a real-time, streaming data ingestion platform, and enables Supercell to focus on delivering games that delight players worldwide.” Bizo, an AdTech company – ingesting customers online advertising performance data in real-time to generating real-time metrics, KPIs about Ad performance, conversion, yield, and more. They would previously do this end of day with a Hadoop job.
  29. Stream of metering records. Incremental bill computation, uploaded every few minute to Redshift so that customers can query their estimated bill. Archive metering records, aggregated to 1 hour buckets, in S3 Billing alerts in real-time.
  30. SDKs available across many languages or call REST APIs
  31. We have multiple trainings which can help you to get upto speed with architecting best practices and even deep dive into specific services such big data. Self-paced labs are available please take advantage of the same. We also have a number of certifications available at both Associate and Professional level For more details please check out this URL /training