SlideShare a Scribd company logo
1 of 46
Download to read offline
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
AWS re:INVENT
SRV304 Building High-Throughput
Serverless Data Processing Pipelines
C e c i l i a D e n g , S o f t w a r e D e v e l o p e r o n A W S L a m b d a
N o v e m b e r 2 8 , 2 0 1 7
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Me
• Canadian
• UBC
• EA Canada
• AWS Lambda
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Goal
Data processing that is
• High-throughput ( > 1 GB/s)
• Serverless (no servers to manage)
• Real-time (pipeline)
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
What to Expect
• Why streams?
• What’s AWS Lambda?
• What’s Amazon Kinesis?
• What does serverless stream processing look like?
• How does Lambda process streams?
• Examples use cases
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
WHY STREAMS?
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Stream Processing
Goal
• High-throughput ( > 1 GB/s)
• Serverless (managed compute)
• Real-time (pipeline)
Streams
• Data size constraint
• Data time constraint
• Have access to recent data
• Processing time constraint
Batch
• No size constraint
• No time constraint (not real-time)
• Have access to all data
• Long running processing (reports)
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Because you have data that is:
• Generated continuously and simultaneously by thousands of data sources
• Typically small sizes (KBs)
And needs to be processed either:
• Sequentially and incrementally
• Or over sliding windows
in some real-time constraint
Stream Processing
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
WHAT’S LAMBDA?
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
It’s your function
Your libraries, your
code, your
executable
With a
programming
model
Easy to start
blueprints and
tutorials,
monitoring, and
logging
That runs
stateless
Infrastructure
abstracted,
persist data using
Amazon DynamoDB,
Amazon S3, or
ElastiCache
And
integrated
security
model
IAM resource
policies and
roles, VPC
Support
Lambda: What Is It?
And flexible
resource model
Choose your
memory and we
allocate
proportional CPU,
network bandwidth,
disk I/O
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Lambda: How Do I Trigger It?
Amazon
S3
Amazon
SNS
ASYNCHRONOUS PUSH MODEL
Amazon
Alexa
AWS
IoT
SYNCHRONOUS PUSH MODEL
Mapping owned by Event Source
triggers Lambda via Invoke APIs
resource-based policy permissions
RequestResponse
invocation
Event Invocation
HOW IT WORKS
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Lambda: How do I Trigger It?
Amazon
DynamoDB
Amazon
Kinesis
STREAM PULL MODEL Mapping owned by Lambda
Lambda function invokes when new
records are found on stream
Lambda execution role policy permissions
Polled batch
RequestResponse
invocation
Lambda polls the streams
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Lambda
Node.js
Python
Java
C#
FUNCTIONEVENT SOURCE
AWS
CloudFormation
Amazon
API Gateway
Amazon
SNS
Database
Cloud
Service
Anything
ENDPOINT
Amazon
Kinesis
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Node.js
Python
Java
C#
FUNCTION
Amazon
Kinesis
ENDPOINT
Database
Cloud
Service
Anything
EVENT SOURCE
IoT Data
IoT Data
Financial
Data
Log Data
Kinesis
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
WHAT’S KINESIS?
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
It’s storage
For real-time data
that’s only stored
for a limited time
Where new data
is made available
quickly
Typically less than 1
second put-to-get
delay
That uses a
checkpoint
model
Supports
multiple
concurrent in-
ordered
processing
Kinesis: What Is It?
As a managed
service
With APIs that let
you easily create
and configure the
stream and put and
retrieve data
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Kinesis: How do I Process It?
…
Source
Shards GetRecords
PutRecords
• Poll for work
• Checkpoint for progress
• Separate checkpoints for multiple consumers
• Use the KCL library
Scale Amazon Kinesis by splitting or merging shards
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Streams: How do I Process It?
…
DDB events
Shards GetRecords
• Poll for work
• Checkpoint for progress
• Separate checkpoints for multiple consumers
• Use the KCL library
Scale Amazon Kinesis by splitting or merging shards
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
SEEMS HARD. CAN I NOT?
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Processing Streams: Kinesis Firehose
• Manages stream:
• No shard configuration
• No partition key or order
• Manages stream processing:
• Polls for records
• Dump to one of
• Amazon S3
• Amazon Redshift
• Amazon Elasticsearch Service
• Compute power default 8 * (1 vCPU + 4GB) KPU
• Choose a Lambda transform function
• JSON/CSV to whatever
• Apache Log to JSON/CSV
• Syslog to JSON/CSV
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Processing Streams: Kinesis Firehose
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Processing Streams: Kinesis Firehose
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Processing Streams: Kinesis Analytics
• Does not manage stream:
• Need to configure Kinesis Stream
• Manages stream processing:
• From Amazon Kinesis or Kinesis Firehose
• Polls for records
• Uses a SQL model to continuously:
• Map record data to internal “stream tables” (aggregation)
• Query the internal “stream tables” for desired results (filter)
• Output the desired results to
• Additional internal “stream tables” (further aggregation) or
• External Kinesis Stream or Kinesis Firehose (destination store)
• Compute power default 8 * (1 vCPU + 4GB) KPU
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Processing Streams: Kinesis Analytics
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Processing Streams: Kinesis Analytics
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Processing Streams: Kinesis
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Processing Streams: Lambda
• Does not manage stream:
• Need to configure Kinesis Stream
• Manages stream processing:
• From Amazon Kinesis or DynamoDB streams
• Polls for records
• Sends for invocation to a Lambda function
• Computer power default 1000 * (configured memory and associated sized
CPU)
• Setup with Lambda createEventSourceMapping
• Lambda:
• Preserves order
• Soft concurrent limit of 1000 invocations * (max 3GB memory and associated
sized CPU)
• Completely customized model and functionality
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Processing Streams: Lambda
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Processing Streams: Lambda
…
Source
Amazon Kinesis
Destination 1
Destination 2
Shards
Scale Amazon Kinesis by splitting or merging shards
Polls a batch
Lambda will scale automatically
…
Lambda
Waits for response
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
STREAM PROCESSING BY LAMBDA
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Processing Streams: Lambda
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Processing Streams: Lambda
…
Source
Shards
Trim horizonCheckpointCheckpointLatest Checkpoint
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Processing Streams: Lambda
Event received by Lambda function is a collection of records from the stream:
{ "Records": [ {
"kinesis": {
"partitionKey": "partitionKey-3",
"kinesisSchemaVersion": "1.0",
"data": "SGVsbG8sIHRoaXMgaXMgYSB0ZXN0IDEyMy4=",
"sequenceNumber": "49545115243490985018280067714973144582180062593244200961" },
"eventSource": "aws:kinesis",
"eventID": "shardId-
000000000000:49545115243490985018280067714973144582180062593244200961",
"invokeIdentityArn": "arn:aws:iam::account-id:role/testLEBRole",
"eventVersion": "1.0",
"eventName": "aws:kinesis:record",
"eventSourceARN": "arn:aws:kinesis:us-west-2:35667example:stream/examplestream",
"awsRegion": "us-west-2" } ] }
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Processing Streams: Lambda
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Processing Streams: Lambda
Per shard:
▪ Lambda calls GetRecords with max limit from Kinesis (10 k or 10 MB)
▪ If no record, wait some time (1s)
▪ Sub-batch in-memory and format records into Lambda payload
▪ Invoke Lambda with synchronous invoke
… …
Source
Amazon Kinesis
Destination 1
Lambda
Destination 2
Shards
Lambda will scale automaticallyScale Amazon Kinesis by splitting or merging shards
Waits for responsePolls a batch
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Processing Streams: Lambda
▪ Lambda blocks on ordered processing for each individual shard
▪ Increasing # of shards with even distribution allows increased concurrency
▪ Batch size may impact duration if the Lambda function takes longer to process more records
… …
Source
Amazon Kinesis
Destination 1
Lambda
Destination 2
Shards
Lambda will scale automaticallyScale Amazon Kinesis by splitting or merging shards
Waits for responsePolls a batch
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Processing Streams: Lambda
Polls and blocks on synchronous invocation per shard
If put/ingestion rate is greater than the theoretical throughput, your processing is at risk of
falling behind
Maximum theoretical throughput
# shards * 2 MB / Lambda function duration (s)
Effective theoretical throughput
# shards * batch size (MB) / Lambda function duration (s)
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Processing Streams: Lambda
Retries
Will retry on execution failures until the record is expired
Throttles and errors impact duration and directly impact throughput
Best practice
Retry with exponential backoff
Effective theoretical throughput with retries
( # shards * batch size (MB) ) / ( function duration (s) * retries until expiry)
…
Source
Amazon Kinesis
Destination
1
Lambda
Destination
2
Shards
Polls a batch
Receives success
Receives error
Receives error
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
MORE EXAMPLES
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Real-time Ad Serving
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
The Assembly Line
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Anomaly Detection
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Anomaly Detection
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Game Analytics
Store real-time player scores and stats Send to Lambda for further aggregation like
Top scores or Longest runs
Surface leaderboards
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Game Analytics
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
QUESTIONS?
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Thank you!
@ c i c i k e n d i g g i t ( m o s t l y m e c o m p l a i n i n g t o a i r l i n e s )

More Related Content

What's hot

STG311_Deep Dive on Amazon S3 & Amazon Glacier Storage Management
STG311_Deep Dive on Amazon S3 & Amazon Glacier Storage ManagementSTG311_Deep Dive on Amazon S3 & Amazon Glacier Storage Management
STG311_Deep Dive on Amazon S3 & Amazon Glacier Storage ManagementAmazon Web Services
 
ABD202_Best Practices for Building Serverless Big Data Applications
ABD202_Best Practices for Building Serverless Big Data ApplicationsABD202_Best Practices for Building Serverless Big Data Applications
ABD202_Best Practices for Building Serverless Big Data ApplicationsAmazon Web Services
 
How Nextdoor Built a Scalable, Serverless Data Pipeline for Billions of Event...
How Nextdoor Built a Scalable, Serverless Data Pipeline for Billions of Event...How Nextdoor Built a Scalable, Serverless Data Pipeline for Billions of Event...
How Nextdoor Built a Scalable, Serverless Data Pipeline for Billions of Event...Amazon Web Services
 
GPSTEC313_GPS Real-Time Data Processing with AWS Lambda Quickly, at Scale, an...
GPSTEC313_GPS Real-Time Data Processing with AWS Lambda Quickly, at Scale, an...GPSTEC313_GPS Real-Time Data Processing with AWS Lambda Quickly, at Scale, an...
GPSTEC313_GPS Real-Time Data Processing with AWS Lambda Quickly, at Scale, an...Amazon Web Services
 
SRV301-Optimizing Serverless Application Data Tiers with Amazon DynamoDB
SRV301-Optimizing Serverless Application Data Tiers with Amazon DynamoDBSRV301-Optimizing Serverless Application Data Tiers with Amazon DynamoDB
SRV301-Optimizing Serverless Application Data Tiers with Amazon DynamoDBAmazon Web Services
 
DVC303-Technological Accelerants for Organizational Transformation
DVC303-Technological Accelerants for Organizational TransformationDVC303-Technological Accelerants for Organizational Transformation
DVC303-Technological Accelerants for Organizational TransformationAmazon Web Services
 
How to Build Scalable Serverless Applications
How to Build Scalable Serverless ApplicationsHow to Build Scalable Serverless Applications
How to Build Scalable Serverless ApplicationsAmazon Web Services
 
DAT307_Modern Cloud Data Warehousing
DAT307_Modern Cloud Data WarehousingDAT307_Modern Cloud Data Warehousing
DAT307_Modern Cloud Data WarehousingAmazon Web Services
 
Create a Serverless Image Processing Platform - ARC326 - re:Invent 2017
Create a Serverless Image Processing Platform - ARC326 - re:Invent 2017Create a Serverless Image Processing Platform - ARC326 - re:Invent 2017
Create a Serverless Image Processing Platform - ARC326 - re:Invent 2017Amazon Web Services
 
NEW LAUNCH! Building Alexa Skills for Businesses (ALX204)
NEW LAUNCH! Building Alexa Skills for Businesses (ALX204) NEW LAUNCH! Building Alexa Skills for Businesses (ALX204)
NEW LAUNCH! Building Alexa Skills for Businesses (ALX204) Amazon Web Services
 
GPSBUS221_Breaking Barriers Move Enterprise SAP Customers to SAP HANA on AWS ...
GPSBUS221_Breaking Barriers Move Enterprise SAP Customers to SAP HANA on AWS ...GPSBUS221_Breaking Barriers Move Enterprise SAP Customers to SAP HANA on AWS ...
GPSBUS221_Breaking Barriers Move Enterprise SAP Customers to SAP HANA on AWS ...Amazon Web Services
 
From Batch to Streaming - How Amazon Flex Uses Real-time Analytics
From Batch to Streaming - How Amazon Flex Uses Real-time AnalyticsFrom Batch to Streaming - How Amazon Flex Uses Real-time Analytics
From Batch to Streaming - How Amazon Flex Uses Real-time AnalyticsAmazon Web Services
 
Scaling Up to Your First 10 Million Users
Scaling Up to Your First 10 Million UsersScaling Up to Your First 10 Million Users
Scaling Up to Your First 10 Million UsersAmazon Web Services
 
ALX202_Integrate Alexa voice technology into your product with the Alexa Voic...
ALX202_Integrate Alexa voice technology into your product with the Alexa Voic...ALX202_Integrate Alexa voice technology into your product with the Alexa Voic...
ALX202_Integrate Alexa voice technology into your product with the Alexa Voic...Amazon Web Services
 
AMF305_Autonomous Driving Algorithm Development on Amazon AI
AMF305_Autonomous Driving Algorithm Development on Amazon AIAMF305_Autonomous Driving Algorithm Development on Amazon AI
AMF305_Autonomous Driving Algorithm Development on Amazon AIAmazon Web Services
 
Analyzing Streaming Data in Real-time with Amazon Kinesis
Analyzing Streaming Data in Real-time with Amazon KinesisAnalyzing Streaming Data in Real-time with Amazon Kinesis
Analyzing Streaming Data in Real-time with Amazon KinesisAmazon Web Services
 
A Look Under the Hood – How Amazon.com Uses AWS Services for Analytics at Mas...
A Look Under the Hood – How Amazon.com Uses AWS Services for Analytics at Mas...A Look Under the Hood – How Amazon.com Uses AWS Services for Analytics at Mas...
A Look Under the Hood – How Amazon.com Uses AWS Services for Analytics at Mas...Amazon Web Services
 

What's hot (20)

STG311_Deep Dive on Amazon S3 & Amazon Glacier Storage Management
STG311_Deep Dive on Amazon S3 & Amazon Glacier Storage ManagementSTG311_Deep Dive on Amazon S3 & Amazon Glacier Storage Management
STG311_Deep Dive on Amazon S3 & Amazon Glacier Storage Management
 
ABD202_Best Practices for Building Serverless Big Data Applications
ABD202_Best Practices for Building Serverless Big Data ApplicationsABD202_Best Practices for Building Serverless Big Data Applications
ABD202_Best Practices for Building Serverless Big Data Applications
 
GPSTEC307_Too Many Tools
GPSTEC307_Too Many ToolsGPSTEC307_Too Many Tools
GPSTEC307_Too Many Tools
 
ABD217_From Batch to Streaming
ABD217_From Batch to StreamingABD217_From Batch to Streaming
ABD217_From Batch to Streaming
 
How Nextdoor Built a Scalable, Serverless Data Pipeline for Billions of Event...
How Nextdoor Built a Scalable, Serverless Data Pipeline for Billions of Event...How Nextdoor Built a Scalable, Serverless Data Pipeline for Billions of Event...
How Nextdoor Built a Scalable, Serverless Data Pipeline for Billions of Event...
 
GPSTEC313_GPS Real-Time Data Processing with AWS Lambda Quickly, at Scale, an...
GPSTEC313_GPS Real-Time Data Processing with AWS Lambda Quickly, at Scale, an...GPSTEC313_GPS Real-Time Data Processing with AWS Lambda Quickly, at Scale, an...
GPSTEC313_GPS Real-Time Data Processing with AWS Lambda Quickly, at Scale, an...
 
SRV301-Optimizing Serverless Application Data Tiers with Amazon DynamoDB
SRV301-Optimizing Serverless Application Data Tiers with Amazon DynamoDBSRV301-Optimizing Serverless Application Data Tiers with Amazon DynamoDB
SRV301-Optimizing Serverless Application Data Tiers with Amazon DynamoDB
 
SID402_An AWS Security Odyssey
SID402_An AWS Security OdysseySID402_An AWS Security Odyssey
SID402_An AWS Security Odyssey
 
DVC303-Technological Accelerants for Organizational Transformation
DVC303-Technological Accelerants for Organizational TransformationDVC303-Technological Accelerants for Organizational Transformation
DVC303-Technological Accelerants for Organizational Transformation
 
How to Build Scalable Serverless Applications
How to Build Scalable Serverless ApplicationsHow to Build Scalable Serverless Applications
How to Build Scalable Serverless Applications
 
DAT307_Modern Cloud Data Warehousing
DAT307_Modern Cloud Data WarehousingDAT307_Modern Cloud Data Warehousing
DAT307_Modern Cloud Data Warehousing
 
Create a Serverless Image Processing Platform - ARC326 - re:Invent 2017
Create a Serverless Image Processing Platform - ARC326 - re:Invent 2017Create a Serverless Image Processing Platform - ARC326 - re:Invent 2017
Create a Serverless Image Processing Platform - ARC326 - re:Invent 2017
 
NEW LAUNCH! Building Alexa Skills for Businesses (ALX204)
NEW LAUNCH! Building Alexa Skills for Businesses (ALX204) NEW LAUNCH! Building Alexa Skills for Businesses (ALX204)
NEW LAUNCH! Building Alexa Skills for Businesses (ALX204)
 
GPSBUS221_Breaking Barriers Move Enterprise SAP Customers to SAP HANA on AWS ...
GPSBUS221_Breaking Barriers Move Enterprise SAP Customers to SAP HANA on AWS ...GPSBUS221_Breaking Barriers Move Enterprise SAP Customers to SAP HANA on AWS ...
GPSBUS221_Breaking Barriers Move Enterprise SAP Customers to SAP HANA on AWS ...
 
From Batch to Streaming - How Amazon Flex Uses Real-time Analytics
From Batch to Streaming - How Amazon Flex Uses Real-time AnalyticsFrom Batch to Streaming - How Amazon Flex Uses Real-time Analytics
From Batch to Streaming - How Amazon Flex Uses Real-time Analytics
 
Scaling Up to Your First 10 Million Users
Scaling Up to Your First 10 Million UsersScaling Up to Your First 10 Million Users
Scaling Up to Your First 10 Million Users
 
ALX202_Integrate Alexa voice technology into your product with the Alexa Voic...
ALX202_Integrate Alexa voice technology into your product with the Alexa Voic...ALX202_Integrate Alexa voice technology into your product with the Alexa Voic...
ALX202_Integrate Alexa voice technology into your product with the Alexa Voic...
 
AMF305_Autonomous Driving Algorithm Development on Amazon AI
AMF305_Autonomous Driving Algorithm Development on Amazon AIAMF305_Autonomous Driving Algorithm Development on Amazon AI
AMF305_Autonomous Driving Algorithm Development on Amazon AI
 
Analyzing Streaming Data in Real-time with Amazon Kinesis
Analyzing Streaming Data in Real-time with Amazon KinesisAnalyzing Streaming Data in Real-time with Amazon Kinesis
Analyzing Streaming Data in Real-time with Amazon Kinesis
 
A Look Under the Hood – How Amazon.com Uses AWS Services for Analytics at Mas...
A Look Under the Hood – How Amazon.com Uses AWS Services for Analytics at Mas...A Look Under the Hood – How Amazon.com Uses AWS Services for Analytics at Mas...
A Look Under the Hood – How Amazon.com Uses AWS Services for Analytics at Mas...
 

Similar to SRV304_Building High-Throughput Serverless Data Processing Pipelines

Going Serverless at AWS Startup Day Bangalore
Going Serverless at AWS Startup Day Bangalore Going Serverless at AWS Startup Day Bangalore
Going Serverless at AWS Startup Day Bangalore Madhusudan Shekar
 
Deep dive and best practices on real time streaming applications nyc-loft_oct...
Deep dive and best practices on real time streaming applications nyc-loft_oct...Deep dive and best practices on real time streaming applications nyc-loft_oct...
Deep dive and best practices on real time streaming applications nyc-loft_oct...Amazon Web Services
 
AWS April 2016 Webinar Series - Getting Started with Real-Time Data Analytics...
AWS April 2016 Webinar Series - Getting Started with Real-Time Data Analytics...AWS April 2016 Webinar Series - Getting Started with Real-Time Data Analytics...
AWS April 2016 Webinar Series - Getting Started with Real-Time Data Analytics...Amazon Web Services
 
Deep Dive and Best Practices for Real Time Streaming Applications
Deep Dive and Best Practices for Real Time Streaming ApplicationsDeep Dive and Best Practices for Real Time Streaming Applications
Deep Dive and Best Practices for Real Time Streaming ApplicationsAmazon Web Services
 
Real-time Data Processing Using AWS Lambda
Real-time Data Processing Using AWS LambdaReal-time Data Processing Using AWS Lambda
Real-time Data Processing Using AWS LambdaAmazon Web Services
 
Getting started with Serverless on AWS
Getting started with Serverless on AWSGetting started with Serverless on AWS
Getting started with Serverless on AWSAdrian Hornsby
 
Serverless Architectural Patterns
Serverless Architectural PatternsServerless Architectural Patterns
Serverless Architectural PatternsAmazon Web Services
 
Serverless Stream Processing Tips & Tricks (ANT358) - AWS re:Invent 2018
Serverless Stream Processing Tips & Tricks (ANT358) - AWS re:Invent 2018Serverless Stream Processing Tips & Tricks (ANT358) - AWS re:Invent 2018
Serverless Stream Processing Tips & Tricks (ANT358) - AWS re:Invent 2018Amazon Web Services
 
Real-time Data Processing Using AWS Lambda
Real-time Data Processing Using AWS LambdaReal-time Data Processing Using AWS Lambda
Real-time Data Processing Using AWS LambdaAmazon Web Services
 
Serverless Architectural Patterns 
and Best Practices - Madhu Shekar - AWS
Serverless Architectural Patterns 
and Best Practices - Madhu Shekar - AWSServerless Architectural Patterns 
and Best Practices - Madhu Shekar - AWS
Serverless Architectural Patterns 
and Best Practices - Madhu Shekar - AWSCodeOps Technologies LLP
 
Real-time Data Processing Using AWS Lambda
Real-time Data Processing Using AWS LambdaReal-time Data Processing Using AWS Lambda
Real-time Data Processing Using AWS LambdaAmazon Web Services
 
AWS Lambda Supports Parallelization Factor for Kinesis and DynamoDB Event Sou...
AWS Lambda Supports Parallelization Factor for Kinesis and DynamoDB Event Sou...AWS Lambda Supports Parallelization Factor for Kinesis and DynamoDB Event Sou...
AWS Lambda Supports Parallelization Factor for Kinesis and DynamoDB Event Sou...Swapnil Pawar
 
How HSBC Uses Serverless to Process Millions of Transactions in Real Time (FS...
How HSBC Uses Serverless to Process Millions of Transactions in Real Time (FS...How HSBC Uses Serverless to Process Millions of Transactions in Real Time (FS...
How HSBC Uses Serverless to Process Millions of Transactions in Real Time (FS...Amazon Web Services
 
Real Time Data Processing Using AWS Lambda
Real Time Data Processing Using AWS LambdaReal Time Data Processing Using AWS Lambda
Real Time Data Processing Using AWS LambdaAmazon Web Services
 
Getting Started with Real-time Analytics
Getting Started with Real-time AnalyticsGetting Started with Real-time Analytics
Getting Started with Real-time AnalyticsAmazon Web Services
 
Building Big Data Applications with Serverless Architectures - June 2017 AWS...
Building Big Data Applications with Serverless Architectures -  June 2017 AWS...Building Big Data Applications with Serverless Architectures -  June 2017 AWS...
Building Big Data Applications with Serverless Architectures - June 2017 AWS...Amazon Web Services
 
Serverless Architecture and Best Practices
Serverless Architecture and Best PracticesServerless Architecture and Best Practices
Serverless Architecture and Best PracticesAmazon Web Services
 
Building a Real Time Dashboard with Amazon Kinesis, Amazon Lambda and Amazon ...
Building a Real Time Dashboard with Amazon Kinesis, Amazon Lambda and Amazon ...Building a Real Time Dashboard with Amazon Kinesis, Amazon Lambda and Amazon ...
Building a Real Time Dashboard with Amazon Kinesis, Amazon Lambda and Amazon ...Amazon Web Services
 
AWS를 활용한 첫 빅데이터 프로젝트 시작하기(김일호)- AWS 웨비나 시리즈 2015
AWS를 활용한 첫 빅데이터 프로젝트 시작하기(김일호)- AWS 웨비나 시리즈 2015AWS를 활용한 첫 빅데이터 프로젝트 시작하기(김일호)- AWS 웨비나 시리즈 2015
AWS를 활용한 첫 빅데이터 프로젝트 시작하기(김일호)- AWS 웨비나 시리즈 2015Amazon Web Services Korea
 
Real-time Data Processing Using AWS Lambda
Real-time Data Processing Using AWS LambdaReal-time Data Processing Using AWS Lambda
Real-time Data Processing Using AWS LambdaAmazon Web Services
 

Similar to SRV304_Building High-Throughput Serverless Data Processing Pipelines (20)

Going Serverless at AWS Startup Day Bangalore
Going Serverless at AWS Startup Day Bangalore Going Serverless at AWS Startup Day Bangalore
Going Serverless at AWS Startup Day Bangalore
 
Deep dive and best practices on real time streaming applications nyc-loft_oct...
Deep dive and best practices on real time streaming applications nyc-loft_oct...Deep dive and best practices on real time streaming applications nyc-loft_oct...
Deep dive and best practices on real time streaming applications nyc-loft_oct...
 
AWS April 2016 Webinar Series - Getting Started with Real-Time Data Analytics...
AWS April 2016 Webinar Series - Getting Started with Real-Time Data Analytics...AWS April 2016 Webinar Series - Getting Started with Real-Time Data Analytics...
AWS April 2016 Webinar Series - Getting Started with Real-Time Data Analytics...
 
Deep Dive and Best Practices for Real Time Streaming Applications
Deep Dive and Best Practices for Real Time Streaming ApplicationsDeep Dive and Best Practices for Real Time Streaming Applications
Deep Dive and Best Practices for Real Time Streaming Applications
 
Real-time Data Processing Using AWS Lambda
Real-time Data Processing Using AWS LambdaReal-time Data Processing Using AWS Lambda
Real-time Data Processing Using AWS Lambda
 
Getting started with Serverless on AWS
Getting started with Serverless on AWSGetting started with Serverless on AWS
Getting started with Serverless on AWS
 
Serverless Architectural Patterns
Serverless Architectural PatternsServerless Architectural Patterns
Serverless Architectural Patterns
 
Serverless Stream Processing Tips & Tricks (ANT358) - AWS re:Invent 2018
Serverless Stream Processing Tips & Tricks (ANT358) - AWS re:Invent 2018Serverless Stream Processing Tips & Tricks (ANT358) - AWS re:Invent 2018
Serverless Stream Processing Tips & Tricks (ANT358) - AWS re:Invent 2018
 
Real-time Data Processing Using AWS Lambda
Real-time Data Processing Using AWS LambdaReal-time Data Processing Using AWS Lambda
Real-time Data Processing Using AWS Lambda
 
Serverless Architectural Patterns 
and Best Practices - Madhu Shekar - AWS
Serverless Architectural Patterns 
and Best Practices - Madhu Shekar - AWSServerless Architectural Patterns 
and Best Practices - Madhu Shekar - AWS
Serverless Architectural Patterns 
and Best Practices - Madhu Shekar - AWS
 
Real-time Data Processing Using AWS Lambda
Real-time Data Processing Using AWS LambdaReal-time Data Processing Using AWS Lambda
Real-time Data Processing Using AWS Lambda
 
AWS Lambda Supports Parallelization Factor for Kinesis and DynamoDB Event Sou...
AWS Lambda Supports Parallelization Factor for Kinesis and DynamoDB Event Sou...AWS Lambda Supports Parallelization Factor for Kinesis and DynamoDB Event Sou...
AWS Lambda Supports Parallelization Factor for Kinesis and DynamoDB Event Sou...
 
How HSBC Uses Serverless to Process Millions of Transactions in Real Time (FS...
How HSBC Uses Serverless to Process Millions of Transactions in Real Time (FS...How HSBC Uses Serverless to Process Millions of Transactions in Real Time (FS...
How HSBC Uses Serverless to Process Millions of Transactions in Real Time (FS...
 
Real Time Data Processing Using AWS Lambda
Real Time Data Processing Using AWS LambdaReal Time Data Processing Using AWS Lambda
Real Time Data Processing Using AWS Lambda
 
Getting Started with Real-time Analytics
Getting Started with Real-time AnalyticsGetting Started with Real-time Analytics
Getting Started with Real-time Analytics
 
Building Big Data Applications with Serverless Architectures - June 2017 AWS...
Building Big Data Applications with Serverless Architectures -  June 2017 AWS...Building Big Data Applications with Serverless Architectures -  June 2017 AWS...
Building Big Data Applications with Serverless Architectures - June 2017 AWS...
 
Serverless Architecture and Best Practices
Serverless Architecture and Best PracticesServerless Architecture and Best Practices
Serverless Architecture and Best Practices
 
Building a Real Time Dashboard with Amazon Kinesis, Amazon Lambda and Amazon ...
Building a Real Time Dashboard with Amazon Kinesis, Amazon Lambda and Amazon ...Building a Real Time Dashboard with Amazon Kinesis, Amazon Lambda and Amazon ...
Building a Real Time Dashboard with Amazon Kinesis, Amazon Lambda and Amazon ...
 
AWS를 활용한 첫 빅데이터 프로젝트 시작하기(김일호)- AWS 웨비나 시리즈 2015
AWS를 활용한 첫 빅데이터 프로젝트 시작하기(김일호)- AWS 웨비나 시리즈 2015AWS를 활용한 첫 빅데이터 프로젝트 시작하기(김일호)- AWS 웨비나 시리즈 2015
AWS를 활용한 첫 빅데이터 프로젝트 시작하기(김일호)- AWS 웨비나 시리즈 2015
 
Real-time Data Processing Using AWS Lambda
Real-time Data Processing Using AWS LambdaReal-time Data Processing Using AWS Lambda
Real-time Data Processing Using AWS Lambda
 

More from Amazon Web Services

Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Amazon Web Services
 
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Amazon Web Services
 
Esegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateEsegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateAmazon Web Services
 
Costruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSCostruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSAmazon Web Services
 
Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Amazon Web Services
 
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Amazon Web Services
 
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...Amazon Web Services
 
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsMicrosoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsAmazon Web Services
 
Database Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareDatabase Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareAmazon Web Services
 
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSCrea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSAmazon Web Services
 
API moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAPI moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAmazon Web Services
 
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareDatabase Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareAmazon Web Services
 
Tools for building your MVP on AWS
Tools for building your MVP on AWSTools for building your MVP on AWS
Tools for building your MVP on AWSAmazon Web Services
 
How to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckHow to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckAmazon Web Services
 
Building a web application without servers
Building a web application without serversBuilding a web application without servers
Building a web application without serversAmazon Web Services
 
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...Amazon Web Services
 
Introduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceIntroduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceAmazon Web Services
 

More from Amazon Web Services (20)

Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
 
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
 
Esegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateEsegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS Fargate
 
Costruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSCostruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWS
 
Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot
 
Open banking as a service
Open banking as a serviceOpen banking as a service
Open banking as a service
 
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
 
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
 
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsMicrosoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
 
Computer Vision con AWS
Computer Vision con AWSComputer Vision con AWS
Computer Vision con AWS
 
Database Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareDatabase Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatare
 
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSCrea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
 
API moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAPI moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e web
 
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareDatabase Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
 
Tools for building your MVP on AWS
Tools for building your MVP on AWSTools for building your MVP on AWS
Tools for building your MVP on AWS
 
How to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckHow to Build a Winning Pitch Deck
How to Build a Winning Pitch Deck
 
Building a web application without servers
Building a web application without serversBuilding a web application without servers
Building a web application without servers
 
Fundraising Essentials
Fundraising EssentialsFundraising Essentials
Fundraising Essentials
 
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
 
Introduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceIntroduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container Service
 

SRV304_Building High-Throughput Serverless Data Processing Pipelines

  • 1. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. AWS re:INVENT SRV304 Building High-Throughput Serverless Data Processing Pipelines C e c i l i a D e n g , S o f t w a r e D e v e l o p e r o n A W S L a m b d a N o v e m b e r 2 8 , 2 0 1 7
  • 2. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Me • Canadian • UBC • EA Canada • AWS Lambda
  • 3. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Goal Data processing that is • High-throughput ( > 1 GB/s) • Serverless (no servers to manage) • Real-time (pipeline)
  • 4. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. What to Expect • Why streams? • What’s AWS Lambda? • What’s Amazon Kinesis? • What does serverless stream processing look like? • How does Lambda process streams? • Examples use cases
  • 5. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. WHY STREAMS?
  • 6. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Stream Processing Goal • High-throughput ( > 1 GB/s) • Serverless (managed compute) • Real-time (pipeline) Streams • Data size constraint • Data time constraint • Have access to recent data • Processing time constraint Batch • No size constraint • No time constraint (not real-time) • Have access to all data • Long running processing (reports)
  • 7. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Because you have data that is: • Generated continuously and simultaneously by thousands of data sources • Typically small sizes (KBs) And needs to be processed either: • Sequentially and incrementally • Or over sliding windows in some real-time constraint Stream Processing
  • 8. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. WHAT’S LAMBDA?
  • 9. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. It’s your function Your libraries, your code, your executable With a programming model Easy to start blueprints and tutorials, monitoring, and logging That runs stateless Infrastructure abstracted, persist data using Amazon DynamoDB, Amazon S3, or ElastiCache And integrated security model IAM resource policies and roles, VPC Support Lambda: What Is It? And flexible resource model Choose your memory and we allocate proportional CPU, network bandwidth, disk I/O
  • 10. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Lambda: How Do I Trigger It? Amazon S3 Amazon SNS ASYNCHRONOUS PUSH MODEL Amazon Alexa AWS IoT SYNCHRONOUS PUSH MODEL Mapping owned by Event Source triggers Lambda via Invoke APIs resource-based policy permissions RequestResponse invocation Event Invocation HOW IT WORKS
  • 11. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Lambda: How do I Trigger It? Amazon DynamoDB Amazon Kinesis STREAM PULL MODEL Mapping owned by Lambda Lambda function invokes when new records are found on stream Lambda execution role policy permissions Polled batch RequestResponse invocation Lambda polls the streams
  • 12. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Lambda Node.js Python Java C# FUNCTIONEVENT SOURCE AWS CloudFormation Amazon API Gateway Amazon SNS Database Cloud Service Anything ENDPOINT Amazon Kinesis
  • 13. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Node.js Python Java C# FUNCTION Amazon Kinesis ENDPOINT Database Cloud Service Anything EVENT SOURCE IoT Data IoT Data Financial Data Log Data Kinesis
  • 14. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. WHAT’S KINESIS?
  • 15. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. It’s storage For real-time data that’s only stored for a limited time Where new data is made available quickly Typically less than 1 second put-to-get delay That uses a checkpoint model Supports multiple concurrent in- ordered processing Kinesis: What Is It? As a managed service With APIs that let you easily create and configure the stream and put and retrieve data
  • 16. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Kinesis: How do I Process It? … Source Shards GetRecords PutRecords • Poll for work • Checkpoint for progress • Separate checkpoints for multiple consumers • Use the KCL library Scale Amazon Kinesis by splitting or merging shards
  • 17. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Streams: How do I Process It? … DDB events Shards GetRecords • Poll for work • Checkpoint for progress • Separate checkpoints for multiple consumers • Use the KCL library Scale Amazon Kinesis by splitting or merging shards
  • 18. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. SEEMS HARD. CAN I NOT?
  • 19. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Processing Streams: Kinesis Firehose • Manages stream: • No shard configuration • No partition key or order • Manages stream processing: • Polls for records • Dump to one of • Amazon S3 • Amazon Redshift • Amazon Elasticsearch Service • Compute power default 8 * (1 vCPU + 4GB) KPU • Choose a Lambda transform function • JSON/CSV to whatever • Apache Log to JSON/CSV • Syslog to JSON/CSV
  • 20. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Processing Streams: Kinesis Firehose
  • 21. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Processing Streams: Kinesis Firehose
  • 22. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Processing Streams: Kinesis Analytics • Does not manage stream: • Need to configure Kinesis Stream • Manages stream processing: • From Amazon Kinesis or Kinesis Firehose • Polls for records • Uses a SQL model to continuously: • Map record data to internal “stream tables” (aggregation) • Query the internal “stream tables” for desired results (filter) • Output the desired results to • Additional internal “stream tables” (further aggregation) or • External Kinesis Stream or Kinesis Firehose (destination store) • Compute power default 8 * (1 vCPU + 4GB) KPU
  • 23. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Processing Streams: Kinesis Analytics
  • 24. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Processing Streams: Kinesis Analytics
  • 25. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Processing Streams: Kinesis
  • 26. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Processing Streams: Lambda • Does not manage stream: • Need to configure Kinesis Stream • Manages stream processing: • From Amazon Kinesis or DynamoDB streams • Polls for records • Sends for invocation to a Lambda function • Computer power default 1000 * (configured memory and associated sized CPU) • Setup with Lambda createEventSourceMapping • Lambda: • Preserves order • Soft concurrent limit of 1000 invocations * (max 3GB memory and associated sized CPU) • Completely customized model and functionality
  • 27. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Processing Streams: Lambda
  • 28. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Processing Streams: Lambda … Source Amazon Kinesis Destination 1 Destination 2 Shards Scale Amazon Kinesis by splitting or merging shards Polls a batch Lambda will scale automatically … Lambda Waits for response
  • 29. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. STREAM PROCESSING BY LAMBDA
  • 30. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Processing Streams: Lambda
  • 31. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Processing Streams: Lambda … Source Shards Trim horizonCheckpointCheckpointLatest Checkpoint
  • 32. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Processing Streams: Lambda Event received by Lambda function is a collection of records from the stream: { "Records": [ { "kinesis": { "partitionKey": "partitionKey-3", "kinesisSchemaVersion": "1.0", "data": "SGVsbG8sIHRoaXMgaXMgYSB0ZXN0IDEyMy4=", "sequenceNumber": "49545115243490985018280067714973144582180062593244200961" }, "eventSource": "aws:kinesis", "eventID": "shardId- 000000000000:49545115243490985018280067714973144582180062593244200961", "invokeIdentityArn": "arn:aws:iam::account-id:role/testLEBRole", "eventVersion": "1.0", "eventName": "aws:kinesis:record", "eventSourceARN": "arn:aws:kinesis:us-west-2:35667example:stream/examplestream", "awsRegion": "us-west-2" } ] }
  • 33. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Processing Streams: Lambda
  • 34. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Processing Streams: Lambda Per shard: ▪ Lambda calls GetRecords with max limit from Kinesis (10 k or 10 MB) ▪ If no record, wait some time (1s) ▪ Sub-batch in-memory and format records into Lambda payload ▪ Invoke Lambda with synchronous invoke … … Source Amazon Kinesis Destination 1 Lambda Destination 2 Shards Lambda will scale automaticallyScale Amazon Kinesis by splitting or merging shards Waits for responsePolls a batch
  • 35. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Processing Streams: Lambda ▪ Lambda blocks on ordered processing for each individual shard ▪ Increasing # of shards with even distribution allows increased concurrency ▪ Batch size may impact duration if the Lambda function takes longer to process more records … … Source Amazon Kinesis Destination 1 Lambda Destination 2 Shards Lambda will scale automaticallyScale Amazon Kinesis by splitting or merging shards Waits for responsePolls a batch
  • 36. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Processing Streams: Lambda Polls and blocks on synchronous invocation per shard If put/ingestion rate is greater than the theoretical throughput, your processing is at risk of falling behind Maximum theoretical throughput # shards * 2 MB / Lambda function duration (s) Effective theoretical throughput # shards * batch size (MB) / Lambda function duration (s)
  • 37. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Processing Streams: Lambda Retries Will retry on execution failures until the record is expired Throttles and errors impact duration and directly impact throughput Best practice Retry with exponential backoff Effective theoretical throughput with retries ( # shards * batch size (MB) ) / ( function duration (s) * retries until expiry) … Source Amazon Kinesis Destination 1 Lambda Destination 2 Shards Polls a batch Receives success Receives error Receives error
  • 38. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. MORE EXAMPLES
  • 39. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Real-time Ad Serving
  • 40. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. The Assembly Line
  • 41. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Anomaly Detection
  • 42. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Anomaly Detection
  • 43. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Game Analytics Store real-time player scores and stats Send to Lambda for further aggregation like Top scores or Longest runs Surface leaderboards
  • 44. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Game Analytics
  • 45. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. QUESTIONS?
  • 46. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Thank you! @ c i c i k e n d i g g i t ( m o s t l y m e c o m p l a i n i n g t o a i r l i n e s )