Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
© 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Tara E. Walker,
AWS Technical Evangelist
06.21.2...
Agenda
AWS Services for Data Processing
AWS Lambda
Amazon Kinesis
Architecture & Workflow for Streaming Data Processing
St...
AWS Services for Data
Processing
Amazon
Kinesis
AWS
Lambda
Amazon Lambda: Overview
Serverless compute service that runs code in response to events
without need to provision and mana...
AWS Lambda: Serverless Compute in the Cloud
Compute service that runs your code in response to events
Easy to author, depl...
High performance at any scale;
Cost-effective and efficient
No Infrastructure to
manage
Pay only for what you use: Lambda
...
Amazon Kinesis: Overview
A managed service for streaming data ingestion and processing
Amazon Kinesis
Streams
Build your o...
Amazon Kinesis: Streaming data done the AWS way
Makes it easy to capture, deliver, and process real-time data streams
Pay ...
Benefits of Amazon Kinesis for stream data
ingestion and continuous processing
Real-time Ingest
Highly Scalable
Durable
El...
Data Processing/Streaming
Architecture & Workflow
Smart
Devices
Click
Stream
Log
Data
AWS Lambda and Amazon Kinesis integration
How it Works
Stream-based model:
▪ Lambda polls the stream; When new records det...
Streaming Architecture Workflow: Lambda + Kinesis
Data Input Kinesis Action Lambda Data Output
IT application activity
Cap...
Common Architecture: Lambda + Kinesis
Real Time Data Processing
Amazon
Kinesis
AWS
Lambda 1
Amazon
CloudWatch
Amazon
Dynam...
Common Architecture: Lambda + Kinesis
Data Processing for Data Storage/Analysis
Use Lambda to “fan out” to
other AWS servi...
Demo: Real time processing of
Amazon Kinesis data streams with
AWS Lambda
Data Processing:
Best Practices & Tips
Best Practices: Kinesis
Creating a Kinesis stream
Streams
▪ Made up of Shards
▪ Each Shard ingests data up to 1MB/sec
▪ Ea...
Best Practices: Lambda
Create different Lambda functions for each task, associate to same Kinesis stream
Log to
CloudWatch...
Best Practices: Lambda
Attaching a Lambda function to a Kinesis Stream
▪ Shards: One Lambda function concurrently invoked ...
Best Practices: Kinesis
Performance tuning Kinesis as an event source
Batch size:
▪ Number of records that AWS Lambda
will...
Best Practices: Lambda
Creating Lambda functions
Memory:
▪ CPU and disk proportional to the memory configured
▪ Increasing...
Best Practices: Lambda
Monitoring and Debugging Lambda functions
•Monitoring: available in Amazon CloudWatch Metrics
• Inv...
Get Started: Data Processing with AWS
Next Steps
1. Create your first Kinesis stream. Configure hundreds of thousands
of d...
Thank You!
Tara E. Walker
AWS Technical Evangelist
@taraw
Real-time Data Processing Using AWS Lambda
Upcoming SlideShare
Loading in …5
×

Real-time Data Processing Using AWS Lambda

3,527 views

Published on

Serverless architecture can eliminate the need to provision and manage servers required to process files or streaming data in real time.

In this session, we will cover the fundamentals of using AWS Lambda to process data from sources such as Amazon DynamoDB Streams, Amazon Kinesis, and Amazon S3. We will walk through sample use cases for real-time data processing and discuss best practices on using these services together. We will then demonstrate how to set up a real-time stream processing solution using just Amazon Kinesis and AWS Lambda, all without the need to run or manage servers.

AWS DevDay San Francisco, June 21, 2016.
Presenter: Tara Walker, Technical Evangelist

Published in: Technology
  • Be the first to comment

Real-time Data Processing Using AWS Lambda

  1. 1. © 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Tara E. Walker, AWS Technical Evangelist 06.21.2016 Real-time Data Processing Using AWS Lambda
  2. 2. Agenda AWS Services for Data Processing AWS Lambda Amazon Kinesis Architecture & Workflow for Streaming Data Processing Streaming Data Processing Demo Best Practices in Building Data Processing Solutions
  3. 3. AWS Services for Data Processing Amazon Kinesis AWS Lambda
  4. 4. Amazon Lambda: Overview Serverless compute service that runs code in response to events without need to provision and manage servers No Servers to Manage Automatically runs code without Provisioning or Managing servers. Just write the code and upload it to Lambda. Continuous Scaling Auto scales Application precisely with the size of the workload. Code runs in Parallel & Processes each trigger individually Subsecond Metering Starts Code within milliseconds of an Event Executes only when event is triggered. Triggers from AWS services or directly from web or mobile applications
  5. 5. AWS Lambda: Serverless Compute in the Cloud Compute service that runs your code in response to events Easy to author, deploy, maintain, secure and manage Allows for focus on business logic, not infrastructure Stateless, event-driven code with native support for Node.js, Java, and Python languages Compute & Code without managing infrastructure like EC2 instances and auto scaling groups Makes it easy to Build back-end services that perform at scale
  6. 6. High performance at any scale; Cost-effective and efficient No Infrastructure to manage Pay only for what you use: Lambda automatically matches capacity to your request rate. Purchase compute in 100ms increments. Bring Your Own Code “Productivity focused compute platform to build powerful, dynamic, modular applications in the cloud” Run code in a choice of standard languages. Use threads, processes, files, and shell scripts normally. Focus on business logic, not infrastructure. You upload code; AWS Lambda handles everything else. Benefits of AWS Lambda for building a server- less data processing engine 1 2 3
  7. 7. Amazon Kinesis: Overview A managed service for streaming data ingestion and processing Amazon Kinesis Streams Build your own custom applications that process or analyze streaming data Amazon Kinesis Firehose Easily load massive volumes of streaming data into Amazon S3 and Redshift Amazon Kinesis Analytics Easily analyze data streams using standard SQL queries
  8. 8. Amazon Kinesis: Streaming data done the AWS way Makes it easy to capture, deliver, and process real-time data streams Pay as you go, no up-front costs Elastically scalable Right services for your specific use cases Real-time latencies Easy to provision, deploy, and manage
  9. 9. Benefits of Amazon Kinesis for stream data ingestion and continuous processing Real-time Ingest Highly Scalable Durable Elastic Replay-able Reads Continuous Processing FX Elastic Load-balancing incoming streams Fault-tolerance, Checkpoint / Replay Enable multiple processing apps in parallel Enable data movement into Stores/ Processing Engines Managed Service Low end-to-end latency
  10. 10. Data Processing/Streaming Architecture & Workflow Smart Devices Click Stream Log Data
  11. 11. AWS Lambda and Amazon Kinesis integration How it Works Stream-based model: ▪ Lambda polls the stream; When new records detected Lambda function invoked. ▪ New records are passed by Kinesis as parameter. ▪ Kinesis mapped as Event source in Lambda Synchronous invocation: ▪ Lambda invoked by RequestResponse invocation i.e. polling Kinesis Stream. ▪ Lambda function is executed once. Event structure: ▪ Event received by Lambda function is a collection of records from Kinesis stream. ▪ Lambda Kinesis Event source: Batch size/Max records configured to be received per invocation.
  12. 12. Streaming Architecture Workflow: Lambda + Kinesis Data Input Kinesis Action Lambda Data Output IT application activity Capture the stream Audit Process the stream SNS Metering records Condense Redshift Change logs Backup S3 Financial data Store RDS Transaction orders Process SQS Server health metrics Monitor EC2 User clickstream Analyze EMR IoT device data Respond Backend endpoint Custom data Custom action Custom application
  13. 13. Common Architecture: Lambda + Kinesis Real Time Data Processing Amazon Kinesis AWS Lambda 1 Amazon CloudWatch Amazon DynamoDB AWS Lambda 2 Amazon S3 1. Real-time event data sent to Amazon Kinesis, allows multiple AWS Lambda functions to process the same events. 2. In AWS Lambda, Function 1 processes the incoming events and stores event data in Amazon DynamoDB 3. Lambda Function 1 also sends values to Amazon CloudWatch for simple monitoring of metrics. 4. In AWS Lambda function, Function 2 stores incoming data events in Amazon S3 https://s3.amazonaws.com/awslambda-reference- architectures/stream-processing/lambda-refarch- streamprocessing.pdf
  14. 14. Common Architecture: Lambda + Kinesis Data Processing for Data Storage/Analysis Use Lambda to “fan out” to other AWS services i.e. Storage, Database, and BI/analytics Amazon Kinesis stream can continuously capture and store terabytes of data per hour from hundreds of thousands of sources Grant AWS Lambda permissions for the relevant stream actions via IAM (Execution Role) during function creation IAM IAM IAM
  15. 15. Demo: Real time processing of Amazon Kinesis data streams with AWS Lambda
  16. 16. Data Processing: Best Practices & Tips
  17. 17. Best Practices: Kinesis Creating a Kinesis stream Streams ▪ Made up of Shards ▪ Each Shard ingests data up to 1MB/sec ▪ Each Shard emits data up to 2MB/sec Data ▪ All data is stored for 24 hours, Replay data inside of 24hr window ▪ A Partition Key is supplied by producer and used to distribute the PUTs across Shards ▪ A unique Sequence # is returned to the Producer upon a successful PUT call
  18. 18. Best Practices: Lambda Create different Lambda functions for each task, associate to same Kinesis stream Log to CloudWatch Logs Push to SNS
  19. 19. Best Practices: Lambda Attaching a Lambda function to a Kinesis Stream ▪ Shards: One Lambda function concurrently invoked per Kinesis shard ▪ Increasing shards will cause more Lambda functions invoked concurrently ▪ Each individual shard follows ordered processing … … Source Kinesis Destination 1 Lambda Destination 2 Pollers FunctionsShards Lambda will scale automaticallyScale Kinesis by adding shards
  20. 20. Best Practices: Kinesis Performance tuning Kinesis as an event source Batch size: ▪ Number of records that AWS Lambda will retrieve from Kinesis at the time of invoking your function ▪ Increasing batch size will cause fewer Lambda function invocations with more data processed per function Starting Position: ▪ The position in the stream where Lambda starts reading ▪ Set to “Trim Horizon” for reading from start of stream (all data) ▪ Set to “Latest” for reading most recent data (LIFO) (latest data)
  21. 21. Best Practices: Lambda Creating Lambda functions Memory: ▪ CPU and disk proportional to the memory configured ▪ Increasing memory makes your code execute faster (if CPU bound) ▪ Increasing memory allows for larger record sizes processed Timeout: ▪ Increasing timeout allows for longer functions, but more wait in case of errors Retries: ▪ For Kinesis, Lambda has unlimited retries (until data expires) Permission model: • Lambda pulls data from Kinesis, so no invocation role needed, only execution role
  22. 22. Best Practices: Lambda Monitoring and Debugging Lambda functions •Monitoring: available in Amazon CloudWatch Metrics • Invocation count • Duration • Error count • Throttle count •Debugging: available in Amazon CloudWatch Logs • All Metrics • Custom logs • RAM consumed • Search for log events
  23. 23. Get Started: Data Processing with AWS Next Steps 1. Create your first Kinesis stream. Configure hundreds of thousands of data producers to put data into an Amazon Kinesis stream. Ex. data from Social media feeds. 2. Create and test your first Lambda function. Use any third party library, even native ones. First 1M requests each month are on us! 3. Read the Developer Guide, AWS Lambda and Kinesis Tutorial, and resources on GitHub at AWS Labs • http://docs.aws.amazon.com/lambda/latest/dg/with-kinesis.html • https://github.com/awslabs/lambda-streams-to-firehose lambda- streams-to-firehose
  24. 24. Thank You! Tara E. Walker AWS Technical Evangelist @taraw

×