AWS re:Invent 2016: ↑↑↓↓←→←→ BA Lambda Start (SVR305)

© 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Vyom Nagrani - Manager, Product Management, Amazon Web Services
Richard McFarland - VP Data Services and Chief Data Scientist, Hearst Corp
November 2016
↑↑↓↓←→←→ BA Lambda Start
SVR305

What to Expect from the Session
 Working with AWS Lambda
 Customer example
 Hearst clickstream and data pipeline
 Best practices and hacks across the lifecycle
 Development and testing
 Deployment and ALM
 Security and scaling
 Debugging and operations
 Questions & answers

Working with AWS Lambda
EVENT SOURCE FUNCTION SERVICES (ANYTHING)
Changes in
data state
Requests to
endpoints
Changes in
resource state
Node
Python
Java
C#

Cost-effective and
efficient
No infrastructure
to manage
Pay only for what you use
Bring your
own code
Productivity-focused compute platform to build powerful, dynamic, modular
applications in the cloud
Run code in standard
languages
Focus on business logic
Benefits of AWS Lambda
1 2 3

Amazon
S3
Amazon
DynamoDB
Amazon
Kinesis
AWS
CloudFormation
AWS
CloudTrail
Amazon
CloudWatch
Amazon
SNS
Amazon
SES
Amazon
API Gateway
Amazon
Cognito
Amazon
Alexa
Cron events
DATA STORES ENDPOINTS
CONFIGURATION REPOSITORIES EVENT/MESSAGE SERVICES
Event sources that trigger AWS Lambda
… and the list will continue to grow!
AWS
CodeCommit
AWS
IoT

Key scenarios and use cases for AWS Lambda
Data processing
Stateless processing of
discrete or streaming
updates to your data-store
or message bus
Control systems
Customize responses and
response workflows to state
and data changes within
AWS
App backend development
Execute server side
backend logic for web,
mobile, device, or voice user
interactions

Customer example:
Hearst clickstream and data
pipeline

Cron-ified
Clickstream
Lambda-fy! Lessons
Learned
What I will be talking about

What business is Hearst in?
Magazines
20 U.S. titles & nearly 300 international titles
Newspapers
15 daily & 34 weekly titlesBroadcasting
30 television & 2 radio stations
Business Media
Operates more than 20 business-to businesses with
significant holdings in the auto, electronic, medical and
financial industriesHearst has over 300 websites world-wide, which
results in 1TB of data per day and over 20 billion
pageviews per year.
“Hearst is in the Data Creation Business”

VARIETY
Structured Data
Unstructured
Data
VELOCITY
Batches
Streaming
VALUE
EXTRACTION
DBA and
Analysts
Cloud Engineering
And
Machine Learning
“Managing our clickstream is necessary for Hearst to extract
business value from our big data”
VOLUME
Single Source
Many Sources
Normal
Data
Big Data
Clickstream
Hearst’s Cron-based Clickstream

Buzzing API
API
Ready
Data
Amazon
Kinesis
Node.JS
App- Proxy
Clickstream
Data Science
Application
Amazon Redshift
ETL on EMR
Models
Agg Data
Amazon
S3
Users to
Hearst
Properties
Hearst’s data pipeline: cron-based
LATENCY
THROUGHPUT
Milliseconds
100GB/Day
30 Seconds
5GB/Day
100 Seconds
1GB/Day
5 Seconds
1GB/Day
DynamoDB API
Gateway
5 min
cron
5 min
cron
5 min
cron
5 min
cron

Lambda-fy it!
Code must execute in
5 minutes or less
Lambda
Limit
For every Lambda
process, create a
“watchdog” that checks
for failures and fills in the
gaps
Lambda
Tip
Lambda
etl_main
etl_watchdog
Lambda
ds_main
ds_watchdog
Lambda
translate
Lambda
push_to_DynamoDB
Lambda
api_integration
Add “triggers” in S3
that are 0 byte files
with the name of the
Lambda function
Lambda
Tip
trigger trigger trigger
Convert existing cron-driven process into trigger-based process
Buzzing API
API
Ready
Data
Data Science
Application
Amazon Redshift
ETL on EMR
DynamoDB API
Gateway
Amazon
Kinesis
Lambda
Kinesis Firehose_to_S3

Deep dive: Python frameworks
What really “exploded” the use of Lambda functions at Hearst was the
introduction of Frameworks
Problem: Using Lambda functions to access multiple AWS tools and perform data
science requires access credentials and database frameworks
psycopg2
boto3
gzip
pgpasslib
pandas pytz
numpy httplib2
Programmers have to configure Python modules not in the standard Python 2.7
library set
So Hearst created a standard set of Python frameworks that make this easy
hearst_frameworks.zip

from redshift_framework.redshift_session import RedshiftSession
# initiate Redshift session
rs = RedshiftSession(pgpass_key='HOSTNAME:PORT:DB:USERNAME')
# read table into pandas dataframe
df = rs.get_df(query='select url,title from {tbl} limit 10',tbl='tmp_fbinst')
# execute sql stored in S3, replace {dt} values in file with 2016/02/21
rs.execute_file(file_name='s3://hearstdataservices/code/FBINST22.sql',dt='2016/02/21')
# execute query and save to tsv in S3
rs.save_query_to_csv(query='select * from tmp_fbinst where url is not null order by 12 desc;',
file_name='s3://hearstdataservices/report/test.csv',sep='t')
# execute sql and save table to json file in S3
rs.save_query_to_json(query='select * from tmp_fbinst where url is not null order by 12 desc;',
file_name='s3://hearstdataservices/report/test.json')
Deep dive: Redshift framework Redshift Framework
is our core
framework that
makes it easy to
create Lambda
functions that
communicate with
Amazon Redshift
Lambda
Tip
Load framework
No password needed
“macro”
variables!
Easily write
query results
S3

Helpers framework
import redshift_framework.helpers as helpers
#write a data frame to a csv/json
helpers.df_to_csv(df1, 's3://hearst/df1.csv')
helpers.df_to_json(df1, 's3://hearst/df1.json')
#download/upload files to S3
helpers.download_s3_file('s3://my-bucket/prefix/sub-prefix/file-name','/path/to/file-name')
helpers.upload_s3_file('/path/to/file-name','s3://my-bucket/prefix/sub-prefix/file-name‘)
#file exists in S3
file_exists = helpers.file_exists_in_s3('my-bucket','prefix/sub-prefix/my-file')
#get file from S3 and read into data frame
df = helpers.get_df_from_csv('s3://prefix/sub-prefix/my-file.csv', sep='t')
#get gzip file from S3 and read into string
content = helpers.get_file_content('s3://prefix/sub-prefix/my-file.csv.gz', compression='gzip')
Create Helpers Framework
to make it easier to
perform frequently
executed actions as well as
reading and writing to S3
Lambda
Tip
Load framework
Simpler packaging of the pandas
function with direct connection to
S3
Common task
Quickly get data in
S3 into a data
frame

Hearst’s serverless data pipeline
Amazon S3
Amazon
DynamoDB
Amazon
Kinesis
Amazon
API Gateway
Amazon Redshift
Lambda
etl_main
etl_watchdog
Lambda
ds_main
ds_watchdog
Lambda
translate
Lambda
push_to_DynamoDB
Lambda
Kinesis Firehose_to_S3
DATA API
DATA STORAGE
DATA
PROCESSING

A look at our lessons learned
Amazon
Kinesis
Spark-
Scala
Amazon
Redshift
S3
Dynamo
DB &
API
Gateway
<
5min
$$$$ $$$
Lambda
Amazon
Kinesis
Amazon
Redshift
S3
Dynamo
DB &
API
Gateway
<
2min
$$$ $

AWS Lambda allows you to manage
your clickstream with less
You can actually
“Do More With
Less”
You don’t need a
big team: With
the right
frameworks in
place, this can all
be done with a
team of 2-3 FTEs
…Or one very rare
individual

Best practices and hacks
across the lifecycle

Getting started on AWS Lambda
Development
and Testing
Deployment
and ALM
Security
and Scaling
Debugging
and Operations
Bring your own code
• Node.js 4.3, Java 8,
Python 2.7, C#
Simple resource model
• Select power rating from
128 MB to 1.5 GB
• CPU and network
allocated proportionately
Stateless
• Persist data using
external storage
• No affinity or access to
underlying infrastructure
Flexible use
• Synchronous or
asynchronous
• Integrated with other
AWS services
NEW !

Anatomy of a Lambda function
Handler() function
• The method in your
code where AWS
Lambda begins
execution
Event object
• Pre-defined object
format for AWS
integrations & events
• Java & C# support
simple data types,
POJOs/POCOs, and
Stream input/output
Context object
• Use methods and
properties like
getRemainingTimeIn
Millis(), identity,
awsRequestId,
invokedFunctionArn,
clientContext,
logStreamName
Development
and Testing
Deployment
and ALM
Security
and Scaling
Debugging
and Operations

FunctionConfiguration metadata
VpcConfig
• Enables private
communication with
other resources
within your VPC
• Provide EC2 security
group and subnets,
auto-creates ENIs
• Internet access can
be added though
NAT Gateway
DeadLetterConfig
• Failed events sent to
your SQS queue /
SNS topic
• Redrive messages
that Lambda could
not process
• Currently available
for asynchronous
invocations only
Environment
• Add custom
key/value pairs as
part of configuration
• Reuse code across
different setups or
passwords
• Encrypted with
specified KMS key
on server, decrypted
at container init
NEW ! NEW !
Development
and Testing
Deployment
and ALM
Security
and Scaling
Debugging
and Operations

AWS Lambda limits
Resource Limits Default Limit
Ephemeral disk capacity ("/tmp" space) 512 MB
Number of file descriptors 1024
Number of processes and threads (combined total) 1024
Maximum execution duration per request 300 seconds
Invoke request body payload size (RequestResponse) 6 MB
Invoke request body payload size (Event) 128 K
Invoke response body payload size (RequestResponse) 6 MB
Dead-letter payload size (Event) 128 K
Deployment Limits Default Limit
Lambda function deployment package size (.zip/.jar file) 50 MB
Size of code/dependencies that you can zip into a deployment package (uncompressed zip/jar size) 250 MB
Total size of all the deployment packages that can be uploaded per region 75 GB
Total size of environment variables set 4 KB
Throttling Limits (can request service limit increase) Default Limit
Concurrent executions 100
Development
and Testing
Deployment
and ALM
Security
and Scaling
Debugging
and Operations

The container model
Container reuse
• Declarations in your Lambda function
code outside handler()
• Disk content in /tmp
• Background processes or callbacks
• Make use of container reuse
opportunistically, e.g.
• Load additional libraries
• Cache static data
• Database connections
Cold starts
• Time to set up a new container and do
necessary bootstrapping when a
Lambda function is invoked for the first
time or after it has been updated
• Ways to reduce cold start latency
• More memory = faster
performance, lower start up time
• Smaller function ZIP loads faster
• Node.js and Python start execution
faster than Java and C#
Development
and Testing
Deployment
and ALM
Security
and Scaling
Debugging
and Operations

The execution environment
Underlying OS
• Public Amazon Linux AMI version
(amzn-ami-hvm-2016.03.3.x86_64-gp2)
• Linux kernel version (4.4.23-
31.54.amzn1.x86_64)
• Compile native binaries against this
environment – can be used to bring
your own runtime!
• Changes over time, always check the
latest versions supported here
Available libraries
• ImageMagick (nodejs wrapper and
native binary)
• OpenJDK 1.8, .NET Core 1.0.1
• AWS SDK for JavaScript version 2.6.9
• AWS SDK for Python (Boto 3) version
1.4.1, Botocore version 1.4.61
• Embed your own SDK/libraries if you
depend on a specific version
Development
and Testing
Deployment
and ALM
Security
and Scaling
Debugging
and Operations

Building a deployment package
Node.js & Python
• .zip file consisting of
your code and any
dependencies
• Use npm/pip to
install libraries
• All dependencies
must be at root level
Java
• Either .zip file with all
code/dependencies,
or standalone .jar
• Use Maven / Eclipse
IDE plugins
• Compiled class &
resource files at root
level, required jars in
/lib directory
C# (.NET Core)
• Either .zip file with all
code/dependencies,
or a standalone .dll
• Use Nuget /
VisualStudio plugins
• All assemblies (.dll)
at root level, platform
specific libraries
managed by VS
tooling
Development
and Testing
Deployment
and ALM
Security
and Scaling
Debugging
and Operations
NEW !

Managing continuous delivery
Source Build Test Deploy
Amazon S3 AWS Lambda (DIY)
AWS CodeCommit
GitHub
AWS CodePipeline
CodeshipJenkins
AWS CodeBuild
NEW !
Development
and Testing
Deployment
and ALM
Security
and Scaling
Debugging
and Operations
… OR …

Deployment tools and frameworks available
CloudFormation
• AWS Serverless
Application Model -
extension optimized
for Serverless
• New Serverless
resources – APIs,
Functions, Tables
• Open specification
(Apache 2.0)
Chalice
• Python serverless
micro-framework
• Quickly create and
deploy applications
• Set up AWS Lambda
and Amazon API
Gateway endpoint
• https://github.com/aw
slabs/chalice
Third-party tools
• Serverless
Framework
(https://serverless.com/)
• Apex Serverless
Architecture
(http://apex.run/)
• DEEP Framework by
Mitoc Group
(https://github.com/Mitoc
Group/deep-framework)
NEW !
Development
and Testing
Deployment
and ALM
Security
and Scaling
Debugging
and Operations

Function versioning and aliases
• Versions = immutable copies of code +
configuration
• Aliases = mutable pointers to versions
• Development against $LATEST version
• Each version/alias gets its own ARN
• Enables rollbacks, staged promotions,
“locked” behavior for client
Development
and Testing
Deployment
and ALM
Security
and Scaling
Debugging
and Operations

The push model and resource policies
Development
and Testing
Deployment
and ALM
Security
and Scaling
Debugging
and Operations
Function (resource) policy
• Permissions you grant to your Lambda
function determine which service or
event source can invoke your function
• Resource policies make it easy to
grant cross-account permissions to
invoke your Lambda function

The pull model and IAM roles
Development
and Testing
Deployment
and ALM
Security
and Scaling
Debugging
and Operations
IAM (execution) role
• Permissions you grant to this role
determine what your AWS Lambda
function can do
• If event source is Amazon DynamoDB
or Amazon Kinesis, then add read
permissions in IAM role

Concurrent executions and throttling
Determining concurrency
• For stream-based event sources:
Number of shards per stream is the
unit of concurrency
• For all other event sources: Request
rate and duration drives concurrency
(concurrency = requests per second *
function duration)
Throttle behavior
Automatically retried until data expires
• For Asynchronous invocations:
Automatically retried for up to six
hours, with delays between retries
• For Synchronous invocations: Invoking
application receives a 429 error and is
responsible for retries
Development
and Testing
Deployment
and ALM
Security
and Scaling
Debugging
and Operations

Other scaling considerations
For Lambda
• Remember, a throttle is NOT an error!
• If you expect sudden large spikes in
demand, consider Asynchronous
invocations to Lambda
• Proactively engage AWS Support to
increase your throttling limits
For upstream/downstream services
• Build retries/backoff in client
applications and upstream setup
• Make sure your downstream setup
“keeps up” with Lambda scaling
• Limit concurrency when connecting to
relational databases
Development
and Testing
Deployment
and ALM
Security
and Scaling
Debugging
and Operations

Errors and retries
Types of errors
• 4xx Client Error: Can be fixed by
developer, e.g. InvalidParameterValue
(400), ResourceNotFound (404),
RequestTooLarge (413), etc.
• 5xx Server Error: Most can be fixed by
admin, e.g. EC2 ENI management
errors (502)
Retry policy
Automatically retried until data expires
• For Asynchronous invocations:
Automatically retried 2 extra times,
then published to dead-letter queue
• For Synchronous invocations: Invoking
application receives an error code and
is responsible for retries
Development
and Testing
Deployment
and ALM
Security
and Scaling
Debugging
and Operations

Tracing and tracking
Integration with AWS X-Ray
• Collects data about requests that your
application serves
• Visibility into the AWS Lambda service
(dwell time, number of retries, latency
and errors)
• Detailed breakdown of your function’s
performance, including calls made to
downstream services and endpoints
Integration with AWS CloudTrail
• Captures calls made to AWS Lambda
API; delivers log files to Amazon S3
• Tracks the request made to AWS
Lambda, the source IP address from
which the request was made, who
made the request, when it was made
• All control plane APIs can be tracked
(no versioning/aliasing and invoke API)
Development
and Testing
Deployment
and ALM
Security
and Scaling
Debugging
and Operations
COMING
SOON!

Troubleshooting and monitoring
Logs
• Every invocation generates START, END,
and REPORT entries in CloudWatch Logs
• User logs included
• Node.js – console.log(), console.error(),
console.warn(), console.info()
• Java – log4j.*, LambdaLogger.log(),
system.out(), system.err()
• Python – print, logging.*
• C# – LambdaLogger.Log(),
ILambdaContext.Logger.Log(),
console.write(), console.writeline()
Metrics
• Default (Free) Metrics: Invocations,
Duration, Throttles, Errors – available as
CloudWatch Metrics
• Additional Metrics: Create custom
metrics for tracking health/status
• Function code vs log-filters
• Ops-centric vs. business-centric
Development
and Testing
Deployment
and ALM
Security
and Scaling
Debugging
and Operations

Conclusion and next steps
Key takeaway
AWS Lambda is one of the core components of the
platform AWS provides to develop serverless applications
Next steps
1. Stay up to date with AWS Lambda on the Compute blog
and check out our detail page for more scenarios.
2. Send us your questions, comments, and feedback on
the AWS Lambda Forums.

Thank you!
Follow us on Twitter
@vyomnagrani
@statsrick

Remember to complete
your evaluations!

Related Sessions
 SVR202 – What’s New with AWS Lambda
 SVR301 – Real-time Data Processing Using AWS Lambda
 SVR302 – Optimizing the Data Tier in Serverless Web Applications
 SVR304 – bots + serverless = ❤
 SVR307 – Application Lifecycle Management in a Serverless World
 SVR311 – The State of Serverless Computing
 SVR401 – Using AWS Lambda to Build Control Systems for Your AWS Infrastructure
 SVR402 – Operating Your Production API
 CMP211 – Getting Started with Serverless Architectures
 DEV205 – Monitoring, Hold the Infrastructure: Getting the Most from AWS Lambda
 DEV301 – Amazon CloudWatch Logs and AWS Lambda: A Match Made in Heaven
 DEV308 – Chalice: A Serverless Microframework for Python

AWS re:Invent 2016: ↑↑↓↓←→←→ BA Lambda Start (SVR305)

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (20)

Similar to AWS re:Invent 2016: ↑↑↓↓←→←→ BA Lambda Start (SVR305)

Similar to AWS re:Invent 2016: ↑↑↓↓←→←→ BA Lambda Start (SVR305) (20)

More from Amazon Web Services

More from Amazon Web Services (20)

Recently uploaded

Recently uploaded (20)

AWS re:Invent 2016: ↑↑↓↓←→←→ BA Lambda Start (SVR305)