Serverless in production, an experience report (codemotion milan)

Yan Cui
Yan CuiSpeaker at Self
from the
TRENCHESTRENCHES
what you should know before you go to production
AWS LAMBDAAWS LAMBDA
hi,I’mYanCui
hi,I’mYanCui
AWS user since 2009
Serverless in production, an experience report (codemotion milan)
Serverless in production, an experience report (codemotion milan)
Serverless in production, an experience report (codemotion milan)
Serverless in production, an experience report (codemotion milan)
Serverless in production, an experience report (codemotion milan)
Serverless in production, an experience report (codemotion milan)
apr, 2016
hidden complexities and dependencies
low utilisation to leave room for traffic spikes
EC2 scaling is slow, so scale earlier
lots of cost for unused resources
up to 30 mins for deployment
deployment required downtime
- Dan North
“lead time to someone saying
thank you is the only reputation
metric that matters.”
Serverless in production, an experience report (codemotion milan)
“what would good
look like for us?”
be small
be fast
have zero downtime
have no lock-step
DEPLOYMENTS SHOULD...
FEATURES SHOULD...
be deployable independently
be loosely-coupled
WE WANT TO...
minimise cost for unused resources
minimise ops effort
reduce tech mess
deliver visible improvements faster
nov, 2016
170 Lambda functions in prod
1.2 GB deployment packages in prod
95% cost saving vs EC2
15x no. of prod releases per month
time
is a good fit
1st function in prod!
time
is a good fit
?
time
is a good fit
1st function in prod!
ALERTING
CI / CD
TESTING
LOGGING
MONITORING
Practices ToolsPrinciples
what is good? how to make it good? with what?
Principles outlast Tools
170 functions
WOOF!
? ?
time
is a good fit
1st function in prod!
SECURITY
DISTRIBUTED
TRACING
CONFIG
MANAGEMENT
evolving the PLATFORM
rebuilt search
Legacy Monolith Amazon Kinesis Amazon Lambda
Amazon CloudSearch
Legacy Monolith Amazon Kinesis Amazon Lambda
Amazon CloudSearchAmazon API Gateway Amazon Lambda
new analytics pipeline
Legacy Monolith Amazon Kinesis Amazon Lambda
Google BigQuery
Legacy Monolith Amazon Kinesis Amazon Lambda
Google BigQuery
1 developer, 2 days
design production
(his 1st serverless project)
Legacy Monolith Amazon Kinesis Amazon Lambda
Google BigQuery
“nothing ever got done
this fast at Skype!”
- Chris Twamley
- Dan North
“lead time to someone saying
thank you is the only reputation
metric that matters.”
Rebuilt
with Lambda
Serverless in production, an experience report (codemotion milan)
Serverless in production, an experience report (codemotion milan)
Serverless in production, an experience report (codemotion milan)
Serverless in production, an experience report (codemotion milan)
Serverless in production, an experience report (codemotion milan)
Serverless in production, an experience report (codemotion milan)
Rebuilt
with Lambda
BigQuery
BigQuery
grapheneDB
BigQuery
grapheneDB
BigQuery
grapheneDB
BigQuery
getting PRODUCTION READY
CHOOSE A
FRAMEWORK
DEPLOYMENT
http://serverless.com
https://github.com/awslabs/serverless-application-model
http://apex.run
https://apex.github.io/up
https://github.com/claudiajs/claudia
https://github.com/Miserlou/Zappa
http://gosparta.io/
TESTING
amzn.to/29Lxuzu
Level of Testing
1.Unit
do our objects do the right thing?
are they easy to work with?
Serverless in production, an experience report (codemotion milan)
Level of Testing
1.Unit
2.Integration
does our code work against code we
can’t change?
handler
handler
test by invoking
the handler
Level of Testing
1.Unit
2.Integration
3.Acceptance
does the whole system work?
Level of Testing
unit
integration
acceptance
feedback
confidence
“…We find that tests that mock external
libraries often need to be complex to
get the code into the right state for the
functionality we need to exercise.
The mess in such tests is telling us that
the design isn’t right but, instead of
fixing the problem by improving the
code, we have to carry the extra
complexity in both code and test…”
Don’t Mock Types You Can’t Change
“…The second risk is that we have to be
sure that the behaviour we stub or mock
matches what the external library will
actually do…
Even if we get it right once, we have to
make sure that the tests remain valid
when we upgrade the libraries…”
Don’t Mock Types You Can’t Change
Don’t Mock Types You Can’t Change
Services
Paul Johnston
The serverless approach to
testing is different and may
actually be easier.
http://bit.ly/2t5viwK
LambdaAPI Gateway DynamoDB
LambdaAPI Gateway DynamoDB
Unit Tests
LambdaAPI Gateway DynamoDB
Unit Tests
Mock/Stub
is our request correct?
is the request mapping
set up correctly?is the API resources
configured correctly?
are we assuming the
correct schema?
LambdaAPI Gateway DynamoDB
is Lambda proxy
configured correctly?
is IAM policy set
up correctly?
is the table created?
what unit tests will not tell you…
Serverless in production, an experience report (codemotion milan)
most Lambda functions are simple
have single purpose, the risk of
shipping broken software has largely
shifted to how they integrate with
external services
observation
Serverless in production, an experience report (codemotion milan)
But it slows down
my feedback loop…
IT’S NOT
ABOUT YOU!
…if a service can’t provide
you with a relatively easy
way to test the interface in
reality, then you should
consider using another one.
Paul Johnston
“…Wherever possible, an acceptance
test should exercise the system end-to-
end without directly calling its internal
code.
An end-to-end test interacts with the
system only from the outside: through
its interface…”
Testing End-to-End
Legacy Monolith Amazon Kinesis Amazon Lambda
Amazon CloudSearchAmazon API Gateway Amazon Lambda
Legacy Monolith Amazon Kinesis Amazon Lambda
Amazon CloudSearchAmazon API Gateway Amazon Lambda
Test Input
Legacy Monolith Amazon Kinesis Amazon Lambda
Amazon CloudSearchAmazon API Gateway Amazon Lambda
Test Input
Validate
integration tests exercise
system’s Integration with its
external dependencies
acceptance tests exercise
system End-to-End from
the outside
integration tests differ from
acceptance tests only in HOW the
Lambda functions are invoked
observation
Serverless in production, an experience report (codemotion milan)
Serverless in production, an experience report (codemotion milan)
Serverless in production, an experience report (codemotion milan)
CI + CD PIPELINE
“the earlier you consider CI + CD, the
more time you save in the long run”
- me
“…We prefer to have the end-to-end
tests exercise both the system and the
process by which it’s built and
deployed…
This sounds like a lot of effort (it is), but
has to be done anyway repeatedly
during the software’s lifetime…”
Testing End-to-End
“deployment scripts
that only live on the CI
box is a disaster
waiting to happen”
- me
Jenkins build config deploys and tests
unit + integration tests
deploy
acceptance tests
if [ "$1" = "deploy" ] && [ $# -eq 4 ]; then
STAGE=$2
REGION=$3
PROFILE=$4
npm install
AWS_PROFILE=$PROFILE 'node_modules/.bin/sls' deploy -s $STAGE -r $REGION
elif [ "$1" = "int-test" ] && [ $# -eq 4 ]; then
STAGE=$2
REGION=$3
PROFILE=$4
npm install
AWS_PROFILE=$PROFILE npm run int-$STAGE
elif [ "$1" = "acceptance-test" ] && [ $# -eq 4 ]; then
STAGE=$2
REGION=$3
PROFILE=$4
npm install
AWS_PROFILE=$PROFILE npm run acceptance-$STAGE
else
usage
exit 1
fi
build.sh allows repeatable builds on both local & CI
Serverless in production, an experience report (codemotion milan)
Auto Auto Manual
LOGGING
Serverless in production, an experience report (codemotion milan)
2016-07-12T12:24:37.571Z 994f18f9-482b-11e6-8668-53e4eab441ae
GOT is off air, what do I do now?
2016-07-12T12:24:37.571Z 994f18f9-482b-11e6-8668-53e4eab441ae
GOT is off air, what do I do now?
UTC Timestamp API Gateway Request Id
your log message
function name
date
function version
me
Logs are not easily searchable
in CloudWatch Logs.
LOG OVERLOAD
CENTRALISE LOGS
CENTRALISE LOGS
MAKE THEM EASILY
SEARCHABLE
+ +
the elk stack
CloudWatch Logs
CloudWatch Logs AWS Lambda ELK stack
CloudWatch Events
Serverless in production, an experience report (codemotion milan)
http://bit.ly/2f3zxQG
DISTRIBUTED TRACING
Serverless in production, an experience report (codemotion milan)
“my followers didn’t
receive my new post!”
- a user
where could the
problem be?
correlation IDs*
* eg. request-id, user-id, yubl-id, etc.
ROLL YOUR OWN
CLIENTS
kinesis client
http client
sns client
http://bit.ly/2k93hAj
ROLL YOUR OWN
CLIENTS
X-RAY
Amazon X-Ray
Amazon X-Ray
traces do not span over
API Gateway
http://bit.ly/2s9yxmA
MONITORING + ALERTING
“where do I install
monitoring agents?”
you can’t
• invocation Count
• error Count
• latency
• throttling
• granular to the minute
• support custom metrics
• same metrics as CW
• better dashboard
• support custom metrics
https://www.datadoghq.com/blog/monitoring-lambda-functions-datadog/
Serverless in production, an experience report (codemotion milan)
Serverless in production, an experience report (codemotion milan)
Serverless in production, an experience report (codemotion milan)
my code
my code
my code
internet internet
press button something happens
“how do I batch up
and send logs in the
background?”
you can’t
(kinda)
console.log(“hydrating yubls from db…”);
console.log(“fetching user info from user-api”);
console.log(“MONITORING|1489795335|27.4|latency|user-api-latency”);
console.log(“MONITORING|1489795335|8|count|yubls-served”);
timestamp metric value
metric type
metric namemetrics
logs
CloudWatch Logs AWS Lambda
ELK stack
logs
metrics
CloudWatch
http://bit.ly/2gGredx
DASHBOARDS
DASHBOARDS
SET ALARMS
DASHBOARDS
SET ALARMS
TRACK APP-LEVEL
METRICS
Not Only CloudWatch
Serverless in production, an experience report (codemotion milan)
“you really don't want
your monitoring
system to fail at the
same time as the
system it monitors”
- me
CONFIG MANAGEMENT
easily and quickly propagate
config changes
Serverless in production, an experience report (codemotion milan)
me
Environment variables make it
hard to share configurations
across functions.
me
Environment variables make it
hard to implement fine-grained
access to sensitive info.
CENTRALISED
CONFIG SERVICE
config service
goes here
Serverless in production, an experience report (codemotion milan)
Serverless in production, an experience report (codemotion milan)
Serverless in production, an experience report (codemotion milan)
SSM
Parameter
Store
sensitive data should be encrypted
in-flight, and at rest
(credentials, connection string, etc.)
role-based access
SSM Parameter Store
HTTPS
role-based access
encrypted in-flight
SSM Parameter Store
encrypt
role-based access
SSM Parameter Store
encrypted at-rest
HTTPS
role-based access
SSM Parameter Store
encrypted in-flight
CENTRALISED
CONFIG SERVICE
CLIENT LIBRARY
fetch & cache at Cold Start
invalidate at interval + signal
http://bit.ly/2yLUjwd
PRO TIPS
max 75 GB total deployment package size*
* limit is per AWS region
Janitor Monkey
Janitor Lambda
http://bit.ly/2xzVu4a
disable versionFunctions in
install Serverless framework as dev
dependency at project level
dev dependencies are excluded since 1.16.0
http://bit.ly/2vzBqhC
http://amzn.to/2vtUkDU
UNDERSTAND
COLDSTARTS
Amazon X-Ray
1st invocation
2nd invocation
cold start
source: http://bit.ly/2oBEbw2
http://bit.ly/2rtCCBz
C#
http://bit.ly/2rtCCBz
Java
http://bit.ly/2rtCCBz
NodeJs, Python
http://bit.ly/2rtCCBz
AVOID
COLDSTARTS
CloudWatch Event AWS Lambda
CloudWatch Event AWS Lambda
ping
ping
ping
ping
CloudWatch Event AWS Lambda
ping
ping
ping
ping
CloudWatch Event AWS Lambda
ping
ping
ping
ping
HEALTH CHECKS?
AVOID HARD
ASSUMPTIONS
ABOUT FUNCTION
LIFETIME
USE STATE
FOR
OPTIMISATION
max 5 mins execution time
USE RECURSION
FOR LONG
RUNNING TASKS
@theburningmonk
theburningmonk.com
github.com/theburningmonk
@theburningmonk
theburningmonk.com
github.com/theburningmonk
http://bit.ly/2yQZj1H
all my blog posts on Lambda
1 of 196

More Related Content

Viewers also liked(20)

SQS ingress for AWS LambdaSQS ingress for AWS Lambda
SQS ingress for AWS Lambda
Yan Cui1K views
Auth0でAWSの認証認可を強化Auth0でAWSの認証認可を強化
Auth0でAWSの認証認可を強化
Hideya Furuta4.5K views
Agile overviewAgile overview
Agile overview
Tsuyoshi Ushio41.5K views
Failing at Scale - PNWPHP 2016Failing at Scale - PNWPHP 2016
Failing at Scale - PNWPHP 2016
Chris Tankersley638 views
Incident Response in the wake of Dear CEOIncident Response in the wake of Dear CEO
Incident Response in the wake of Dear CEO
Paul Dutot IEng MIET MBCS CITP OSCP CSTM2K views
Roxar Multiphase MeterRoxar Multiphase Meter
Roxar Multiphase Meter
ali_elkaseh7.9K views
TrendsByte PresentationTrendsByte Presentation
TrendsByte Presentation
Indalytics Advisors453 views
Gsm jammerGsm jammer
Gsm jammer
Dr-Ahmed Elkorany2.7K views
114 Numalliance114 Numalliance
114 Numalliance
Ludovic Vallet530 views
Catálogo Elk Sport 2016 2017Catálogo Elk Sport 2016 2017
Catálogo Elk Sport 2016 2017
Elk Sport32K views

Similar to Serverless in production, an experience report (codemotion milan)(20)

Recently uploaded(20)

CXL at OCPCXL at OCP
CXL at OCP
CXL Forum158 views
ThroughputThroughput
Throughput
Moisés Armani Ramírez25 views
ISWC2023-McGuinnessTWC16x9FinalShort.pdfISWC2023-McGuinnessTWC16x9FinalShort.pdf
ISWC2023-McGuinnessTWC16x9FinalShort.pdf
Deborah McGuinness80 views

Serverless in production, an experience report (codemotion milan)