AWS Lambda is a serverless, event-driven compute service that lets you run code for virtually any type of application or backend service without provisioning or managing servers. AWS Lambda is chosen for its flexibility, the ease of integration with other AWS Services, and reducing the amount of infrastructure you and your team own. But over time, when the number of clients and requests start to increase, and you start caring about latency, you may discover that there is no free lunch. Clients complain about latency, things you've taken for granted when running your software on EC2 or Fargate no longer apply, and costs start to ramp up. In this talk, I'm going to describe some of the lessons learned from working on multiple services backed by AWS Lambda: what are and how to reduce the cold starts, how the JVM makes them even more problematic, when AWS Lambda is more expensive than the less abstract platform, how to use provisioned concurrency and why one of the biggest problems in Computer Science (caching) is even bigger on Lambdas.
3. About me
• A decade of experience with JVM languages, mostly Java
and Scala
• Working with server-based environments since the beginning
• Pratical experience with AWS Lambda since February 2021
• Maitanance and development of 2 existing services
• Development of additional two services from scratch
Andrzej Dębski
4. Agenda
1. Cold starts (generic and those specific to JVM)
2. AWS Lambda Pricing
3. In-memory caching on Lambda
Andrzej Dębski
10. Cold start type 1 (container recycle)
1. 18:16:32 Duration: 85.24 ms Max Memory Used: 109
MB Init Duration: 1821.08 ms
2. 18:18:50 Duration: 1.95 ms Max Memory Used: 109
MB
3. 18:24:05 Duration: 1.74 ms Max Memory Used: 109
MB
4. 18:30:40 Duration: 72.91 ms Max Memory Used: 109
MB Init Duration: 1870.40 ms
Andrzej Dębski
11.
12. Cold start type 2 (JVM JIT and lazy load)
1. Code path 1: Duration: 731.32 ms
2. Code path 1: Duration: 19.86 ms
3. Code path 1: Duration: 4.31 ms
4. Code path 2: Duration: 1261.40 ms (new code path
executed for the first time)
5. Code path 2: Duration: 2.12 ms
Andrzej Dębski
13. Reducing cold starts
1.Provisioned concurrency (best but costly)
1.Keeps a set of instances ready to respond to requests
2."Background" traffic (best effort)
3.Use a different language (e.g. Go) or use AOT
compilation through GraalVM
1.https://shinesolutions.com/2021/08/30/improving-cold-start-
times-of-java-aws-lambda-functions-using-graalvm-and-
native-images/
Andrzej Dębski
14.
15. Cold start for provisioned concurrency
1. Code path 1: Duration: 33.12 ms
2. Code path 1: Duration: 2.09 ms
3. Code path 1: Duration: 2.34 ms
4. Code path 2: Duration: 2.03 ms (new code path
executed for the first time)
5. Code path 2: Duration: 2.40 ms
Andrzej Dębski
16. Provisioned concurrency best practices
1.Make sure the function is invoked using the alias.
2.Monitor (and alarm) on metrics
1. ProvisionedConcurrencySpilloverInvocations
2. ProvisionedConcurrencyUtilization
3. Pre-warm the code paths in the constructor
4. Use code deploy policies to gradually deploy
new function revisions.
Andrzej Dębski
17. Recap
1.Cold starts are real and they affect the JVM
Lambdas even more
2.Either accept them or pay for provisioned
concurrency (and use it well)
3. Monitor and instrument your functions to
understand the bottlenecks in your code
Andrzej Dębski
19. Lambda pricing
1. Lambda price (https://aws.amazon.com/lambda/pricing/):
1. On demand: billed for every GB-second
(sum(duration ) * allocated memory) + number of
invocations
2. Provisioned concurrency is more complicated. We
pay for keeping the containers "warm"
2. Free-tier
Andrzej Dębski
20. How to calculate AWS costs
1. AWS pricing calculator
1. https://calculator.aws/#/
2. Cost estimates for the following examples:
1. https://tinyurl.com/yk7v3wek
Andrzej Dębski
22. Assumptions
1. "Average" request time: 200 ms
2. Monthly costs
3. Region: us-east-1
4. Focus on compute cost, ignore everything else
Andrzej Dębski
23. Scenarios considered
1. "Hello world": on-demand vs provisioned capacity
2. Smallest Fargate vs Lambda
3. StackExchange on Lambda vs Fargate
Andrzej Dębski
24. "Hello world", 1 request per second
Andrzej Dębski
256 MB RAM
On demand Provisioned
capacity of 1
0.33 $ 4.55 $
25. Fargate, 1 request per second
Andrzej Dębski
1769 MB RAM 1vCPU
2GB RAM
On demand Provisioned
capacity of 1
Fargate
8.8 $ 28.28 $ 36.04 $
26.
27. Simulating Stack Exchange
1. https://stackexchange.com/performance
1. 1.3* 10^9 monthly page views means ~495
requests per second
2. For Fargate we assume 9 tasks to protect from AZ
outage, 82.5 RPS per task.
3. For Lambda provisioned concurrency try to handle
every request with prov capacity.
1. 495 (RPS) / 5 (requests per second per container) = 99
Andrzej Dębski
28. SE, 495 requests per second
Andrzej Dębski
1769 MB RAM 1vCPU
2GB RAM
On demand Provisioned
capacity of 99
Fargate,
9 tasks
7,744.27 $ 6,502.63 $ 324.36 $
29.
30. Recap
1. Free-tier
2. Use the AWS Pricing Calculator
3. With more traffic Lambda gets really costly
4. Take advantage of the "free" compute during init phase for
on-demand lambdas
5. For consistent workloads provisioned concurrency may be
cheaper
6. Always consider factors other than $$$ when evaluating
technologies
Andrzej Dębski
36. Caching in Lambda – challenges
1. No hard guarantees when the container will be
recycled
2. Best effort async updates
3. No control over the routing algorithm (no sticky
sessions)
Andrzej Dębski
37. Caching in Lambda – approaches
1. (Mostly) rely on out of process cache
2. Use different compute platform if L1 cache is critical
3. Use Lambda extensions
1.That's how AWS AppConfig does it
Andrzej Dębski
38. Caching in Lambda – extensions
1. Code that can execute alongside your Lambda
2. Extension continues to execute AFTER the Lambda
handler returned a response.
3. Implementing cache in extension
1.Save the data on Lambda FS
2.Expose local HTTP server and serve the data
Andrzej Dębski
40. Extensions for caching
1. (+) "Asynchronous" cache updates
2. (-) Overhead (in ms):
1.Avg (no ext/ext): 1.88 ms/14.1 ms
2.Tm99.9 (no ext/ext): 1.88 ms/14.2 ms
3. (-) Lambda instance can't handle new requests until
extension is done
4. (-) Extension instances are not shared between
containers
Andrzej Dębski
41. Recap
1. In memory caching in Lambda is not straightforward
1. Small window of time where the cache is "warm"
2. Asynchronous updates are best effort only
2. Extensions improve the situation but have their own
downsides
1. Runtime overhead
2. Complicate the flow
3. L2 cache or different compute platform
Andrzej Dębski