AWS Lambda service is very popular among startups as it enables fast development and easy scaling but there are challenges in bringing existing business applications to Serverless world. At re:Invent 2022 AWS released a new service called SnapStart in order to make first service response very fast. It is almost a perfect solution for business heavyweights like Java and soon other platforms like .net. During the session we will dive into technical details of Lambda and how AWS managed to speed up already blazing fast execution time of Lambda functions using technologies like FirecrackerVM, CRaC and similar.
2. Lambda SnapStart
• Initialize your function during deployment
process
• Make a snapshot of the entire function
space/VM
• Restore on first execution
2
4. 4
#!/bin/sh
set -euo pipefail
# Initialization - load function handler
source $LAMBDA_TASK_ROOT/"$(echo $_HANDLER | cut -d. -f1).sh"
# Processing
while true
do
HEADERS="$(mktemp)"
# Get an event. The HTTP request will block until one is received
EVENT_DATA=$(curl -sS -LD "$HEADERS" -X GET "http://${AWS_LAMBDA_RUNTIME_API}/2018-06-
01/runtime/invocation/next")
# Extract request ID by scraping response headers received above
REQUEST_ID=$(grep -Fi Lambda-Runtime-Aws-Request-Id "$HEADERS" | tr -d '[:space:]' | cut -d: -f2)
# Run the handler function from the script
RESPONSE=$($(echo "$_HANDLER" | cut -d. -f2) "$EVENT_DATA")
# Send the response
curl -X POST "http://${AWS_LAMBDA_RUNTIME_API}/2018-06-01/runtime/invocation/$REQUEST_ID/response" -d
"$RESPONSE"
done
BASH RUNTIME
5. ColdStart
• Rust – Python – NodeJs – Java -
.net
• 100ms - 10 sec – 15 min
• Most affected
• Front facing lamba fns
• Lambda fns in Pipelines
• Spiky workloads
Code Migration / Business Value
SpringBoot,…
5
6. Java Solutions For ColdStart Problem
• CDS - Class Data Sharing
• Ahead Of Time Compilation - AOT
• Oracle GraalVM
• RedHat Quarkus & Micronaut frameworks
• Coordinated Restore at Checkpoint - CRaC
6
7. FIRECRACKER VM
SNAPSHOT
Firecracker VM
Snapshot of a full OS VM
Linux CRIU
Checkpoint/Restore In Userspace
7
https://github.com/firecracker-
microvm/firecracker/blob/main/docs/snapshotting/snapsho
t-support.md
11. 2 part series by Mike Roberts
• No SnapStart - 4 seconds cold start
• SnapStart without hook-based optimization - 2
seconds cold start
• SnapStart with first pass of hook-based
optimization - 1.2 seconds cold start
11
https://blog.symphonia.io/posts/2023-01-11_snapstart-what-why
• Nuances of hooks
• SnapStart adds about 2 minutes to deployment time at time of writing, so you may not want to use it
in development environments.
12. 6 part series by Vadym Kazulkin
12
https://dev.to/vkazulkin/measuring-java-11-lambda-cold-
starts-with-snapstart-part-1-first-impressions-30a4
Measuring Java 11 Lambda cold starts with SnapStart
Pure Java, Quarkus, Micronaut, SpringBoot
Priming
* Direct Lambda function Invoke with PRIMING
13. New Stuff / Snapshot hooks
Before Snapshot
• Load Data
• Load Params and Secrets (instead of using ENV)
• Priming / + framework support
• Compile Java ? jbang.dev
• 15 min
After Restore
• Reset: cache expiration
* Coordinated Restore at Checkpoint - CRaC
13
14. Side effects
• Slower deployment – ~2 min per function
• Stack deployment fails if you have bugs in Init phase
• Snapshot is generated once for every AZ
• Extra LOG keywords and numbers / not comparable
14
15. 15
INIT_REPORT
This record shows duration details for the Init phase, including the duration of any beforeCheckpoint runtime
hooks.
REPORT
Restore Duration: The time it takes for Lambda to restore a snapshot, load the runtime (JVM), and run any
afterRestore runtime hooks.
Billed Restore Duration: The time it takes for Lambda to load the runtime (JVM) and run any afterRestore
hooks. You are not charged for the time it takes to restore a snapshot.
Cold Start = Restore Duration + Duration
Lambda SnapStart / LOG
16. Not Supported
• provisioned concurrency
• arm64 architecture
• the Lambda Extensions API
• EFS
• X-Ray
• Ephemeral storage up to 512 MB
Presentation Title 16
17. Problems
• Lambda Versions
• CloudFormation - Two step migration from ARM
• Random number generators
• Network Connections
• LOG numbers don’t match
17
19. 14 days Snapshot Expiration
• Snapshot of a not used Lambda is deleted
• On first invoke Snapshot creation is initiated
• Takes ~2 minutes to create a snapshot
• Error 500 with info
• S3 Events / connect via SQS
• Different Developer eXperience
19
20. Community discussion
Why not cold start after 14 days?
Why not Snapshot after first call?
Why don’t you let me pay for my snapshot
storage instead of deleting it?
Why don’t you set SnapStart as default?
Why can’t I decide when the snapshot will be
created?
This is better than GraalVM
Self refreshing lambda?
20
https://youtu.be/nhwgm9J4F9A
21. Takeaways
• This is just the beginning for SnapStart
• SnapStart eliminates ColdStart just make
sure your workload doesn’t get too cold
• Time of VM snapshots is coming
21