Who am I?
Laurent Bernaille @d2si
• OPS background
• Cloud enthousiast
• Opensource advocate
• Love discovering, building (and breaking…) new things
• Passionate about the ongoing IT transformations
Monitoring: how do I monitor my functions?
• Are my functions behaving well?
• Where is my New Relic?
• Where is my Datadog?
Monitoring: for lambda, we can use cloudwatch!
• Simple application: <20 lambdas
• Is this normal? What about trends? What about scale?
• What about user experience?
Monitoring: What about errors?
Are these errors "normal"?
What kind of errors?
• Code errors?
• Execution errors (out of memory? out of time?)
• Lambda runtime error (can they happen?)
Are they related to retries?
Logging: what are the cause for errors / latency?
• Lambda logs console/logger outputs
• Logs are in Cloudwatch logs
One Log group per function, nice!
One Log stream per?
Crazy amount of logs (only from lambda engine here)
> Requires careful configuration
> AND appropriate tools
Logging: needle in a haystack
Tracing: where is my function taking time?
• No off-the-shelf APM solution (yet)
• Current State-of-the-art: manual tracing
Let's write a function that reacts to writes on s3
• do a transformation
• writes the result on s3
Guess what happens?
Kinesis streamDynamo DB
Kinesis guarantees in-order delivery
What will happen now?
Lambdas can be very fast
• < 10ms for simple treatments
• What happens when we call many lambdas? Latency sums up
• Is this fast enough?
- Paris-London, one-way 4-5ms
- redis local latency? < 100us
- simple operation on CPU? < 10ns
• Being fast is important, but on the other side, billing is per 100ms
• First run of a lambda is *much* slower (100s ms)
> Even slower in some cases (lambda in a VPC which requires an ENI)
• Lambdas are rescheduled regularly (every few hours) => new cold-start
• What about new version of the code?
Warm-up and rescheduling
Limits and throttling
• By default Lambda is limited to 100 concurrent executions
• For a 100ms function, it means 1000 invocations/s
• No metric for concurrent executions
- Look at throttling
- Estimate concurrency based on function duration / number of calls
Event source behavior / configuration
• One event at a time or batching
• Dead-Letter queues
Other managed services
• Serverless applications (usually) don't use RDBMS
• Serverless applications (usually) don't use classic messaging technologies
• Scaling up / down needs to be automated
• Not always simple
New services => New expertise
- table and index design
- read / write capacity estimation
- optimize performance *and *costs
- sharding for multiplexing and scalability
- when to reshard / merge shards?
Testing is not easy
• How do I replicate Lambda in my CI environment?
• Will I use AWS services for unit testing?
• What about mocking?
Local deployment is helpflul to iterate fast
• How do I replicate Lambda locally?
• How can I simulate AWS services?
- "Easy" for some (many dynamoDB implementations)
- Much harder for some complex integration (DynamoDB streams for instance?)
- Several projects working on this (localstack)
Packaging and versioning
• Easy for the code
• Lambda can be versioned in AWS
Most frameworks are designed to push from local machine
• Build the code, get dependencies, push
• Can be duplicated in CI
• But no real artifact that can be shared
Deploying the same version across environments?
Is there a deployment "artifact" I can share
- across environements
- across AWS accounts (Prod / Staging)
- with all the dependencies built-in
What is an application?
Is it a single function?
• Deployed independently
• Versioned independently
> What about shared libraries between functions?
The answer is probably somewhere in the middle
• No clear best practice yet
• Trial and error
Is it all my functions?
• Versioned as a whole
• With bundled shared libraries
• Same artifact with different handlers
• Deployed together or independently?
> Functions and dependencies can sum up to a big artifact (Megabytes)
Serverless is the future (or a big part of it)
• Focus on business logic that matters
• Much simpler applications
• Really pay for what you use
Serverless creates many new challenges
• How can we adapt standard code best practices?
• How do operate these new applications?
From NoOPS to NewOPS
• No longer sysadmins or netadmins
• Supervision remains similar but requires new tools
• A big focus on new architectures and new backends
• Optimize for performance and costs