Operational challenges behind
Serverless architectures
16 Mai 2017 - AWS User group
Who am I?
Laurent Bernaille @d2si
• OPS background
• Cloud enthousiast
• Opensource advocate
• Love discovering, building (and breaking…) new things
• Passionate about the ongoing IT transformations
@lbernail
About this talk
About this talk
Agenda
• Observability
• Challenges with event based architecture
• Understanding new services
• Security
• Continuous Delivery
Observability
Monitoring: how do I monitor my functions?
• Are my functions behaving well?
• Where is my New Relic?
• Where is my Datadog?
Monitoring: for lambda, we can use cloudwatch!
Invocations/mn
Average duration
• Simple application: <20 lambdas
• Is this normal? What about trends? What about scale?
• What about user experience?
Monitoring: What about errors?
Errors
Are these errors "normal"?
What kind of errors?
• Code errors?
• Execution errors (out of memory? out of time?)
• Lambda runtime error (can they happen?)
Are they related to retries?
Logging: what are the cause for errors / latency?
• Lambda logs console/logger outputs
• Logs are in Cloudwatch logs
One Log group per function, nice!
One Log stream per?
Crazy amount of logs (only from lambda engine here)
> Requires careful configuration
> AND appropriate tools
Logging: needle in a haystack
Tracing: where is my function taking time?
• No off-the-shelf APM solution (yet)
• Current State-of-the-art: manual tracing
Challenges with event based architecture
Snowball effects
Let's write a function that reacts to writes on s3
• do a transformation
• writes the result on s3
Guess what happens?
Poison messages
Kinesis streamDynamo DB
Kinesis guarantees in-order delivery
What will happen now?
Latency
Lambdas can be very fast
• < 10ms for simple treatments
• What happens when we call many lambdas? Latency sums up
• Is this fast enough?
- Paris-London, one-way 4-5ms
- redis local latency? < 100us
- simple operation on CPU? < 10ns
• Being fast is important, but on the other side, billing is per 100ms
Warm-up times
• First run of a lambda is *much* slower (100s ms)
> Even slower in some cases (lambda in a VPC which requires an ENI)
• Lambdas are rescheduled regularly (every few hours) => new cold-start
• What about new version of the code?
Asynchronicity
Event processing is asynchronous, which can have side-effects
• Race conditions
• Inconsistent states
> Applications must take this into account
Understanding new services
Lambda
Warm-up and rescheduling
Limits and throttling
• By default Lambda is limited to 100 concurrent executions (now 1000!)
• For a 100ms function, it means 1000 invocations/s (now 10000/s)
• No metric for concurrent executions
- Look at throttling
- Estimate concurrency based on function duration / number of calls
Event source behavior / configuration
• One event at a time or batching
• Retries
• Dead-Letter queues
Other managed services
New services
• Serverless applications (usually) don't use RDBMS
• Serverless applications (usually) don't use classic messaging technologies
Scalability
• Scaling up / down needs to be automated
• Not always simple
New services => New expertise
• DynamoDB
- table and index design
- read / write capacity estimation
- optimize performance *and *costs
• Kinesis
- sharding for multiplexing and scalability
- when to reshard / merge shards?
Security
Security
Serverless helps with security
• No Operating System to manage
• No application runtime to manage
• Limited attack surface (short function)
• Short lifespan (<5mn for function, up to 6h for container)
And others are sometimes trickier
• Many external services to secure (SAAS, managed services)
• AWS permissions
But some things don't change
• Code security
• Frameworks
• 3rd party dependancies
Continuous Delivery
Continous integration
Testing is not easy
• How do I replicate Lambda in my CI environment?
• Will I use AWS services for unit testing?
• What about mocking?
Local deployment is helpflul to iterate fast
• How do I replicate Lambda locally?
• How can I simulate AWS services?
- "Easy" for some (many dynamoDB implementations)
- Much harder for some complex integration (DynamoDB streams for instance?)
- Several projects working on this (localstack)
Packaging and versioning
Managing versioning
• Easy for the code
• Lambda can be versioned in AWS
Most frameworks are designed to push from local machine
• Build the code, get dependencies, push
• Can be duplicated in CI
• But no real artifact that can be shared
Deploying the same version across environments?
Is there a deployment "artifact" I can share
- across environements
- across AWS accounts (Prod / Staging)
- with all the dependencies built-in
What is an application?
Is it a single function?
• Deployed independently
• Versioned independently
> What about shared libraries between functions?
The answer is probably somewhere in the middle
• No clear best practice yet
• Trial and error
Is it all my functions?
• Versioned as a whole
• With bundled shared libraries
• Same artifact with different handlers
• Deployed together or independently?
> Functions and dependencies can sum up to a big artifact (Megabytes)
Conclusion
Conclusion
Serverless is the future (or a big part of it)
• Focus on business logic that matters
• Much simpler applications
• Really pay for what you use
Serverless creates many new challenges
• How can we adapt standard code best practices?
• How do operate these new applications?
From NoOPS to NewOPS
• No longer sysadmins or netadmins
• Supervision remains similar but requires new tools
• A big focus on new architectures and new backends
• Optimize for performance and costs
Questions?
Thank you
@lbernail

Operational challenges behind Serverless architectures

  • 1.
    Operational challenges behind Serverlessarchitectures 16 Mai 2017 - AWS User group
  • 2.
    Who am I? LaurentBernaille @d2si • OPS background • Cloud enthousiast • Opensource advocate • Love discovering, building (and breaking…) new things • Passionate about the ongoing IT transformations @lbernail
  • 3.
  • 4.
  • 5.
    Agenda • Observability • Challengeswith event based architecture • Understanding new services • Security • Continuous Delivery
  • 6.
  • 7.
    Monitoring: how doI monitor my functions? • Are my functions behaving well? • Where is my New Relic? • Where is my Datadog?
  • 8.
    Monitoring: for lambda,we can use cloudwatch! Invocations/mn Average duration • Simple application: <20 lambdas • Is this normal? What about trends? What about scale? • What about user experience?
  • 9.
    Monitoring: What abouterrors? Errors Are these errors "normal"? What kind of errors? • Code errors? • Execution errors (out of memory? out of time?) • Lambda runtime error (can they happen?) Are they related to retries?
  • 10.
    Logging: what arethe cause for errors / latency? • Lambda logs console/logger outputs • Logs are in Cloudwatch logs One Log group per function, nice! One Log stream per?
  • 11.
    Crazy amount oflogs (only from lambda engine here) > Requires careful configuration > AND appropriate tools Logging: needle in a haystack
  • 12.
    Tracing: where ismy function taking time? • No off-the-shelf APM solution (yet) • Current State-of-the-art: manual tracing
  • 13.
    Challenges with eventbased architecture
  • 14.
    Snowball effects Let's writea function that reacts to writes on s3 • do a transformation • writes the result on s3 Guess what happens?
  • 15.
    Poison messages Kinesis streamDynamoDB Kinesis guarantees in-order delivery What will happen now?
  • 16.
    Latency Lambdas can bevery fast • < 10ms for simple treatments • What happens when we call many lambdas? Latency sums up • Is this fast enough? - Paris-London, one-way 4-5ms - redis local latency? < 100us - simple operation on CPU? < 10ns • Being fast is important, but on the other side, billing is per 100ms Warm-up times • First run of a lambda is *much* slower (100s ms) > Even slower in some cases (lambda in a VPC which requires an ENI) • Lambdas are rescheduled regularly (every few hours) => new cold-start • What about new version of the code?
  • 17.
    Asynchronicity Event processing isasynchronous, which can have side-effects • Race conditions • Inconsistent states > Applications must take this into account
  • 18.
  • 19.
    Lambda Warm-up and rescheduling Limitsand throttling • By default Lambda is limited to 100 concurrent executions (now 1000!) • For a 100ms function, it means 1000 invocations/s (now 10000/s) • No metric for concurrent executions - Look at throttling - Estimate concurrency based on function duration / number of calls Event source behavior / configuration • One event at a time or batching • Retries • Dead-Letter queues
  • 20.
    Other managed services Newservices • Serverless applications (usually) don't use RDBMS • Serverless applications (usually) don't use classic messaging technologies Scalability • Scaling up / down needs to be automated • Not always simple New services => New expertise • DynamoDB - table and index design - read / write capacity estimation - optimize performance *and *costs • Kinesis - sharding for multiplexing and scalability - when to reshard / merge shards?
  • 21.
  • 22.
    Security Serverless helps withsecurity • No Operating System to manage • No application runtime to manage • Limited attack surface (short function) • Short lifespan (<5mn for function, up to 6h for container) And others are sometimes trickier • Many external services to secure (SAAS, managed services) • AWS permissions But some things don't change • Code security • Frameworks • 3rd party dependancies
  • 23.
  • 24.
    Continous integration Testing isnot easy • How do I replicate Lambda in my CI environment? • Will I use AWS services for unit testing? • What about mocking? Local deployment is helpflul to iterate fast • How do I replicate Lambda locally? • How can I simulate AWS services? - "Easy" for some (many dynamoDB implementations) - Much harder for some complex integration (DynamoDB streams for instance?) - Several projects working on this (localstack)
  • 25.
    Packaging and versioning Managingversioning • Easy for the code • Lambda can be versioned in AWS Most frameworks are designed to push from local machine • Build the code, get dependencies, push • Can be duplicated in CI • But no real artifact that can be shared Deploying the same version across environments? Is there a deployment "artifact" I can share - across environements - across AWS accounts (Prod / Staging) - with all the dependencies built-in
  • 26.
    What is anapplication? Is it a single function? • Deployed independently • Versioned independently > What about shared libraries between functions? The answer is probably somewhere in the middle • No clear best practice yet • Trial and error Is it all my functions? • Versioned as a whole • With bundled shared libraries • Same artifact with different handlers • Deployed together or independently? > Functions and dependencies can sum up to a big artifact (Megabytes)
  • 27.
  • 28.
    Conclusion Serverless is thefuture (or a big part of it) • Focus on business logic that matters • Much simpler applications • Really pay for what you use Serverless creates many new challenges • How can we adapt standard code best practices? • How do operate these new applications? From NoOPS to NewOPS • No longer sysadmins or netadmins • Supervision remains similar but requires new tools • A big focus on new architectures and new backends • Optimize for performance and costs
  • 29.