Sloppy Little Serverless Stories

•

0 likes•59 views

Everything fails all the time! A quote repeated by many everyday. How does it feel when things fail in production? How do you recover from such situations? How can you make sure they don’t repeat? All these discussed with real production incidents and the measures taken to mitigate such failures. We will also look at few of the most common failure possibilities in a serverless ecosystem. Remember, when everything fails all the time, you must learn something everyday to be operational all the time!

Software

From Oops… to Ops
Sloppy Little Serverless Stories
Sheen Brisals
The LEGO Group
sheenbrisals

Oops!
A word that is used to show
the recognition of a mistake
Ops
Operations

LEGO.com was
switched to
Serverless
on AWS Cloud
July 10, 2019

200+ Lambda functions
40+ microservices
40+ API endpoints
25+ DynamoDB tables
30+ S3 buckets
20+ SNS topics
60+ SQS queues + DLQs
200+ SSM parameters
PROD - Services Stats

No one starts perfect with Serverless.
That’s fine, but strive to be better at
every iteration, and that is important.

Oops!
1
When friendly
Firehose turned
foe…

Kinesis
Data Firehose
API
Gateway
S3
bucket
Event
producer
Fan-out
function
Click-stream event ingestion
Buffer size Buffer interval

Oops! Ops
Buffer size: 3 MB
Buffer interval: 1 min
Peak season lambda
failure
Buffer size: 1 MB
Buffer interval: 1 min
Perfect with a tuned
lambda

Oops!
2
When SSM
Parameter Store
packed a punch…

200+ Lambda functions
200+ SSM parameters
PROD - Services Stats
Reads Writes

Oops! Ops
Default throughput
40 TPS
Rate limiting

Oops! Ops
Default throughput
40 TPS
Rate limiting
Advanced throughput
1000 TPS
Self change via console

Oops!
3
When too much
became
too little…

CloudWatch
Logs
Balancing concurrency
Concurrency 25
Log
Splitter
Elasticsearch
Monitoring

Oops! Ops
Failing at wrong time
Missing crucial logs
Inadequate testing
Incorrect dashboards
Better dev
process
Better
monitoring

Oops!
4
When going
higher gone
through the roof…

CloudWatch
Event
Heavy lifting
function
Art of coding – Copy & Paste
Trigger
rule
StepFunction
2 GB RAM
5 mins run
2 x daily
Frontend Status check
API
Request
handler
Status
store
2 GB RAM
100 ms run
1000s x daily

Oops! Ops
Memory: 2 GB
Invocations: 1 per sec
Invocations: 2.5mil /mo
Cost~ $9.00 / month
Duration: 100 ms
Memory: 256 MB
Invocations: 1 per sec
Invocations: 2.5mil /mo
Cost~ $1.50 / month
Duration: 100 ms

Oops!
5
When a key
moment turned
chaotic…

SaaS
API
Feeds
Feeds
store
DLQ
Data Pipeline
Internal
Feeds API
Feeds
Internal
App
{API Key}
The API re-deployment somehow
changed the API Key!
No one knew. No one noticed!

Oops! Ops
Silent breakage
Bad customer exp.
Unhappy business
Chaotic dev process
Better dev
process
Better
prompts
Better
monitoring

Serverless requires a new way of thinking, new way of
working, and new way of running applications.
That means, we need to change our way of thinking, our
way of working, and our way of running applications.

Oopsibility!
2
Dev to Prod
debacle
• Keys & secrets
• Domains
• Resource Configurations
• Leaked privacy

From Oops of Sorrows to
Operational Success…
• Know the service limits
• Be Well-Architected
• See through the Serverless Lens
• Alarm – Alert – Act
• Monitor monitor monitor

Thank
you!
Go
Build
Serverless
sheenbrisals

What's hot

Serverless presentationjasonsich

Serverless Architectural PatternsAmazon Web Services

Automate Migration to AWS with DatapipeAmazon Web Services

BDA303 Serverless big data architectures: Design patterns and best practicesAmazon Web Services

Build a serverless web app for a theme parkJames Beswick

ENT310 Microservices? Dynamic Infrastructure? - Adventures in Keeping Your Ap...Amazon Web Services

Deep Dive on AWS Lambda - January 2017 AWS Online Tech TalksAmazon Web Services

Serverless Architecture PatternsAmazon Web Services

SRV203 Getting Started with AWS Lambda and the Serverless CloudAmazon Web Services

AWS re:Invent 2016: AWS Training Opportunities (DCS202 )Amazon Web Services

AWS re:Invent 2016: 6 Million New Registrations in 30 Days: How the Chick-fil...Amazon Web Services

Serverless Computing: build and run applications without thinking about serversAmazon Web Services

Workshop : Wild Rydes Takes Off - The Dawn of a New UnicornAmazon Web Services

Getting Started with Serverless ArchitecturesAmazon Web Services

AWS Lambda support for AWS X-RayEitan Sela

Serverless - When to FaaS?Benny Bauer

Helping SEPTA with the Pope’s Visit to Philadelphia | AWS Public Sector Summi...Amazon Web Services

Serverless architectureAmazon Web Services

SRV209 Introducing Amazon Connect: Create an Amazon Scale Cloud Contact Cent...Amazon Web Services

Using AWS Lambda for Infrastructure Automation and BeyondSoftServe

What's hot (20)

Serverless presentation

Serverless Architectural Patterns

Automate Migration to AWS with Datapipe

BDA303 Serverless big data architectures: Design patterns and best practices

Build a serverless web app for a theme park

ENT310 Microservices? Dynamic Infrastructure? - Adventures in Keeping Your Ap...

Deep Dive on AWS Lambda - January 2017 AWS Online Tech Talks

Serverless Architecture Patterns

SRV203 Getting Started with AWS Lambda and the Serverless Cloud

AWS re:Invent 2016: AWS Training Opportunities (DCS202 )

AWS re:Invent 2016: 6 Million New Registrations in 30 Days: How the Chick-fil...

Serverless Computing: build and run applications without thinking about servers

Workshop : Wild Rydes Takes Off - The Dawn of a New Unicorn

Getting Started with Serverless Architectures

AWS Lambda support for AWS X-Ray

Serverless - When to FaaS?

Helping SEPTA with the Pope’s Visit to Philadelphia | AWS Public Sector Summi...

Serverless architecture

SRV209 Introducing Amazon Connect: Create an Amazon Scale Cloud Contact Cent...

Using AWS Lambda for Infrastructure Automation and Beyond

Recently uploaded (20)

The Role of IoT and Sensor Technology in Cargo Cloud Solutions.pptx

Patterns for automating API delivery. API conference

A healthy diet for your Java application Devoxx France.pdf

Keeping your build tool updated in a multi repository world

Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...

JavaLand 2024 - Going serverless with Quarkus GraalVM native images and AWS L...

Comparing Linux OS Image Update Models - EOSS 2024.pdf

Osi security architecture in network.pptx

How to submit a standout Adobe Champion Application

Effectively Troubleshoot 9 Types of OutOfMemoryError

SAM Training Session - How to use EXCEL ?

Sending Calendar Invites on SES and Calendarsnack.pdf

2024 DevNexus Patterns for Resiliency: Shuffle shards

Introduction to Firebase Workshop Slides

Post Quantum Cryptography – The Impact on Identity

Understanding Flamingo - DeepMind's VLM Architecture

Large Language Models for Test Case Evolution and Repair

VictoriaMetrics Anomaly Detection Updates: Q1 2024

Amazon Bedrock in Action - presentation of the Bedrock's capabilities

UI5ers live - Custom Controls wrapping 3rd-party libs.pptx

Sloppy Little Serverless Stories

1. From Oops… to Ops Sloppy Little Serverless Stories Sheen Brisals The LEGO Group sheenbrisals

2. Oops! A word that is used to show the recognition of a mistake Ops Operations

4. LEGO.com was switched to Serverless on AWS Cloud July 10, 2019

5. 200+ Lambda functions 40+ microservices 40+ API endpoints 25+ DynamoDB tables 30+ S3 buckets 20+ SNS topics 60+ SQS queues + DLQs 200+ SSM parameters PROD - Services Stats

6. Oops! Moments…

7. No one starts perfect with Serverless. That’s fine, but strive to be better at every iteration, and that is important.

8. Oops! 1 When friendly Firehose turned foe…

9. Kinesis Data Firehose API Gateway S3 bucket Event producer Fan-out function Click-stream event ingestion Buffer size Buffer interval

10. Oops! Ops Buffer size: 3 MB Buffer interval: 1 min Peak season lambda failure Buffer size: 1 MB Buffer interval: 1 min Perfect with a tuned lambda

11. Oops! 2 When SSM Parameter Store packed a punch…

12. 200+ Lambda functions 40+ microservices 40+ API endpoints 25+ DynamoDB tables 30+ S3 buckets 20+ SNS topics 60+ SQS queues + DLQs 200+ SSM parameters PROD - Services Stats

13. 200+ Lambda functions 40+ microservices 40+ API endpoints 25+ DynamoDB tables 30+ S3 buckets 20+ SNS topics 60+ SQS queues + DLQs 200+ SSM parameters PROD - Services Stats

14. 200+ Lambda functions 200+ SSM parameters PROD - Services Stats Reads Writes

15. Oops! Ops Default throughput 40 TPS Rate limiting

16. Oops! Ops Default throughput 40 TPS Rate limiting Advanced throughput 1000 TPS Self change via console

17. Oops! 3 When too much became too little…

18. CloudWatch Logs Balancing concurrency Concurrency 25 Log Splitter Elasticsearch Monitoring

19. Oops! Ops Failing at wrong time Missing crucial logs Inadequate testing Incorrect dashboards Better dev process Better monitoring

20. Oops! 4 When going higher gone through the roof…

21. CloudWatch Event Heavy lifting function Art of coding – Copy & Paste Trigger rule StepFunction 2 GB RAM 5 mins run 2 x daily Frontend Status check API Request handler Status store 2 GB RAM 100 ms run 1000s x daily

22. Oops! Ops Memory: 2 GB Invocations: 1 per sec Invocations: 2.5mil /mo Cost~ $9.00 / month Duration: 100 ms Memory: 256 MB Invocations: 1 per sec Invocations: 2.5mil /mo Cost~ $1.50 / month Duration: 100 ms

23. Oops! 5 When a key moment turned chaotic…

24. SaaS API Feeds Feeds store DLQ Data Pipeline Internal Feeds API Feeds Internal App {API Key} The API re-deployment somehow changed the API Key! No one knew. No one noticed!

25. Oops! Ops Silent breakage Bad customer exp. Unhappy business Chaotic dev process Better dev process Better prompts Better monitoring

26. Oops! Oopsibilities…

27. Serverless requires a new way of thinking, new way of working, and new way of running applications. That means, we need to change our way of thinking, our way of working, and our way of running applications.

28. Oopsibility! 1 High load hiccup

29. Oopsibility! 1 High load hiccup

30. Oopsibility! 2 Dev to Prod debacle • Keys & secrets • Domains • Resource Configurations • Leaked privacy

31. Oopsibility! 3 Admin Access Abuse

32. Oopsibility! 3 Admin Access Abuse

33. Oopsibility! 4 Ignorance is blip

34. Oopsibility! 5 Third-party trauma

35. Oopsibility! 5 Third-party trauma SaaS

36. From Oops of Sorrows to Operational Success… • Know the service limits • Be Well-Architected • See through the Serverless Lens • Alarm – Alert – Act • Monitor monitor monitor

37. Thank you! Go Build Serverless sheenbrisals

Sloppy Little Serverless Stories

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Sloppy Little Serverless Stories

Similar to Sloppy Little Serverless Stories (20)

More from SheenBrisals

More from SheenBrisals (8)

Recently uploaded

Recently uploaded (20)

Sloppy Little Serverless Stories