- Jeremy Daly is a CTO who consults on building resilient serverless systems and has experience using AWS Lambda since 2015.
- Serverless systems provide automatic scaling but have limits like function timeouts and memory constraints that developers must account for.
- Non-serverless components like databases cannot automatically scale in the same way as serverless functions and require techniques like caching, asynchronous processing, and throttling to integrate resiliently.
- Event-driven architectures using services like SNS, SQS, and EventBridge can help decouple components and distribute loads to improve resiliency when integrating serverless and non-serverless systems.
2. Jeremy Daly
• CTO at AlertMe.news
• Consult with companies building in the cloud
• 20+ year veteran of technology startups
• Started working with AWS in 2009 and started using
Lambda in 2015
• Blogger (jeremydaly.com), OSS contributor, speaker
• Publish the Off-by-none serverless newsletter
• Host of the Serverless Chats podcast
@jeremy_daly
3. Agenda
• What is resiliency and what is serverless?
• Working with “less-than-scalable” RDBMS
• Using unreliable APIs
• Managing API quotas
• Decoupling our services
• Other non-serverless components
@jeremy_daly
4. What is resiliency?
@jeremy_daly
“The ability of a software solution to absorb the impact
of a problem in one or more parts of a system, while
continuing to provide an acceptable service level to the
business.” ~ IBM
IT’S NOT ABOUT PREVENTING FAILURE
IT’S UNDERSTANDING HOWTO GRACEFULLY DEAL WITH IT
5. What does it mean to be Serverless?
• No server management
• Flexible scaling
• Pay for value
• Automated high availability
@jeremy_daly
Flexible scaling 👈
6. What does it mean to be Serverless?
@jeremy_daly
ElastiCache
RDS
EMR Amazon ES
Redshift
Fargate
Anything “on EC2”Lambda Cognito Kinesis
S3 DynamoDB SQS
SNS API Gateway CloudWatch
AppSync IoT Comprehend
Serverless Managed Not Serverless
DocumentDB
(MongoDB)
Managed Streaming
for Kaca
Definitely
7. Everything has limits!
• Reserved Concurrency 🚦
• FunctionTimeouts ⏳
• Memory Limits 🧠
• NetworkThroughput 🚰
Some components are better than others
@jeremy_daly
Know
Your
Limits
9. “I want my, I want my, I want my SQL”
~ Dire Straits
10. Simple ServerlessWeb Service
Client
API Gateway Lambda
@jeremy_daly
Highly Scalable Highly Scalable NotThat Scalable 😳
RDS
^
not so
RDBMS and FaaS don’t play nicely together:
• Concurrency model doesn’t allow connection pooling
• Limited number of DB connections available
• Recycled containers create zombies
11. Ways to Manage DB Connections
• Increase max_connections setting
• Limit concurrent executions
• Lower your connection timeouts
• Limit connections per username
• Close connection before function ends
@jeremy_daly
🤞
😡
⚠
🎲
😱
👎
12. BetterWays to Manage DB Connections
• Implement a good caching strategy 💾
• Buffer events for throttling and durability 🏋
• Utilize a proxy service 🛰
• Manage connections ourselves 🤔
@jeremy_daly
👎
13. miss
Implement a good caching strategy
Client API Gateway RDSLambda
Elasticache
Key Points:
• Create new RDS connections ONLY on misses
• Make sureTTLs are set appropriately
• Include the ability to invalidate cache
@jeremy_daly
YOU STILL NEEDTO
SIZEYOUR DATABASE
CLUSTERS APPROPRIATELY
14. Do you really need immediate feedback?
Synchronous Communication
Services can be invoked by other services and must wait for a reply.
This is considered a blocking request, because the invoking service
cannot finish executing until a response is received.
Asynchronous Communication 🚀
This is a non-blocking request. A service can invoke (or trigger)
another service directly or it can use another type of communication
channel to queue information.The service typically only needs to wait
for confirmation (ack) that the request was received.
@jeremy_daly
15. RDS
Buffer events for throttling and durability
Client API Gateway
SQS
Queue
SQS
(DLQ)
Lambda Lambda
(throttled)
ack
“Asynchronous”
Request
Synchronous
Request
@jeremy_daly
Key Points:
• SQS adds durability
• Throttled Lambdas reduce downstream pressure
• Failed events are stored for further inspection/replay
Limit the
concurrency to match
RDS throughput
x
Utilize Service
Integrations
16. Utilize a Proxy Service
• PgBouncer 🏀
• SQL Relay 🏃
@jeremy_daly
Client
API
Gateway
Lambda RDSEC2x
Fargate
🙀
• Amazon RDS Proxy (Preview)
In a “serverless” application?
FOR SHAME! 😿
17. Manage connections ourselves
1. Count open connections
2. Close connection if connection ratio threshold exceeded
3. Close sleeping connections with high time values
4. Retry connections with exponential back off
@jeremy_daly
20. Close connection if over ratio threshold
@jeremy_daly
If we exceed the
connection ratio
Calculate our timeout
Try to kill zombies
If no zombies,
terminate connection
Else, just try to kill
zombies
21. Close sleeping connections with high time values
@jeremy_daly
Query processlist for zombies
Kill zombies
22. Retry connections with exponential back off
@jeremy_daly
If error trying to connect
Retry with Jitter
23. Does this really work?
@jeremy_daly
• Aurora Serverless (2 ACUs)
• 90 connections available
• 1,024 MB of memory
• 500 users/sec for one minute
• Avg. response time was 41 ms
• ZERO ERRORS
24. We shouldn’t have to do this!
@jeremy_daly
Amazon
Aurora Serverless
Aurora Serverless
DATA API
Doesn’t solve the
max_connections issue
Slower throughput, not quite
ready for synchronous workloads
Amazon
RDS Proxy
Added cost, still doesn’t
address scalability issues
*PREVIEW*
🥰
26. Manage calls to third-party APIs
• Implement a good caching strategy 💾
• Buffer events for throttling and durability 🏋
• Implement circuit breakers 🚦
@jeremy_daly
27. DynamoDB
Stripe API
The Circuit Breaker
Client API Gateway Lambda
Key Points:
• Cache your cache with warm functions
• Use a reasonable failure count
• Understand idempotency
Status
Check CLOSED
OPEN
Increment Failure Count
HALF OPEN
“Everything fails all the time.”
~WernerVogels
@jeremy_daly
🔥
🔥
🔥
🔥
🔥
Elasticache
or
28. What about quotas?
• Concurrency has no effect on frequency ⏰
• Stateless functions are not coordinated 😿
• Step Functions StandardWorkflows would be very expensive 💰
• Adding state wouldn’t prevent needless invocations 🗑
@jeremy_daly
29. Can we build a better system?
• 100% serverless
• Cost effective
• Scalable
• Resilient
• Efficient
• Coordinated
@jeremy_daly
30. Lambda Orchestrator
(concurrency 1)
The Lambda Orchestrator
DynamoDB
LambdaWorker
LambdaWorker
LambdaWorker
Concurrent Executions
of the SAME function
SQS (DLQ)
@jeremy_daly
CloudWatch Rule
(trigger every minute)
SQS QueueSQS (DLQ)
Status?
Gmail API
250 Quota Units
per minute
32. Multicasting with SNS
Key Points:
• SNS has a “well-defined API”
• Decouples downstream processes
• Allows multiple subscribers with message filters
Client
SNS
“Asynchronous”
Request
ack
Event Service
@jeremy_daly
HTTP
SMS
Lambda
SQS
Email
SQS (DLQ)
FUN FACT:
SNS to SQS is
“guaranteed”
(100,010 retries)
33. @jeremy_daly
Multicasting with EventBridge
Key Points:
• Allows multiple subscribers with RULES, PATTERNS and FILTERS
• Forward events to other accounts
• 24 hours of automated retries
Asynchronous
“PutEvents” Request
ack
w/ event id
Amazon
EventBridge
Lambda
SQS
Client
Step Function
Event Bus
+16 others
34. Key Points:
• Filter events to selectively trigger services
• Manage throttling/quotas per service
• Use Lambda Destinations with asynchronous events
Stripe API
@jeremy_daly
Distribute &Throttle
ack
SQS
Queue Lambda
(concurrency 25)
Client API
Gateway
Lambda
Order Service
"total": [{ "numeric": [ ”>", 0 ]}]
RDS
SQS
Queue Lambda
(concurrency 10)
SMS Alerting Service
Twilio API
SQS
Queue Lambda
(concurrency 5)
Billing Service
"detail-type": [ "ORDER COMPLETE" ]
EventBridge
35. Other non-serverless components
• Managed Services
• Other cloud services (MongoDB Atlas, ElasticSearch, etc.)
• Legacy Systems
• Our own serverless APIs 🤔
@jeremy_daly
36. Non-serverless components are inevitable
• Know the limits of your components
• Use a good caching strategy
• Embrace asynchronous processes
• Buffer and throttle events to distributed systems
• Utilize eventual consistency
@jeremy_daly
👈