There is a misunderstanding, that everything is possible with the Serverless Services in AWS, for example that your Lambda function may scale without limitations .
But each AWS service (not only Serverless) has a big list of quotas that everybody needs to be aware of, understand and take into account during the development.
In this talk I'll explain the most important quotas of the Serverless Services like API Gateway, Lambda, DynamoDB, SQS and Aurora Serverless and how to architect your solution with these quotas in mind.
12. API Gateway Important Service Quotas
Quota Description Value Adjustable
Default
throughput /
Throttle rate
The maximum number of requests per
second that your APIs can receive
10.000
Throttle burst rate The maximum number of additional
requests per second that you can send in
one burst
5.000
14. API Gateway Important Service Quotas
Quota Description Value Adjustable Mitigation
Max timeout The maximum integration
timeout in milliseconds
29 sec
API Payload
size
Maximum payload size for
non WebSocket API
10 MB 1)The client makes an
HTTP GET request to
API Gateway, and the
Lambda function
generates and returns a
presigned S3 URL
2)The client uploads the
image to S3 directly,
using the resigned S3
URL
16. Lambda Important Service Quotas
Quota Description Value Adjus
table
Mitigation
Concurrent
executions/
Concurrency
limit
The maximum number of
events that functions can
process simultaneously in the
current region
1.000 Rearchitect
Burst
Concurrency
Limit
The maximum immediate
increase in function
concurrency that can occur
when your functions scale in
response to a burst of traffic.
After the initial burst,
concurrency scales by 500
executions per minute up to
your concurrency limit
• US West (Oregon), US
East (N. Virginia),
Europe (Ireland)=3.000
• Asia Pacific (Tokyo),
Europe (Frankfurt), US
East (Ohio)=1000
• All other Regions=500
Use
provisioned
concurrency
20. Image: burst.shopify.com/photos/a-look-across-the-landscape-with-view-of-the-sea
Lambda Burst Limit and Cold Start
https://aws.amazon.com/de/blogs/compute/understanding-aws-lambdas-invoke-throttle-limits/
• If there are sudden and steep spikes in the number of cold starts, it can put pressure
on the invoke services that handle these cold start operations, and also cause
undesirable side effects for your application such as increased latencies, reduced
cache efficiency and increased fan out on downstream dependencies
• The burst limit exists to protect against such surges of cold starts, especially for
accounts that have a high concurrency limit
21. Image: burst.shopify.com/photos/a-look-across-the-landscape-with-view-of-the-sea
Lambda Burst Limit
https://aws.amazon.com/de/blogs/compute/understanding-aws-lambdas-invoke-throttle-limits/ https://docs.aws.amazon.com/lambda/latest/dg/burst-concurrency.html
The chart above shows the burst limit in action with a maximum concurrency limit of
3000, a maximum burst(B) of 1000 and a refill rate(r) of 500/minute. The token bucket
starts full with 1000 tokens, as is the available burst headroom
22. Lambda Important Service Quotas
Quota Description Value Adjust
able
Mitigation
TPS
(Transaction
per Second)
The
maximum
number of
TPS
TPS = min(10 x
concurrency,
concurrency /
function duration in
seconds)
• If the function duration is
exactly 100ms (or 1/10th of a
second), both terms in the min
function are equal
• If the function duration is over
100ms, the second term is
lower and TPS is limited as per
concurrency/function duration
• If the function duration is under
100ms, the first term is lower
and TPS is limited as per 10 x
concurrency
23. Image: burst.shopify.com/photos/a-look-across-the-landscape-with-view-of-the-sea
Lambda TPS Limit
https://aws.amazon.com/de/blogs/compute/understanding-aws-lambdas-invoke-throttle-limits/ https://www.linkedin.com/pulse/how-aws-lambda-works-underneath-shwetabh-shekhar/
The burst limit isn’t a rate limit on the invoke itself, but a rate limit on how quickly
concurrency can rise. However, since invoke TPS is a function of concurrency, it also
clamps how quickly TPS can rise.
24. Image: burst.shopify.com/photos/a-look-across-the-landscape-with-view-of-the-sea
Lambda TPS Limit
https://aws.amazon.com/de/blogs/compute/understanding-aws-lambdas-invoke-throttle-limits/ https://www.linkedin.com/pulse/how-aws-lambda-works-underneath-shwetabh-shekhar/
The TPS limit exists to protect the Invoke Data Plane from the high churn of short-lived
invocations. In case of short invocations of under 100ms, throughput is capped as
though the function duration is 100ms (at 10 x concurrency). This implies that short
lived invocations may be TPS limited, rather than concurrency limited.
25. Lambda Important Service Quotas
Quota Description Value Adjus
table
Mitigation
Concurrent
executions/
Concurrency
limit
The maximum number of
events that functions can
process simultaneously in the
current region
1.000 Rearchitect
Burst
Concurrency
Limit
The maximum immediate
increase in function
concurrency that can occur
when your functions scale in
response to a burst of traffic.
After the initial burst,
concurrency scales by 500
executions per minute up to
your concurrency limit
• US West (Oregon), US
East (N. Virginia),
Europe (Ireland)=3.000
• Asia Pacific (Tokyo),
Europe (Frankfurt), US
East (Ohio)=1000
• All other Regions=500
Use
provisioned
concurrency
27. Image: burst.shopify.com/photos/a-look-across-the-landscape-with-view-of-the-sea
General Best Practices for using Lambda
• Optimize for cost-performance
• Use AWS Lambda Power Tuning
• Reuse AWS Service clients/connections outside of the Lambda
handler
• Use the newest version of AWS SDK of programming language of
your choice
• Minimize dependencies and package size
• Import only dependencies that you need (especially from AWS SDK)
• Use a keep-alive directive to maintain persistent connections
• Implement (other) best practices to reduce cold starts
https://docs.aws.amazon.com/lambda/latest/dg/best-practices.html
28. Lambda Important Service Quotas
Quota Description Value Adjustable Mitigation
Function
timeout
The maximum timeout
that you can configure
for a function
15 min
Synchronous
payload
The maximum size of an
incoming synchronous
invocation request or
outgoing response
6 MB For the Request:
• use API Gateway
service proxy to S3
• use pre-signed S3 URL and
upload directly to S3
For the Response:
Use response streaming (with
AWS Lambda Web Adapter )
https://theburningmonk.com/2020/04/hit-the-6mb-lambda-payload-limit-heres-what-you-can-do/
29. Image: burst.shopify.com/photos/a-look-across-the-landscape-with-view-of-the-sea
Lambda Response Streaming
https://aws.amazon.com/de/blogs/compute/introducing-aws-lambda-response-streaming/
You can use response streaming to send responses larger than Lambda’s 6 MB
response payload limit up to a soft limit of 20 MB.
• Response streaming currently supports the Node.js 14.x and subsequent managed
runtimes
• To indicate to the runtime that Lambda should stream your function’s responses,
you must wrap your function handler with the streamifyResponse() decorator. This
tells the runtime to use the correct stream logic path, allowing the function to
stream responses
exports.handler = awslambda.streamifyResponse(
async (event, responseStream, context) => {
responseStream.setContentType(“text/plain”);
responseStream.write(“Hello, world!”);
responseStream.end();});
30. Image: burst.shopify.com/photos/a-look-across-the-landscape-with-view-of-the-sea
Lambda Response Streaming
(with AWS Lambda Web Adapter)
https://aws.amazon.com/de/blogs/compute/using-response-streaming-with-aws-lambda-web-adapter-to-optimize-performance/
• The Lambda Web Adapter, written in Rust, serves as a universal
adapter for Lambda Runtime API and HTTP API
• It allows developers to package familiar HTTP 1.1/1.0 web
applications, such as Express.js, Next.js, Flask, SpringBoot, or
Laravel, and deploy them on AWS Lambda
• This replaces the need to modify the web application to
accommodate Lambda’s input and output formats, reducing the
complexity of adapting code to meet Lambda’s requirements
33. SQS (Standard) Important Service Quotas
Quota Description Value Adjustable
Throughput
per Standard
Queue
Standard queues support a nearly unlimited
number of transactions per second (TPS) per
API action.
Nearly
unlimited
34. Image: burst.shopify.com/photos/a-look-across-the-landscape-with-view-of-the-sea
Lambda scaling with SQS standard queues
https://aws.amazon.com/de/blogs/compute/understanding-how-aws-lambda-scales-when-subscribed-to-amazon-sqs-queues/
• When a Lambda function subscribes to an SQS
queue, Lambda polls the queue as it waits for
messages to arrive. It consumes messages in
batches, starting with 5 functions at a time
• If there are more messages in the queue, Lambda
adds up to 60 functions per minute, up to 1,000
functions, to consume those messages from the
SQS queue
• This scaling behavior is managed by AWS and
cannot be modified
• To process more messages, you can optimize your
Lambda configuration for higher throughput
35. Image: burst.shopify.com/photos/a-look-across-the-landscape-with-view-of-the-sea
Lambda scaling with SQS standard queues
https://aws.amazon.com/de/blogs/compute/understanding-how-aws-lambda-scales-when-subscribed-to-amazon-sqs-queues/
https://docs.aws.amazon.com/lambda/latest/dg/with-sqs.html#services-sqs-batchfailurereporting
• Increase the allocated memory for your Lambda
function
• Optimize batching behavior:
• by default, Lambda batches up to
10 messages in a queue to process them
during a single Lambda execution. You can
increase this number up to 10,000 messages,
or up to 6 MB of messages in a single batch
for standard SQS queues
• If each payload size is 256KB (the maximum
message size for SQS), Lambda can only take
23 messages per batch, regardless of the
batch size setting
• Implement partial batch responses
36. SQS (Standard) Important Service Quotas
Quota Description Value Adjustable
Throughput
per Standard
Queue
Standard queues support a nearly unlimited
number of transactions per second (TPS) per
API action.
Nearly
unlimited
In-Flight
Messages per
Standard Queue
The number of in-flight messages (received
from a queue by a consumer, but not yet deleted
from the queue) in a standard queue
120.000
Message size The size of a message 256KB
37. Image: burst.shopify.com/photos/a-look-across-the-landscape-with-view-of-the-sea
Use BatchWriteItem for storing to DynamoDB
The BatchWriteItem operation puts
or deletes multiple items in one or
more tables.
A single call to BatchWriteItem can
transmit up to 16MB of data over
the network, consisting of up to
25 item put or delete operations
use BatchWriteItem
https://docs.aws.amazon.com/amazondynamodb/latest/APIReference/API_BatchWriteItem.html
39. SQS (FIFO) Important Service Quotas
Quota Description Value Adjustable
Batched Message
Throughput for FIFO
Queues
The number of batched transactions per
second (TPS) for FIFO queues
3.000
In-Flight Messages per
FIFO Queue
The number of in-flight messages in a
FIFO queue
20.000
41. Image: burst.shopify.com/photos/a-look-across-the-landscape-with-view-of-the-sea
SQS FIFO Message Groups and
Multiple Consumers
https://aws.amazon.com/blogs/compute/solving-complex-ordering-challenges-with-amazon-sqs-fifo-queues/
https://jayendrapatil.com/aws-sqs-standard-vs-fifo-queue/
• The combination of increased messages and extra processing time for the new
features means that a single consumer is too slow. The solution is to scale to
have more consumers and process messages in parallel
• To work in parallel, only the messages related to a single Auction must be kept
in order. FIFO can handle that case with a feature called message groups. Each
transaction related to Auction A is placed by your producer into message group
A, and so on
43. Image: burst.shopify.com/photos/a-look-across-the-landscape-with-view-of-the-sea
SQS FIFO High Throughput Mode
https://docs.aws.amazon.com/AWSSimpleQueueService/latest/SQSDeveloperGuide/high-throughput-fifo.html
• High throughput for FIFO queues supports a higher number of
requests per API, per second
• To increase the number of requests in high throughput for FIFO
queues, you can increase the number of message groups you use.
• Each message group supports 300 requests per second
Quota Description Value Adjustable
Throughput for FIFO
High Throughput
mode
Number of transactions per second (TPS)
per API in the high throughput node of
FIFO queue
2.400-9.000
45. DynamoDB Important Service Quotas
Quota Description Value Adjus
table
Mitigation
Table-level
read/wrtie
throughput limit
The maximum number of
read/write throughput allocated
for a table or global secondary
index
40.000 RCU/
40.000 WCU
Ask for quote
increase
Table-Level burst
capacity for
provisioned
capacity mode
During an occasional burst of
read or write activity, these extra
capacity units can be consumed
quickly
up to 300
seconds of
unused RCUs
and WCUs
Partition-level
read/write
throughput
The maximum number of
read/write throughput allocated
for a partition
3000 RCU /1000
WCU
Use best
practices to
avoid hot
partition
https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/Limits.html#default-limits-throughput-capacity-modes
https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/bp-partition-key-design.html#bp-partition-key-throughput-bursting
46. Image: burst.shopify.com/photos/a-look-across-the-landscape-with-view-of-the-sea
Recommendations for partition keys
• Use high-cardinality attributes
• These are attributes that have distinct values for each item, like email_id, employee_no,
customer_id, session_id, order_id, and so on
• Use composite attributes
• Try to combine more than one attribute to form a unique key, if that meets your access
pattern. For example, consider an orders table with customerid#productid#countrycode
as the partition key and order_date as the sort key, where the symbol # is used to split
different field
• Add random numbers or digits from a predetermined range for write-heavy use
cases
• Suppose that you expect a large volume of writes for a partition key (for example,
greater than 1000 1 K writes per second). In this case, use an additional prefix or suffix
(a fixed number from predetermined range, say 0–9) and add it to the partition key, like
InvoiceNumber#Random(0-N)
https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/bp-partition-key-uniform-load.html
https://aws.amazon.com/de/blogs/database/choosing-the-right-dynamodb-partition-key/
47. DynamoDB Important Service Quotas
Quota Description Value Adjustable
Initial throughput for
on-demand capacity
mode
Initial throughput for on-demand
capacity mode
See futher details
https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/Limits.html#default-limits-throughput-capacity-modes
49. Image: burst.shopify.com/photos/a-look-across-the-landscape-with-view-of-the-sea
Initial Throughput For DynamoDB On-
Demand Capacity Mode
Newly created table with on-demand capacity mode:
• enables newly created on-demand tables to serve up to 4,000 WCUs or 12,000
RCUs
• If you exceed double your previous traffic's peak within 30 minutes, then you might
experience throttling
• One solution is to pre-warm the tables to the anticipated peak capacity of the spike
by:
• Performing the load test
• Creating table in provisioned mode with high WCUs/RCUs and then switch to
on-demand mode
https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/HowItWorks.ReadWriteCapacityMode.html#HowItWorks.InitialThroughput
50. Image: burst.shopify.com/photos/a-look-across-the-landscape-with-view-of-the-sea
Initial Throughput For DynamoDB On-
Demand Capacity Mode
Existing table switched from provisioned to on-demand capacity mode:
• The previous peak is half the maximum write capacity units and read capacity
units provisioned since the table was created
• or the settings for a newly created table with on-demand capacity mode,
whichever is higher
• In other words, your table will deliver at least as much throughput as it did prior to
switching to on-demand capacity mode
https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/HowItWorks.ReadWriteCapacityMode.html#HowItWorks.InitialThroughput
52. Aurora (Serverless) Important Service Quotas
Quotas
Quota Description Value Adjustable
Data API requests per
second
The maximum number of requests to
the Data API per second allowed in this
account in the current AWS Region
1.000
https://docs.aws.amazon.com/AmazonRDS/latest/AuroraUserGuide/CHAP_Limits.html
54. Image: burst.shopify.com/photos/a-look-across-the-landscape-with-view-of-the-sea
DynamoDB vs Aurora (Serverless)
Aurora Serverless v1 Aurora Serverless v1
+ Data API
Aurora
Serverless v2
Investment in
Knowledge
Relational databases are
familiar to many
Same Same
Engine Support Takes time to support
the newest engines of
MySQL and PostreSQL
Takes time to support
the newest engines of
MySQL and PostreSQL
The newest
engines of MySQL
and PostreSQL
supported
https://docs.aws.amazon.com/AmazonRDS/latest/AuroraUserGuide/Concepts.Aurora_Fea_Regions_DB-eng.Feature.ServerlessV1.html
https://docs.aws.amazon.com/AmazonRDS/latest/AuroraUserGuide/Concepts.Aurora_Fea_Regions_DB-eng.Feature.ServerlessV2.html
55. Image: burst.shopify.com/photos/a-look-across-the-landscape-with-view-of-the-sea
DynamoDB vs Aurora (Serverless)
Aurora
Serverless v1
Aurora Serverless v1
+ Data API
Aurora
Serverless v2
Misc Not too much
new feature
development
happening
• Not too much new
feature development
happening
• Service Quota of
Data API to consider
No Data API
support yet
Requires to put Lambda
into VPC to access
May require Amazon
RDS Proxy for
connection pooling
https://docs.aws.amazon.com/AmazonRDS/latest/AuroraUserGuide/data-api.html
56. Image: burst.shopify.com/photos/a-look-across-the-landscape-with-view-of-the-sea
Other Optimizations: Caching
• Put DynamoDB Accelerator (DAX) in front of DynamoDB
• Requires putting Lambda behind VPC
• ElastiCache before Aurora Serverless
• Requires putting Lambda behind VPC
• No “pay as you go” pricing
• Enable API Gateway Caching
• Uses ElastiCache behind the scenes
• No “pay as you go” pricing for ElastiCache
• Use CloudFront (and its caching capabilities) in front of API
Gateway
58. Image: burst.shopify.com/photos/a-look-across-the-landscape-with-view-of-the-sea
Other Optimizations: Error Handling and
Retries
• Set meaningful timeouts
• For API Gateway, Lambda
• Retry with exponential backoff and jitter
• AWS SDK supports them out of the box
• Implement idempotency
• AWS Lambda Powertools (Java, Python) supports idempotency
module
https://docs.aws.amazon.com/sdkref/latest/guide/feature-retry-behavior.html
https://aws.amazon.com/builders-library/timeouts-retries-and-backoff-with-jitter/
https://aws.amazon.com/de/blogs/architecture/exponential-backoff-and-jitter/
https://aws.amazon.com/blogs/compute/handling-lambda-functions-idempotency-with-aws-lambda-powertools/
60. Image: burst.shopify.com/photos/a-look-across-the-landscape-with-view-of-the-sea
AWS “Virtual Waiting Room” Solution
• Open-source project written in
Python that can be integrated into
existing applications
• Source code available on GitHub
• Different CloudFormation
Templates to choose from (from
minimal to extended solutions)
• Estimated costs for a 50,000-user
and a 100,000-user waiting room with
an event duration ranging 2-4 hours
• Virtual Waiting Room on AWS has
been load tested with a tool called
Locust. The simulated event sizes
ranged from 10,000 to 100,000 clients
https://docs.aws.amazon.com/solutions/latest/virtual-waiting-room-on-aws/architecture-overview.html
https://docs.aws.amazon.com/pdfs/solutions/latest/virtual-waiting-room-on-aws/virtual-waiting-room-on-aws.pdf
62. Image: burst.shopify.com/photos/a-look-across-the-landscape-with-view-of-the-sea
General Best Practices for Service Quotas
• Know, understand and observe the service quotas
• Architect with service quotas in mind
• AWS adjusts them from time to time
• In case I’d like to request the quota increase, provide a valid
justification for the new desired value
• Service quotas are valid per AWS account (per region)
• Use different AWS accounts for development and testing
• Use different AWS accounts for independent (micro-)services
• Separate AWS accounts on the team level
• Use AWS Organizations