Making sense of service quotas of AWS Serverless services and how to deal with them AWS Community Day DACH 2023

Vadym Kazulkin, ip.labs , AWS Community Day DACH 14 September 2023

Contact
Vadym Kazulkin
ip.labs GmbH Bonn, Germany
Co-Organizer of the Java User Group Bonn
v.kazulkin@gmail.com
@VKazulkin
https://dev.to/vkazulkin
https://github.com/Vadym79/
https://www.linkedin.com/in/vadymkazulkin
https://www.iplabs.de/

ip.labs
https://www.iplabs.de/

Image: burst.shopify.com/photos/a-look-across-the-landscape-with-view-of-the-sea
Agenda
• Start with some basic AWS Serverless application
• Look at the various Service Quotas from different
perspectives :
• (Hyper) Scaling
• Other important quotas to be aware of
• Re-architect our application to be more scalable,
performant and resilient

Serverless Application

Service Quotas

Service Quotas
account and current region limits

Service Quotas Request History

API Gateway Important Service Quotas
Quota Description Value Adjustable
Default
throughput /
Throttle rate
The maximum number of requests per
second that your APIs can receive
10.000
Throttle burst rate The maximum number of additional
requests per second that you can send in
one burst
5.000

Service Quotas
https://aws.amazon.com/de/blogs/compute/building-well-architected-serverless-applications-controlling-serverless-api-access-part-2/
https://docs.aws.amazon.com/apigateway/latest/developerguide/api-gateway-request-throttling.html
• The throttle rate then determines how many
requests are allowed per second
• The throttle burst determines how many
additional requests are allowed per second
API Gateway throttling-related settings are
applied in the following order:
• Per-client or per-method throttling limits
that you set for an API stage in a usage
plan
• Per-method throttling limits that you set
for an API stage
• Account-level throttling per Region
• AWS Regional throttling
Token bucket algorithm

API Gateway Important Service Quotas
Quota Description Value Adjustable Mitigation
Max timeout The maximum integration
timeout in milliseconds
29 sec
API Payload
size
Maximum payload size for
non WebSocket API
10 MB 1)The client makes an
HTTP GET request to
API Gateway, and the
Lambda function
generates and returns a
presigned S3 URL
2)The client uploads the
image to S3 directly,
using the resigned S3
URL

Lambda Important Service Quotas
Quota Description Value Adjus
table
Mitigation
Concurrent
executions/
Concurrency
limit
The maximum number of
events that functions can
process simultaneously in the
current region
1.000 Rearchitect
Burst
Concurrency
Limit
The maximum immediate
increase in function
concurrency that can occur
when your functions scale in
response to a burst of traffic.
After the initial burst,
concurrency scales by 500
executions per minute up to
your concurrency limit
• US West (Oregon), US
East (N. Virginia),
Europe (Ireland)=3.000
• Asia Pacific (Tokyo),
Europe (Frankfurt), US
East (Ohio)=1000
• All other Regions=500
Use
provisioned
concurrency

Lambda Concurrency
https://aws.amazon.com/de/blogs/compute/understanding-aws-lambdas-invoke-throttle-limits/

Lambda Concurrency and TPS
https://docs.aws.amazon.com/lambda/latest/dg/lambda-concurrency.html
Concurrency is the number of in-flight requests your AWS Lambda function is
handling at the same time

Lambda Concurrency and TPS
Lambda concurrency limit is a limit on the
simultaneous in-flight invocations allowed at
the same time
Transaction per second (TPS) =
concurrency / function duration in
seconds

Lambda Burst Limit and Cold Start
• If there are sudden and steep spikes in the number of cold starts, it can put pressure
on the invoke services that handle these cold start operations, and also cause
undesirable side effects for your application such as increased latencies, reduced
cache efficiency and increased fan out on downstream dependencies
• The burst limit exists to protect against such surges of cold starts, especially for
accounts that have a high concurrency limit

Lambda Burst Limit
https://aws.amazon.com/de/blogs/compute/understanding-aws-lambdas-invoke-throttle-limits/ https://docs.aws.amazon.com/lambda/latest/dg/burst-concurrency.html
The chart above shows the burst limit in action with a maximum concurrency limit of
3000, a maximum burst(B) of 1000 and a refill rate(r) of 500/minute. The token bucket
starts full with 1000 tokens, as is the available burst headroom

Quota Description Value Adjust
able
Mitigation
TPS
(Transaction
per Second)
The
maximum
number of
TPS
TPS = min(10 x
concurrency,
concurrency /
function duration in
seconds)
• If the function duration is
exactly 100ms (or 1/10th of a
second), both terms in the min
function are equal
• If the function duration is over
100ms, the second term is
lower and TPS is limited as per
concurrency/function duration
• If the function duration is under
100ms, the first term is lower
and TPS is limited as per 10 x
concurrency

Lambda TPS Limit
https://aws.amazon.com/de/blogs/compute/understanding-aws-lambdas-invoke-throttle-limits/ https://www.linkedin.com/pulse/how-aws-lambda-works-underneath-shwetabh-shekhar/
The burst limit isn’t a rate limit on the invoke itself, but a rate limit on how quickly
concurrency can rise. However, since invoke TPS is a function of concurrency, it also
clamps how quickly TPS can rise.

Lambda TPS Limit
https://aws.amazon.com/de/blogs/compute/understanding-aws-lambdas-invoke-throttle-limits/ https://www.linkedin.com/pulse/how-aws-lambda-works-underneath-shwetabh-shekhar/
The TPS limit exists to protect the Invoke Data Plane from the high churn of short-lived
invocations. In case of short invocations of under 100ms, throughput is capped as
though the function duration is 100ms (at 10 x concurrency). This implies that short
lived invocations may be TPS limited, rather than concurrency limited.

Lambda Function Concurrency

General Best Practices for using Lambda
• Optimize for cost-performance
• Use AWS Lambda Power Tuning
• Reuse AWS Service clients/connections outside of the Lambda
handler
• Use the newest version of AWS SDK of programming language of
your choice
• Minimize dependencies and package size
• Import only dependencies that you need (especially from AWS SDK)
• Use a keep-alive directive to maintain persistent connections
• Implement (other) best practices to reduce cold starts
https://docs.aws.amazon.com/lambda/latest/dg/best-practices.html

Quota Description Value Adjustable Mitigation
Function
timeout
The maximum timeout
that you can configure
for a function
15 min
Synchronous
payload
The maximum size of an
incoming synchronous
invocation request or
outgoing response
6 MB For the Request:
• use API Gateway
service proxy to S3
• use pre-signed S3 URL and
upload directly to S3
For the Response:
Use response streaming (with
AWS Lambda Web Adapter )
https://theburningmonk.com/2020/04/hit-the-6mb-lambda-payload-limit-heres-what-you-can-do/

Lambda Response Streaming
https://aws.amazon.com/de/blogs/compute/introducing-aws-lambda-response-streaming/
You can use response streaming to send responses larger than Lambda’s 6 MB
response payload limit up to a soft limit of 20 MB.
• Response streaming currently supports the Node.js 14.x and subsequent managed
runtimes
• To indicate to the runtime that Lambda should stream your function’s responses,
you must wrap your function handler with the streamifyResponse() decorator. This
tells the runtime to use the correct stream logic path, allowing the function to
stream responses
exports.handler = awslambda.streamifyResponse(
async (event, responseStream, context) => {
responseStream.setContentType(“text/plain”);
responseStream.write(“Hello, world!”);
responseStream.end();});

(with AWS Lambda Web Adapter)
https://aws.amazon.com/de/blogs/compute/using-response-streaming-with-aws-lambda-web-adapter-to-optimize-performance/
• The Lambda Web Adapter, written in Rust, serves as a universal
adapter for Lambda Runtime API and HTTP API
• It allows developers to package familiar HTTP 1.1/1.0 web
applications, such as Express.js, Next.js, Flask, SpringBoot, or
Laravel, and deploy them on AWS Lambda
• This replaces the need to modify the web application to
accommodate Lambda’s input and output formats, reducing the
complexity of adapting code to meet Lambda’s requirements

(with AWS Lambda Web Adapter)
https://aws.amazon.com/de/blogs/compute/using-response-streaming-with-aws-lambda-web-adapter-to-optimize-performance/

SQS (Standard) Important Service Quotas
Throughput
per Standard
Queue
Standard queues support a nearly unlimited
number of transactions per second (TPS) per
API action.
Nearly
unlimited

Lambda scaling with SQS standard queues
https://aws.amazon.com/de/blogs/compute/understanding-how-aws-lambda-scales-when-subscribed-to-amazon-sqs-queues/
• When a Lambda function subscribes to an SQS
queue, Lambda polls the queue as it waits for
messages to arrive. It consumes messages in
batches, starting with 5 functions at a time
• If there are more messages in the queue, Lambda
adds up to 60 functions per minute, up to 1,000
functions, to consume those messages from the
SQS queue
• This scaling behavior is managed by AWS and
cannot be modified
• To process more messages, you can optimize your
Lambda configuration for higher throughput

Lambda scaling with SQS standard queues
https://aws.amazon.com/de/blogs/compute/understanding-how-aws-lambda-scales-when-subscribed-to-amazon-sqs-queues/
https://docs.aws.amazon.com/lambda/latest/dg/with-sqs.html#services-sqs-batchfailurereporting
• Increase the allocated memory for your Lambda
function
• Optimize batching behavior:
• by default, Lambda batches up to
10 messages in a queue to process them
during a single Lambda execution. You can
increase this number up to 10,000 messages,
or up to 6 MB of messages in a single batch
for standard SQS queues
• If each payload size is 256KB (the maximum
message size for SQS), Lambda can only take
23 messages per batch, regardless of the
batch size setting
• Implement partial batch responses

SQS (Standard) Important Service Quotas
Throughput
per Standard
Queue
Standard queues support a nearly unlimited
number of transactions per second (TPS) per
API action.
Nearly
unlimited
In-Flight
Messages per
Standard Queue
The number of in-flight messages (received
from a queue by a consumer, but not yet deleted
from the queue) in a standard queue
120.000
Message size The size of a message 256KB

Use BatchWriteItem for storing to DynamoDB
The BatchWriteItem operation puts
or deletes multiple items in one or
more tables.
A single call to BatchWriteItem can
transmit up to 16MB of data over
the network, consisting of up to
25 item put or delete operations
use BatchWriteItem
https://docs.aws.amazon.com/amazondynamodb/latest/APIReference/API_BatchWriteItem.html

SQS FIFO
https://aws.amazon.com/blogs/compute/solving-complex-ordering-challenges-with-amazon-sqs-fifo-queues/
https://jayendrapatil.com/aws-sqs-standard-vs-fifo-queue/
SQS Standard Queue SQS FIFO Queue
Ordering Best Effort Ordering First In First Out Ordering
within the Message Group
Delivery At Least Once Exactly once

SQS (FIFO) Important Service Quotas
Batched Message
Throughput for FIFO
Queues
The number of batched transactions per
second (TPS) for FIFO queues
3.000
In-Flight Messages per
FIFO Queue
The number of in-flight messages in a
FIFO queue
20.000

SQS FIFO

SQS FIFO Message Groups and
Multiple Consumers
• The combination of increased messages and extra processing time for the new
features means that a single consumer is too slow. The solution is to scale to
have more consumers and process messages in parallel
• To work in parallel, only the messages related to a single Auction must be kept
in order. FIFO can handle that case with a feature called message groups. Each
transaction related to Auction A is placed by your producer into message group
A, and so on

SQS FIFO Message Groups and
Multiple Consumers

SQS FIFO High Throughput Mode
https://docs.aws.amazon.com/AWSSimpleQueueService/latest/SQSDeveloperGuide/high-throughput-fifo.html
• High throughput for FIFO queues supports a higher number of
requests per API, per second
• To increase the number of requests in high throughput for FIFO
queues, you can increase the number of message groups you use.
• Each message group supports 300 requests per second
Throughput for FIFO
High Throughput
mode
Number of transactions per second (TPS)
per API in the high throughput node of
FIFO queue
2.400-9.000

DynamoDB Important Service Quotas
Quota Description Value Adjus
table
Mitigation
Table-level
read/wrtie
throughput limit
read/write throughput allocated
for a table or global secondary
index
40.000 RCU/
40.000 WCU
Ask for quote
increase
Table-Level burst
capacity for
provisioned
capacity mode
During an occasional burst of
read or write activity, these extra
capacity units can be consumed
quickly
up to 300
seconds of
unused RCUs
and WCUs
Partition-level
read/write
throughput
read/write throughput allocated
for a partition
3000 RCU /1000
WCU
Use best
practices to
avoid hot
partition
https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/Limits.html#default-limits-throughput-capacity-modes
https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/bp-partition-key-design.html#bp-partition-key-throughput-bursting

Recommendations for partition keys
• Use high-cardinality attributes
• These are attributes that have distinct values for each item, like email_id, employee_no,
customer_id, session_id, order_id, and so on
• Use composite attributes
• Try to combine more than one attribute to form a unique key, if that meets your access
pattern. For example, consider an orders table with customerid#productid#countrycode
as the partition key and order_date as the sort key, where the symbol # is used to split
different field
• Add random numbers or digits from a predetermined range for write-heavy use
cases
• Suppose that you expect a large volume of writes for a partition key (for example,
greater than 1000 1 K writes per second). In this case, use an additional prefix or suffix
(a fixed number from predetermined range, say 0–9) and add it to the partition key, like
InvoiceNumber#Random(0-N)
https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/bp-partition-key-uniform-load.html
https://aws.amazon.com/de/blogs/database/choosing-the-right-dynamodb-partition-key/

DynamoDB Important Service Quotas
Initial throughput for
on-demand capacity
mode
Initial throughput for on-demand
capacity mode
See futher details
https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/Limits.html#default-limits-throughput-capacity-modes

DynamoDB On-Demand Capacity Mode
On-Demand Capacity Mode ideal for:
• Unknown workloads
• Frequently idle workloads
• Have unpredictable application traffic
• Low management overhead (truly serverless mode)
• Pricing based on the actual data reads and writes your application performs on
your tables

Initial Throughput For DynamoDB On-
Demand Capacity Mode
Newly created table with on-demand capacity mode:
• enables newly created on-demand tables to serve up to 4,000 WCUs or 12,000
RCUs
• If you exceed double your previous traffic's peak within 30 minutes, then you might
experience throttling
• One solution is to pre-warm the tables to the anticipated peak capacity of the spike
by:
• Performing the load test
• Creating table in provisioned mode with high WCUs/RCUs and then switch to
on-demand mode
https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/HowItWorks.ReadWriteCapacityMode.html#HowItWorks.InitialThroughput

Initial Throughput For DynamoDB On-
Demand Capacity Mode
Existing table switched from provisioned to on-demand capacity mode:
• The previous peak is half the maximum write capacity units and read capacity
units provisioned since the table was created
• or the settings for a newly created table with on-demand capacity mode,
whichever is higher
• In other words, your table will deliver at least as much throughput as it did prior to
switching to on-demand capacity mode
https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/HowItWorks.ReadWriteCapacityMode.html#HowItWorks.InitialThroughput

Aurora (Serverless) Important Service Quotas
Quotas
Data API requests per
second
The maximum number of requests to
the Data API per second allowed in this
account in the current AWS Region
1.000
https://docs.aws.amazon.com/AmazonRDS/latest/AuroraUserGuide/CHAP_Limits.html

DynamoDB vs Aurora (Serverless)
DynamoDB DynamoDB + DAX
Investment in
Knowledge
• Understanding of
NoSQL databases
• Understanding of
single-table design
principles
Same
Requires to put Lambda
into VPC to access

Aurora Serverless v1 Aurora Serverless v1
+ Data API
Aurora
Serverless v2
Investment in
Knowledge
Relational databases are
familiar to many
Same Same
Engine Support Takes time to support
the newest engines of
MySQL and PostreSQL
Takes time to support
the newest engines of
MySQL and PostreSQL
The newest
engines of MySQL
and PostreSQL
supported
https://docs.aws.amazon.com/AmazonRDS/latest/AuroraUserGuide/Concepts.Aurora_Fea_Regions_DB-eng.Feature.ServerlessV1.html
https://docs.aws.amazon.com/AmazonRDS/latest/AuroraUserGuide/Concepts.Aurora_Fea_Regions_DB-eng.Feature.ServerlessV2.html

Aurora
Serverless v1
Aurora Serverless v1
+ Data API
Aurora
Serverless v2
Misc Not too much
new feature
development
happening
• Not too much new
feature development
happening
• Service Quota of
Data API to consider
No Data API
support yet
Requires to put Lambda
into VPC to access
May require Amazon
RDS Proxy for
connection pooling
https://docs.aws.amazon.com/AmazonRDS/latest/AuroraUserGuide/data-api.html

Other Optimizations: Caching
• Put DynamoDB Accelerator (DAX) in front of DynamoDB
• Requires putting Lambda behind VPC
• ElastiCache before Aurora Serverless
• No “pay as you go” pricing
• Enable API Gateway Caching
• Uses ElastiCache behind the scenes
• No “pay as you go” pricing for ElastiCache
• Use CloudFront (and its caching capabilities) in front of API
Gateway

Other Optimizations: Caching
• Put DynamoDB Accelerator (DAX) in front of DynamoDB
• Enable API Gateway Caching
• Uses ElastiCache behind the scenes
• No “pay as you go” pricing for ElastiCache
• Use CloudFront (and its caching capabilities) in front of API
Gateway

Other Optimizations: Error Handling and
Retries
• Set meaningful timeouts
• For API Gateway, Lambda
• Retry with exponential backoff and jitter
• AWS SDK supports them out of the box
• Implement idempotency
• AWS Lambda Powertools (Java, Python) supports idempotency
module
https://docs.aws.amazon.com/sdkref/latest/guide/feature-retry-behavior.html
https://aws.amazon.com/builders-library/timeouts-retries-and-backoff-with-jitter/
https://aws.amazon.com/de/blogs/architecture/exponential-backoff-and-jitter/
https://aws.amazon.com/blogs/compute/handling-lambda-functions-idempotency-with-aws-lambda-powertools/

AWS “Virtual Waiting Room” Solution
Virtual Waiting Room
https://aws.amazon.com/solutions/implementations/virtual-waiting-room-on-aws/

AWS “Virtual Waiting Room” Solution
• Open-source project written in
Python that can be integrated into
existing applications
• Source code available on GitHub
• Different CloudFormation
Templates to choose from (from
minimal to extended solutions)
• Estimated costs for a 50,000-user
and a 100,000-user waiting room with
an event duration ranging 2-4 hours
• Virtual Waiting Room on AWS has
been load tested with a tool called
Locust. The simulated event sizes
ranged from 10,000 to 100,000 clients
https://docs.aws.amazon.com/solutions/latest/virtual-waiting-room-on-aws/architecture-overview.html
https://docs.aws.amazon.com/pdfs/solutions/latest/virtual-waiting-room-on-aws/virtual-waiting-room-on-aws.pdf

Services Quotas of other Serveress services
More Serverless services, more service quotas ☺
• CloudFront
• S3
• EventBridge
• SNS
• Kinesis
• StepFunctions

General Best Practices for Service Quotas
• Know, understand and observe the service quotas
• Architect with service quotas in mind
• AWS adjusts them from time to time
• In case I’d like to request the quota increase, provide a valid
justification for the new desired value
• Service quotas are valid per AWS account (per region)
• Use different AWS accounts for development and testing
• Use different AWS accounts for independent (micro-)services
• Separate AWS accounts on the team level
• Use AWS Organizations

www.iplabs.de
Accelerate Your Photo Business
Get in Touch
https://feedback.aws-community-day.de/#/feedback/470794

Making sense of service quotas of AWS Serverless services and how to deal with them AWS Community Day DACH 2023

Recommended

Recommended

More Related Content

Similar to Making sense of service quotas of AWS Serverless services and how to deal with them AWS Community Day DACH 2023

Similar to Making sense of service quotas of AWS Serverless services and how to deal with them AWS Community Day DACH 2023 (20)

More from Vadym Kazulkin

More from Vadym Kazulkin (20)

Recently uploaded

Recently uploaded (20)

Making sense of service quotas of AWS Serverless services and how to deal with them AWS Community Day DACH 2023