Rate limiting

WHAT IS RATE LIMITING
 Let's say you are exposing a bunch of public RESTful APIs. And if you want to limit the number of requests to
be served over a period of time, in order to save resources and protect it from abuse.
 Say for example you want to allow only 60 calls to be made in a 1-minute window. To be able to do this, there
are many algos, we will discuss each of those in depth.

LEAKY BUCKET
 It is a Queue which takes request in First in First Out(FIFO) way.
 Once, Queue is filled. server drops the upcoming request until the
queue has space to take more request.

HOW IT WORKS
 For Example, Server gets request 1,2,3 and 4. Based on the Queue size. it takes in the request. consider the size
of queue as 4. it will take requests 1,2,3 and 4.
 After that, server gets request 5. it will drop it.
 As in below image, the queue size is 2 and receives 2 request, Once the queue is full, then additional 3rd , 4th,
5th and 6th requests are discarded (or leaked).

CONS
 A burst of traffic can fill up the queue with old requests and more recent requests will starve from being
processed. It also provides no guarantee that requests get processed in a fixed amount of time.

TOKEN BUCKET ALGORITHM
 For each unique user, we would record their last request’s Unix timestamp and available token count within a
hash in Redis.
 Whenever a new request arrives from a user, the rate limiter would have to do a 2 things to track usage.
 Firstly, It would fetch the hash from Redis and refill the available tokens based on a chosen refill rate and the time of the user’s
last request.
 Then it would update the hash with the current request’s timestamp and the new available token count.

User 1 has two tokens left in their token bucket and made their last request on Thursday, March 30,
2017 at 10:00 GMT

CONS
 Despite the token bucket algorithm’s elegance and tiny memory footprint, its Redis operations aren’t atomic. In a distributed environment, the
“read-and-then-write” behaviour creates a race condition,
 Imagine if there was only one available token for a user and that user issued multiple requests. If two separate processes served each of these
requests and concurrently read the available token count before either of them updated it, each process would think the user had a single token
left and that they had not hit the rate limit.
 Our token bucket implementation could achieve atomicity if each process were to fetch a Redis lock for the duration of its Redis operations. This,
however, would come at the expense of slowing down concurrent requests from the same user and introducing another layer of complexity

FIXED WINDOW COUNTER
 It increments the request counter of an user for a particular time. if counter crosses a threshold. server drops
the request. it uses redis to store the request information.
 Unlike the token bucket algorithm, this approach’s Redis operations are atomic. Each request would increment
a Redis key that included the request’s timestamp. A given Redis key might look like this:
 When incrementing the request count for a given timestamp, we would compare its value to our rate limit to
decide whether to reject the request.
 We would also tell Redis to expire the key when the current minute passed to ensure that stale counters didn’t
stick around forever.

RATE LIMIT 2 REQUEST PER MINUTE
 A request comes at 00:00:24 belongs to window 1
and it increases the window’s counter to 1.
 The next request comes at 00:00:36 also belongs to
window 1 and the window’s counter becomes 2.
 The next request that comes at 00:00:49 is rejected
because the counter has exceeded the limit.
 Then the request comes at 00:01:12 can be served
because it belongs to window 2.

CONS
 Let's say that server gets lots of request at 55th second of a minute. this won't work as expected.
 For example, if our rate limit were 5 requests per minute and a user made 5 requests at 11:00:59, they could
make 5 more requests at 11:01:00 because a new counter begins at the start of each minute. Despite a rate
limit of 5 requests per minute, we’ve now allowed 10 requests in less than one minute!

SLIDING LOGS
 It stores the logs of each request with a timestamp in redis or in memory. For each request.
 we could efficiently track all of a user’s requests in a single sorted set. By inserting a new member into the
sorted set with a sort value of the Unix microsecond timestamp

SCENARIO
 When a request comes, we first pop all outdated timestamps before appending the new request time to the
log.
 Then we decide whether this request should be processed depending on whether the log size has exceeded
the limit. For example, suppose the rate limit is 2 requests per minute:
11:59:12 –
00:00:12
11:59:24 –
00:00:24
11:59:36 –
00:00:36 in this
minute
timeframe third
request is
rejected
00:00:25 – 00:01:25 – Request older than this
time frame is deleted.

CONS
 While the precision of the sliding window log approach may be useful for a developer API, it leaves a
considerably large memory footprint because it stores a value for every request.
 Let's say if application receives million request, maintaining log for each request in memory is expensive.

SLIDING WINDOW COUNTER
 This approach is somewhat similar to sliding logs. Only difference here is, Instead of storing all the logs,we
store by grouping user request data based on timestamp.
 For example, if we have an hourly rate limit, we increment counters specific to the current Unix minute
timestamp and calculate the sum of all counters in the past hour when we receive a new request.
 When each request increments a counter in the hash, it also sets the hash to expire an hour later.

RATE LIMITING IN DISTRIBUTED SYSTEMS

SYNCHRONIZATION POLICIES
 We have two load balancer in two different regions coupled with a Rate limiter. And our application is
deployed In two node cluster. We have a centralised DB Redis
Request
Load
Balancer
Load
Balancer
RL
RL
Redis
Node
2
Node
1

CONSISTENCY ISSUE
 The problem we have here is if a user sends request simultaneously in two different load balancer. And If each
node were to track its own rate limit, then a consumer could exceed a global rate limit when requests are sent
to different nodes via different load balancer.
 In fact, the greater the number of nodes, the more likely the user will be able to exceed the global limit.
 Assume your rate limit is 5 request per minute. Now for a given user we have server already 4 requests, and the rate limit counter
is set to 4 in Redis.
 Only one more request could be allowed to serve for that given minute.
 Now 2 new Request is being fired simultaneously via different nodes, and both rate limiter pick the latest rate limit counter value
from Redis as 4, and presume that they can allow the new request.
 Once that happens we break the rule and serve 6 request per minute instead of 5.

SOLUTION 1: STICKY SOLUTION
 The simplest way to enforce the limit is to set up sticky sessions in your load balancer so that each consumer gets sent to exactly one
node.
 The disadvantages include a lack of fault tolerance and scaling problems when nodes 2 get overloaded.
Request
Load
Balancer
Load
Balancer
RL
RL
Redis
Node2
Node
1

RACE CONDITION ISSUE AND SOLUTION
 One of the largest problems with a centralized data store is the potential for race conditions in high concurrency request patterns. This happens
when you use a naïve “get-then-set” approach, wherein you retrieve the current rate limit counter, increment it, and then push it back to the
datastore. The problem with this model is that in the time it takes to perform a full cycle of read-increment-store, additional requests can come
through, each attempting to store the increment counter with an invalid (lower) counter value. This allows a consumer sending a very high rate of
requests to bypass rate limiting controls.
 One way to avoid this problem is to put a “lock” around the key in Redis, preventing any other processes from accessing or writing to the
counter.
 However, this would quickly become a major performance bottleneck and add latency as one process should wait unless another process
releases its lock.

WHAT ELSE IS THE SOLUTION
 Eventual Consistency model
 Instead of every time fetching the counter from Redis global memory we can have a local memory cache in
each node or Rate Limiter and fetch the request count from the local memory.
 The counter value stored in local memory will be asynchronously synced global memory and also back
this case the local memory of both the respective nodes will be updated and in sync.

REFERENCE
 https://dev.to/ganeshmani/designing-a-scalable-api-rate-limiter-in-node-js-application-5hg3
 https://hechao.li/2018/06/25/Rate-Limiter-Part1/
 https://konghq.com/blog/how-to-design-a-scalable-rate-limiting-algorithm/
 https://www.youtube.com/watch?v=mhUQe4BKZXs&t=183s

Rate limiting

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Rate limiting

Similar to Rate limiting (20)

More from Viyaan Jhiingade

More from Viyaan Jhiingade (7)

Recently uploaded

Recently uploaded (20)

Rate limiting