3. Background
Solr and Redis datastores are under constant
● read load (serving products to users) and
● write load (indexing updates to products/documents)
5. Architecture
● Throttling engine polls metrics. Eg: Median RTs for Solr
● Permitted rate of updates is calculated
● Permitted rate is pushed to a central cache
● Kafka Spout implementation in Apache Storm reads and
maintains this rate
8. Test Setup
● A single solr machine was used as datastore
● A vertical load test was performed for read traffic
● Heavy indexing was triggered for write traffic
● Following algorithms were tested for performance -
○ AIMD Limit
○ Gradient2 Limit
○ In-house Limit Algorithm
○ TCP Vegas Limit
10. In-House Algorithm
● Developed as V1 algorithm to test out the entire system
● Uses simple mathematical functions and empirically determined limits
● Convert RTs to a load value from 0 to 1
● Converts load value to permitted percentage of tuples (0 to 100 %)
● Was highly hand-tuned for the existing setup
11. AIMD Limit Algorithm
● Stands for Additive Increase Multiplicative Decrease
● If RT < threshold RT ⇒ new_limit = prev_limit +1
● If RT >= threshold RT ⇒ new_limit = prev_limit*back-off_ration
● Back-off ratio lies between 0.5 to 1
12. Gradient2 Limit Algorithm
● The algorithm tracks the measure of divergence between two
exponential averages over a long and short time window
● After identifying a queueing trend, the algorithm aggressively reduces
the limit
● gradient = max(0.5, min(1.0, longRtt / shortRtt))
● newLimit = estimatedLimit * gradient + queueSize
13. TCP Vegas Algorithm
● TCP Vegas is a TCP congestion avoidance algorithm that emphasizes
packet delay, rather than packet loss
● A bottleneck queue is estimated
● queue_size = prev_limit * (1—minRTT/sampleRtt)
● Where minRTT is RTT at no load and sampleRTT is current value
● Queue_size and some parameters are used to update the limit
14. Test Results
● In-house Algorithm -
○ 100%Tuples Processed
○ 20 mins in test setup
○ RTTs reaching 70ms at peak
15. Test Results
● AIMD Limit Algorithm -
○ 60% Tuples Processed
○ 21 mins in test setup
○ RTTs crossed 65ms at peak
16. Test Results
● Gradient2 -
○ 40% Tuples Processed
○ 23 mins in test setup
○ RTTs crossed 80ms at peak
17. Test Results
● TCP Vegas -
○ 100% Tuples Processed
○ 16 mins in test setup
○ RTTs reaching 60ms at peak
18. Test Results
● Conclusions -
○ TCP Vegas takes less time to process all updates as well as
maintains a similar and sometimes lower response times.
○ TCP Vegas performs better than AIMD as it is more reactive to
RT changes while being not as aggressive as Gradient2
Bursty Write Traffic before Throttling vs Controlled Flow after Throttling
19. Areas of Improvement
● Throttling Engine reliability
● Intelligent allocation of permitted limit between different types of
updates
● Failover strategies
● Metrics for Redis