Adaptive indexing throttling

•Download as PPTX, PDF•

0 likes•153 views

Arpit Jain

How to reactively control indexing rate to preserve query performance for bursty traffic

Software

Adaptive Throttling of
Indexing for Improved Query
Responsiveness
Arpit Jain
Search Team

Agenda
● Background
● Problem Statement
● Architecture
● Test Setup
● Test Results
● Areas of Improvements

Background
Solr and Redis datastores are under constant
● read load (serving products to users) and
● write load (indexing updates to products/documents)

Problem Statement
● Optimise reads and writes to the system.
● Remove manual indexing windows during HRDs.

Architecture
● Throttling engine polls metrics. Eg: Median RTs for Solr
● Permitted rate of updates is calculated
● Permitted rate is pushed to a central cache
● Kafka Spout implementation in Apache Storm reads and
maintains this rate

Test Setup
● A single solr machine was used as datastore
● A vertical load test was performed for read traffic
● Heavy indexing was triggered for write traffic
● Following algorithms were tested for performance -
○ AIMD Limit
○ Gradient2 Limit
○ In-house Limit Algorithm
○ TCP Vegas Limit

Test Setup
The read traffic generated for test setup

In-House Algorithm
● Developed as V1 algorithm to test out the entire system
● Uses simple mathematical functions and empirically determined limits
● Convert RTs to a load value from 0 to 1
● Converts load value to permitted percentage of tuples (0 to 100 %)
● Was highly hand-tuned for the existing setup

AIMD Limit Algorithm
● Stands for Additive Increase Multiplicative Decrease
● If RT < threshold RT ⇒ new_limit = prev_limit +1
● If RT >= threshold RT ⇒ new_limit = prev_limit*back-off_ration
● Back-off ratio lies between 0.5 to 1

Gradient2 Limit Algorithm
● The algorithm tracks the measure of divergence between two
exponential averages over a long and short time window
● After identifying a queueing trend, the algorithm aggressively reduces
the limit
● gradient = max(0.5, min(1.0, longRtt / shortRtt))
● newLimit = estimatedLimit * gradient + queueSize

TCP Vegas Algorithm
● TCP Vegas is a TCP congestion avoidance algorithm that emphasizes
packet delay, rather than packet loss
● A bottleneck queue is estimated
● queue_size = prev_limit * (1—minRTT/sampleRtt)
● Where minRTT is RTT at no load and sampleRTT is current value
● Queue_size and some parameters are used to update the limit

Test Results
● In-house Algorithm -
○ 100%Tuples Processed
○ 20 mins in test setup
○ RTTs reaching 70ms at peak

Test Results
● AIMD Limit Algorithm -
○ 60% Tuples Processed
○ 21 mins in test setup
○ RTTs crossed 65ms at peak

Test Results
● Gradient2 -
○ 40% Tuples Processed
○ 23 mins in test setup
○ RTTs crossed 80ms at peak

Test Results
● TCP Vegas -
○ 100% Tuples Processed
○ 16 mins in test setup
○ RTTs reaching 60ms at peak

Test Results
● Conclusions -
○ TCP Vegas takes less time to process all updates as well as
maintains a similar and sometimes lower response times.
○ TCP Vegas performs better than AIMD as it is more reactive to
RT changes while being not as aggressive as Gradient2
Bursty Write Traffic before Throttling vs Controlled Flow after Throttling

Areas of Improvement
● Throttling Engine reliability
● Intelligent allocation of permitted limit between different types of
updates
● Failover strategies
● Metrics for Redis

What's hot

Apex as yarn applicationChinmay Kolhatkar

1. introductionMathenge Kenneth

Quasi Partitioned SchedulingUniversità degli Studi di Padova

Homework solutionsch9Ignåciø Såråviå

SCHEDULING ALGORITHMSMargrat C R

Hdl based simulatorsPrachi Pandey

Round Robin Algorithm.pptxSanad Bhowmik

How fpgas work when they don'tInfinIT - Innovationsnetværket for it

Spark Meetup:DataScience@Concur - Reacting to RT events to control throughputAnikate Singh

PlanificacionDavid Lilue

Distributed systems schedulingPragati Startup Presentation Designer firm

OSCh7Joe Christensen

Real-time Change Detection & Automatic Network ResponseMike Fisk

Unit - 5 Pipelining.pptxMedicaps University

Reactive systemsNaresh Chintalcheru

C100 k and gotracymacding

Process scheduling algorithmsShubham Sharma

Round-ribon algorithm presntationJamsheed Ali

Comparision of different Round Robin Scheduling Algorithm using Dynamic Time ...Editor IJMTER

What's hot (19)

Apex as yarn application

1. introduction

Quasi Partitioned Scheduling

Homework solutionsch9

SCHEDULING ALGORITHMS

Hdl based simulators

Round Robin Algorithm.pptx

How fpgas work when they don't

Spark Meetup:DataScience@Concur - Reacting to RT events to control throughput

Planificacion

Distributed systems scheduling

OSCh7

Real-time Change Detection & Automatic Network Response

Unit - 5 Pipelining.pptx

Reactive systems

C100 k and go

Process scheduling algorithms

Round-ribon algorithm presntation

Comparision of different Round Robin Scheduling Algorithm using Dynamic Time ...

Similar to Adaptive indexing throttling

Aceleracion TCP Mikrotik.pdfWifiCren

GE IOT Predix Time Series & Data Ingestion Service using Apache Apex (Hadoop)Apache Apex

Mirko Damiani - An Embedded soft real time distributed system in Golinuxlab_conf

Tuning TCP and NGINX on EC2Chartbeat

LoadBalancing .pptxRebaMaheen

LoadBalancing .pptxalianwarr

Virtual Flink Forward 2020: Autoscaling Flink at Netflix - Timothy FarkasFlink Forward

A Methodology for Automatic GPU Kernel Optimization - NECSTTechTalk 4/06/2020NECST Lab @ Politecnico di Milano

Værktøjer udviklet på AAU til analyse af SCJ programmerInfinIT - Innovationsnetværket for it

Netflix Keystone Pipeline at Samza Meetup 10-13-2015Monal Daxini

Apache Big Data 2016: Next Gen Big Data Analytics with Apache ApexApache Apex

Netflix SRE perf meetup_slidesEd Hunter

A Methodology for Automatic GPU Kernel OptimizationNECST Lab @ Politecnico di Milano

2009.08 grid peer-slidesYehia El-khatib

Data Stream ManagementJohn Mike

Application of Parallel Processingare you

Optimizing Linux ServersDavor Guttierrez

Real Time SystemAKANSH SINGHAL

Real-time Stream Processing using Apache ApexApache Apex

Gatling Gaurav Shukla

Similar to Adaptive indexing throttling (20)

Aceleracion TCP Mikrotik.pdf

GE IOT Predix Time Series & Data Ingestion Service using Apache Apex (Hadoop)

Mirko Damiani - An Embedded soft real time distributed system in Go

Tuning TCP and NGINX on EC2

LoadBalancing .pptx

Virtual Flink Forward 2020: Autoscaling Flink at Netflix - Timothy Farkas

A Methodology for Automatic GPU Kernel Optimization - NECSTTechTalk 4/06/2020

Værktøjer udviklet på AAU til analyse af SCJ programmer

Netflix Keystone Pipeline at Samza Meetup 10-13-2015

Apache Big Data 2016: Next Gen Big Data Analytics with Apache Apex

Netflix SRE perf meetup_slides

A Methodology for Automatic GPU Kernel Optimization

2009.08 grid peer-slides

Data Stream Management

Application of Parallel Processing

Optimizing Linux Servers

Real Time System

Real-time Stream Processing using Apache Apex

Gatling

Recently uploaded

英国UN学位证,北安普顿大学毕业证书1:1制作qr0udbr0

Unveiling the Future: Sylius 2.0 New FeaturesŁukasz Chruściel

chapter--4-software-project-planning.pptkotipi9215

Software Project Health Check: Best Practices and Techniques for Your Product...Velvetech LLC

KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptxTier1 app

Recruitment Management Software Benefits (Infographic)Hr365.us smith

Cloud Data Center Network Construction - IEEEVICTOR MAESTRE RAMIREZ

MYjobs Presentation Django-based projectAnoyGreter

Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)jennyeacort

Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed DataAlluxio, Inc.

EY_Graph Database Powered SustainabilityNeo4j

Implementing Zero Trust strategy with AzureDinusha Kumarasiri

Building Real-Time Data Pipelines: Stream & Batch Processing workshop SlideChristina Lin

Automate your Kamailio Test Calls - Kamailio World 2024Andreas Granig

BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASEOrtus Solutions, Corp

Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024StefanoLambiase

Salesforce Certified Field Service ConsultantAxelRicardoTrocheRiq

Cloud Management Software Platforms: OpenStackVICTOR MAESTRE RAMIREZ

The Evolution of Karaoke From Analog to App.pdfPower Karaoke

Folding Cheat Sheet #4 - fourth in a seriesPhilip Schwarz

Recently uploaded (20)

英国UN学位证,北安普顿大学毕业证书1:1制作

Unveiling the Future: Sylius 2.0 New Features

chapter--4-software-project-planning.ppt

Software Project Health Check: Best Practices and Techniques for Your Product...

KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx

Recruitment Management Software Benefits (Infographic)

Cloud Data Center Network Construction - IEEE

MYjobs Presentation Django-based project

Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)

Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data

EY_Graph Database Powered Sustainability

Implementing Zero Trust strategy with Azure

Building Real-Time Data Pipelines: Stream & Batch Processing workshop Slide

Automate your Kamailio Test Calls - Kamailio World 2024

BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE

Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024

Salesforce Certified Field Service Consultant

Cloud Management Software Platforms: OpenStack

The Evolution of Karaoke From Analog to App.pdf

Folding Cheat Sheet #4 - fourth in a series

Adaptive indexing throttling

1. Adaptive Throttling of Indexing for Improved Query Responsiveness Arpit Jain Search Team

2. Agenda ● Background ● Problem Statement ● Architecture ● Test Setup ● Test Results ● Areas of Improvements

3. Background Solr and Redis datastores are under constant ● read load (serving products to users) and ● write load (indexing updates to products/documents)

4. Problem Statement ● Optimise reads and writes to the system. ● Remove manual indexing windows during HRDs.

5. Architecture ● Throttling engine polls metrics. Eg: Median RTs for Solr ● Permitted rate of updates is calculated ● Permitted rate is pushed to a central cache ● Kafka Spout implementation in Apache Storm reads and maintains this rate

6. Architecture Current Architecture

7. Architecture Proposed Architecture

8. Test Setup ● A single solr machine was used as datastore ● A vertical load test was performed for read traffic ● Heavy indexing was triggered for write traffic ● Following algorithms were tested for performance - ○ AIMD Limit ○ Gradient2 Limit ○ In-house Limit Algorithm ○ TCP Vegas Limit

9. Test Setup The read traffic generated for test setup

10. In-House Algorithm ● Developed as V1 algorithm to test out the entire system ● Uses simple mathematical functions and empirically determined limits ● Convert RTs to a load value from 0 to 1 ● Converts load value to permitted percentage of tuples (0 to 100 %) ● Was highly hand-tuned for the existing setup

11. AIMD Limit Algorithm ● Stands for Additive Increase Multiplicative Decrease ● If RT < threshold RT ⇒ new_limit = prev_limit +1 ● If RT >= threshold RT ⇒ new_limit = prev_limit*back-off_ration ● Back-off ratio lies between 0.5 to 1

12. Gradient2 Limit Algorithm ● The algorithm tracks the measure of divergence between two exponential averages over a long and short time window ● After identifying a queueing trend, the algorithm aggressively reduces the limit ● gradient = max(0.5, min(1.0, longRtt / shortRtt)) ● newLimit = estimatedLimit * gradient + queueSize

13. TCP Vegas Algorithm ● TCP Vegas is a TCP congestion avoidance algorithm that emphasizes packet delay, rather than packet loss ● A bottleneck queue is estimated ● queue_size = prev_limit * (1—minRTT/sampleRtt) ● Where minRTT is RTT at no load and sampleRTT is current value ● Queue_size and some parameters are used to update the limit

14. Test Results ● In-house Algorithm - ○ 100%Tuples Processed ○ 20 mins in test setup ○ RTTs reaching 70ms at peak

15. Test Results ● AIMD Limit Algorithm - ○ 60% Tuples Processed ○ 21 mins in test setup ○ RTTs crossed 65ms at peak

16. Test Results ● Gradient2 - ○ 40% Tuples Processed ○ 23 mins in test setup ○ RTTs crossed 80ms at peak

17. Test Results ● TCP Vegas - ○ 100% Tuples Processed ○ 16 mins in test setup ○ RTTs reaching 60ms at peak

18. Test Results ● Conclusions - ○ TCP Vegas takes less time to process all updates as well as maintains a similar and sometimes lower response times. ○ TCP Vegas performs better than AIMD as it is more reactive to RT changes while being not as aggressive as Gradient2 Bursty Write Traffic before Throttling vs Controlled Flow after Throttling

19. Areas of Improvement ● Throttling Engine reliability ● Intelligent allocation of permitted limit between different types of updates ● Failover strategies ● Metrics for Redis

20. Q & A

Editor's Notes

if (queueSize <= threshold) newLimit = estimatedLimit + beta else if (queueSize < alpha) newLimit = increaseFunc.apply(estimatedLimit); else if (queueSize > beta) newLimit = decreaseFunc.apply(estimatedLimit); else newLimit = estimatedLimit

Adaptive indexing throttling

Recommended

Recommended

More Related Content

What's hot

What's hot (19)

Similar to Adaptive indexing throttling

Similar to Adaptive indexing throttling (20)

Recently uploaded

Recently uploaded (20)

Adaptive indexing throttling

Editor's Notes