SlideShare a Scribd company logo
1 of 2
Download to read offline
Adaptive Write-Back Destaging for Server-Side Caches
Gokul Nadathur, Venkatesh Kothakota, Chethan Kumar,
Mahesh Patil, Karthik Dwarakinath, Vanish Talwar
PernixData, Inc.
Problem Statement: We consider a virtualized server
side storage acceleration architecture wherein IOs from
each host are accelerated by a local cache in write back
mode [1]. Each host is connected to a variety of NAS
and SAN storage backends with different performance
capabilities. Writing dirty data to backend storage is
done by the destager by batching a fixed number of IOs.
Since reads are mostly cached, this fundamentally alters
the workload to the storage from mixed read writes to a
purely bursty write biased workload of varying IO sizes.
Maintaining a constant batch size in the destager that is
agnostic of the actual capability of the storage backend
results in sub-optimal write throughput accompanied by
high latencies. The problem is further compounded when
multiple destagers in the cluster write to the same shared
storage. Finally, every destager IO also involves reading
from an acceleration tier the behavior of which should
also be factored.
Adaptive Destager: We address these problems by
designing an adaptive destager that optimizes destager
write throughput and latency in a diverse storage envi-
ronment. The scheme also works with multiple destagers
in the system without any explicit state sharing. Specifi-
cally, we introduce an adaptation module to the destager
consisting of (i) a monitoring component to collect
throughput and latency stats of the flash device and back-
end storage. These are fed into (ii) an adaptor component
that applies a learning algorithm to dynamically pick the
optimal batch size of the IOs issued by the destager. This
optimal size is then enforced by (iii) a controller that ac-
tuates the batch size in the overall destager workflow.
The adaptor’s learning algorithm robustly quantifies and
detects the optimal write throughput (OWT) available to
a destager in the cluster. It runs continuously and dy-
namically updates the OWT to respond to changing IO
conditions in the cluster. We calculate OWT in 2 ways
- (i) as the knee point of batch size vs observed write
throughput curve [2], and (ii) as the knee point of write
batch size vs observed write throughput power curve,
where throughput power is the ratio of throughput to
latency [3]. The latency is aggregated across accelera-
tion and storage tier accommodating negative changes in
both tiers. The batch size at OWT is used as the optimal
batch size for the destager. The destager starts with a
conservative value for the write batch size which is then
increased by the adaptation module at regular intervals
based on OWT calculation and checks to see if the knee
has been reached. Once a knee is determined, the sys-
tem stays at this point until certain counter conditions
are triggered. When a counter condition is hit, the system
backs off from the current knee and the whole detection
process is repeated again. Every destager in the cluster
executes the above algorithm. Since the changing con-
ditions of the shared storage such as throughput loss or
latency increase is visible to all the destagers, the differ-
ent destagers in the cluster reach an equilibrium without
explicit state sharing. Our results so far show upto 30%
predictable improvement in throughput and 50% reduc-
tion in latency under overloaded conditions.
Related Work: PARDA [4] enforces proportional-share
fairness among distributed hosts accessing a storage de-
vice at a latency threshold using flow control mecha-
nisms similar to FAST TCP. PARDA operates at the
VM level and is not designed to optimize aggregate IO
throughput. Our work achieves the best possible bal-
ance between storage backend throughput and latency in
a stateless algorithm with no user intervention. Similarly,
while some of the techniques from other adaptation pa-
pers may be applicable, the overall architecture and con-
straints are different.
[1] BHAGWAT D., et. al. A practical implementation of clustered fault
tolerant write acceleration in a virtualized environment. In FAST 2015.
[2] SATOPAA V., et. al. Finding a kneedle in a haystack: Detecting
knee points in system behavior. In ICDCSW 2011.
[3] KLEINROCK L. Power and deterministic rules of thumb for prob-
abilistic problems in computer communications. In International Con-
ference on Communications 1979.
[4] GULATI A., et. al. PARDA: proportional allocation of resources
for distributed storage access. In FAST 2009.
2

More Related Content

What's hot

load balancing in public cloud ppt
load balancing in public cloud pptload balancing in public cloud ppt
load balancing in public cloud ppt
Krishna Kumar
 
Swift container sync
Swift container syncSwift container sync
Swift container sync
Open Stack
 

What's hot (20)

Load balancing In cloud - In a semi distributed system
Load balancing In cloud - In a semi distributed systemLoad balancing In cloud - In a semi distributed system
Load balancing In cloud - In a semi distributed system
 
Locks In Disributed Systems
Locks In Disributed SystemsLocks In Disributed Systems
Locks In Disributed Systems
 
Linked In Stream Processing Meetup - Apache Pulsar
Linked In Stream Processing Meetup - Apache PulsarLinked In Stream Processing Meetup - Apache Pulsar
Linked In Stream Processing Meetup - Apache Pulsar
 
Building your own Distributed System The easy way - Cassandra Summit EU 2014
Building your own Distributed System The easy way - Cassandra Summit EU 2014Building your own Distributed System The easy way - Cassandra Summit EU 2014
Building your own Distributed System The easy way - Cassandra Summit EU 2014
 
Load Balancing in Cloud
Load Balancing in CloudLoad Balancing in Cloud
Load Balancing in Cloud
 
SYNCHRONIZATION IN MULTIPROCESSING
SYNCHRONIZATION IN MULTIPROCESSINGSYNCHRONIZATION IN MULTIPROCESSING
SYNCHRONIZATION IN MULTIPROCESSING
 
load balancing in public cloud ppt
load balancing in public cloud pptload balancing in public cloud ppt
load balancing in public cloud ppt
 
Semaphores
SemaphoresSemaphores
Semaphores
 
Redo log
Redo logRedo log
Redo log
 
Replication in the Wild
Replication in the WildReplication in the Wild
Replication in the Wild
 
An efficient approach for load balancing using dynamic ab algorithm in cloud ...
An efficient approach for load balancing using dynamic ab algorithm in cloud ...An efficient approach for load balancing using dynamic ab algorithm in cloud ...
An efficient approach for load balancing using dynamic ab algorithm in cloud ...
 
Base paper ppt-. A load balancing model based on cloud partitioning for the ...
Base paper ppt-. A  load balancing model based on cloud partitioning for the ...Base paper ppt-. A  load balancing model based on cloud partitioning for the ...
Base paper ppt-. A load balancing model based on cloud partitioning for the ...
 
Observer, a "real life" time series application
Observer, a "real life" time series applicationObserver, a "real life" time series application
Observer, a "real life" time series application
 
Load balancing in cloud
Load balancing in cloudLoad balancing in cloud
Load balancing in cloud
 
Patterns of parallel programming
Patterns of parallel programmingPatterns of parallel programming
Patterns of parallel programming
 
Concurrency, Performance, Parallelism
Concurrency, Performance, ParallelismConcurrency, Performance, Parallelism
Concurrency, Performance, Parallelism
 
Gfs
GfsGfs
Gfs
 
Seastore: Next Generation Backing Store for Ceph
Seastore: Next Generation Backing Store for CephSeastore: Next Generation Backing Store for Ceph
Seastore: Next Generation Backing Store for Ceph
 
Swift container sync
Swift container syncSwift container sync
Swift container sync
 
A load balancing model based on cloud partitioning for the public cloud. ppt
A  load balancing model based on cloud partitioning for the public cloud. ppt A  load balancing model based on cloud partitioning for the public cloud. ppt
A load balancing model based on cloud partitioning for the public cloud. ppt
 

Similar to adaptive-destager

Dominant block guided optimal cache size estimation to maximize ipc of embedd...
Dominant block guided optimal cache size estimation to maximize ipc of embedd...Dominant block guided optimal cache size estimation to maximize ipc of embedd...
Dominant block guided optimal cache size estimation to maximize ipc of embedd...
ijesajournal
 
Dominant block guided optimal cache size estimation to maximize ipc of embedd...
Dominant block guided optimal cache size estimation to maximize ipc of embedd...Dominant block guided optimal cache size estimation to maximize ipc of embedd...
Dominant block guided optimal cache size estimation to maximize ipc of embedd...
ijesajournal
 
Optimizing your java applications for multi core hardware
Optimizing your java applications for multi core hardwareOptimizing your java applications for multi core hardware
Optimizing your java applications for multi core hardware
IndicThreads
 
WRITE BUFFER PARTITIONING WITH RENAME REGISTER CAPPING IN MULTITHREADED CORES
WRITE BUFFER PARTITIONING WITH RENAME REGISTER CAPPING IN MULTITHREADED CORESWRITE BUFFER PARTITIONING WITH RENAME REGISTER CAPPING IN MULTITHREADED CORES
WRITE BUFFER PARTITIONING WITH RENAME REGISTER CAPPING IN MULTITHREADED CORES
ijdpsjournal
 
Iaetsd appliances of harmonizing model in cloud
Iaetsd appliances of harmonizing model in cloudIaetsd appliances of harmonizing model in cloud
Iaetsd appliances of harmonizing model in cloud
Iaetsd Iaetsd
 
Highly available (ha) kubernetes
Highly available (ha) kubernetesHighly available (ha) kubernetes
Highly available (ha) kubernetes
Tarek Ali
 
Adaptable SLA-Aware Consistency_Ayantu Debebe Ejeta.pdf
Adaptable SLA-Aware Consistency_Ayantu Debebe Ejeta.pdfAdaptable SLA-Aware Consistency_Ayantu Debebe Ejeta.pdf
Adaptable SLA-Aware Consistency_Ayantu Debebe Ejeta.pdf
tadeseguchi
 
Chapter 9 OS
Chapter 9 OSChapter 9 OS
Chapter 9 OS
C.U
 

Similar to adaptive-destager (20)

Fault tolerance on cloud computing
Fault tolerance on cloud computingFault tolerance on cloud computing
Fault tolerance on cloud computing
 
Dominant block guided optimal cache size estimation to maximize ipc of embedd...
Dominant block guided optimal cache size estimation to maximize ipc of embedd...Dominant block guided optimal cache size estimation to maximize ipc of embedd...
Dominant block guided optimal cache size estimation to maximize ipc of embedd...
 
Dominant block guided optimal cache size estimation to maximize ipc of embedd...
Dominant block guided optimal cache size estimation to maximize ipc of embedd...Dominant block guided optimal cache size estimation to maximize ipc of embedd...
Dominant block guided optimal cache size estimation to maximize ipc of embedd...
 
Optimizing your java applications for multi core hardware
Optimizing your java applications for multi core hardwareOptimizing your java applications for multi core hardware
Optimizing your java applications for multi core hardware
 
WRITE BUFFER PARTITIONING WITH RENAME REGISTER CAPPING IN MULTITHREADED CORES
WRITE BUFFER PARTITIONING WITH RENAME REGISTER CAPPING IN MULTITHREADED CORESWRITE BUFFER PARTITIONING WITH RENAME REGISTER CAPPING IN MULTITHREADED CORES
WRITE BUFFER PARTITIONING WITH RENAME REGISTER CAPPING IN MULTITHREADED CORES
 
Main memoryfinal
Main memoryfinalMain memoryfinal
Main memoryfinal
 
Iaetsd appliances of harmonizing model in cloud
Iaetsd appliances of harmonizing model in cloudIaetsd appliances of harmonizing model in cloud
Iaetsd appliances of harmonizing model in cloud
 
Mainmemoryfinal 161019122029
Mainmemoryfinal 161019122029Mainmemoryfinal 161019122029
Mainmemoryfinal 161019122029
 
Megastore by Google
Megastore by GoogleMegastore by Google
Megastore by Google
 
Compiler design
Compiler designCompiler design
Compiler design
 
Load Balancing in Parallel and Distributed Database
Load Balancing in Parallel and Distributed DatabaseLoad Balancing in Parallel and Distributed Database
Load Balancing in Parallel and Distributed Database
 
Memory consistency models
Memory consistency modelsMemory consistency models
Memory consistency models
 
Highly available (ha) kubernetes
Highly available (ha) kubernetesHighly available (ha) kubernetes
Highly available (ha) kubernetes
 
IRJET - Efficient Load Balancing in a Distributed Environment
IRJET -  	  Efficient Load Balancing in a Distributed EnvironmentIRJET -  	  Efficient Load Balancing in a Distributed Environment
IRJET - Efficient Load Balancing in a Distributed Environment
 
An Efficient Low Complexity Low Latency Architecture for Matching of Data Enc...
An Efficient Low Complexity Low Latency Architecture for Matching of Data Enc...An Efficient Low Complexity Low Latency Architecture for Matching of Data Enc...
An Efficient Low Complexity Low Latency Architecture for Matching of Data Enc...
 
Adaptable SLA-Aware Consistency_Ayantu Debebe Ejeta.pdf
Adaptable SLA-Aware Consistency_Ayantu Debebe Ejeta.pdfAdaptable SLA-Aware Consistency_Ayantu Debebe Ejeta.pdf
Adaptable SLA-Aware Consistency_Ayantu Debebe Ejeta.pdf
 
[ACNA2022] Hadoop Vectored IO_ your data just got faster!.pdf
[ACNA2022] Hadoop Vectored IO_ your data just got faster!.pdf[ACNA2022] Hadoop Vectored IO_ your data just got faster!.pdf
[ACNA2022] Hadoop Vectored IO_ your data just got faster!.pdf
 
Operating system memory management
Operating system memory managementOperating system memory management
Operating system memory management
 
MongoDB Sharding
MongoDB ShardingMongoDB Sharding
MongoDB Sharding
 
Chapter 9 OS
Chapter 9 OSChapter 9 OS
Chapter 9 OS
 

adaptive-destager

  • 1. Adaptive Write-Back Destaging for Server-Side Caches Gokul Nadathur, Venkatesh Kothakota, Chethan Kumar, Mahesh Patil, Karthik Dwarakinath, Vanish Talwar PernixData, Inc. Problem Statement: We consider a virtualized server side storage acceleration architecture wherein IOs from each host are accelerated by a local cache in write back mode [1]. Each host is connected to a variety of NAS and SAN storage backends with different performance capabilities. Writing dirty data to backend storage is done by the destager by batching a fixed number of IOs. Since reads are mostly cached, this fundamentally alters the workload to the storage from mixed read writes to a purely bursty write biased workload of varying IO sizes. Maintaining a constant batch size in the destager that is agnostic of the actual capability of the storage backend results in sub-optimal write throughput accompanied by high latencies. The problem is further compounded when multiple destagers in the cluster write to the same shared storage. Finally, every destager IO also involves reading from an acceleration tier the behavior of which should also be factored. Adaptive Destager: We address these problems by designing an adaptive destager that optimizes destager write throughput and latency in a diverse storage envi- ronment. The scheme also works with multiple destagers in the system without any explicit state sharing. Specifi- cally, we introduce an adaptation module to the destager consisting of (i) a monitoring component to collect throughput and latency stats of the flash device and back- end storage. These are fed into (ii) an adaptor component that applies a learning algorithm to dynamically pick the optimal batch size of the IOs issued by the destager. This optimal size is then enforced by (iii) a controller that ac- tuates the batch size in the overall destager workflow. The adaptor’s learning algorithm robustly quantifies and detects the optimal write throughput (OWT) available to a destager in the cluster. It runs continuously and dy- namically updates the OWT to respond to changing IO conditions in the cluster. We calculate OWT in 2 ways - (i) as the knee point of batch size vs observed write throughput curve [2], and (ii) as the knee point of write batch size vs observed write throughput power curve, where throughput power is the ratio of throughput to latency [3]. The latency is aggregated across accelera- tion and storage tier accommodating negative changes in both tiers. The batch size at OWT is used as the optimal batch size for the destager. The destager starts with a conservative value for the write batch size which is then increased by the adaptation module at regular intervals based on OWT calculation and checks to see if the knee has been reached. Once a knee is determined, the sys- tem stays at this point until certain counter conditions are triggered. When a counter condition is hit, the system backs off from the current knee and the whole detection process is repeated again. Every destager in the cluster executes the above algorithm. Since the changing con- ditions of the shared storage such as throughput loss or latency increase is visible to all the destagers, the differ- ent destagers in the cluster reach an equilibrium without explicit state sharing. Our results so far show upto 30% predictable improvement in throughput and 50% reduc- tion in latency under overloaded conditions. Related Work: PARDA [4] enforces proportional-share fairness among distributed hosts accessing a storage de- vice at a latency threshold using flow control mecha- nisms similar to FAST TCP. PARDA operates at the VM level and is not designed to optimize aggregate IO throughput. Our work achieves the best possible bal- ance between storage backend throughput and latency in a stateless algorithm with no user intervention. Similarly, while some of the techniques from other adaptation pa- pers may be applicable, the overall architecture and con- straints are different. [1] BHAGWAT D., et. al. A practical implementation of clustered fault tolerant write acceleration in a virtualized environment. In FAST 2015. [2] SATOPAA V., et. al. Finding a kneedle in a haystack: Detecting knee points in system behavior. In ICDCSW 2011. [3] KLEINROCK L. Power and deterministic rules of thumb for prob- abilistic problems in computer communications. In International Con- ference on Communications 1979. [4] GULATI A., et. al. PARDA: proportional allocation of resources for distributed storage access. In FAST 2009.
  • 2. 2