SlideShare a Scribd company logo
1 of 15
Download to read offline
2332-7790 (c) 2016 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TBDATA.2017.2656121, IEEE
Transactions on Big Data
IEEE TRANSACTIONS ON BIG DATA, VOL. 3, NO. 1, DECEMBER 2016 1
Adaptable SLA-Aware Consistency Tuning for
Quorum-replicated Datastores
Subhajit Sidhanta, Wojciech Golab, Supratik Mukhopadhyay, and Saikat Basu
Abstract—Users of distributed datastores that employ quorum-based replication are burdened with the choice of a suitable client-centric
consistency setting for each storage operation. The above matching choice is difficult to reason about as it requires deliberating about
the tradeoff between the latency and staleness, i.e., how stale (old) the result is. The latency and staleness for a given operation depend
on the client-centric consistency setting applied, as well as dynamic parameters such as the current workload and network condition.
We present OptCon, a machine learning-based predictive framework, that can automate the choice of client-centric consistency setting
under user-specified latency and staleness thresholds given in the service level agreement (SLA). Under a given SLA, OptCon predicts
a client-centric consistency setting that is matching, i.e., it is weak enough to satisfy the latency threshold, while being strong enough
to satisfy the staleness threshold. While manually tuned consistency settings remain fixed unless explicitly reconfigured, OptCon tunes
consistency settings on a per-operation basis with respect to changing workload and network state. Using decision tree learning, OptCon
yields 0.14 cross validation error in predicting matching consistency settings under latency and staleness thresholds given in the SLA.
We demonstrate experimentally that OptCon is at least as effective as any manually chosen consistency settings in adapting to the SLA
thresholds for different use cases. We also demonstrate that OptCon adapts to variations in workload, whereas a given manually chosen
fixed consistency setting satisfies the SLA only for a characteristic workload.
Index Terms—distributed databases, storage management, distributed systems,, storage management high availability, distributed file
systems, middleware, performance evaluation, machine learning, reliability, availability, and serviceability
✦
1 INTRODUCTION
Many quorum-based distributed data stores [1, 2, 3],
allow the developers to explicitly declare the desired client-
centric consistency setting (i.e., consistency observed from
the viewpoint of the client application) for an operation.
Such systems accept the client-centric consistency settings
for an operation in the form of a runtime argument, typically
referred to as the consistency level. The performance of the
system, with respect to a given operation, is affected by the
choice of the consistency level applied [4].
From the viewpoint of the user, the most important per-
formance parameters affected by the consistency level ap-
plied are the latency and client-observed consistency anoma-
lies, i.e., anomalies in the result of an operation observed
from the client application, such as stale reads. Consistency
anomalies are measured in terms of the client-centric stal-
eness [5], i.e., how stale (old) is the version of the data
item (observed from the client application) with respect
to the most recent version. According to the consistency
level applied, the system waits for coordination among a
specific number of replicas containing copies (i.e., versions)
of the data item accessed by the given operation [1]. If the
system waits for coordination among a smaller number of
• S. Sidhanta is currently with INESC-ID / IST, University of Lisbon,
Portugal. This work was done while S. Sidhanta was with Louisiana State
University.
E-mail: ssidhanta@gsd.inesc-id.pt
• S. Mukhopadhyay is with Louisiana State University, USA. Email:
supratik@csc.lsu.edu
• S. Basu was with Louisiana State University, USA. Email:
sbasu8@lsu.edu
• W. Golab is with University of Waterloo, Canada. Email: wgo-
lab@uwaterloo.ca
Manuscript received May 5, 2005; revised October 26, 2016.
replicas, the chance of getting a stale result (i.e., an older
version) increases. Also, the latency for the given operation
depends on the waiting time for the above coordination;
hence, in turn, depends on the consistency level applied.
For example, a weak consistency level for a read operation
in Cassandra [1], like READ ONE, requires only one of the
replicas to coordinate successfully, resulting in low latency
and high chances of a stale read. Hence, while choosing the
consistency level, developers must consider how this choice
affects the latency and staleness for a given operation.
The chosen consistency level must be matching with
respect to the latency and staleness thresholds specified in
the given service level agreement (SLA), i.e., it must be weak
enough to satisfy the latency threshold, while being strong
enough to satisfy the staleness threshold. Consider a typical
use case at Netflix where a user browses recommended
movie titles. Such use cases require real-time response [6].
Hence the SLA typically comprises low latency and higher
staleness thresholds. For the given SLA, workload, and
network state, the matching choice is a weak read-write
consistency level (like ONE/ANY in Cassandra). If the
developer applies stronger consistency level, the resulting
high latency might violate the SLA.
With the current state-of-the-art, the developers have to
manually determine a matching consistency level for a given
operation at development time. Reasoning about the above
choice is difficult because of: 1) the large number of possible
SLAs, 2) unpredictable factors like changing network state
and varying workload that impact the latency and staleness,
and 3) the absence of a well-formed mathematical relation
connecting the above parameters [7]. This makes automated
consistency tuning under latency and staleness thresholds
in the SLA a highly desirable feature for quorum-based
2332-7790 (c) 2016 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TBDATA.2017.2656121, IEEE
Transactions on Big Data
IEEE TRANSACTIONS ON BIG DATA, VOL. 3, NO. 1, DECEMBER 2016 2
datastores.
We present OptCon [8], a framework that automatically
determines a matching consistency level for a given opera-
tion under a given SLA. Due to the absence of a closed-form
mathematical model capturing the impact of the consistency
level, current workload, and network state, on the observed
latency and staleness, OptCon applies machine learning
[9] to train a model for predicting a matching consistency
level under the given SLA, workload, and network state.
For the Netflix use case, taking into account the read-
heavy workload, the current network state, and the SLA
thresholds, OptCon predicts a weak consistency level. The
contributions of this paper are:
• We introduce OptCon, a novel machine learning-
based framework, that can automatically predict a
matching consistency level that satisfies the latency
and staleness thresholds specified in a given SLA,
i.e., the predicted consistency level is weak enough
to satisfy the latency threshold, and strong enough
to satisfy the staleness threshold. Using decision tree
learning, OptCon yields a cross validation error of
0.14 in predicting a matching consistency level under
the given SLA.
• Experimental results demonstrate that OptCon is
at least as effective as any manually chosen con-
sistency level in adapting to the different latency
and staleness thresholds in SLAs. Furthermore, we
also demonstrate experimentally that OptCon sur-
passes any manually chosen fixed consistency level
in adapting to a varying workload, i.e., OptCon
satisfies the SLA thresholds for variations in the
workload, whereas a manually chosen consistency
level satisfies the SLA for only a subset of workload
types.
2 MOTIVATION
2.1 Choice of the SLA Parameters
Following prior research by Terry et al. [4], we include
latency and staleness in the SLA for operations on quorum-
based stores. The choice of consistency level directly affects
the latency for a given operation on a quorum-based data-
store [1]. While Terry et al. [4] use categorical attributes for
specifying the desired consistency (such as read-my-write,
eventual, etc.) in the SLA, we use a more fine-grained SLA,
that accepts threshold values for the client-centric staleness
[10]. Golab et al. [10] demonstrate that both the propor-
tion and severity of stale results increases from stronger
to weaker consistency levels. The use of a client-centric
staleness metric in the SLA enables the developer to specify,
on a per-operation basis, the exact degree of staleness of
results, that a given client application can tolerate.
Following Terry et al. [4], we do not include throughput
in the SLA. But it is still desirable to have throughput [11] as
a secondary optimization criterion, once the SLA thresholds
are satisfied. Depending on the SLA, a set of consistency
levels can be matching with respect to the latency and
staleness thresholds given in the SLA. Consider a real-
time application (such as an online shopping cart) that
demands moderately low latency and tolerates relatively
higher staleness in the SLA. In such cases, any weaker con-
sistency level (like ONE or TWO in Cassandra, depending
on the replication factor, can yield latency and staleness
values within the given SLA thresholds. In such cases of
multiple matching consistency levels, OptCon chooses the
one that maximizes throughput. Also, we do not consider
parameters like replication factor, number of server nodes,
and certain other server-centric parameters in our model,
since our focus is optimizing client-centric performance [10].
TABLE 1: Glossary of OptCon Model Parameters
L
Average latency, i.e., average
time delay for a
storage operation over a
one minute time period
T
Average throughput i.e.,
operations/second, over a
one minute time period
S
Client-centric staleness given
by the 95 percentile Γ score [10]
Tc
Thread count, i.e., number
of user threads spawned
by the storage operations
invoked from the
client application
P
Packet count, i.e., number of
network packets transmitted
to execute a given operation
RW
Read-write proportion in
the client workload
C
Consistency level applied for
an operation
2.2 Motivation for Automated Consistency Tuning
The large number of possible use cases [11], each comprising
different SLA thresholds for latency and staleness, makes
manual determination of a matching consistency level a
complex process. The latency and staleness are also affected
[7] by the following independent variables (parameters):
1) the packet count parameter, i.e., the number of packets
transmitted during a given operation, represents the net-
work state [1], and 2) the read proportion (i.e., proportion
of reads in the workload), and 3) thread count. Parameters
2 and 3 represent the workload characteristics. We list the
parameters in Table 1. Manually reasoning about all of
these parameters combined together is difficult, even for a
skilled and experienced developer. Further, unpredictable
variations in workload, network congestion, and node fail-
ures may render a particular consistency level, manually
pre-configured at development time, infeasible at runtime
[12, 4].
3 CHALLENGES
Currently, there is no closed-form mathematical model [7]
that represents the effect of the consistency level on the
observed staleness and latency, with respect to the inde-
pendent variables. Bailis et al. [7] base their work upon
the simplifying assumption that writes do not execute con-
currently with other operations. Hence, it is not clear from
[7] how to compute the latency and staleness from a real
workload. Pileus and Tuba [4, 13] are the only systems that
provide fine-grained consistency tuning using SLAs. But,
instead of predicting, these systems perform actual trials on
the system, and select the consistency level corresponding
to the SLA row that produces maximum resultant utility,
based on the trial outcomes. The trial-based technique can
produce unreliable results due to the unpredictable param-
eters like network conditions and workload that affect the
observed latency and staleness. Thus, predictions based on
2332-7790 (c) 2016 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TBDATA.2017.2656121, IEEE
Transactions on Big Data
IEEE TRANSACTIONS ON BIG DATA, VOL. 3, NO. 1, DECEMBER 2016 3
the outcomes of the trial phase may be unsuitable in the
actual running time of the operation. Sivaramakrishnan et
al. [14] use a static analysis approach to determine the
weakest consistency level that satisfies the correctness con-
ditions declared in the form of a contract, i.e., a specification
of client-centric consistency properties that the application
must satisfy. They do not consider the tradeoff between
staleness and latency, and cannot dynamically adapt to
varying workload and network state.
4 DESIGN OF OPTCON
4.1 Design Overview
In the absence of a closed-form mathematical model relating
the consistency level to the observed latency and staleness,
OptCon leverages machine learning-based prediction tech-
niques [9], following the footsteps of prior research [15].
OptCon is the first work that trains on historic data, and
learns a model M relating the consistency level and the
input parameters (i.e., the workload and network state) to
the observed latency and staleness. The model M can pre-
dict a matching consistency level with respect to the given
SLA thresholds for latency and staleness, under the current
workload and network state. A matching consistency level
is weak enough to satisfy the latency threshold, and simul-
taneously strong enough to satisfy the staleness threshold.
Our definition of matching consistency level is a modified
version of the correct consistency level in QUELA [14]. For
multiple matching consistency levels, OptCon predicts the
consistency level that maximizes the throughput.
In contrast to program verification-based static (compile
time) approaches [14], OptCon provides on-the-fly predic-
tions based on dynamic (runtime) values of the input pa-
rameters, i.e., the current workload and the network state
(Section 2.2). Thus, OptCon tunes the datastore accord-
ing to any dynamic variations in the workload and any
given SLA. Furthermore, unlike the state-of-the-art trial-
based techniques that base decisions on the values of the
parameters obtained during the trial phase [4, 13], OptCon
provides reliable predictions (see Section 6.9 and 6.10), tak-
ing into consideration the actual runtime values of the input
parameters.
Average Latency Staleness
≤100ms ≤5ms
≤50ms ≤10ms
≤25ms ≤15ms
TABLE 2: Example subSLAs
Like Terry et al. [4], users provide latency and staleness
thresholds to OptCon in form of an SLA, illustrated in
Table 2. Each SLA consists of rows of subSLAs, where each
subSLA row, in turn, comprises two columns containing the
thresholds for latency and staleness, respectively. Instead of
categorical attributes [4], OptCon uses threshold values for
staleness [10] in the subSLAs. Unlike Terry et al., subSLAs in
OptCon do not have a utility column. Also, subSLAs are not
necessary ordered. If any one (or all) of the thresholds for a
particular subSLA are violated, OptCon marks the operation
as failed with respect to the given subSLA. The handling of
such failure cases will depend on the application logic, and
is not part of the scope of this paper.
Instead of modifying the source code of distributed data-
stores, OptCon is designed as a wrapper over existing dis-
tributed storage systems. We have implemented and tested
OptCon on Cassandra, which follows the Dynamo [16]
model. Among the Dynamo-style systems, while Cassandra
organises the data in the form of column families, Voldemort
[17] and Riak [2] are strictly key-value stores. Also, Cassan-
dra and Riak are designed to enable faster writes, whereas
Voldemort enables faster reads. But the principles governing
the internal synchronization mechanisms are similar for all
these systems. Hence the consistency level choices affect the
observed latency and staleness in a similar fashion for all
these systems. Thus, OptCon can potentially be integrated
with any Dynamo-style system.
4.2 Architecture
Fig. 1: The architecture of OptCon: rectangles denote mod-
ules and the folded rectangle denotes a dataset.
OptCon (refer to Figure 1) consists of the following
modules: 1) The Logger module records the independent
variables (Section 2.2), the observed latency, and staleness,
obtained using experiments performed on the datastore
with benchmark workloads. It collates these parameters into
a training data corpus, which acts as the training dataset.
2) The Learner module learns the model M (refer Section
6.3) from the training dataset, and predicts a matching
consistency level that maximizes the throughput. 3) The
Client module calls the other modules, and executes the
given operation on the datastore.
During the training phase, the Client module runs a
simulation workload on the given quorum-based datastore.
The Client calls the Logger module (Figure 1) to collect
the independent variables (parameters) and the observed
parameters for the operation from the JMX interface [1],
and appends these parameters into the training data. The
Client then calls the Learner module, which trains the
model M from the training data applying machine learning
techniques. During the prediction phase, the Client calls
the Learner, with runtime arguments comprising the sub-
SLA thresholds and the current values of the independent
variables. The subSLA (and also the SLA) can be varied
on a per-operation basis. The Learner predicts a matching
consistency level from the learnt model M. The Client
performs the given operation on the datastore with the
predicted consistency level.
4.3 Design of the Logger Module
4.3.1 Model Parameters Collected by the Logger
For the reasons explained in Section 2.1, the latency L, i.e.,
time delay for the operation, and client-centric staleness
2332-7790 (c) 2016 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TBDATA.2017.2656121, IEEE
Transactions on Big Data
IEEE TRANSACTIONS ON BIG DATA, VOL. 3, NO. 1, DECEMBER 2016 4
S, are considered as the model parameters that need to
satisfy a given SLA. The throughput T is considered as a
secondary optimization criterion. As discussed in Section
2.1, server centric parameters, like replication factor and
keyspace size, are not considered. Following the reasoning
in Section 2.2, the independent variables (parameters) for
learning the model M are: 1) the read proportion (RW )
in the operation, 2) the thread count (Tc), i.e., the number
of user threads spawned by the client, 3) the packet count
(P), i.e., the number of network packets transmitted during
the operation, and 4) the consistency level C, provided as
a categorical attribute containing attribute values specific to
the given datastore.
Most operations on quorum based stores are not instan-
taneous, but are executed over certain time intervals [10].
Following [7], the Logger uses the average latency over a
time interval of one minute for measuring L. The Logger
computes S in terms of the Γ metric of Golab et al. [10]. As
demonstrated in the above work, Γ is preferred over other
client-centric staleness measures for its proven sensitivity
to workload parameters and consistency level. Section 6.4
establishes the significance of the above parameters with
respect to the OptCon model. Section 6.5 demonstrates the
accuracy of the model comprising the above parameters.
4.3.2 Computation of Γ as the Metric for Client-centric Stal-
eness
The metric Γ is based upon Lamport’s atomicity property
[18], which states that operations appear to take effect in
some total order that reflects their “happens before” relation
in the following sense: if operation A finishes before opera-
tion B starts then the effect of A must be visible before the
effect of B. The gamma metric defines observed consistency
in terms of the responses of storage operations observed
from the clients, and does not refer directly to server-
side implementation details of the datastore, like replication
factor, etc. As a result, the metric is implementation inde-
pendent, i.e., it can be applied to any key-value datastore. Γ
is a bound on time-based staleness, i.e., every read returns
a value at most Γ time units stale. A Γ score quantifies
the bound on the timestamp difference between operations
on the same key that return different values. We say that
a trace of operations recorded by the logger is Γ-atomic
if it is atomic in Lamport’s sense, or becomes atomic after
each operation in the trace is transformed by decreasing its
starting time by Γ/2 time units, and increasing its finish
time by Γ/2 time units. In this context the start and finish
times are shifted in a mathematical sense for the purpose of
analyzing a trace, and do not imply injection of artificial
delays; the behavior of the storage system is unaffected.
The per-key Γ score quantifies the degree of client-centric
staleness incurred by operations applied to a particular key.
We considered average per-key Γ, but settled for percentile
per-key Γ since it takes into account the skewed nature of
the workloads.
4.4 Design of the Learner Module
In OptCon, building the model M from training examples
(Figure 1) is a one-time process, i.e., the same model can be
reused for predicting consistency levels for all operations.
Even with node failures, the model may not need to be
regenerated since the training data remain unchanged, only
the failed predictions, resulting due to node failure, need to
be rerun. However, the model will need to be retrained in
certain cases where the behavior of the system with respect
to the latency and staleness changes, such as when a storage
system receives a software or hardware upgrade.
4.4.1 Overview of Possible Learning Techniques
We provide a brief overview of the possible Learning tech-
niques and their applicability to OptCon. We describe the
implementation details, including the various configuration
parameters, of each learning algorithm in Section 5. Perfor-
mance analysis and insights gathered from each technique
are given in Section 6.5.2. We consider Logistic regression
and Bayesian learning as they can help visualize the signifi-
cance of the model parameters and the dependency relations
among these parameters, and thus can provide intuition to
the developer for developing complex and more accurate
models [9]. We also consider Decision Tree, Random For-
est, and Artificial Neural Networks (ANN), since they can
produce more accurate predictions, being directly computed
from the data. We consider these approaches in the order
of their performance given in Table 3, in terms of various
model selection metrics. We leave the choice of a suitable
learning technique to the developer, rendering flexibility to
the framework. Based on our evaluation of the learning
algorithms, developers can choose the learning technique
that best suits the respective application domain and use
case.
Assuming linear dependency, we start with the linear
regression technique for training the model in OptCon, in
an attempt to come up with a simple mathematical model
that expresses the effect of RW , T c, and P , on latency
and staleness for an operation, with different values of
client side consistency level C. Also, linear regression can
provide preliminary insights on the relationship among the
model parameters, and act as a basis for designing further
complicated modelling algorithms, i.e., it can help us un-
derstand the dependency relationships between parameters
that can be used as a baseline for learning algorithms, for
example, Bayesian learning which require assuming some
prior distribution of independent parameters.
Next, with logistic regression approach, we fit two logit
(i.e., logistic) functions for the two dependent variables L
and S as follows: logit (π (L)) = β0 + β1C + β2RW +
β3P + β4T c and logit (π (S)) = β4 + β5C + β6RW +
β7P + β8T c, where C is the applied consistency level,
L is the observed latency, S is the observed staleness,
RW is the proportion of reads, P is number of packets
transmitted, and T c is the number of threads during an
operation. βi are the coefficients estimated by iterative least
square estimation, and π is the likelihood function. Using
ordinary least square estimation, we iteratively minimize a
loss function given by the difference of the observed and
estimated values, with respect to the training dataset. For
eliminating overfitting, we perform L1
regularization with
the Lasso method.
Next we consider the decision tree learning algorithm
because of its simplicity, speed, and accuracy. The given
problem can be viewed as a classification problem, which
2332-7790 (c) 2016 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TBDATA.2017.2656121, IEEE
Transactions on Big Data
IEEE TRANSACTIONS ON BIG DATA, VOL. 3, NO. 1, DECEMBER 2016 5
classifies the training dataset into classes based on whether
the observed latency and staleness (corresponding to each
row) fall within the given SLA thresholds. In fact, the
problem statement, i.e., choosing a matching consistency
level, can be viewed as a classification problem. The mod-
elling technique using Decision Tree comprises the fol-
lowing phases: 1) Labelling: On the fly computations for
maximizing throughput T (i.e., the secondary optimization
criteria) in the prediction phase would introduce consider-
able overhead. Since the latency L is considered before T is
maximized, the additional overhead may result in violation
of the latency threshold in the SLA. Hence, OptCon per-
forms the computations for T in the labelling phase itself.
Using exhaustive search, we label each row in the training
dataset with the highest T corresponding to a given values
of RW , Tc and P , such that the SLA parameters L and
S are within the given SLA thresholds. 2) Training: Next
we apply decision tree learning for training a model M
from the labelled dataset. Use of error pruning techniques
mitigates issues of overfitting [9].
Next, we consider the random forest algorithm that ap-
plies ensemble learning to bootstrap multiple weak decision
tree learners to generate a strong classifier. It iteratively
chooses random samples with replacement from the train-
ing dataset, and trains a decision tree on each of the samples.
We take the average of the predictions from each decision
tree to obtain the final prediction with respect to a test
dataset.
For the Bayesian approach, we make the following
simplifying assumptions to make our problem suitable for
Bayesian learning: A1) a prior logistic distribution exists for
each of RW , T c, P , and C (see Section 4), A2) the posterior
distributions for L, and S are conditionally dependent on
the joint distribution of hRW, T c, P, Ci, and A3) the
target features L and S are conditionally independent. With
these approximations, we plug-in the resultant dependency
graph from the Logistic Regression approach as input to the
Bayesian learner. Despite these assumptions and approxi-
mations, the Bayesian technique can provide intuition to the
developers, that can be used to build a more accurate model.
Next, we consider Artificial Neural Networks (ANN) for
learning a model M. We consider a two-layer perceptron
network for ANN; we do not consider Deep Learning be-
cause 1) the number of features are manageable for a simple
ANN, 2) the size of training dataset is also manageable for
ANN, and 3) simple ANN gives satisfactory accuracy, and
4) training delay would be considerably high with deep
learning. Though the training phase for ANN is not a factor
(since training occurs once), the overhead for the prediction
phase is still considerably high (Section 6.3) due to model
complexity.
5 IMPLEMENTATION
OptCon is developed as a Java-based wrapper over Cas-
sandra v2.1.0. The source code can be found in the github
repositories located at https://github.com/ssidhanta/. The
Logger module is implemented as an extension over the
YCSB (Yahoo Cloud Serving Benchmark) 0.1.4 [19] frame-
work. It runs as a daemon on top of the datastore, listening
for the parameters using the built-in JMX Interface. Logistic
Regression, SVM, and Neural Network implementations of
the Learning module use Matlab 2013b’s statistical toolbox.
Decision Tree and Random Forest are implemented using
Java APIs from Weka, an open source machine learning
suite. The Bayesian approach uses Infer.net, an open source
machine learning toolbox under the .NET framework. We
summarize the important configuration parameters in the
Learner implementations. Logistic regression uses a min-
imization function with additional regularization weights,
multiplied by a tunable L1
norm term Λ. We set α = 0.05
and Λ = 25. Decision tree uses a confidence threshold of 0.25
for pruning, and uses random data shuffling with seed = 1.
Random forest bootstraps 100 decision trees, and introduces
low correlation among the individual trees by selecting a
random subset of the features at each split boundary of the
component trees. The Bayesian method uses a precision of
0.1. The ANN is implemented as a two-layer perceptron,
with default weights = 0 and the default bias = 0.
5.1 Implementation Details: Decision Tree
Labelling Phase: We generate the training dataset as fol-
lows: 1) we set YCSB parameters, namely read proportion
RW and thread count T c, to simulate a characteristic
workload w, and 2) the above workload w is executed with
different applied consistency levels C. Each row k in the
training dataset corresponds to a particular characteristic
workload w with a particular consistency level Ck. Each
row k records observed values of the output parameters,
namely average latency L, staleness S, and throughput
T , over a sliding time window of 1 minute. For a given
combination of values of the input parameters, comprising
the tuple hRW, T c, P i, and the given subSLA thresholds,
the output parameters, L, S and T , vary according to the
applied consistency level C for the operation corresponding
to the row k in the training dataset. The training dataset is
labelled as follows: 1) the training dataset is exhaustively
searched, 2) in each iteration of the above search step, we
group the rows in the training dataset by the values of the
input parameters comprising the tuple hRW, T c, P i,
such that each group of rows contains an unique combina-
tion of values for the above tuple, 3) for each group of rows
g, we find the row l in g that satisfies both of the following
conditions. Condition 1: The value of throughput Tl in the
row l is maximum among the values of throughput for all
rows in g. Condition 2: The values of staleness Sl and latency
Ll satisfy the thresholds set in the subSLA, 4) rows in each
group of rows g in the training dataset are labelled with the
consistency level Cl corresponding to the row l that satisfies
the above conditions.
Training Phase The J48 decision tree [9] learner is loaded
with the labelled training data. The J48 learner is an open
source implementation of the C4.5 algorithm, which can
learn a model from a training dataset comprising both
continuous and discrete attributes (i.e., model parameters).
For handling continuous attributes, C4.5 assigns a threshold
value t to each continuous attribute a. C4.5 splits the rows
in the training dataset into categories, based on whether the
values of the attribute a for a row r exceed the threshold
t or not. C4.5 uses the normalized information gain Ig,
computed from the difference in entropy, as the splitting
2332-7790 (c) 2016 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TBDATA.2017.2656121, IEEE
Transactions on Big Data
IEEE TRANSACTIONS ON BIG DATA, VOL. 3, NO. 1, DECEMBER 2016 6
(a) Bayesian belief network (b) Structure of decision tree with latency given in millisec-
onds
Fig. 2: Bayesian belief network and decision tree structure
criterion. At each node in the tree, the parameter that
produces the highest Ig is chosen to make the split. The
algorithm continues recurring along the rest of the training
dataset till the complete decision tree is generated. The
leaf nodes represent the predicted consistency level for a
given workload, characterized by a given set of values
of the input parameters hRW, T c, P i. Each group of
rows in the training dataset forms a path in the decision
tree (from root to a leaf node), for which the predicted
consistency level is given by the label of the leaf node. Under
a subSLA comprising a 2 ms threshold for latency and a
10 ms threshold for staleness, the generated J48 decision
tree is presented in Figure 2(b). The C4.5 algorithm chooses
staleness as the first splitting function, assigning a threshold
of 0 ms for splitting the training dataset into two categories
at the root. The right subtree r corresponds to the examples
with observed latency exceeding 0 ms, and the left subtree
l corresponds to the rest of the examples. Next, staleness
is chosen as the criterion that further splits the examples in
the right subtree r into subtrees l1 and r1, based on whether
the observed staleness exceeds 0.5 ms or not. The examples
in the right subtree r1 are further split into subtrees l2
and r2 based on a latency threshold of 3.967 ms. Then the
right subtree r2 derived from the above split is again split
into subtrees depending on whether the observed latency
exceeds a threshold of 5.431 ms.
Prediction Phase During the prediction phase under a
given subSLA with thresholds TL and TS for latency and
staleness, respectively, the Learner module traverses along a
path in the tree (Figure 2(b)), starting from the root to a leaf
node l, depending on the values of the input parameters
hRW, T c, P i. The consistency level Cl corresponding to
the label of the leaf node l is the predicted matching con-
sistency level under the given subSLA, for the given values
of the input parameters. The predictor constructs the above
path as follows. Starting from the root node, it traverses the
depth of the tree, marking each node in the path as v. At
each step in the path, it splits the tree into branches, based
on whether the value of the attribute a corresponding to
the label of the current node v falls within a range [0, t]
or a range [t, max]; t is the assigned split threshold, and
max is the maximum value of a in the training dataset. At
each node v in the path, the predictor chooses the branch
for which the value of the attribute a, corresponding to the
label of the given node, falls within which of the above
ranges. The predictor keeps on traversing the tree in the
above manner till it reaches a leaf node l, and it chooses the
label Cl of the leaf node l as the matching consistency for
the given operation.
5.2 Implementation Details: Bayesian Learning
With the Bayesian [9] approach, we assume the following.
A1: A prior distribution exists for each of the input param-
eters, namely RW , T c, P , and C. The input parameter C
is a discrete variable, which takes values from a finite set of
categorical attributes, comprising the supported list of con-
sistency levels. Hence, we assume a Dirichlet distribution
[20] for the prior probabilities for C. On the other hand, the
parameters RW , T c, and P are continuous variables. We
assume that these parameters follow a Gaussian distribu-
tion. A2: The posterior distributions for output parameters,
namely L and S, are conditionally dependent on the joint
distribution of the input parameters hRW, T c, P, Ci.
A3: The output parameters L and S are conditionally in-
dependent, allowing L and S to be expressed as the effect
of the input parameters RW , T c, P , and C. We represent
the dependency relation of the model parameters from the
equations obtained from the Logistic Regression approach
using a dependency graph. Figure 2(a) illustrates such a
graph. Our assumption that the output parameters, L and
S, are conditionally independent (though actually L and S
are dependent on each other) enables us to express them
as the effect of the input parameters. This allows us to
model the causality among the dependent and independent
variables as a simple Bayesian Network.
6 EVALUATION
6.1 Experimental Setup
Following [10], we have run our experiments on a testbed of
20 Amazon Ec2 m1-small instances, located in the Ec2 region
us-west-2, running Ubuntu 13.10, loaded with Cassandra
2.1.0 with a replication factor of 5, running on Oracle JDK
1.8 Our models are learnt from a training dataset comprising
23040 rows, and a testing dataset of 2560 rows, generated by
running varying YCSB 0.1.4 [19] workloads. We have used
the NTP (Network Time Protocol) protocol implementation
2332-7790 (c) 2016 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TBDATA.2017.2656121, IEEE
Transactions on Big Data
IEEE TRANSACTIONS ON BIG DATA, VOL. 3, NO. 1, DECEMBER 2016 7
from ntp.org to bring down the maximum clock skew to 5
ms.
6.2 Latency and Staleness Ranges Used in the sub-
SLAs
[21] suggest that the latency for common web applications
falls in the range of 75-140 ms. The Pileus system [4], which
includes latency in subSLAs as well, uses values between
200 to 1000 ms as the latency threshold. Further reducing
the lower bound to favor availability, we choose any value
between 101 and 150ms as a candidate for the threshold
of latency in OptCon. Following the observations from the
Riak deployment at Yammer [22], Bailis et al. [7] uses a
range of 1.85-13.6 ms for the staleness bounds, given as
the t-visibility values. Hence, following [7], we choose a
value within the above range as the staleness threshold.
Rounding the bounding values for the ranges given above,
our subSLAs use values in the ranges 101-150 ms and 1-13
ms as thresholds for L and S, respectively.
6.3 OptCon Framework Overhead
Training the model being a one-time process, the latency
overhead due to the training phase is negligible during an
operation. The overhead due to training phase, which in-
cludes the time spent by the Logger in collecting the training
dataset from YCSB, varies from 6.5 ms for Linear Regression
to 27 ms for ANN. However, there is still considerable
overhead due to the prediction phase, especially with a large
training dataset. The prediction phase spans from the call
to the Learner module till the predicted consistency level
is returned by the Learner module. The average, variance
and standard deviation of the overhead are 1 ms, 0.25, and
0.37, respectively, with decision tree learning. The average
overhead with different learning techniques is given in Table
3.
Approach
Cross
Validation
Error
AICC BIC Overhead (ms)
Decision Tree 0.14 10.73 51.44 1
Bayesian Learning 0.57 12.85 53.57 1.2
Logistic Regression 1.98 16.32 57.07 0.7
Random Forest 0.14 9.51 50.24 1.3
Neural Network 0.059 12.85 63.54 1.5
TABLE 3: Model Selection Results: As per descending order
of AICC and BIC, and ascending order of CV and Overhead
6.4 Dimensionality Reduction: Significance of the
Model Parameters
Parameter
Rank Features
Technique
Sequential Attribute
Selection Technique
Misclassification
Error For
Random Forest
C 4.81 1 0.22
RW 0.9 1 1.11
P 2.78 1 0.24
TABLE 4: Significance of the Model Parameters: Significance
of a parameter is greater for higher values of Rank Features
and Sequential Attribute Selection, and lesser for higher
values of Misclassification error
We apply dimensionality reduction [9] to determine the
features (parameters) that are relevant to the problem, and
have low correlation among themselves. We use the follow-
ing techniques to detect any redundant feature that can be
omitted without losing information. In the rank features
approach, we determines the most significant features on
the basis of different criterion functions. The above criterion
functions are evaluated as a filtering (or a preprocessing)
step before the training phase. On the other hand, sequential
feature selection is a wrapper technique, that sequentially
selects the features until there is no substantial improvement
(given by the deviance of the fitted models) in the predictive
power for an included feature. For the random forest model,
we evaluate the significance of the features in the ensemble
of component trees using the misclassification error. Table 4
presents the results of the above techniques on the model
parameters C, RW , and P . Sequential feature selection
outputs 1 for all the model parameters, indicating that
all the parameters are significant. Rank features ranks the
relative significance of the features in the order C, P , and
RW . Misclassification error ranks the features in the order
RW , P , and C. The order of the relative significance is
different with each dimensionality reduction technique; the
developer must choose the suitable ranking approach based
on the learning technique. For example, the misclassification
error criterion is to be used for the random forest technique.
6.5 Comparison of the Predictive Power of the Learn-
ing Techniques Using Model Selection Metrics
6.5.1 The Model Selection Metrics Used
Cross validation error (CV error) [9] is the most common
and widely used model selection metric. It represents the
accuracy of the model with respect to an unseen test dataset,
which is different from the original dataset used to train
the model. Using 10-fold cross validation, we partition the
dataset into 10 subsamples, and run 10 rounds of validation.
In each round a different subsample is used as the test
dataset, while we train the model with the rest of the dataset.
We compute the average of the mean squared error values
for all validation rounds to obtain the mean CV error (Table
3). We also use the Akaike Information Criterion (AIC)
[23] which quantifies the quality of the generated model in
terms of the information loss during the training phase. For
obtaining AIC, we first compute the likelihood of each ob-
served outcome in the test dataset with respect to the model.
We compute the maximum log likelihood L, i.e., the maxi-
mum of natural logarithms over the likelihood of each of the
observed outcomes. AICC [23] (Table 3) further improves
upon AIC, penalizing overfitting with a correction term for
finite sample size n. Thus AICC = 2k−2 ln L+ 2k(k+1)
n−k−1
,
where k is the number of model parameters. Apart from
AICC, we also compute the Bayesian Information Criterion
(BIC) [23] that uses the Bayesian prior probability estimates
to penalize overfitting. Thus, BIC = 2k ln N − 2 ln L,
where the additional parameter N for the sample size en-
ables BIC to assign a greater penalty on the sample size than
AICC. At smaller sample size, BIC assigns lower penalty
on the free parameters than AICC, whereas the penalty
is higher for larger sample size (because it uses k ln N
instead of k). Normally, the BIC and AICC scores (Table
2332-7790 (c) 2016 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TBDATA.2017.2656121, IEEE
Transactions on Big Data
IEEE TRANSACTIONS ON BIG DATA, VOL. 3, NO. 1, DECEMBER 2016 8
3) are used together to compare the models, AICC detects
the problem of overfitting and BIC indicates underfitting.
Another criterion for the choice of the algorithm is the
average prediction overhead (refer Section 6.3), which is
given in the last column of Table 3.
6.5.2 Insights: Evaluation of Learning Techniques With Re-
spect to the Metrics
The above table can guide the developers in making an
informed choice regarding the correct learning technique
to use. The following analysis with respect to the Table
3 can act as lessons learnt for future practitioners and
researchers looking to apply learning techniques to solve
similar problems. Our problem can be directly cast as clas-
sification of the given training data into classes labelled
(Section 4.4) by the consistency levels. Hence, Decision Tree
yields high accuracy and speed (Table 3). We observed
near random predictions with Linear regression (hence we
omitted the results), because: 1) our problem is more of a
classification problem, as already explained, and 2) linear
regression requires the assumption of a linear model. Apart
from linear regression, logistic regression fares worst among
the approaches as indicated by the values of the metrics.
This is because it also treats the problem as a regression
problem, whereas ours is a nonlinear classification problem.
But it is still useful, since it yields a simple relation which
expresses the effect of the parameters RW , T c, and P ,
on L and S for an operation, under different consistency
levels. Also, it can be used to determine the strength of
the relationship among the parameters. Thus it can act as
a basis for complex modelling algorithms. Random forest
further eliminates errors due to overfitting and noisy data
by: 1) bootstrapping multiple learners, and 2) using random-
ized feature selection. Hence, it produces the best accuracy.
However, the prediction overhead due to model complexity,
might increase the latency overhead beyond the thresholds
of real-time subSLA. Similarly, though ANN yields high
accuracy (Table 3), the additional overhead due to the model
complexity (it produces the most complex model) may
overshoot the threshold for real-time use cases. Bayesian
method requires several approximating assumptions which
result in high error scores.
6.5.3 Selection of Algorithm Based on Tradeoff Between
Accuracy And Speed
We define a measure Perf for selecting the learning tech-
nique that provides an optimal tradeoff between the accu-
racy and speed (Table 3). We assign the overhead of the
slowest learning technique (as per Table 3) as the baseline
overhead Obase . Since ANN has the maximum overhead
(1.5 ms) as per Table 3, Obase = 1.5. The speedup Speedup
obtained with a chosen learning algorithm relative to the
slowest algorithm is estimated from the observed relative
decrease in overhead for the chosen algorithm with re-
spect to the baseline Obase (i.e., for the slowest algorithm,
Speedup= 0). Thus, Speedup = Obase −O
Obase
, where O is the
overhead of the chosen technique. We denote the prediction
error for a given learning technique as E. We assign the
error for the least accurate algorithm, as per Table 3, as the
baseline Ebase . Logistic regression has the maximum CV er-
ror (1.98 as per Table 3). Hence, Ebase = 1.98. The value of E
and Ebase depends on the choice of the error measure. The
developer can either use the CV error, or both AICC and BIC
taken together. In the latter case, E is given as the maximum
between the AICC and BIC for the chosen algorithm, and
Ebase is given as the maximum between the AICC and BIC
for the least accurate algorithm. We compute the relative
increase in accuracy (ARel ) obtained with a chosen learning
algorithm with respect to the least accurate algorithm. ARel
is estimated from the proportion of observed decrease in
error for the chosen algorithm with respect to the baseline
Ebase . Thus, ARel = Ebase −E
Ebase
. Finally, the measure Perf is
computed as Perf = wSpeedup × Speedup + wA × ARel ,
where wSpeedup and wA are the weights (real numbers in
the closed interval [0, 1], such that wSpeedup + wA = 1)
assigned by the developers to quantify the relative im-
portance of speed and accuracy, respectively, for selecting
a learning technique. The developer can assign a larger
weight to accuracy (i.e., wA > wSpeedup ), if accuracy is
more important than speedup (and vice versa) according to
the use case. The developer chooses the learning algorithm
that produces the maximum value of Perf , with a given
choice of error measure. We provide the values of Perf for
various learning algorithms in Table 5, with CV Error as
the error measure, and equal weights assigned to accuracy
and speedup (i.e., wA = wSpeedup = 0.5). The value of
Perf is maximum for the decision tree algorithm, implying
that it produces an optimal tradeoff of accuracy and speed
according to our choice of error measure, and the assigned
values for the weights wA and wSpeedup . In most cases,
the difference in overhead between algorithms is negligibly
small, hence the Speedup is not a criterion of significance
for most cases. Thus, generally developers would assign
weights wA = 1 and wSpeedup = 0, choosing the algorithm
with the highest accuracy.
Decision Tree
Bayesian
Learning
Logistic
Regression
Random
Forest
ANN
0.62 0.46 0.27 0.52 0.49
TABLE 5: Evaluation of the Accuracy-Speed Tradeoff
6.6 Analysis of Linear Regression Approach
6.6.1 Background of ANOVA Analysis Technique
We use ANOVA [20] or Analysis of Variance on the fitted
linear regression to perform a hypothesis test for determin-
ing whether there is a significant linear relationship between
input parameters. The degree of freedom DF (refer to Table
6) is given as the number of independent observations in a
sample (training dataset) minus the number of the param-
eters that are to be estimated from sample. The regression
mean square (MSR) for each parameter is computed as the
squared sum of errors (i.e., deviation of observed values
from the estimated values) divided by the degree of freedom
DF. The mean square error (MSE) is computed as the
squared sum of errors divided by n−1, where n is number
of parameters. The F statistic for each parameter is com-
puted as MSR/MSE. The p values in the ANOVA table
denote the probabilistic significance of the corresponding
parameter - a small p value will imply a small probability of
obtaining a large value of the F statistic. The null hypothesis
2332-7790 (c) 2016 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TBDATA.2017.2656121, IEEE
Transactions on Big Data
IEEE TRANSACTIONS ON BIG DATA, VOL. 3, NO. 1, DECEMBER 2016 9
for linear regression states that there is no significant linear
relationship among the model parameters. If we obtain a
large value of F with a small p-value, we reject the null
hypothesis in favor of the alternative hypothesis, i.e., there
exists a significant relationship among model parameters.
TABLE 6: ANOVA results for Linear Regression
ANOVA for Latency function
Variable SumSq DF MeanSq F pValue
C 4.11E+18 1 4.11E+18 7.0851 7.82E-03
RW 1.57E+20 1 1.57E+20 269.75 1.23E-57
P 7.26E+18 1 7.26E+18 12.503 0.00041377
Tc 1.05E+23 1 3.2E+25 325.45 1.5E-65
Error 1.45E+21 2492 5.80E+17
ANOVA for Staleness function
Variable SumSq DF MeanSq F pValue
C 7.04E+08 1 7.04E+08 1.1906 0.27523
RW 7.71E+09 1 7.71E+09 480.32 1.70E-105
P 9.71E+08 1 9.71E+08 3.6821 0.05501
Tc 3.23E+12 1 2.85E+13 538.12 1.57E-123
Error 1.92E+12 2492 7.69E+08
6.6.2 The Resulting ANOVA Table
In case of the regression equation for latency L, the p
values (refer to Table 6) are all approximately 0 (rounded
to 3 digits of precision) for the parameters C, RW , P ,
and T c, respectively, and the values of F statistic are 7.08,
269.75, 12.5, and 325.45, respectively. Such small p values
indicate that probability of observing large values of F-
ratios is very small. Since F is as high as 269.75 for RW ,
the significance of the independent parameter RW in the
regression equation for L is very high, i.e., the probability
of L linearly varying with RW is very high. On the other
hand, F statistics for the parameters C and P is much
lower compared to RW , hence these parameters may have
relatively smaller chance of having a linear relationship
with the dependent parameter L. In the case of the fitted
regression equation for staleness S, the p values (refer to
Table 6) are approximately 0.28, 0, 0.06, and 0, for the
parameters C, RW , P , and T c, respectively, and the values
of F statistic are 1.19, 480.32, 3.68, and 538.12, respectively.
Thus, while the p values for the regression equations of
S are greater than those for L, they are still very small
(approximately 0, when rounded). Hence, the probability of
observing high values of F statistics is very small, especially
for C and P (F is 1.19 and 3.68, respectively, for C and
P ). Thus, the parameters C and P have very low chance
of being independent parameters in the regression equation
for staleness S.
TABLE 7: Confusion matrix generated using Decision Tree
a b c,d,e,f g h <– classified as
6194 238 0 73 93 a = ANY
115 1045 0 2793 495 b = ONE
0 0 0 0 0 c = TWO
0 0 0 0 0 d = THREE
0 0 0 0 0 e = LOCAL QUORUM
0 0 0 0 0 f = EACH QUORUM
117 1073 0 2782 553 g = QUORUM
114 1029 0 2771 h = ALL
Fig. 3: ROC curve generated using Decision Tree: AUC=0.98
Fig. 4: ROC Curve for the Logistic Regression approach
6.7 Analysis of Confusion Matrix and ROC Curves
6.7.1 Background of Confusion Matrix and ROC Curves
True positives (TP) [20] represent examples in the training
dataset that corresponds to a correct true classification. False
positives (FP) or type I errors involve incorrect labelling of
negative examples as positives. True negatives (TN), on the
other hand, are negative examples that have been classified
correctly. False negatives (FN) or type II errors involve
incorrect classification of positive examples as negatives.
A confusion matrix is a tabular visualization of the values
of TP, FP, TN, and FN [20]. True Positive Rate (TPR) and
False Positive Rate (FPR) are obtained as T P R = T P
N
, and
F P R = F P
N
, respectively. Receiver Operating characteris-
tics (ROC) curves display performance of machine learning
algorithms in terms of TPR and FPR. We plot the ROC
from the confusion matrix. The ROC curve plots the TPR
and FPR along the x and the y axis, respectively. The area
under the curve (AUC) quantifies the probability that a
given classifier will assign higher rank to a positive example
over a negative example. The ideal region for ROC curve
is the upper left half which corresponds to an area of 1,
representing a perfect test; an area of 0.5 typically means a
uniformly random test.
2332-7790 (c) 2016 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TBDATA.2017.2656121, IEEE
Transactions on Big Data
IEEE TRANSACTIONS ON BIG DATA, VOL. 3, NO. 1, DECEMBER 2016 10
Fig. 5: ROC curve generated using Bayes Net
6.7.2 Resulting Confusion Matrix and ROC Curves
Figure 7 shows the confusion matrix obtained with the
Decision Tree technique. From the confusion matrix, we
obtain the ROC curve for the decision tree (refer to Figure
3). The area under the curve (AUC), computed using Mann
Whitney technique [20], is 0.99. The Gini coefficient G1 is
computed as G1 = 2AUC − 1 = 2 × 0.99 − 1 =
0.97. But the curve follows below the diagonal indicating
negative predictions; hence we follow the mirrored method,
which mirrors, i.e., reverses, its predictions with reference
to the diagonal line. All predictions obtained are reversed
using above technique before passing the predicted values
to the client for performing the read/write operations.
Figure 5 presents the ROC curve for the Bayes network
approach. The diagonal corresponds to random guesses;
the prediction instances all reside above it. The area under
the ROC curve for the Bayes network is 0.9, and the Gini
coefficient G1 = 2 × 0.9 − 1 = 0.8. Figure 4 presents
the ROC curve for the Logistic Regression approach.
Fig. 6: Cross Validated Deviance of Lasso Fit
6.8 Analysis of Logistic Regression
A saturated model is a model with the maximum number
of regression coefficients that can be estimated. Deviance
for a model M1 is computed as twice the difference be-
tween the log-likelihood of the model M1 and the saturated
model MS. For a nonnegative value of Λ, lasso (we use
lasso regularization with logistic regression, as discussed in
Fig. 7: Residuals for the Logistic Regression approach
Section 4.4.1) minimizes the Λ-penalized deviance, which
is equivalent to maximizing the Λ-penalized log likelihood.
The above deviance represents the value of the loss function,
computed as the negative log-likelihood for the prediction
model, averaged over the validation folds in the cross-
validation procedure. We use Figure 6 to analyze the de-
viance, obtained with Logistic Regression. The green circle
and dashed line locate the Λ term that produces minimal
cross-validation error. Figure 6 plots the value of Λ with
minimum cross-validated MSE, and the highest value Λ
that is within one standard error of minimum MSE. Figure 6
identifies the minimum-deviance point with a green circle,
and the dashed line presents the variation of the regular-
ization parameter Λ. The blue circled point has minimum
Λ with standard deviation at most 1. Figure 7 presents
the residuals obtained with the Lasso plot, computed as
the difference between the observed and predicted values.
Residuals are used to detect outliers to a fitted regression
curve, and in the process help test the assumptions of the
logistic regression model. In Figure 7, we plot the histogram
of raw residuals, i.e., the difference between the observed
and the fitted values of L estimated from the logistic re-
gression equation. The histogram shows that the residuals
are slightly left skewed, indicating that the model can be
further improved.
6.9 Adaptability of OptCon to Varying Workload
Varying Throughput: We run our experiments on a geo-
replicated testbed of 7 Amazon Ec2 m1.large instances
(replica nodes), located in 3 regions (3 nodes in us-west-2a, 2
nodes in us-east, and 2 nodes in eu-west), running Ubuntu
14.04 LTS, loaded with Cassandra 2.1.2 with a replication
factor of 3. We demonstrate the adaptability of OptCon to
variations in throughput in the client workload. We tune
the target throughput parameter (operations per second) in
the YCSB client, and observe the variations in the severity
of staleness (i.e., 95 percentile Γ score) and latency against
changing throughput, under a subSLA comprising a 300
ms threshold for latency (following the SLA for Amazon’s
shopping cart application [16]) and a 10 ms threshold for
staleness. As per Table 5, we choose the decision Tree im-
plementation of the Learner module. Following [10], we run
YCSB workloads for spans of 60 seconds with the following
configuration: 50% ratio of reads and writes, 8 client threads,
2332-7790 (c) 2016 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TBDATA.2017.2656121, IEEE
Transactions on Big Data
IEEE TRANSACTIONS ON BIG DATA, VOL. 3, NO. 1, DECEMBER 2016 11
0
100
200
300
400
500
600
700
800
100 200 300 400 500 600 700
0
5
10
15
20
Latency
(milliseconds)
Gamma
(milliseconds)
Throughput in ops/s
Gamma (milliseconds)
Latency (milliseconds)
(a) Operations With READ ALL/WRITE
ALL
0
100
200
300
400
500
600
700
800
100 200 300 400 500 600 700
0
5
10
15
20
Latency
(milliseconds)
Gamma
(milliseconds)
Throughput in ops/s
Gamma (milliseconds)
Latency (milliseconds)
(b) Operations With READ/WRITE QUO-
RUM
0
100
200
300
400
500
600
700
800
100 200 300 400 500 600 700
0
5
10
15
20
Latency
(milliseconds)
Gamma
(milliseconds)
Throughput in ops/s
Gamma (milliseconds)
Latency (milliseconds)
(c) Operations With READ ONE/WRITE
ONE
0
100
200
300
400
500
600
700
800
100 200 300 400 500 600 700
0
5
10
15
20
Latency
(milliseconds)
Gamma
(milliseconds)
Throughput in ops/s
Gamma (milliseconds)
Latency (milliseconds)
(d) Operations With OptCon
Fig. 8: Adaptability of OptCon to Varying Target Throughput: under a subSLA (Latency: 300ms Staleness: 10ms)
0
50
100
150
200
0
2
4
6
8
10
12
14
Latency
in
milliseconds
Staleness
in
milliseconds
ReadProportion 0.1 0.3 0.5 0.7 0.9
Staleness with OptCon
Latency with OptCon
(a) Operations With OptCon satisfy the sub-
SLA in 100% cases
0
50
100
150
200
0
2
4
6
8
10
12
14
Latency
in
milliseconds
Staleness
in
milliseconds
ReadProportion 0.1 0.3 0.5 0.7 0.9
Staleness with ALL READ/ALL WRITE
Latency with ALL READ/ALL WRITE
(b) Operations With READ ALL/WRITE
ALL satisfy the subSLA in 70% cases
0
50
100
150
200
0
2
4
6
8
10
12
14
Latency
in
milliseconds
Staleness
in
milliseconds
ReadProportion 0.1 0.3 0.5 0.7 0.9
Staleness with ALL READ/QUORUM WRITE
Latency with ALL READ/QUORUM WRITE
(c) Operations With READ ALL/WRITE
QUORUM satisfy the subSLA in 75% cases
0
50
100
150
200
0
2
4
6
8
10
12
14
Latency
in
milliseconds
ReadProportion 0.1 0.3 0.5 0.7 0.9
Latency with QUORUM READ/QUORUM WRITE
(d) Operations With READ QUO-
RUM/WRITE QUORUM satisfy the
subSLA in 75% cases
0
50
100
150
200
0
2
4
6
8
10
12
14
Latency
in
milliseconds
Staleness
in
milliseconds
ReadProportion 0.1 0.3 0.5 0.7 0.9
Staleness with QUORUM READ/ALL WRITE
Latency with QUORUM READ/ALL WRITE
(e) Operations With READ QUO-
RUM/WRITE ALL satisfy the subSLA
in 35% cases
0
50
100
150
200
0
2
4
6
8
10
12
14
Latency
in
milliseconds
Staleness
in
milliseconds
ReadProportion 0.1 0.3 0.5 0.7 0.9
Staleness with ONE READ/ANY WRITE
Latency with ONE READ/ANY WRITE
(f) Operations With READ ONE/WRITE
ANY satisfy the subSLA in 45% cases
Fig. 9: Adaptability of OptCon to Varying Read Proportion under the subSLA SLA-1 (Latency:250ms Staleness: 5ms)
2332-7790 (c) 2016 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TBDATA.2017.2656121, IEEE
Transactions on Big Data
IEEE TRANSACTIONS ON BIG DATA, VOL. 3, NO. 1, DECEMBER 2016 12
0
5
10
15
20
0 100 200 300 400 500
staleness
in
ms
latency in ms
predicted consistency
(a) Operations With OptCon under subSLA-1:
Latency:250ms Staleness: 5ms
0
5
10
15
20
0 100 200 300 400 500
staleness
in
ms
latency in ms
predicted consistency
(b) Operations With OptCon under subSLA-
2: Latency:100ms Staleness: 10ms
0
5
10
15
20
0 100 200 300 400 500
staleness
in
ms
latency in ms
predicted consistency
(c) Operations With OptCon under subSLA-3:
Latency:20ms Staleness: 20ms
0
5
10
15
20
0 100 200 300 400 500
staleness
in
ms
latency in ms
read all/write all
(d) Operations With READ ALL/WRITE
ALL
0
5
10
15
20
0 100 200 300 400 500
staleness
in
ms
latency in ms
read all/write quorum
(e) Operations With READ ALL/WRITE
QUORUM
0
5
10
15
20
0 100 200 300 400 500
staleness
in
ms
latency in ms
read/write quorum
(f) Operations With READ QUO-
RUM/WRITE QUORUM
0
5
10
15
20
0 100 200 300 400 500
staleness
in
ms
latency in ms
read quorum/write all
(g) Operations With READ QUO-
RUM/WRITE ALL
0
5
10
15
20
0 100 200 300 400 500
staleness
in
ms
latency in ms
read all/write any
(h) Operations With READ ALL/WRITE
ANY
0
5
10
15
20
0 100 200 300 400 500
staleness
in
ms latency in ms
read all/write one
(i) Operations With READ ALL/WRITE ONE
0
5
10
15
20
0 100 200 300 400 500
staleness
in
ms
latency in ms
read one/write all
(j) Operations With READ ONE/WRITE ALL
0
5
10
15
20
0 100 200 300 400 500
staleness
in
ms
latency in ms
read quorum/write one
(k) Operations With READ QUO-
RUM/WRITE ONE
0
5
10
15
20
0 100 200 300 400 500
staleness
in
ms
latency in ms
read quorum/write any
(l) Operations With READ QUO-
RUM/WRITE ANY
0
5
10
15
20
0 100 200 300 400 500
staleness
in
ms
latency in ms
read one/write quorum
(m) Operations With READ ONE/WRITE
QUORUM
0
5
10
15
20
0 100 200 300 400 500
staleness
in
ms
latency in ms
write any/read one
(n) Operations With READ ONE/WRITE
ANY
Consistency Level M Value
subSLA-1
WRITE ANY/READ ONE 68
READ/WRITE QUORUM 41
Predicted Consistency 85
subSLA-2
READ QUORUM/WRITE ALL 70
Predicted Consistency 92
subSLA-3
READ ALL/WRITE QUORUM 0
Predicted Consistency 100
(o) Chart Showing M-statistic
values
Fig. 10: Adaptability of OptCon to Different subSLAs
2332-7790 (c) 2016 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TBDATA.2017.2656121, IEEE
Transactions on Big Data
IEEE TRANSACTIONS ON BIG DATA, VOL. 3, NO. 1, DECEMBER 2016 13
“latest” distribution, and keyspace size of 100,000. Figures
8(d), 8(a), 8(b), and 8(c) plot the 95 percentile values of
observed Γ scores and observed latency in milliseconds,
against varying target throughput, with manually chosen
consistency levels and with OptCon. Since frequency of
operations on each key increases with increase in target
throughput applied from the YCSB client, the chances of ob-
serving stale results increases as well. Hence, the plots show
that, with increasing target throughput, the 95 percentile Γ
score increases. Stronger consistency levels (i.e., ALL and
QUORUM, in Figures 8(a) and 8(b)) satisfy the staleness
threshold against all values of applied target throughput.
Due to the high internode communication delay between
replica nodes distributed across the 3 regions under geo-
replicated settings, the consistency level ALL (Figure 8(a))
violates the 300 ms latency threshold. On the other hand,
weak consistency level (i.e., ONE, in Figure 8(c)) satisfies
both the latency and staleness threshold for lower values of
target throughput, but violates the 10 ms staleness thresh-
old where target throughput exceeds 400 op/s. Hence, for
throughput ≤ 400 op /s, ONE or QUORUM is the match-
ing choice, while QUORUM is matching for throughput >
400 op/s. OptCon (Figure 8(d)) applies weak consistency
level ONE for target throughput ≤ 400 op/s, to achieve the
latency threshold and to maximize the observed through-
put. It applies strong consistency level QUORUM for target
throughput > 400 op/s. QUORUM (Figure 8(b)) satisfies
the subSLA for all values of target throughput. However,
choosing ONE instead of QUORUM for throughput ≤ 400
op/s, OptCon (Figure 8(d)) is able is able to produce 61 ms
less latency, on an average, than with QUORUM (Figure
8(b)).
Varying Read Proportion: Next, we vary the read pro-
portion (i.e., proportion of reads) parameter (RW ) while
running YCSB workloads, with OptCon and with manually
chosen consistency levels (Figure 9(a)). In the Figures 9(a)
through 9(f), read proportion is plotted along the x axis, the
observed latency along the primary y axis, and the staleness
along the secondary y axis. We demonstrate that, for a
specific read proportion, only a subset of all possible read-
write consistency levels is optimal, i.e., satisfies latency and
staleness thresholds in a given subSLA. Varying the read
proportion (RW ) in the workload from 0.1 to 1, we observe
that a manually chosen read-write consistency level is opti-
mal for only a fraction of the total number of cases, under
the subSLA SLA-1 (Table 2). OptCon demonstrates its adapt-
ability to varying read proportion, applying the optimal
consistency level for each read proportion, thus satisfying
the subSLA in 100% cases. OptCon can adapt to changing
read proportion due to the presence of the parameter RW
as a feature in the prediction model M. As shown by the
results in [10], the frequency of stale results (i.e., higher
Γ scores) increases with increasing read proportion in the
workload. Hence with higher read proportion in the right-
half of the x axis, stronger read consistency levels are re-
quired to return less stale results. Also, since the proportion
of writes is less, the penalty for the write latency overhead
is less. Hence stronger write consistency levels, that result
in high write latency, are acceptable. Thus for higher read
proportions, combinations of strong read-write consistency
levels, i.e., ALL/QUORUM, succeed in lowering staleness
values to acceptable bounds (right-half of the Figures 9(b),
9(c), 9(d), and 9(e)). Strong consistency levels achieve a
maximum success rate of 75% in satisfying the subSLA, with
READ ALL/QUORUM WRITE (Figure 9(c)) and READ
QUORUM/WRITE ALL (Figure 9(d). Taking into account
the clock skew, the staleness is effectively 0 in these cases.
For the same reasons, for higher read proportions, weak
read consistency levels result in failure to achieve stringent
staleness thresholds, as demonstrated by the points in the
right-half of Figure 9(f). With lower read proportions in the
left-half of the x axes, the frequency of stale read results
are smaller [10]. In this case, weaker manually chosen read-
write consistency levels, i.e., ONE/ANY (left-half of Figure
reffig:label-12) are sufficient for returning consistent results.
For lower read proportions, stronger consistency levels, i.e.,
ALL or QUORUM, are unnecessary, only resulting in high
latency overhead. With weak consistency levels, we observe
a maximum success rate of only ≤ 55% in satisfying the
subSLA (left-half of Figures 9(b) through 9(e)), with READ
ALL/WRITE ALL (Figure 9(a)). Thus, manually chosen
consistency levels show a maximum success rate of 75% in
satisfying the subSLA, in contrast with a 100% success rate
demonstrated by OptCon (Figure 9(a)).
6.10 Adaptability of OptCon to Different subSLAs
We demonstrate that consistency levels predicted by Opt-
Con are always matching, i.e., it always chooses from
the above optimal set of consistency levels for the given
subSLA. We have integrated OptCon (using Decision tree
learning as per Table 5) with the RUBBoS [24] benchmark,
that can simulate concurrent access, and interleaving user
access. Figures 10(a) through 10(n) plot the observed stal-
eness along the y axis, and latency along the x axis, un-
der subSLAs SLA-1, SLA-2 and SLA-3, respectively (Table
2). Figures 10(a) through 10(c) demonstrate experiments
performed with OptCon. Figures 10(d) through 10(j) cor-
respond to experiments done with manually chosen con-
sistency levels. We demonstrate a few extreme cases of
occasional heavy network traffic in real world applica-
tions with artificially introduced network delays (refer to
the few boundary cases with arbitrarily high latency in
Figures 10(a), 10(b), and 10(c), that violate the respective
subSLAs). These boundary cases were simulated using the
traffic shaping feature of Traffic Control [25], a linux-based
tool for configuring network traffic, that controls the trans-
mission rate by lowering the network bandwidth. SLA-1
(Table 2) represents systems demanding stronger consis-
tency. Manually chosen weak read-write consistency level,
i.e, ANY/ONE, fails to satisfy the stringent staleness bound
of SLA-1 (Figures 10(k) through 10(m)) On the other hand,
strong consistency levels (i.e., ALL) (Figures 10(d) through
10(g)) satisfy SLA-1. OptCon satisfies SLA-1 by applying
the respective optimal consistency levels (i.e., ALL in this
case) for SLA-1 (Figure 10(a)). For lower latency bound in
SLA-3 (Table 2), the manually chosen strong consistency
levels (ALL/QUORUM) unnecessarily produce latencies be-
yond the acceptable threshold (Figures 10(d) through 10(g)).
However, weaker consistency levels (i.e., ONE/ANY) prove
to be optimal (Figures 10(k) through 10(j)). Again, OptCon
succeeds by choosing the respective optimal consistency
2332-7790 (c) 2016 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TBDATA.2017.2656121, IEEE
Transactions on Big Data
IEEE TRANSACTIONS ON BIG DATA, VOL. 3, NO. 1, DECEMBER 2016 14
levels (i.e., ONE/ANY in this case) for SLA-3 (Figure 10(c)).
Similarly with SLA-2, a subset of manually chosen fixed
consistency levels (Figures 10(h), 10(i), and 10(j)) produce
optimal results, whereas OptCon successfully achieves SLA-
2 in Figure 10(b) for all operations. We evaluate OptCon
by a measure M, which measures the adaptability of the
system with the subSLA. Following [4], M (Table 10(o))
computes the percentage of cases which did not violate
the subSLA. For all the given subSLAs, the M values for
operations performed with OptCon, given in the Figures
10(a), 10(b), and 10(c), exceed the M for operations using
fixed consistency levels.
7 RELATED WORK
Wada et al. [26] and Bermbach et al. [27] analyze and
quantify consistency from a system-centric perspective that
focuses on the convergence time of the replication proto-
col. Bailis et al. [7] and Rahman et al. [28] instead con-
sider a client-centric perspective of consistency, in which
a consistency anomaly occurs only if differences in state
among replicas lead to a client application actually ob-
serving a stale read. In practice eventual consistency is
preferred over strong consistency in scenarios where the
system must maintain availability during network parti-
tions [29, 5], or when the application is latency-sensitive
and able to tolerate occasional inconsistencies [30]. The state
machine replication paradigm achieves the strongest possi-
ble form of consistency by physically serializing operations
[31]. Lamport’s Paxos protocol is a quorum-based fault-
tolerant protocol for state machine replication. Mencius
improves upon Paxos by ensuring better scalability and
higher throughput under high client load using a rotating
coordinator scheme [32]. Using variations of Paxos, a num-
ber of systems [33, 34, 35, 36] claim to provide scalability and
fault-tolerance. Relatively few systems [37, 4, 13, 14] provide
mechanisms for fine-grained control over consistency. Li et
al. [12] applies static analysis for automated consistency
tuning for relational databases.
8 CONCLUSIONS
OptCon automates client-centric consistency-latency tuning
in quorum-replicated stores, based on staleness and la-
tency thresholds in the SLAs. We implemented the OptCon
Learning module using various Learning algorithms and
do a comparative analysis of the accuracy and overhead
for each case. ANN produces highest accuracy, and Logistic
Regression the lowest overhead. Assigning equal weight to
accuracy and overhead, the decision tree performed best.
OptCon can adapt to variations in the workload, and is at
least as effective as any manually chosen consistency setting
in satisfying a given SLA. OptCon provides insights for
applying machine learning to similar problems.
9 ACKNOWLEDGEMENT
The project is partially supported by Army Research Office
(ARO) under Grant W911NF1010495. Any opinions, find-
ings, and conclusions or recommendations expressed in this
material are those of the authors and do not necessarily re-
flect the views of the ARO or the United States Government.
Wojciech Golab is supported in part by the Natural Sciences
and Engineering Research Council (NSERC) of Canada.
Subhajit Sidhanta is supported in part by the Amazon AWS
(Amazon Web Services) Research Grant.
REFERENCES
[1] A. Lakshman and P. Malik, “Cassandra: A decentralized
structured storage system,” SIGOPS Oper. Syst. Rev., vol. 44, no. 2,
pp. 35–40, Apr. 2010. [Online]. Available: http://doi.acm.org/10.
1145/1773912.1773922
[2] C. Meiklejohn, “Riak PG: Distributed process groups on dynamo-
style distributed storage,” in Erlang ’13.
[3] B. Calder, J. Wang, A. Ogus, N. Nilakantan, A. Skjolsvold,
S. McKelvie, Y. Xu, S. Srivastav, J. Wu, H. Simitci, J. Haridas,
C. Uddaraju, H. Khatri, A. Edwards, V. Bedekar, S. Mainali,
R. Abbasi, A. Agarwal, M. F. u. Haq, M. I. u. Haq,
D. Bhardwaj, S. Dayanand, A. Adusumilli, M. McNett,
S. Sankaran, K. Manivannan, and L. Rigas, “Windows azure
storage: A highly available cloud storage service with strong
consistency,” in Proceedings of the Twenty-Third ACM Symposium on
Operating Systems Principles, ser. SOSP ’11. New York, NY, USA:
ACM, 2011, pp. 143–157. [Online]. Available: http://doi.acm.org/
10.1145/2043556.2043571
[4] D. B. Terry, V. Prabhakaran, R. Kotla, M. Balakrishnan, M. K.
Aguilera, and H. Abu-Libdeh, “Consistency-based service level
agreements for cloud storage,” in SOSP ’13.
[5] D. B. Terry, M. M. Theimer, K. Petersen, A. J. Demers, M. J.
Spreitzer, and C. H. Hauser, “Managing update conflicts in Bayou,
a weakly connected replicated storage system,” in SOSP ’95.
[6] A. Jain, “Using cassandra for real-time analytics: Part 1,” http://
blog.markedup.com/2013/03/cassandra-real-time-analytics/,
2011.
[7] P. Bailis, S. Venkataraman, M. J. Franklin, J. M. Hellerstein,
and I. Stoica, “Probabilistically bounded staleness for practical
partial quorums,” Proc. VLDB Endow., vol. 5, no. 8, pp. 776–787,
Apr. 2012. [Online]. Available: http://dl.acm.org/citation.cfm?
id=2212351.2212359
[8] S. Sidhanta, W. Golab, S. Mukhopadhyay, and S. Basu, “OptCon:
An Adaptable SLA-Aware Consistency Tuning Framework for
Quorum-based Stores,” in CCGrid ’16.
[9] P. Flach, Machine Learning: The Art and Science of Algorithms That
Make Sense of Data. New York, NY, USA: Cambridge University
Press, 2012.
[10] W. M. Golab, M. R. Rahman, A. AuYoung, K. Keeton, J. J. Wylie,
and I. Gupta, “Client-centric benchmarking of eventual consis-
tency for cloud storage systems,” in ICDCS, 2014, p. 28.
[11] P. Cassandra, “Apache cassandra use cases,” http : //
planetcassandra.org/apache-cassandra-use-cases/, 2015.
[12] C. Li, J. Leitão, A. Clement, N. Preguiça, R. Rodrigues, and
V. Vafeiadis, “Automating the choice of consistency levels in
replicated systems,” in USENIX ATC 14.
[13] M. S. Ardekani and D. B. Terry, “A self-configurable geo-replicated
cloud storage system,” in OSDI 14.
[14] K. C. Sivaramakrishnan, G. Kaki, and S. Jagannathan, “Declarative
programming over eventually consistent data stores,” in PLDI ’15.
[15] L. Ravindranath, J. Padhye, R. Mahajan, and H. Balakrishnan,
“Timecard: Controlling user-perceived delays in server-based
mobile applications,” in Proceedings of the Twenty-Fourth ACM
Symposium on Operating Systems Principles, ser. SOSP ’13. New
York, NY, USA: ACM, 2013, pp. 85–100. [Online]. Available:
http://doi.acm.org/10.1145/2517349.2522717
[16] G. DeCandia, D. Hastorun, M. Jampani, G. Kakulapati,
A. Lakshman, A. Pilchin, S. Sivasubramanian, P. Vosshall, and
W. Vogels, “Dynamo: Amazon’s highly available key-value store,”
SIGOPS Oper. Syst. Rev., vol. 41, no. 6, pp. 205–220, Oct. 2007.
[Online]. Available: http://doi.acm.org/10.1145/1323293.1294281
[17] R. Sumbaly, J. Kreps, L. Gao, A. Feinberg, C. Soman, and S. Shah,
“Serving large-scale batch computed data with project Volde-
mort,” in Proc. of the 10th USENIX Conference on File and Storage
Technologies (FAST), 2012.
[18] L. Lamport, “On interprocess communication,” Distributed
Computing, vol. 1, no. 2, pp. 77–85, 1986. [Online]. Available:
http://dx.doi.org/10.1007/BF01786227
[19] B. F. Cooper, A. Silberstein, E. Tam, R. Ramakrishnan, and R. Sears,
“Benchmarking cloud serving systems with YCSB,” in SoCC ’10.
2332-7790 (c) 2016 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TBDATA.2017.2656121, IEEE
Transactions on Big Data
IEEE TRANSACTIONS ON BIG DATA, VOL. 3, NO. 1, DECEMBER 2016 15
[20] S. Weerahandi, Exact Statistical Methods for Data Analysis, ser.
Springer Series in Statistics. New York, NY, USA: Springer Science
and Business Media, 2001.
[21] T. Everts. Web performance today.
[22] C. Hale and R. Kennedy, “Using riak at yammer,” http://dl.
dropbox.com/u/2744222/2011-03-22 Riak-At-Yammer.pdf, 2011.
[23] K. P. Burnham and D. R. Anderson, Model selection and multimodel
inference: a practical information-theoretic approach. Springer Science
& Business Media, 2002.
[24] o. Consortium, “RUBBoS:˜Bulletin Board Benchmark,” 2002.
[25] B. Hubert, T. Graf, G. Maxwell, R. Van Mook, M. Van Oosterhout,
P. B. Schroeder, J. Spaans, and P. Larroy, Linux Advanced Routing
& Traffic Control HOWTO, Online publication, Linux Advanced
Routing & Traffic Control, http://lartc.org/, Apr. 2004. [Online].
Available: http://lartc.org/lartc.pdf
[26] H. Wada, A. Fekete, L. Zhao, K. Lee, and A. Liu, “Data consistency
properties and the trade-offs in commercial cloud storage: the
consumers’ perspective,” in Proc. Conference on Innovative Data
Systems Research (CIDR), 2011, pp. 134–143.
[27] D. Bermbach and S. Tai, “Eventual consistency: How soon is
eventual? An evaluation of Amazon S3’s consistency behavior,”
in Proc. Workshop on Middleware for Service Oriented Computing
(MW4SOC), 2011.
[28] M. R. Rahman, W. , A. AuYoung, K. Keeton, and J. J. Wylie,
“Toward a principled framework for benchmarking consistency,”
in Proc. of the Eighth USENIX Conference on Hot Topics in System
Dependability, ser. HotDep’12. Berkeley, CA, USA: USENIX
Association, 2012, pp. 8–8. [Online]. Available: http://dl.acm.
org/citation.cfm?id=2387858.2387866
[29] K. Birman and R. Friedman, Trading Consistency for Availability in
Distributed Systems. Cornell University. Department of Computer
Science, 1996. [Online]. Available: https://books.google.com/
books?id=XAwDrgEACAAJ
[30] E. A. Brewer, “Towards robust distributed systems (Invited Talk),”
in Proc. of the 19th ACM SIGACT-SIGOPS Symposium on Principles
of Distributed Computing, 2000.
[31] L. Lamport, “Time, clocks and the ordering of events in a dis-
tributed system,” Communications of the ACM, vol. 21, no. 7, p. 558,
1978.
[32] Y. Mao, F. P. Junqueira, and K. Marzullo, “Mencius: Building
efficient replicated state machines for WANs,” in OSDI’08.
[33] J. Rao, E. J. Shekita, and S. Tata, “Using Paxos to build a scalable,
consistent, and highly available datastore,” PVLDB, vol. 4, no. 4,
p. 243, 2011.
[34] W. J. Bolosky, D. Bradshaw, R. B. Haagens, N. P. Kusters,
and P. Li, “Paxos replicated state machines as the basis of
a high-performance data store,” in Proc. of the 8th USENIX
Conference on Networked Systems Design and Implementation,
ser. NSDI’11. Berkeley, CA, USA: USENIX Association, 2011,
pp. 11–11. [Online]. Available: http://dl.acm.org/citation.cfm?
id=1972457.1972472
[35] T. Kraska, G. Pang, M. J. Franklin, S. Madden, and A. Fekete,
“MDCC: multi-data center consistency,” in Proc. of the 8th ACM
European Conference on Computer Systems, ser. EuroSys ’13. New
York, NY, USA: ACM, 2013, pp. 113–126. [Online]. Available:
http://doi.acm.org/10.1145/2465351.2465363
[36] J. C. Corbett, J. Dean, M. Epstein, A. Fikes, C. Frost, J. J. Furman,
S. Ghemawat, A. Gubarev, C. Heiser, P. Hochschild, W. Hsieh,
S. Kanthak, E. Kogan, H. Li, A. Lloyd, S. Melnik, D. Mwaura,
D. Nagle, S. Quinlan, R. Rao, L. Rolig, Y. Saito, M. Szymaniak,
C. Taylor, R. Wang, and D. Woodford, “Spanner: Google’s
globally-distributed database,” in Proc. of the 10th USENIX
Conference on Operating Systems Design and Implementation,
ser. OSDI’12. Berkeley, CA, USA: USENIX Association, 2012,
pp. 251–264. [Online]. Available: http://dl.acm.org/citation.cfm?
id=2387880.2387905
[37] H. Yu and A. Vahdat, “Building replicated internet
services using TACT: A toolkit for tunable availability
and consistency tradeoffs.” in WECWIS, 2000, pp. 75–84.
[Online]. Available: http://dblp.uni-trier.de/db/conf/wecwis/
wecwis2000.html#YuV00
Subhajit Sidhanta Subhajit Sidhanta received
his PhD from the Department of Computer Sci-
ence and Engineering LSU, affiliated with the
School of Electrical Engineering and Computer
Science at Louisiana State University, USA, in
2016. He is currently a Postdoctoral Researcher
working with the Distributed Systems Research
Group at INESC-ID research lab, affiliated with
Instituto Superior Tcnico at University of Lisbon,
Portugal. His major research interest encom-
passes the areas of consistency in distributed
systems, performance modelling of distributed storage and parallel com-
puting systems.
Supratik Mukhopadhyay Supratik Mukhopad-
hyay is a Professor in the Computer Science
Department at Louisiana State University, USA.
He received his PhD from the Max Planck Institut
fuer Informatik and the University of Saarland
in 2001, MS from the Indian Statistical Institute,
India and his BS from Jadavpur University, India.
His areas of specialization include formal veri-
fication of software, inference engines, bigdata
analytics distributed systems, parallel computing
framework and program synthesis.
Saikat Basu Saikat Basu received his PhD from
the Computer Science Department at Louisiana
State University, Baton Rouge, USA. He re-
ceived his BS in Computer Science from Na-
tional Institute of Technology, Durgapur, India
in 2011. His primary research interests include
applications of Computer Vision and Machine
Learning to satellite imagery data and using
High Performance Computing architectures to
address bigdata analytics problems for very
high-resolution satellite datasets. He is particu-
larly interested in the applications of various Deep Learning frameworks
to satellite image understanding and representations.
Wojciech Golab Wojciech Golab received his
PhD in Computer Science from the University of
Toronto in 2010. After a post-doctoral fellowship
at the University of Calgary, he spent two years
as a Research Scientist at Hewlett-Packard Labs
in Palo Alto, and then joined the University of
Waterloo in 2012 as an Assistant Professor in
Electrical and Computer Engineering. His re-
search agenda focuses on algorithmic problems
in distributed computing with applications to stor-
age and processing of big data. Dr. Golab's the-
oretical work on shared memory algorithms won the best paper award
at DISC 2008, and was distinguished by the ACM Computing Reviews
as one of 91 “notable computing items published in 2012”.

More Related Content

Similar to Adaptable SLA-Aware Consistency_Ayantu Debebe Ejeta.pdf

Mca & diplamo java titles
Mca & diplamo java titlesMca & diplamo java titles
Mca & diplamo java titlestema_solution
 
Mca & diplamo java titles
Mca & diplamo java titlesMca & diplamo java titles
Mca & diplamo java titlestema_solution
 
Mca & diplamo java titles
Mca & diplamo java titlesMca & diplamo java titles
Mca & diplamo java titlestema_solution
 
Mca & diplamo java titles
Mca & diplamo java titlesMca & diplamo java titles
Mca & diplamo java titlestema_solution
 
IRJET- A Survey on Remote Data Possession Verification Protocol in Cloud Storage
IRJET- A Survey on Remote Data Possession Verification Protocol in Cloud StorageIRJET- A Survey on Remote Data Possession Verification Protocol in Cloud Storage
IRJET- A Survey on Remote Data Possession Verification Protocol in Cloud StorageIRJET Journal
 
Mobility Information Series - Webservice Architecture Comparison by RapidValue
Mobility Information Series - Webservice Architecture Comparison by RapidValueMobility Information Series - Webservice Architecture Comparison by RapidValue
Mobility Information Series - Webservice Architecture Comparison by RapidValueRapidValue
 
Distributed Algorithms
Distributed AlgorithmsDistributed Algorithms
Distributed Algorithms913245857
 
Modified Active Monitoring Load Balancing with Cloud Computing
Modified Active Monitoring Load Balancing with Cloud ComputingModified Active Monitoring Load Balancing with Cloud Computing
Modified Active Monitoring Load Balancing with Cloud Computingijsrd.com
 
Performance and Cost Evaluation of an Adaptive Encryption Architecture for Cl...
Performance and Cost Evaluation of an Adaptive Encryption Architecture for Cl...Performance and Cost Evaluation of an Adaptive Encryption Architecture for Cl...
Performance and Cost Evaluation of an Adaptive Encryption Architecture for Cl...Editor IJLRES
 
Analysis of quality of service in cloud storage systems
Analysis of quality of service in cloud storage systemsAnalysis of quality of service in cloud storage systems
Analysis of quality of service in cloud storage systemsijfcstjournal
 
Analysis of quality of service in cloud storage systems
Analysis of quality of service in cloud storage systemsAnalysis of quality of service in cloud storage systems
Analysis of quality of service in cloud storage systemsijfcstjournal
 
A DDS-Based Scalable and Reconfigurable Framework for Cyber-Physical Systems
A DDS-Based Scalable and Reconfigurable Framework for Cyber-Physical SystemsA DDS-Based Scalable and Reconfigurable Framework for Cyber-Physical Systems
A DDS-Based Scalable and Reconfigurable Framework for Cyber-Physical Systemsijseajournal
 
Configuration Optimization for Big Data Software
Configuration Optimization for Big Data SoftwareConfiguration Optimization for Big Data Software
Configuration Optimization for Big Data SoftwarePooyan Jamshidi
 
Real time eventual consistency
Real time eventual consistencyReal time eventual consistency
Real time eventual consistencyijfcstjournal
 
2014 IEEE JAVA CLOUD COMPUTING PROJECT A stochastic model to investigate data...
2014 IEEE JAVA CLOUD COMPUTING PROJECT A stochastic model to investigate data...2014 IEEE JAVA CLOUD COMPUTING PROJECT A stochastic model to investigate data...
2014 IEEE JAVA CLOUD COMPUTING PROJECT A stochastic model to investigate data...IEEEFINALSEMSTUDENTPROJECTS
 
SERVICE LEVEL AGREEMENT BASED FAULT TOLERANT WORKLOAD SCHEDULING IN CLOUD COM...
SERVICE LEVEL AGREEMENT BASED FAULT TOLERANT WORKLOAD SCHEDULING IN CLOUD COM...SERVICE LEVEL AGREEMENT BASED FAULT TOLERANT WORKLOAD SCHEDULING IN CLOUD COM...
SERVICE LEVEL AGREEMENT BASED FAULT TOLERANT WORKLOAD SCHEDULING IN CLOUD COM...ijgca
 
SERVICE LEVEL AGREEMENT BASED FAULT TOLERANT WORKLOAD SCHEDULING IN CLOUD COM...
SERVICE LEVEL AGREEMENT BASED FAULT TOLERANT WORKLOAD SCHEDULING IN CLOUD COM...SERVICE LEVEL AGREEMENT BASED FAULT TOLERANT WORKLOAD SCHEDULING IN CLOUD COM...
SERVICE LEVEL AGREEMENT BASED FAULT TOLERANT WORKLOAD SCHEDULING IN CLOUD COM...ijgca
 
SERVICE LEVEL AGREEMENT BASED FAULT TOLERANT WORKLOAD SCHEDULING IN CLOUD COM...
SERVICE LEVEL AGREEMENT BASED FAULT TOLERANT WORKLOAD SCHEDULING IN CLOUD COM...SERVICE LEVEL AGREEMENT BASED FAULT TOLERANT WORKLOAD SCHEDULING IN CLOUD COM...
SERVICE LEVEL AGREEMENT BASED FAULT TOLERANT WORKLOAD SCHEDULING IN CLOUD COM...ijgca
 
IEEE 2014 JAVA CLOUD COMPUTING PROJECTS A stochastic model to investigate dat...
IEEE 2014 JAVA CLOUD COMPUTING PROJECTS A stochastic model to investigate dat...IEEE 2014 JAVA CLOUD COMPUTING PROJECTS A stochastic model to investigate dat...
IEEE 2014 JAVA CLOUD COMPUTING PROJECTS A stochastic model to investigate dat...IEEEGLOBALSOFTSTUDENTPROJECTS
 

Similar to Adaptable SLA-Aware Consistency_Ayantu Debebe Ejeta.pdf (20)

Mca & diplamo java titles
Mca & diplamo java titlesMca & diplamo java titles
Mca & diplamo java titles
 
Mca & diplamo java titles
Mca & diplamo java titlesMca & diplamo java titles
Mca & diplamo java titles
 
Mca & diplamo java titles
Mca & diplamo java titlesMca & diplamo java titles
Mca & diplamo java titles
 
Mca & diplamo java titles
Mca & diplamo java titlesMca & diplamo java titles
Mca & diplamo java titles
 
IRJET- A Survey on Remote Data Possession Verification Protocol in Cloud Storage
IRJET- A Survey on Remote Data Possession Verification Protocol in Cloud StorageIRJET- A Survey on Remote Data Possession Verification Protocol in Cloud Storage
IRJET- A Survey on Remote Data Possession Verification Protocol in Cloud Storage
 
Fault tolerance on cloud computing
Fault tolerance on cloud computingFault tolerance on cloud computing
Fault tolerance on cloud computing
 
Mobility Information Series - Webservice Architecture Comparison by RapidValue
Mobility Information Series - Webservice Architecture Comparison by RapidValueMobility Information Series - Webservice Architecture Comparison by RapidValue
Mobility Information Series - Webservice Architecture Comparison by RapidValue
 
Distributed Algorithms
Distributed AlgorithmsDistributed Algorithms
Distributed Algorithms
 
Modified Active Monitoring Load Balancing with Cloud Computing
Modified Active Monitoring Load Balancing with Cloud ComputingModified Active Monitoring Load Balancing with Cloud Computing
Modified Active Monitoring Load Balancing with Cloud Computing
 
Performance and Cost Evaluation of an Adaptive Encryption Architecture for Cl...
Performance and Cost Evaluation of an Adaptive Encryption Architecture for Cl...Performance and Cost Evaluation of an Adaptive Encryption Architecture for Cl...
Performance and Cost Evaluation of an Adaptive Encryption Architecture for Cl...
 
Analysis of quality of service in cloud storage systems
Analysis of quality of service in cloud storage systemsAnalysis of quality of service in cloud storage systems
Analysis of quality of service in cloud storage systems
 
Analysis of quality of service in cloud storage systems
Analysis of quality of service in cloud storage systemsAnalysis of quality of service in cloud storage systems
Analysis of quality of service in cloud storage systems
 
A DDS-Based Scalable and Reconfigurable Framework for Cyber-Physical Systems
A DDS-Based Scalable and Reconfigurable Framework for Cyber-Physical SystemsA DDS-Based Scalable and Reconfigurable Framework for Cyber-Physical Systems
A DDS-Based Scalable and Reconfigurable Framework for Cyber-Physical Systems
 
Configuration Optimization for Big Data Software
Configuration Optimization for Big Data SoftwareConfiguration Optimization for Big Data Software
Configuration Optimization for Big Data Software
 
Real time eventual consistency
Real time eventual consistencyReal time eventual consistency
Real time eventual consistency
 
2014 IEEE JAVA CLOUD COMPUTING PROJECT A stochastic model to investigate data...
2014 IEEE JAVA CLOUD COMPUTING PROJECT A stochastic model to investigate data...2014 IEEE JAVA CLOUD COMPUTING PROJECT A stochastic model to investigate data...
2014 IEEE JAVA CLOUD COMPUTING PROJECT A stochastic model to investigate data...
 
SERVICE LEVEL AGREEMENT BASED FAULT TOLERANT WORKLOAD SCHEDULING IN CLOUD COM...
SERVICE LEVEL AGREEMENT BASED FAULT TOLERANT WORKLOAD SCHEDULING IN CLOUD COM...SERVICE LEVEL AGREEMENT BASED FAULT TOLERANT WORKLOAD SCHEDULING IN CLOUD COM...
SERVICE LEVEL AGREEMENT BASED FAULT TOLERANT WORKLOAD SCHEDULING IN CLOUD COM...
 
SERVICE LEVEL AGREEMENT BASED FAULT TOLERANT WORKLOAD SCHEDULING IN CLOUD COM...
SERVICE LEVEL AGREEMENT BASED FAULT TOLERANT WORKLOAD SCHEDULING IN CLOUD COM...SERVICE LEVEL AGREEMENT BASED FAULT TOLERANT WORKLOAD SCHEDULING IN CLOUD COM...
SERVICE LEVEL AGREEMENT BASED FAULT TOLERANT WORKLOAD SCHEDULING IN CLOUD COM...
 
SERVICE LEVEL AGREEMENT BASED FAULT TOLERANT WORKLOAD SCHEDULING IN CLOUD COM...
SERVICE LEVEL AGREEMENT BASED FAULT TOLERANT WORKLOAD SCHEDULING IN CLOUD COM...SERVICE LEVEL AGREEMENT BASED FAULT TOLERANT WORKLOAD SCHEDULING IN CLOUD COM...
SERVICE LEVEL AGREEMENT BASED FAULT TOLERANT WORKLOAD SCHEDULING IN CLOUD COM...
 
IEEE 2014 JAVA CLOUD COMPUTING PROJECTS A stochastic model to investigate dat...
IEEE 2014 JAVA CLOUD COMPUTING PROJECTS A stochastic model to investigate dat...IEEE 2014 JAVA CLOUD COMPUTING PROJECTS A stochastic model to investigate dat...
IEEE 2014 JAVA CLOUD COMPUTING PROJECTS A stochastic model to investigate dat...
 

More from tadeseguchi

300-710-SNCF.pdf
300-710-SNCF.pdf300-710-SNCF.pdf
300-710-SNCF.pdftadeseguchi
 
Unmaintained-Design-Icons_v2.0.pptx
Unmaintained-Design-Icons_v2.0.pptxUnmaintained-Design-Icons_v2.0.pptx
Unmaintained-Design-Icons_v2.0.pptxtadeseguchi
 
200+ [updated] computer networking mc qs and answers 2020
200+ [updated] computer networking mc qs and answers 2020200+ [updated] computer networking mc qs and answers 2020
200+ [updated] computer networking mc qs and answers 2020tadeseguchi
 
Operating system mc qs
Operating system mc qsOperating system mc qs
Operating system mc qstadeseguchi
 
300+ top database management system questions and answers 2020
300+ top database management system questions and answers 2020300+ top database management system questions and answers 2020
300+ top database management system questions and answers 2020tadeseguchi
 
200+ [updated] computer networking mc qs and answers 2020
200+ [updated] computer networking mc qs and answers 2020200+ [updated] computer networking mc qs and answers 2020
200+ [updated] computer networking mc qs and answers 2020tadeseguchi
 
300+ top data structures and algorithms mc qs pdf 2020
300+ top data structures and algorithms mc qs pdf 2020300+ top data structures and algorithms mc qs pdf 2020
300+ top data structures and algorithms mc qs pdf 2020tadeseguchi
 

More from tadeseguchi (8)

300-710-SNCF.pdf
300-710-SNCF.pdf300-710-SNCF.pdf
300-710-SNCF.pdf
 
Unmaintained-Design-Icons_v2.0.pptx
Unmaintained-Design-Icons_v2.0.pptxUnmaintained-Design-Icons_v2.0.pptx
Unmaintained-Design-Icons_v2.0.pptx
 
200+ [updated] computer networking mc qs and answers 2020
200+ [updated] computer networking mc qs and answers 2020200+ [updated] computer networking mc qs and answers 2020
200+ [updated] computer networking mc qs and answers 2020
 
Operating system mc qs
Operating system mc qsOperating system mc qs
Operating system mc qs
 
300+ top database management system questions and answers 2020
300+ top database management system questions and answers 2020300+ top database management system questions and answers 2020
300+ top database management system questions and answers 2020
 
200+ [updated] computer networking mc qs and answers 2020
200+ [updated] computer networking mc qs and answers 2020200+ [updated] computer networking mc qs and answers 2020
200+ [updated] computer networking mc qs and answers 2020
 
300+ top data structures and algorithms mc qs pdf 2020
300+ top data structures and algorithms mc qs pdf 2020300+ top data structures and algorithms mc qs pdf 2020
300+ top data structures and algorithms mc qs pdf 2020
 
Getdataback
GetdatabackGetdataback
Getdataback
 

Recently uploaded

Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...HostedbyConfluent
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxOnBoard
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
Google AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAGGoogle AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAGSujit Pal
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...gurkirankumar98700
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 

Recently uploaded (20)

Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptx
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
Google AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAGGoogle AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAG
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 

Adaptable SLA-Aware Consistency_Ayantu Debebe Ejeta.pdf

  • 1. 2332-7790 (c) 2016 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information. This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TBDATA.2017.2656121, IEEE Transactions on Big Data IEEE TRANSACTIONS ON BIG DATA, VOL. 3, NO. 1, DECEMBER 2016 1 Adaptable SLA-Aware Consistency Tuning for Quorum-replicated Datastores Subhajit Sidhanta, Wojciech Golab, Supratik Mukhopadhyay, and Saikat Basu Abstract—Users of distributed datastores that employ quorum-based replication are burdened with the choice of a suitable client-centric consistency setting for each storage operation. The above matching choice is difficult to reason about as it requires deliberating about the tradeoff between the latency and staleness, i.e., how stale (old) the result is. The latency and staleness for a given operation depend on the client-centric consistency setting applied, as well as dynamic parameters such as the current workload and network condition. We present OptCon, a machine learning-based predictive framework, that can automate the choice of client-centric consistency setting under user-specified latency and staleness thresholds given in the service level agreement (SLA). Under a given SLA, OptCon predicts a client-centric consistency setting that is matching, i.e., it is weak enough to satisfy the latency threshold, while being strong enough to satisfy the staleness threshold. While manually tuned consistency settings remain fixed unless explicitly reconfigured, OptCon tunes consistency settings on a per-operation basis with respect to changing workload and network state. Using decision tree learning, OptCon yields 0.14 cross validation error in predicting matching consistency settings under latency and staleness thresholds given in the SLA. We demonstrate experimentally that OptCon is at least as effective as any manually chosen consistency settings in adapting to the SLA thresholds for different use cases. We also demonstrate that OptCon adapts to variations in workload, whereas a given manually chosen fixed consistency setting satisfies the SLA only for a characteristic workload. Index Terms—distributed databases, storage management, distributed systems,, storage management high availability, distributed file systems, middleware, performance evaluation, machine learning, reliability, availability, and serviceability ✦ 1 INTRODUCTION Many quorum-based distributed data stores [1, 2, 3], allow the developers to explicitly declare the desired client- centric consistency setting (i.e., consistency observed from the viewpoint of the client application) for an operation. Such systems accept the client-centric consistency settings for an operation in the form of a runtime argument, typically referred to as the consistency level. The performance of the system, with respect to a given operation, is affected by the choice of the consistency level applied [4]. From the viewpoint of the user, the most important per- formance parameters affected by the consistency level ap- plied are the latency and client-observed consistency anoma- lies, i.e., anomalies in the result of an operation observed from the client application, such as stale reads. Consistency anomalies are measured in terms of the client-centric stal- eness [5], i.e., how stale (old) is the version of the data item (observed from the client application) with respect to the most recent version. According to the consistency level applied, the system waits for coordination among a specific number of replicas containing copies (i.e., versions) of the data item accessed by the given operation [1]. If the system waits for coordination among a smaller number of • S. Sidhanta is currently with INESC-ID / IST, University of Lisbon, Portugal. This work was done while S. Sidhanta was with Louisiana State University. E-mail: ssidhanta@gsd.inesc-id.pt • S. Mukhopadhyay is with Louisiana State University, USA. Email: supratik@csc.lsu.edu • S. Basu was with Louisiana State University, USA. Email: sbasu8@lsu.edu • W. Golab is with University of Waterloo, Canada. Email: wgo- lab@uwaterloo.ca Manuscript received May 5, 2005; revised October 26, 2016. replicas, the chance of getting a stale result (i.e., an older version) increases. Also, the latency for the given operation depends on the waiting time for the above coordination; hence, in turn, depends on the consistency level applied. For example, a weak consistency level for a read operation in Cassandra [1], like READ ONE, requires only one of the replicas to coordinate successfully, resulting in low latency and high chances of a stale read. Hence, while choosing the consistency level, developers must consider how this choice affects the latency and staleness for a given operation. The chosen consistency level must be matching with respect to the latency and staleness thresholds specified in the given service level agreement (SLA), i.e., it must be weak enough to satisfy the latency threshold, while being strong enough to satisfy the staleness threshold. Consider a typical use case at Netflix where a user browses recommended movie titles. Such use cases require real-time response [6]. Hence the SLA typically comprises low latency and higher staleness thresholds. For the given SLA, workload, and network state, the matching choice is a weak read-write consistency level (like ONE/ANY in Cassandra). If the developer applies stronger consistency level, the resulting high latency might violate the SLA. With the current state-of-the-art, the developers have to manually determine a matching consistency level for a given operation at development time. Reasoning about the above choice is difficult because of: 1) the large number of possible SLAs, 2) unpredictable factors like changing network state and varying workload that impact the latency and staleness, and 3) the absence of a well-formed mathematical relation connecting the above parameters [7]. This makes automated consistency tuning under latency and staleness thresholds in the SLA a highly desirable feature for quorum-based
  • 2. 2332-7790 (c) 2016 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information. This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TBDATA.2017.2656121, IEEE Transactions on Big Data IEEE TRANSACTIONS ON BIG DATA, VOL. 3, NO. 1, DECEMBER 2016 2 datastores. We present OptCon [8], a framework that automatically determines a matching consistency level for a given opera- tion under a given SLA. Due to the absence of a closed-form mathematical model capturing the impact of the consistency level, current workload, and network state, on the observed latency and staleness, OptCon applies machine learning [9] to train a model for predicting a matching consistency level under the given SLA, workload, and network state. For the Netflix use case, taking into account the read- heavy workload, the current network state, and the SLA thresholds, OptCon predicts a weak consistency level. The contributions of this paper are: • We introduce OptCon, a novel machine learning- based framework, that can automatically predict a matching consistency level that satisfies the latency and staleness thresholds specified in a given SLA, i.e., the predicted consistency level is weak enough to satisfy the latency threshold, and strong enough to satisfy the staleness threshold. Using decision tree learning, OptCon yields a cross validation error of 0.14 in predicting a matching consistency level under the given SLA. • Experimental results demonstrate that OptCon is at least as effective as any manually chosen con- sistency level in adapting to the different latency and staleness thresholds in SLAs. Furthermore, we also demonstrate experimentally that OptCon sur- passes any manually chosen fixed consistency level in adapting to a varying workload, i.e., OptCon satisfies the SLA thresholds for variations in the workload, whereas a manually chosen consistency level satisfies the SLA for only a subset of workload types. 2 MOTIVATION 2.1 Choice of the SLA Parameters Following prior research by Terry et al. [4], we include latency and staleness in the SLA for operations on quorum- based stores. The choice of consistency level directly affects the latency for a given operation on a quorum-based data- store [1]. While Terry et al. [4] use categorical attributes for specifying the desired consistency (such as read-my-write, eventual, etc.) in the SLA, we use a more fine-grained SLA, that accepts threshold values for the client-centric staleness [10]. Golab et al. [10] demonstrate that both the propor- tion and severity of stale results increases from stronger to weaker consistency levels. The use of a client-centric staleness metric in the SLA enables the developer to specify, on a per-operation basis, the exact degree of staleness of results, that a given client application can tolerate. Following Terry et al. [4], we do not include throughput in the SLA. But it is still desirable to have throughput [11] as a secondary optimization criterion, once the SLA thresholds are satisfied. Depending on the SLA, a set of consistency levels can be matching with respect to the latency and staleness thresholds given in the SLA. Consider a real- time application (such as an online shopping cart) that demands moderately low latency and tolerates relatively higher staleness in the SLA. In such cases, any weaker con- sistency level (like ONE or TWO in Cassandra, depending on the replication factor, can yield latency and staleness values within the given SLA thresholds. In such cases of multiple matching consistency levels, OptCon chooses the one that maximizes throughput. Also, we do not consider parameters like replication factor, number of server nodes, and certain other server-centric parameters in our model, since our focus is optimizing client-centric performance [10]. TABLE 1: Glossary of OptCon Model Parameters L Average latency, i.e., average time delay for a storage operation over a one minute time period T Average throughput i.e., operations/second, over a one minute time period S Client-centric staleness given by the 95 percentile Γ score [10] Tc Thread count, i.e., number of user threads spawned by the storage operations invoked from the client application P Packet count, i.e., number of network packets transmitted to execute a given operation RW Read-write proportion in the client workload C Consistency level applied for an operation 2.2 Motivation for Automated Consistency Tuning The large number of possible use cases [11], each comprising different SLA thresholds for latency and staleness, makes manual determination of a matching consistency level a complex process. The latency and staleness are also affected [7] by the following independent variables (parameters): 1) the packet count parameter, i.e., the number of packets transmitted during a given operation, represents the net- work state [1], and 2) the read proportion (i.e., proportion of reads in the workload), and 3) thread count. Parameters 2 and 3 represent the workload characteristics. We list the parameters in Table 1. Manually reasoning about all of these parameters combined together is difficult, even for a skilled and experienced developer. Further, unpredictable variations in workload, network congestion, and node fail- ures may render a particular consistency level, manually pre-configured at development time, infeasible at runtime [12, 4]. 3 CHALLENGES Currently, there is no closed-form mathematical model [7] that represents the effect of the consistency level on the observed staleness and latency, with respect to the inde- pendent variables. Bailis et al. [7] base their work upon the simplifying assumption that writes do not execute con- currently with other operations. Hence, it is not clear from [7] how to compute the latency and staleness from a real workload. Pileus and Tuba [4, 13] are the only systems that provide fine-grained consistency tuning using SLAs. But, instead of predicting, these systems perform actual trials on the system, and select the consistency level corresponding to the SLA row that produces maximum resultant utility, based on the trial outcomes. The trial-based technique can produce unreliable results due to the unpredictable param- eters like network conditions and workload that affect the observed latency and staleness. Thus, predictions based on
  • 3. 2332-7790 (c) 2016 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information. This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TBDATA.2017.2656121, IEEE Transactions on Big Data IEEE TRANSACTIONS ON BIG DATA, VOL. 3, NO. 1, DECEMBER 2016 3 the outcomes of the trial phase may be unsuitable in the actual running time of the operation. Sivaramakrishnan et al. [14] use a static analysis approach to determine the weakest consistency level that satisfies the correctness con- ditions declared in the form of a contract, i.e., a specification of client-centric consistency properties that the application must satisfy. They do not consider the tradeoff between staleness and latency, and cannot dynamically adapt to varying workload and network state. 4 DESIGN OF OPTCON 4.1 Design Overview In the absence of a closed-form mathematical model relating the consistency level to the observed latency and staleness, OptCon leverages machine learning-based prediction tech- niques [9], following the footsteps of prior research [15]. OptCon is the first work that trains on historic data, and learns a model M relating the consistency level and the input parameters (i.e., the workload and network state) to the observed latency and staleness. The model M can pre- dict a matching consistency level with respect to the given SLA thresholds for latency and staleness, under the current workload and network state. A matching consistency level is weak enough to satisfy the latency threshold, and simul- taneously strong enough to satisfy the staleness threshold. Our definition of matching consistency level is a modified version of the correct consistency level in QUELA [14]. For multiple matching consistency levels, OptCon predicts the consistency level that maximizes the throughput. In contrast to program verification-based static (compile time) approaches [14], OptCon provides on-the-fly predic- tions based on dynamic (runtime) values of the input pa- rameters, i.e., the current workload and the network state (Section 2.2). Thus, OptCon tunes the datastore accord- ing to any dynamic variations in the workload and any given SLA. Furthermore, unlike the state-of-the-art trial- based techniques that base decisions on the values of the parameters obtained during the trial phase [4, 13], OptCon provides reliable predictions (see Section 6.9 and 6.10), tak- ing into consideration the actual runtime values of the input parameters. Average Latency Staleness ≤100ms ≤5ms ≤50ms ≤10ms ≤25ms ≤15ms TABLE 2: Example subSLAs Like Terry et al. [4], users provide latency and staleness thresholds to OptCon in form of an SLA, illustrated in Table 2. Each SLA consists of rows of subSLAs, where each subSLA row, in turn, comprises two columns containing the thresholds for latency and staleness, respectively. Instead of categorical attributes [4], OptCon uses threshold values for staleness [10] in the subSLAs. Unlike Terry et al., subSLAs in OptCon do not have a utility column. Also, subSLAs are not necessary ordered. If any one (or all) of the thresholds for a particular subSLA are violated, OptCon marks the operation as failed with respect to the given subSLA. The handling of such failure cases will depend on the application logic, and is not part of the scope of this paper. Instead of modifying the source code of distributed data- stores, OptCon is designed as a wrapper over existing dis- tributed storage systems. We have implemented and tested OptCon on Cassandra, which follows the Dynamo [16] model. Among the Dynamo-style systems, while Cassandra organises the data in the form of column families, Voldemort [17] and Riak [2] are strictly key-value stores. Also, Cassan- dra and Riak are designed to enable faster writes, whereas Voldemort enables faster reads. But the principles governing the internal synchronization mechanisms are similar for all these systems. Hence the consistency level choices affect the observed latency and staleness in a similar fashion for all these systems. Thus, OptCon can potentially be integrated with any Dynamo-style system. 4.2 Architecture Fig. 1: The architecture of OptCon: rectangles denote mod- ules and the folded rectangle denotes a dataset. OptCon (refer to Figure 1) consists of the following modules: 1) The Logger module records the independent variables (Section 2.2), the observed latency, and staleness, obtained using experiments performed on the datastore with benchmark workloads. It collates these parameters into a training data corpus, which acts as the training dataset. 2) The Learner module learns the model M (refer Section 6.3) from the training dataset, and predicts a matching consistency level that maximizes the throughput. 3) The Client module calls the other modules, and executes the given operation on the datastore. During the training phase, the Client module runs a simulation workload on the given quorum-based datastore. The Client calls the Logger module (Figure 1) to collect the independent variables (parameters) and the observed parameters for the operation from the JMX interface [1], and appends these parameters into the training data. The Client then calls the Learner module, which trains the model M from the training data applying machine learning techniques. During the prediction phase, the Client calls the Learner, with runtime arguments comprising the sub- SLA thresholds and the current values of the independent variables. The subSLA (and also the SLA) can be varied on a per-operation basis. The Learner predicts a matching consistency level from the learnt model M. The Client performs the given operation on the datastore with the predicted consistency level. 4.3 Design of the Logger Module 4.3.1 Model Parameters Collected by the Logger For the reasons explained in Section 2.1, the latency L, i.e., time delay for the operation, and client-centric staleness
  • 4. 2332-7790 (c) 2016 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information. This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TBDATA.2017.2656121, IEEE Transactions on Big Data IEEE TRANSACTIONS ON BIG DATA, VOL. 3, NO. 1, DECEMBER 2016 4 S, are considered as the model parameters that need to satisfy a given SLA. The throughput T is considered as a secondary optimization criterion. As discussed in Section 2.1, server centric parameters, like replication factor and keyspace size, are not considered. Following the reasoning in Section 2.2, the independent variables (parameters) for learning the model M are: 1) the read proportion (RW ) in the operation, 2) the thread count (Tc), i.e., the number of user threads spawned by the client, 3) the packet count (P), i.e., the number of network packets transmitted during the operation, and 4) the consistency level C, provided as a categorical attribute containing attribute values specific to the given datastore. Most operations on quorum based stores are not instan- taneous, but are executed over certain time intervals [10]. Following [7], the Logger uses the average latency over a time interval of one minute for measuring L. The Logger computes S in terms of the Γ metric of Golab et al. [10]. As demonstrated in the above work, Γ is preferred over other client-centric staleness measures for its proven sensitivity to workload parameters and consistency level. Section 6.4 establishes the significance of the above parameters with respect to the OptCon model. Section 6.5 demonstrates the accuracy of the model comprising the above parameters. 4.3.2 Computation of Γ as the Metric for Client-centric Stal- eness The metric Γ is based upon Lamport’s atomicity property [18], which states that operations appear to take effect in some total order that reflects their “happens before” relation in the following sense: if operation A finishes before opera- tion B starts then the effect of A must be visible before the effect of B. The gamma metric defines observed consistency in terms of the responses of storage operations observed from the clients, and does not refer directly to server- side implementation details of the datastore, like replication factor, etc. As a result, the metric is implementation inde- pendent, i.e., it can be applied to any key-value datastore. Γ is a bound on time-based staleness, i.e., every read returns a value at most Γ time units stale. A Γ score quantifies the bound on the timestamp difference between operations on the same key that return different values. We say that a trace of operations recorded by the logger is Γ-atomic if it is atomic in Lamport’s sense, or becomes atomic after each operation in the trace is transformed by decreasing its starting time by Γ/2 time units, and increasing its finish time by Γ/2 time units. In this context the start and finish times are shifted in a mathematical sense for the purpose of analyzing a trace, and do not imply injection of artificial delays; the behavior of the storage system is unaffected. The per-key Γ score quantifies the degree of client-centric staleness incurred by operations applied to a particular key. We considered average per-key Γ, but settled for percentile per-key Γ since it takes into account the skewed nature of the workloads. 4.4 Design of the Learner Module In OptCon, building the model M from training examples (Figure 1) is a one-time process, i.e., the same model can be reused for predicting consistency levels for all operations. Even with node failures, the model may not need to be regenerated since the training data remain unchanged, only the failed predictions, resulting due to node failure, need to be rerun. However, the model will need to be retrained in certain cases where the behavior of the system with respect to the latency and staleness changes, such as when a storage system receives a software or hardware upgrade. 4.4.1 Overview of Possible Learning Techniques We provide a brief overview of the possible Learning tech- niques and their applicability to OptCon. We describe the implementation details, including the various configuration parameters, of each learning algorithm in Section 5. Perfor- mance analysis and insights gathered from each technique are given in Section 6.5.2. We consider Logistic regression and Bayesian learning as they can help visualize the signifi- cance of the model parameters and the dependency relations among these parameters, and thus can provide intuition to the developer for developing complex and more accurate models [9]. We also consider Decision Tree, Random For- est, and Artificial Neural Networks (ANN), since they can produce more accurate predictions, being directly computed from the data. We consider these approaches in the order of their performance given in Table 3, in terms of various model selection metrics. We leave the choice of a suitable learning technique to the developer, rendering flexibility to the framework. Based on our evaluation of the learning algorithms, developers can choose the learning technique that best suits the respective application domain and use case. Assuming linear dependency, we start with the linear regression technique for training the model in OptCon, in an attempt to come up with a simple mathematical model that expresses the effect of RW , T c, and P , on latency and staleness for an operation, with different values of client side consistency level C. Also, linear regression can provide preliminary insights on the relationship among the model parameters, and act as a basis for designing further complicated modelling algorithms, i.e., it can help us un- derstand the dependency relationships between parameters that can be used as a baseline for learning algorithms, for example, Bayesian learning which require assuming some prior distribution of independent parameters. Next, with logistic regression approach, we fit two logit (i.e., logistic) functions for the two dependent variables L and S as follows: logit (π (L)) = β0 + β1C + β2RW + β3P + β4T c and logit (π (S)) = β4 + β5C + β6RW + β7P + β8T c, where C is the applied consistency level, L is the observed latency, S is the observed staleness, RW is the proportion of reads, P is number of packets transmitted, and T c is the number of threads during an operation. βi are the coefficients estimated by iterative least square estimation, and π is the likelihood function. Using ordinary least square estimation, we iteratively minimize a loss function given by the difference of the observed and estimated values, with respect to the training dataset. For eliminating overfitting, we perform L1 regularization with the Lasso method. Next we consider the decision tree learning algorithm because of its simplicity, speed, and accuracy. The given problem can be viewed as a classification problem, which
  • 5. 2332-7790 (c) 2016 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information. This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TBDATA.2017.2656121, IEEE Transactions on Big Data IEEE TRANSACTIONS ON BIG DATA, VOL. 3, NO. 1, DECEMBER 2016 5 classifies the training dataset into classes based on whether the observed latency and staleness (corresponding to each row) fall within the given SLA thresholds. In fact, the problem statement, i.e., choosing a matching consistency level, can be viewed as a classification problem. The mod- elling technique using Decision Tree comprises the fol- lowing phases: 1) Labelling: On the fly computations for maximizing throughput T (i.e., the secondary optimization criteria) in the prediction phase would introduce consider- able overhead. Since the latency L is considered before T is maximized, the additional overhead may result in violation of the latency threshold in the SLA. Hence, OptCon per- forms the computations for T in the labelling phase itself. Using exhaustive search, we label each row in the training dataset with the highest T corresponding to a given values of RW , Tc and P , such that the SLA parameters L and S are within the given SLA thresholds. 2) Training: Next we apply decision tree learning for training a model M from the labelled dataset. Use of error pruning techniques mitigates issues of overfitting [9]. Next, we consider the random forest algorithm that ap- plies ensemble learning to bootstrap multiple weak decision tree learners to generate a strong classifier. It iteratively chooses random samples with replacement from the train- ing dataset, and trains a decision tree on each of the samples. We take the average of the predictions from each decision tree to obtain the final prediction with respect to a test dataset. For the Bayesian approach, we make the following simplifying assumptions to make our problem suitable for Bayesian learning: A1) a prior logistic distribution exists for each of RW , T c, P , and C (see Section 4), A2) the posterior distributions for L, and S are conditionally dependent on the joint distribution of hRW, T c, P, Ci, and A3) the target features L and S are conditionally independent. With these approximations, we plug-in the resultant dependency graph from the Logistic Regression approach as input to the Bayesian learner. Despite these assumptions and approxi- mations, the Bayesian technique can provide intuition to the developers, that can be used to build a more accurate model. Next, we consider Artificial Neural Networks (ANN) for learning a model M. We consider a two-layer perceptron network for ANN; we do not consider Deep Learning be- cause 1) the number of features are manageable for a simple ANN, 2) the size of training dataset is also manageable for ANN, and 3) simple ANN gives satisfactory accuracy, and 4) training delay would be considerably high with deep learning. Though the training phase for ANN is not a factor (since training occurs once), the overhead for the prediction phase is still considerably high (Section 6.3) due to model complexity. 5 IMPLEMENTATION OptCon is developed as a Java-based wrapper over Cas- sandra v2.1.0. The source code can be found in the github repositories located at https://github.com/ssidhanta/. The Logger module is implemented as an extension over the YCSB (Yahoo Cloud Serving Benchmark) 0.1.4 [19] frame- work. It runs as a daemon on top of the datastore, listening for the parameters using the built-in JMX Interface. Logistic Regression, SVM, and Neural Network implementations of the Learning module use Matlab 2013b’s statistical toolbox. Decision Tree and Random Forest are implemented using Java APIs from Weka, an open source machine learning suite. The Bayesian approach uses Infer.net, an open source machine learning toolbox under the .NET framework. We summarize the important configuration parameters in the Learner implementations. Logistic regression uses a min- imization function with additional regularization weights, multiplied by a tunable L1 norm term Λ. We set α = 0.05 and Λ = 25. Decision tree uses a confidence threshold of 0.25 for pruning, and uses random data shuffling with seed = 1. Random forest bootstraps 100 decision trees, and introduces low correlation among the individual trees by selecting a random subset of the features at each split boundary of the component trees. The Bayesian method uses a precision of 0.1. The ANN is implemented as a two-layer perceptron, with default weights = 0 and the default bias = 0. 5.1 Implementation Details: Decision Tree Labelling Phase: We generate the training dataset as fol- lows: 1) we set YCSB parameters, namely read proportion RW and thread count T c, to simulate a characteristic workload w, and 2) the above workload w is executed with different applied consistency levels C. Each row k in the training dataset corresponds to a particular characteristic workload w with a particular consistency level Ck. Each row k records observed values of the output parameters, namely average latency L, staleness S, and throughput T , over a sliding time window of 1 minute. For a given combination of values of the input parameters, comprising the tuple hRW, T c, P i, and the given subSLA thresholds, the output parameters, L, S and T , vary according to the applied consistency level C for the operation corresponding to the row k in the training dataset. The training dataset is labelled as follows: 1) the training dataset is exhaustively searched, 2) in each iteration of the above search step, we group the rows in the training dataset by the values of the input parameters comprising the tuple hRW, T c, P i, such that each group of rows contains an unique combina- tion of values for the above tuple, 3) for each group of rows g, we find the row l in g that satisfies both of the following conditions. Condition 1: The value of throughput Tl in the row l is maximum among the values of throughput for all rows in g. Condition 2: The values of staleness Sl and latency Ll satisfy the thresholds set in the subSLA, 4) rows in each group of rows g in the training dataset are labelled with the consistency level Cl corresponding to the row l that satisfies the above conditions. Training Phase The J48 decision tree [9] learner is loaded with the labelled training data. The J48 learner is an open source implementation of the C4.5 algorithm, which can learn a model from a training dataset comprising both continuous and discrete attributes (i.e., model parameters). For handling continuous attributes, C4.5 assigns a threshold value t to each continuous attribute a. C4.5 splits the rows in the training dataset into categories, based on whether the values of the attribute a for a row r exceed the threshold t or not. C4.5 uses the normalized information gain Ig, computed from the difference in entropy, as the splitting
  • 6. 2332-7790 (c) 2016 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information. This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TBDATA.2017.2656121, IEEE Transactions on Big Data IEEE TRANSACTIONS ON BIG DATA, VOL. 3, NO. 1, DECEMBER 2016 6 (a) Bayesian belief network (b) Structure of decision tree with latency given in millisec- onds Fig. 2: Bayesian belief network and decision tree structure criterion. At each node in the tree, the parameter that produces the highest Ig is chosen to make the split. The algorithm continues recurring along the rest of the training dataset till the complete decision tree is generated. The leaf nodes represent the predicted consistency level for a given workload, characterized by a given set of values of the input parameters hRW, T c, P i. Each group of rows in the training dataset forms a path in the decision tree (from root to a leaf node), for which the predicted consistency level is given by the label of the leaf node. Under a subSLA comprising a 2 ms threshold for latency and a 10 ms threshold for staleness, the generated J48 decision tree is presented in Figure 2(b). The C4.5 algorithm chooses staleness as the first splitting function, assigning a threshold of 0 ms for splitting the training dataset into two categories at the root. The right subtree r corresponds to the examples with observed latency exceeding 0 ms, and the left subtree l corresponds to the rest of the examples. Next, staleness is chosen as the criterion that further splits the examples in the right subtree r into subtrees l1 and r1, based on whether the observed staleness exceeds 0.5 ms or not. The examples in the right subtree r1 are further split into subtrees l2 and r2 based on a latency threshold of 3.967 ms. Then the right subtree r2 derived from the above split is again split into subtrees depending on whether the observed latency exceeds a threshold of 5.431 ms. Prediction Phase During the prediction phase under a given subSLA with thresholds TL and TS for latency and staleness, respectively, the Learner module traverses along a path in the tree (Figure 2(b)), starting from the root to a leaf node l, depending on the values of the input parameters hRW, T c, P i. The consistency level Cl corresponding to the label of the leaf node l is the predicted matching con- sistency level under the given subSLA, for the given values of the input parameters. The predictor constructs the above path as follows. Starting from the root node, it traverses the depth of the tree, marking each node in the path as v. At each step in the path, it splits the tree into branches, based on whether the value of the attribute a corresponding to the label of the current node v falls within a range [0, t] or a range [t, max]; t is the assigned split threshold, and max is the maximum value of a in the training dataset. At each node v in the path, the predictor chooses the branch for which the value of the attribute a, corresponding to the label of the given node, falls within which of the above ranges. The predictor keeps on traversing the tree in the above manner till it reaches a leaf node l, and it chooses the label Cl of the leaf node l as the matching consistency for the given operation. 5.2 Implementation Details: Bayesian Learning With the Bayesian [9] approach, we assume the following. A1: A prior distribution exists for each of the input param- eters, namely RW , T c, P , and C. The input parameter C is a discrete variable, which takes values from a finite set of categorical attributes, comprising the supported list of con- sistency levels. Hence, we assume a Dirichlet distribution [20] for the prior probabilities for C. On the other hand, the parameters RW , T c, and P are continuous variables. We assume that these parameters follow a Gaussian distribu- tion. A2: The posterior distributions for output parameters, namely L and S, are conditionally dependent on the joint distribution of the input parameters hRW, T c, P, Ci. A3: The output parameters L and S are conditionally in- dependent, allowing L and S to be expressed as the effect of the input parameters RW , T c, P , and C. We represent the dependency relation of the model parameters from the equations obtained from the Logistic Regression approach using a dependency graph. Figure 2(a) illustrates such a graph. Our assumption that the output parameters, L and S, are conditionally independent (though actually L and S are dependent on each other) enables us to express them as the effect of the input parameters. This allows us to model the causality among the dependent and independent variables as a simple Bayesian Network. 6 EVALUATION 6.1 Experimental Setup Following [10], we have run our experiments on a testbed of 20 Amazon Ec2 m1-small instances, located in the Ec2 region us-west-2, running Ubuntu 13.10, loaded with Cassandra 2.1.0 with a replication factor of 5, running on Oracle JDK 1.8 Our models are learnt from a training dataset comprising 23040 rows, and a testing dataset of 2560 rows, generated by running varying YCSB 0.1.4 [19] workloads. We have used the NTP (Network Time Protocol) protocol implementation
  • 7. 2332-7790 (c) 2016 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information. This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TBDATA.2017.2656121, IEEE Transactions on Big Data IEEE TRANSACTIONS ON BIG DATA, VOL. 3, NO. 1, DECEMBER 2016 7 from ntp.org to bring down the maximum clock skew to 5 ms. 6.2 Latency and Staleness Ranges Used in the sub- SLAs [21] suggest that the latency for common web applications falls in the range of 75-140 ms. The Pileus system [4], which includes latency in subSLAs as well, uses values between 200 to 1000 ms as the latency threshold. Further reducing the lower bound to favor availability, we choose any value between 101 and 150ms as a candidate for the threshold of latency in OptCon. Following the observations from the Riak deployment at Yammer [22], Bailis et al. [7] uses a range of 1.85-13.6 ms for the staleness bounds, given as the t-visibility values. Hence, following [7], we choose a value within the above range as the staleness threshold. Rounding the bounding values for the ranges given above, our subSLAs use values in the ranges 101-150 ms and 1-13 ms as thresholds for L and S, respectively. 6.3 OptCon Framework Overhead Training the model being a one-time process, the latency overhead due to the training phase is negligible during an operation. The overhead due to training phase, which in- cludes the time spent by the Logger in collecting the training dataset from YCSB, varies from 6.5 ms for Linear Regression to 27 ms for ANN. However, there is still considerable overhead due to the prediction phase, especially with a large training dataset. The prediction phase spans from the call to the Learner module till the predicted consistency level is returned by the Learner module. The average, variance and standard deviation of the overhead are 1 ms, 0.25, and 0.37, respectively, with decision tree learning. The average overhead with different learning techniques is given in Table 3. Approach Cross Validation Error AICC BIC Overhead (ms) Decision Tree 0.14 10.73 51.44 1 Bayesian Learning 0.57 12.85 53.57 1.2 Logistic Regression 1.98 16.32 57.07 0.7 Random Forest 0.14 9.51 50.24 1.3 Neural Network 0.059 12.85 63.54 1.5 TABLE 3: Model Selection Results: As per descending order of AICC and BIC, and ascending order of CV and Overhead 6.4 Dimensionality Reduction: Significance of the Model Parameters Parameter Rank Features Technique Sequential Attribute Selection Technique Misclassification Error For Random Forest C 4.81 1 0.22 RW 0.9 1 1.11 P 2.78 1 0.24 TABLE 4: Significance of the Model Parameters: Significance of a parameter is greater for higher values of Rank Features and Sequential Attribute Selection, and lesser for higher values of Misclassification error We apply dimensionality reduction [9] to determine the features (parameters) that are relevant to the problem, and have low correlation among themselves. We use the follow- ing techniques to detect any redundant feature that can be omitted without losing information. In the rank features approach, we determines the most significant features on the basis of different criterion functions. The above criterion functions are evaluated as a filtering (or a preprocessing) step before the training phase. On the other hand, sequential feature selection is a wrapper technique, that sequentially selects the features until there is no substantial improvement (given by the deviance of the fitted models) in the predictive power for an included feature. For the random forest model, we evaluate the significance of the features in the ensemble of component trees using the misclassification error. Table 4 presents the results of the above techniques on the model parameters C, RW , and P . Sequential feature selection outputs 1 for all the model parameters, indicating that all the parameters are significant. Rank features ranks the relative significance of the features in the order C, P , and RW . Misclassification error ranks the features in the order RW , P , and C. The order of the relative significance is different with each dimensionality reduction technique; the developer must choose the suitable ranking approach based on the learning technique. For example, the misclassification error criterion is to be used for the random forest technique. 6.5 Comparison of the Predictive Power of the Learn- ing Techniques Using Model Selection Metrics 6.5.1 The Model Selection Metrics Used Cross validation error (CV error) [9] is the most common and widely used model selection metric. It represents the accuracy of the model with respect to an unseen test dataset, which is different from the original dataset used to train the model. Using 10-fold cross validation, we partition the dataset into 10 subsamples, and run 10 rounds of validation. In each round a different subsample is used as the test dataset, while we train the model with the rest of the dataset. We compute the average of the mean squared error values for all validation rounds to obtain the mean CV error (Table 3). We also use the Akaike Information Criterion (AIC) [23] which quantifies the quality of the generated model in terms of the information loss during the training phase. For obtaining AIC, we first compute the likelihood of each ob- served outcome in the test dataset with respect to the model. We compute the maximum log likelihood L, i.e., the maxi- mum of natural logarithms over the likelihood of each of the observed outcomes. AICC [23] (Table 3) further improves upon AIC, penalizing overfitting with a correction term for finite sample size n. Thus AICC = 2k−2 ln L+ 2k(k+1) n−k−1 , where k is the number of model parameters. Apart from AICC, we also compute the Bayesian Information Criterion (BIC) [23] that uses the Bayesian prior probability estimates to penalize overfitting. Thus, BIC = 2k ln N − 2 ln L, where the additional parameter N for the sample size en- ables BIC to assign a greater penalty on the sample size than AICC. At smaller sample size, BIC assigns lower penalty on the free parameters than AICC, whereas the penalty is higher for larger sample size (because it uses k ln N instead of k). Normally, the BIC and AICC scores (Table
  • 8. 2332-7790 (c) 2016 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information. This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TBDATA.2017.2656121, IEEE Transactions on Big Data IEEE TRANSACTIONS ON BIG DATA, VOL. 3, NO. 1, DECEMBER 2016 8 3) are used together to compare the models, AICC detects the problem of overfitting and BIC indicates underfitting. Another criterion for the choice of the algorithm is the average prediction overhead (refer Section 6.3), which is given in the last column of Table 3. 6.5.2 Insights: Evaluation of Learning Techniques With Re- spect to the Metrics The above table can guide the developers in making an informed choice regarding the correct learning technique to use. The following analysis with respect to the Table 3 can act as lessons learnt for future practitioners and researchers looking to apply learning techniques to solve similar problems. Our problem can be directly cast as clas- sification of the given training data into classes labelled (Section 4.4) by the consistency levels. Hence, Decision Tree yields high accuracy and speed (Table 3). We observed near random predictions with Linear regression (hence we omitted the results), because: 1) our problem is more of a classification problem, as already explained, and 2) linear regression requires the assumption of a linear model. Apart from linear regression, logistic regression fares worst among the approaches as indicated by the values of the metrics. This is because it also treats the problem as a regression problem, whereas ours is a nonlinear classification problem. But it is still useful, since it yields a simple relation which expresses the effect of the parameters RW , T c, and P , on L and S for an operation, under different consistency levels. Also, it can be used to determine the strength of the relationship among the parameters. Thus it can act as a basis for complex modelling algorithms. Random forest further eliminates errors due to overfitting and noisy data by: 1) bootstrapping multiple learners, and 2) using random- ized feature selection. Hence, it produces the best accuracy. However, the prediction overhead due to model complexity, might increase the latency overhead beyond the thresholds of real-time subSLA. Similarly, though ANN yields high accuracy (Table 3), the additional overhead due to the model complexity (it produces the most complex model) may overshoot the threshold for real-time use cases. Bayesian method requires several approximating assumptions which result in high error scores. 6.5.3 Selection of Algorithm Based on Tradeoff Between Accuracy And Speed We define a measure Perf for selecting the learning tech- nique that provides an optimal tradeoff between the accu- racy and speed (Table 3). We assign the overhead of the slowest learning technique (as per Table 3) as the baseline overhead Obase . Since ANN has the maximum overhead (1.5 ms) as per Table 3, Obase = 1.5. The speedup Speedup obtained with a chosen learning algorithm relative to the slowest algorithm is estimated from the observed relative decrease in overhead for the chosen algorithm with re- spect to the baseline Obase (i.e., for the slowest algorithm, Speedup= 0). Thus, Speedup = Obase −O Obase , where O is the overhead of the chosen technique. We denote the prediction error for a given learning technique as E. We assign the error for the least accurate algorithm, as per Table 3, as the baseline Ebase . Logistic regression has the maximum CV er- ror (1.98 as per Table 3). Hence, Ebase = 1.98. The value of E and Ebase depends on the choice of the error measure. The developer can either use the CV error, or both AICC and BIC taken together. In the latter case, E is given as the maximum between the AICC and BIC for the chosen algorithm, and Ebase is given as the maximum between the AICC and BIC for the least accurate algorithm. We compute the relative increase in accuracy (ARel ) obtained with a chosen learning algorithm with respect to the least accurate algorithm. ARel is estimated from the proportion of observed decrease in error for the chosen algorithm with respect to the baseline Ebase . Thus, ARel = Ebase −E Ebase . Finally, the measure Perf is computed as Perf = wSpeedup × Speedup + wA × ARel , where wSpeedup and wA are the weights (real numbers in the closed interval [0, 1], such that wSpeedup + wA = 1) assigned by the developers to quantify the relative im- portance of speed and accuracy, respectively, for selecting a learning technique. The developer can assign a larger weight to accuracy (i.e., wA > wSpeedup ), if accuracy is more important than speedup (and vice versa) according to the use case. The developer chooses the learning algorithm that produces the maximum value of Perf , with a given choice of error measure. We provide the values of Perf for various learning algorithms in Table 5, with CV Error as the error measure, and equal weights assigned to accuracy and speedup (i.e., wA = wSpeedup = 0.5). The value of Perf is maximum for the decision tree algorithm, implying that it produces an optimal tradeoff of accuracy and speed according to our choice of error measure, and the assigned values for the weights wA and wSpeedup . In most cases, the difference in overhead between algorithms is negligibly small, hence the Speedup is not a criterion of significance for most cases. Thus, generally developers would assign weights wA = 1 and wSpeedup = 0, choosing the algorithm with the highest accuracy. Decision Tree Bayesian Learning Logistic Regression Random Forest ANN 0.62 0.46 0.27 0.52 0.49 TABLE 5: Evaluation of the Accuracy-Speed Tradeoff 6.6 Analysis of Linear Regression Approach 6.6.1 Background of ANOVA Analysis Technique We use ANOVA [20] or Analysis of Variance on the fitted linear regression to perform a hypothesis test for determin- ing whether there is a significant linear relationship between input parameters. The degree of freedom DF (refer to Table 6) is given as the number of independent observations in a sample (training dataset) minus the number of the param- eters that are to be estimated from sample. The regression mean square (MSR) for each parameter is computed as the squared sum of errors (i.e., deviation of observed values from the estimated values) divided by the degree of freedom DF. The mean square error (MSE) is computed as the squared sum of errors divided by n−1, where n is number of parameters. The F statistic for each parameter is com- puted as MSR/MSE. The p values in the ANOVA table denote the probabilistic significance of the corresponding parameter - a small p value will imply a small probability of obtaining a large value of the F statistic. The null hypothesis
  • 9. 2332-7790 (c) 2016 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information. This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TBDATA.2017.2656121, IEEE Transactions on Big Data IEEE TRANSACTIONS ON BIG DATA, VOL. 3, NO. 1, DECEMBER 2016 9 for linear regression states that there is no significant linear relationship among the model parameters. If we obtain a large value of F with a small p-value, we reject the null hypothesis in favor of the alternative hypothesis, i.e., there exists a significant relationship among model parameters. TABLE 6: ANOVA results for Linear Regression ANOVA for Latency function Variable SumSq DF MeanSq F pValue C 4.11E+18 1 4.11E+18 7.0851 7.82E-03 RW 1.57E+20 1 1.57E+20 269.75 1.23E-57 P 7.26E+18 1 7.26E+18 12.503 0.00041377 Tc 1.05E+23 1 3.2E+25 325.45 1.5E-65 Error 1.45E+21 2492 5.80E+17 ANOVA for Staleness function Variable SumSq DF MeanSq F pValue C 7.04E+08 1 7.04E+08 1.1906 0.27523 RW 7.71E+09 1 7.71E+09 480.32 1.70E-105 P 9.71E+08 1 9.71E+08 3.6821 0.05501 Tc 3.23E+12 1 2.85E+13 538.12 1.57E-123 Error 1.92E+12 2492 7.69E+08 6.6.2 The Resulting ANOVA Table In case of the regression equation for latency L, the p values (refer to Table 6) are all approximately 0 (rounded to 3 digits of precision) for the parameters C, RW , P , and T c, respectively, and the values of F statistic are 7.08, 269.75, 12.5, and 325.45, respectively. Such small p values indicate that probability of observing large values of F- ratios is very small. Since F is as high as 269.75 for RW , the significance of the independent parameter RW in the regression equation for L is very high, i.e., the probability of L linearly varying with RW is very high. On the other hand, F statistics for the parameters C and P is much lower compared to RW , hence these parameters may have relatively smaller chance of having a linear relationship with the dependent parameter L. In the case of the fitted regression equation for staleness S, the p values (refer to Table 6) are approximately 0.28, 0, 0.06, and 0, for the parameters C, RW , P , and T c, respectively, and the values of F statistic are 1.19, 480.32, 3.68, and 538.12, respectively. Thus, while the p values for the regression equations of S are greater than those for L, they are still very small (approximately 0, when rounded). Hence, the probability of observing high values of F statistics is very small, especially for C and P (F is 1.19 and 3.68, respectively, for C and P ). Thus, the parameters C and P have very low chance of being independent parameters in the regression equation for staleness S. TABLE 7: Confusion matrix generated using Decision Tree a b c,d,e,f g h <– classified as 6194 238 0 73 93 a = ANY 115 1045 0 2793 495 b = ONE 0 0 0 0 0 c = TWO 0 0 0 0 0 d = THREE 0 0 0 0 0 e = LOCAL QUORUM 0 0 0 0 0 f = EACH QUORUM 117 1073 0 2782 553 g = QUORUM 114 1029 0 2771 h = ALL Fig. 3: ROC curve generated using Decision Tree: AUC=0.98 Fig. 4: ROC Curve for the Logistic Regression approach 6.7 Analysis of Confusion Matrix and ROC Curves 6.7.1 Background of Confusion Matrix and ROC Curves True positives (TP) [20] represent examples in the training dataset that corresponds to a correct true classification. False positives (FP) or type I errors involve incorrect labelling of negative examples as positives. True negatives (TN), on the other hand, are negative examples that have been classified correctly. False negatives (FN) or type II errors involve incorrect classification of positive examples as negatives. A confusion matrix is a tabular visualization of the values of TP, FP, TN, and FN [20]. True Positive Rate (TPR) and False Positive Rate (FPR) are obtained as T P R = T P N , and F P R = F P N , respectively. Receiver Operating characteris- tics (ROC) curves display performance of machine learning algorithms in terms of TPR and FPR. We plot the ROC from the confusion matrix. The ROC curve plots the TPR and FPR along the x and the y axis, respectively. The area under the curve (AUC) quantifies the probability that a given classifier will assign higher rank to a positive example over a negative example. The ideal region for ROC curve is the upper left half which corresponds to an area of 1, representing a perfect test; an area of 0.5 typically means a uniformly random test.
  • 10. 2332-7790 (c) 2016 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information. This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TBDATA.2017.2656121, IEEE Transactions on Big Data IEEE TRANSACTIONS ON BIG DATA, VOL. 3, NO. 1, DECEMBER 2016 10 Fig. 5: ROC curve generated using Bayes Net 6.7.2 Resulting Confusion Matrix and ROC Curves Figure 7 shows the confusion matrix obtained with the Decision Tree technique. From the confusion matrix, we obtain the ROC curve for the decision tree (refer to Figure 3). The area under the curve (AUC), computed using Mann Whitney technique [20], is 0.99. The Gini coefficient G1 is computed as G1 = 2AUC − 1 = 2 × 0.99 − 1 = 0.97. But the curve follows below the diagonal indicating negative predictions; hence we follow the mirrored method, which mirrors, i.e., reverses, its predictions with reference to the diagonal line. All predictions obtained are reversed using above technique before passing the predicted values to the client for performing the read/write operations. Figure 5 presents the ROC curve for the Bayes network approach. The diagonal corresponds to random guesses; the prediction instances all reside above it. The area under the ROC curve for the Bayes network is 0.9, and the Gini coefficient G1 = 2 × 0.9 − 1 = 0.8. Figure 4 presents the ROC curve for the Logistic Regression approach. Fig. 6: Cross Validated Deviance of Lasso Fit 6.8 Analysis of Logistic Regression A saturated model is a model with the maximum number of regression coefficients that can be estimated. Deviance for a model M1 is computed as twice the difference be- tween the log-likelihood of the model M1 and the saturated model MS. For a nonnegative value of Λ, lasso (we use lasso regularization with logistic regression, as discussed in Fig. 7: Residuals for the Logistic Regression approach Section 4.4.1) minimizes the Λ-penalized deviance, which is equivalent to maximizing the Λ-penalized log likelihood. The above deviance represents the value of the loss function, computed as the negative log-likelihood for the prediction model, averaged over the validation folds in the cross- validation procedure. We use Figure 6 to analyze the de- viance, obtained with Logistic Regression. The green circle and dashed line locate the Λ term that produces minimal cross-validation error. Figure 6 plots the value of Λ with minimum cross-validated MSE, and the highest value Λ that is within one standard error of minimum MSE. Figure 6 identifies the minimum-deviance point with a green circle, and the dashed line presents the variation of the regular- ization parameter Λ. The blue circled point has minimum Λ with standard deviation at most 1. Figure 7 presents the residuals obtained with the Lasso plot, computed as the difference between the observed and predicted values. Residuals are used to detect outliers to a fitted regression curve, and in the process help test the assumptions of the logistic regression model. In Figure 7, we plot the histogram of raw residuals, i.e., the difference between the observed and the fitted values of L estimated from the logistic re- gression equation. The histogram shows that the residuals are slightly left skewed, indicating that the model can be further improved. 6.9 Adaptability of OptCon to Varying Workload Varying Throughput: We run our experiments on a geo- replicated testbed of 7 Amazon Ec2 m1.large instances (replica nodes), located in 3 regions (3 nodes in us-west-2a, 2 nodes in us-east, and 2 nodes in eu-west), running Ubuntu 14.04 LTS, loaded with Cassandra 2.1.2 with a replication factor of 3. We demonstrate the adaptability of OptCon to variations in throughput in the client workload. We tune the target throughput parameter (operations per second) in the YCSB client, and observe the variations in the severity of staleness (i.e., 95 percentile Γ score) and latency against changing throughput, under a subSLA comprising a 300 ms threshold for latency (following the SLA for Amazon’s shopping cart application [16]) and a 10 ms threshold for staleness. As per Table 5, we choose the decision Tree im- plementation of the Learner module. Following [10], we run YCSB workloads for spans of 60 seconds with the following configuration: 50% ratio of reads and writes, 8 client threads,
  • 11. 2332-7790 (c) 2016 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information. This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TBDATA.2017.2656121, IEEE Transactions on Big Data IEEE TRANSACTIONS ON BIG DATA, VOL. 3, NO. 1, DECEMBER 2016 11 0 100 200 300 400 500 600 700 800 100 200 300 400 500 600 700 0 5 10 15 20 Latency (milliseconds) Gamma (milliseconds) Throughput in ops/s Gamma (milliseconds) Latency (milliseconds) (a) Operations With READ ALL/WRITE ALL 0 100 200 300 400 500 600 700 800 100 200 300 400 500 600 700 0 5 10 15 20 Latency (milliseconds) Gamma (milliseconds) Throughput in ops/s Gamma (milliseconds) Latency (milliseconds) (b) Operations With READ/WRITE QUO- RUM 0 100 200 300 400 500 600 700 800 100 200 300 400 500 600 700 0 5 10 15 20 Latency (milliseconds) Gamma (milliseconds) Throughput in ops/s Gamma (milliseconds) Latency (milliseconds) (c) Operations With READ ONE/WRITE ONE 0 100 200 300 400 500 600 700 800 100 200 300 400 500 600 700 0 5 10 15 20 Latency (milliseconds) Gamma (milliseconds) Throughput in ops/s Gamma (milliseconds) Latency (milliseconds) (d) Operations With OptCon Fig. 8: Adaptability of OptCon to Varying Target Throughput: under a subSLA (Latency: 300ms Staleness: 10ms) 0 50 100 150 200 0 2 4 6 8 10 12 14 Latency in milliseconds Staleness in milliseconds ReadProportion 0.1 0.3 0.5 0.7 0.9 Staleness with OptCon Latency with OptCon (a) Operations With OptCon satisfy the sub- SLA in 100% cases 0 50 100 150 200 0 2 4 6 8 10 12 14 Latency in milliseconds Staleness in milliseconds ReadProportion 0.1 0.3 0.5 0.7 0.9 Staleness with ALL READ/ALL WRITE Latency with ALL READ/ALL WRITE (b) Operations With READ ALL/WRITE ALL satisfy the subSLA in 70% cases 0 50 100 150 200 0 2 4 6 8 10 12 14 Latency in milliseconds Staleness in milliseconds ReadProportion 0.1 0.3 0.5 0.7 0.9 Staleness with ALL READ/QUORUM WRITE Latency with ALL READ/QUORUM WRITE (c) Operations With READ ALL/WRITE QUORUM satisfy the subSLA in 75% cases 0 50 100 150 200 0 2 4 6 8 10 12 14 Latency in milliseconds ReadProportion 0.1 0.3 0.5 0.7 0.9 Latency with QUORUM READ/QUORUM WRITE (d) Operations With READ QUO- RUM/WRITE QUORUM satisfy the subSLA in 75% cases 0 50 100 150 200 0 2 4 6 8 10 12 14 Latency in milliseconds Staleness in milliseconds ReadProportion 0.1 0.3 0.5 0.7 0.9 Staleness with QUORUM READ/ALL WRITE Latency with QUORUM READ/ALL WRITE (e) Operations With READ QUO- RUM/WRITE ALL satisfy the subSLA in 35% cases 0 50 100 150 200 0 2 4 6 8 10 12 14 Latency in milliseconds Staleness in milliseconds ReadProportion 0.1 0.3 0.5 0.7 0.9 Staleness with ONE READ/ANY WRITE Latency with ONE READ/ANY WRITE (f) Operations With READ ONE/WRITE ANY satisfy the subSLA in 45% cases Fig. 9: Adaptability of OptCon to Varying Read Proportion under the subSLA SLA-1 (Latency:250ms Staleness: 5ms)
  • 12. 2332-7790 (c) 2016 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information. This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TBDATA.2017.2656121, IEEE Transactions on Big Data IEEE TRANSACTIONS ON BIG DATA, VOL. 3, NO. 1, DECEMBER 2016 12 0 5 10 15 20 0 100 200 300 400 500 staleness in ms latency in ms predicted consistency (a) Operations With OptCon under subSLA-1: Latency:250ms Staleness: 5ms 0 5 10 15 20 0 100 200 300 400 500 staleness in ms latency in ms predicted consistency (b) Operations With OptCon under subSLA- 2: Latency:100ms Staleness: 10ms 0 5 10 15 20 0 100 200 300 400 500 staleness in ms latency in ms predicted consistency (c) Operations With OptCon under subSLA-3: Latency:20ms Staleness: 20ms 0 5 10 15 20 0 100 200 300 400 500 staleness in ms latency in ms read all/write all (d) Operations With READ ALL/WRITE ALL 0 5 10 15 20 0 100 200 300 400 500 staleness in ms latency in ms read all/write quorum (e) Operations With READ ALL/WRITE QUORUM 0 5 10 15 20 0 100 200 300 400 500 staleness in ms latency in ms read/write quorum (f) Operations With READ QUO- RUM/WRITE QUORUM 0 5 10 15 20 0 100 200 300 400 500 staleness in ms latency in ms read quorum/write all (g) Operations With READ QUO- RUM/WRITE ALL 0 5 10 15 20 0 100 200 300 400 500 staleness in ms latency in ms read all/write any (h) Operations With READ ALL/WRITE ANY 0 5 10 15 20 0 100 200 300 400 500 staleness in ms latency in ms read all/write one (i) Operations With READ ALL/WRITE ONE 0 5 10 15 20 0 100 200 300 400 500 staleness in ms latency in ms read one/write all (j) Operations With READ ONE/WRITE ALL 0 5 10 15 20 0 100 200 300 400 500 staleness in ms latency in ms read quorum/write one (k) Operations With READ QUO- RUM/WRITE ONE 0 5 10 15 20 0 100 200 300 400 500 staleness in ms latency in ms read quorum/write any (l) Operations With READ QUO- RUM/WRITE ANY 0 5 10 15 20 0 100 200 300 400 500 staleness in ms latency in ms read one/write quorum (m) Operations With READ ONE/WRITE QUORUM 0 5 10 15 20 0 100 200 300 400 500 staleness in ms latency in ms write any/read one (n) Operations With READ ONE/WRITE ANY Consistency Level M Value subSLA-1 WRITE ANY/READ ONE 68 READ/WRITE QUORUM 41 Predicted Consistency 85 subSLA-2 READ QUORUM/WRITE ALL 70 Predicted Consistency 92 subSLA-3 READ ALL/WRITE QUORUM 0 Predicted Consistency 100 (o) Chart Showing M-statistic values Fig. 10: Adaptability of OptCon to Different subSLAs
  • 13. 2332-7790 (c) 2016 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information. This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TBDATA.2017.2656121, IEEE Transactions on Big Data IEEE TRANSACTIONS ON BIG DATA, VOL. 3, NO. 1, DECEMBER 2016 13 “latest” distribution, and keyspace size of 100,000. Figures 8(d), 8(a), 8(b), and 8(c) plot the 95 percentile values of observed Γ scores and observed latency in milliseconds, against varying target throughput, with manually chosen consistency levels and with OptCon. Since frequency of operations on each key increases with increase in target throughput applied from the YCSB client, the chances of ob- serving stale results increases as well. Hence, the plots show that, with increasing target throughput, the 95 percentile Γ score increases. Stronger consistency levels (i.e., ALL and QUORUM, in Figures 8(a) and 8(b)) satisfy the staleness threshold against all values of applied target throughput. Due to the high internode communication delay between replica nodes distributed across the 3 regions under geo- replicated settings, the consistency level ALL (Figure 8(a)) violates the 300 ms latency threshold. On the other hand, weak consistency level (i.e., ONE, in Figure 8(c)) satisfies both the latency and staleness threshold for lower values of target throughput, but violates the 10 ms staleness thresh- old where target throughput exceeds 400 op/s. Hence, for throughput ≤ 400 op /s, ONE or QUORUM is the match- ing choice, while QUORUM is matching for throughput > 400 op/s. OptCon (Figure 8(d)) applies weak consistency level ONE for target throughput ≤ 400 op/s, to achieve the latency threshold and to maximize the observed through- put. It applies strong consistency level QUORUM for target throughput > 400 op/s. QUORUM (Figure 8(b)) satisfies the subSLA for all values of target throughput. However, choosing ONE instead of QUORUM for throughput ≤ 400 op/s, OptCon (Figure 8(d)) is able is able to produce 61 ms less latency, on an average, than with QUORUM (Figure 8(b)). Varying Read Proportion: Next, we vary the read pro- portion (i.e., proportion of reads) parameter (RW ) while running YCSB workloads, with OptCon and with manually chosen consistency levels (Figure 9(a)). In the Figures 9(a) through 9(f), read proportion is plotted along the x axis, the observed latency along the primary y axis, and the staleness along the secondary y axis. We demonstrate that, for a specific read proportion, only a subset of all possible read- write consistency levels is optimal, i.e., satisfies latency and staleness thresholds in a given subSLA. Varying the read proportion (RW ) in the workload from 0.1 to 1, we observe that a manually chosen read-write consistency level is opti- mal for only a fraction of the total number of cases, under the subSLA SLA-1 (Table 2). OptCon demonstrates its adapt- ability to varying read proportion, applying the optimal consistency level for each read proportion, thus satisfying the subSLA in 100% cases. OptCon can adapt to changing read proportion due to the presence of the parameter RW as a feature in the prediction model M. As shown by the results in [10], the frequency of stale results (i.e., higher Γ scores) increases with increasing read proportion in the workload. Hence with higher read proportion in the right- half of the x axis, stronger read consistency levels are re- quired to return less stale results. Also, since the proportion of writes is less, the penalty for the write latency overhead is less. Hence stronger write consistency levels, that result in high write latency, are acceptable. Thus for higher read proportions, combinations of strong read-write consistency levels, i.e., ALL/QUORUM, succeed in lowering staleness values to acceptable bounds (right-half of the Figures 9(b), 9(c), 9(d), and 9(e)). Strong consistency levels achieve a maximum success rate of 75% in satisfying the subSLA, with READ ALL/QUORUM WRITE (Figure 9(c)) and READ QUORUM/WRITE ALL (Figure 9(d). Taking into account the clock skew, the staleness is effectively 0 in these cases. For the same reasons, for higher read proportions, weak read consistency levels result in failure to achieve stringent staleness thresholds, as demonstrated by the points in the right-half of Figure 9(f). With lower read proportions in the left-half of the x axes, the frequency of stale read results are smaller [10]. In this case, weaker manually chosen read- write consistency levels, i.e., ONE/ANY (left-half of Figure reffig:label-12) are sufficient for returning consistent results. For lower read proportions, stronger consistency levels, i.e., ALL or QUORUM, are unnecessary, only resulting in high latency overhead. With weak consistency levels, we observe a maximum success rate of only ≤ 55% in satisfying the subSLA (left-half of Figures 9(b) through 9(e)), with READ ALL/WRITE ALL (Figure 9(a)). Thus, manually chosen consistency levels show a maximum success rate of 75% in satisfying the subSLA, in contrast with a 100% success rate demonstrated by OptCon (Figure 9(a)). 6.10 Adaptability of OptCon to Different subSLAs We demonstrate that consistency levels predicted by Opt- Con are always matching, i.e., it always chooses from the above optimal set of consistency levels for the given subSLA. We have integrated OptCon (using Decision tree learning as per Table 5) with the RUBBoS [24] benchmark, that can simulate concurrent access, and interleaving user access. Figures 10(a) through 10(n) plot the observed stal- eness along the y axis, and latency along the x axis, un- der subSLAs SLA-1, SLA-2 and SLA-3, respectively (Table 2). Figures 10(a) through 10(c) demonstrate experiments performed with OptCon. Figures 10(d) through 10(j) cor- respond to experiments done with manually chosen con- sistency levels. We demonstrate a few extreme cases of occasional heavy network traffic in real world applica- tions with artificially introduced network delays (refer to the few boundary cases with arbitrarily high latency in Figures 10(a), 10(b), and 10(c), that violate the respective subSLAs). These boundary cases were simulated using the traffic shaping feature of Traffic Control [25], a linux-based tool for configuring network traffic, that controls the trans- mission rate by lowering the network bandwidth. SLA-1 (Table 2) represents systems demanding stronger consis- tency. Manually chosen weak read-write consistency level, i.e, ANY/ONE, fails to satisfy the stringent staleness bound of SLA-1 (Figures 10(k) through 10(m)) On the other hand, strong consistency levels (i.e., ALL) (Figures 10(d) through 10(g)) satisfy SLA-1. OptCon satisfies SLA-1 by applying the respective optimal consistency levels (i.e., ALL in this case) for SLA-1 (Figure 10(a)). For lower latency bound in SLA-3 (Table 2), the manually chosen strong consistency levels (ALL/QUORUM) unnecessarily produce latencies be- yond the acceptable threshold (Figures 10(d) through 10(g)). However, weaker consistency levels (i.e., ONE/ANY) prove to be optimal (Figures 10(k) through 10(j)). Again, OptCon succeeds by choosing the respective optimal consistency
  • 14. 2332-7790 (c) 2016 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information. This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TBDATA.2017.2656121, IEEE Transactions on Big Data IEEE TRANSACTIONS ON BIG DATA, VOL. 3, NO. 1, DECEMBER 2016 14 levels (i.e., ONE/ANY in this case) for SLA-3 (Figure 10(c)). Similarly with SLA-2, a subset of manually chosen fixed consistency levels (Figures 10(h), 10(i), and 10(j)) produce optimal results, whereas OptCon successfully achieves SLA- 2 in Figure 10(b) for all operations. We evaluate OptCon by a measure M, which measures the adaptability of the system with the subSLA. Following [4], M (Table 10(o)) computes the percentage of cases which did not violate the subSLA. For all the given subSLAs, the M values for operations performed with OptCon, given in the Figures 10(a), 10(b), and 10(c), exceed the M for operations using fixed consistency levels. 7 RELATED WORK Wada et al. [26] and Bermbach et al. [27] analyze and quantify consistency from a system-centric perspective that focuses on the convergence time of the replication proto- col. Bailis et al. [7] and Rahman et al. [28] instead con- sider a client-centric perspective of consistency, in which a consistency anomaly occurs only if differences in state among replicas lead to a client application actually ob- serving a stale read. In practice eventual consistency is preferred over strong consistency in scenarios where the system must maintain availability during network parti- tions [29, 5], or when the application is latency-sensitive and able to tolerate occasional inconsistencies [30]. The state machine replication paradigm achieves the strongest possi- ble form of consistency by physically serializing operations [31]. Lamport’s Paxos protocol is a quorum-based fault- tolerant protocol for state machine replication. Mencius improves upon Paxos by ensuring better scalability and higher throughput under high client load using a rotating coordinator scheme [32]. Using variations of Paxos, a num- ber of systems [33, 34, 35, 36] claim to provide scalability and fault-tolerance. Relatively few systems [37, 4, 13, 14] provide mechanisms for fine-grained control over consistency. Li et al. [12] applies static analysis for automated consistency tuning for relational databases. 8 CONCLUSIONS OptCon automates client-centric consistency-latency tuning in quorum-replicated stores, based on staleness and la- tency thresholds in the SLAs. We implemented the OptCon Learning module using various Learning algorithms and do a comparative analysis of the accuracy and overhead for each case. ANN produces highest accuracy, and Logistic Regression the lowest overhead. Assigning equal weight to accuracy and overhead, the decision tree performed best. OptCon can adapt to variations in the workload, and is at least as effective as any manually chosen consistency setting in satisfying a given SLA. OptCon provides insights for applying machine learning to similar problems. 9 ACKNOWLEDGEMENT The project is partially supported by Army Research Office (ARO) under Grant W911NF1010495. Any opinions, find- ings, and conclusions or recommendations expressed in this material are those of the authors and do not necessarily re- flect the views of the ARO or the United States Government. Wojciech Golab is supported in part by the Natural Sciences and Engineering Research Council (NSERC) of Canada. Subhajit Sidhanta is supported in part by the Amazon AWS (Amazon Web Services) Research Grant. REFERENCES [1] A. Lakshman and P. Malik, “Cassandra: A decentralized structured storage system,” SIGOPS Oper. Syst. Rev., vol. 44, no. 2, pp. 35–40, Apr. 2010. [Online]. Available: http://doi.acm.org/10. 1145/1773912.1773922 [2] C. Meiklejohn, “Riak PG: Distributed process groups on dynamo- style distributed storage,” in Erlang ’13. [3] B. Calder, J. Wang, A. Ogus, N. Nilakantan, A. Skjolsvold, S. McKelvie, Y. Xu, S. Srivastav, J. Wu, H. Simitci, J. Haridas, C. Uddaraju, H. Khatri, A. Edwards, V. Bedekar, S. Mainali, R. Abbasi, A. Agarwal, M. F. u. Haq, M. I. u. Haq, D. Bhardwaj, S. Dayanand, A. Adusumilli, M. McNett, S. Sankaran, K. Manivannan, and L. Rigas, “Windows azure storage: A highly available cloud storage service with strong consistency,” in Proceedings of the Twenty-Third ACM Symposium on Operating Systems Principles, ser. SOSP ’11. New York, NY, USA: ACM, 2011, pp. 143–157. [Online]. Available: http://doi.acm.org/ 10.1145/2043556.2043571 [4] D. B. Terry, V. Prabhakaran, R. Kotla, M. Balakrishnan, M. K. Aguilera, and H. Abu-Libdeh, “Consistency-based service level agreements for cloud storage,” in SOSP ’13. [5] D. B. Terry, M. M. Theimer, K. Petersen, A. J. Demers, M. J. Spreitzer, and C. H. Hauser, “Managing update conflicts in Bayou, a weakly connected replicated storage system,” in SOSP ’95. [6] A. Jain, “Using cassandra for real-time analytics: Part 1,” http:// blog.markedup.com/2013/03/cassandra-real-time-analytics/, 2011. [7] P. Bailis, S. Venkataraman, M. J. Franklin, J. M. Hellerstein, and I. Stoica, “Probabilistically bounded staleness for practical partial quorums,” Proc. VLDB Endow., vol. 5, no. 8, pp. 776–787, Apr. 2012. [Online]. Available: http://dl.acm.org/citation.cfm? id=2212351.2212359 [8] S. Sidhanta, W. Golab, S. Mukhopadhyay, and S. Basu, “OptCon: An Adaptable SLA-Aware Consistency Tuning Framework for Quorum-based Stores,” in CCGrid ’16. [9] P. Flach, Machine Learning: The Art and Science of Algorithms That Make Sense of Data. New York, NY, USA: Cambridge University Press, 2012. [10] W. M. Golab, M. R. Rahman, A. AuYoung, K. Keeton, J. J. Wylie, and I. Gupta, “Client-centric benchmarking of eventual consis- tency for cloud storage systems,” in ICDCS, 2014, p. 28. [11] P. Cassandra, “Apache cassandra use cases,” http : // planetcassandra.org/apache-cassandra-use-cases/, 2015. [12] C. Li, J. Leitão, A. Clement, N. Preguiça, R. Rodrigues, and V. Vafeiadis, “Automating the choice of consistency levels in replicated systems,” in USENIX ATC 14. [13] M. S. Ardekani and D. B. Terry, “A self-configurable geo-replicated cloud storage system,” in OSDI 14. [14] K. C. Sivaramakrishnan, G. Kaki, and S. Jagannathan, “Declarative programming over eventually consistent data stores,” in PLDI ’15. [15] L. Ravindranath, J. Padhye, R. Mahajan, and H. Balakrishnan, “Timecard: Controlling user-perceived delays in server-based mobile applications,” in Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles, ser. SOSP ’13. New York, NY, USA: ACM, 2013, pp. 85–100. [Online]. Available: http://doi.acm.org/10.1145/2517349.2522717 [16] G. DeCandia, D. Hastorun, M. Jampani, G. Kakulapati, A. Lakshman, A. Pilchin, S. Sivasubramanian, P. Vosshall, and W. Vogels, “Dynamo: Amazon’s highly available key-value store,” SIGOPS Oper. Syst. Rev., vol. 41, no. 6, pp. 205–220, Oct. 2007. [Online]. Available: http://doi.acm.org/10.1145/1323293.1294281 [17] R. Sumbaly, J. Kreps, L. Gao, A. Feinberg, C. Soman, and S. Shah, “Serving large-scale batch computed data with project Volde- mort,” in Proc. of the 10th USENIX Conference on File and Storage Technologies (FAST), 2012. [18] L. Lamport, “On interprocess communication,” Distributed Computing, vol. 1, no. 2, pp. 77–85, 1986. [Online]. Available: http://dx.doi.org/10.1007/BF01786227 [19] B. F. Cooper, A. Silberstein, E. Tam, R. Ramakrishnan, and R. Sears, “Benchmarking cloud serving systems with YCSB,” in SoCC ’10.
  • 15. 2332-7790 (c) 2016 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information. This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TBDATA.2017.2656121, IEEE Transactions on Big Data IEEE TRANSACTIONS ON BIG DATA, VOL. 3, NO. 1, DECEMBER 2016 15 [20] S. Weerahandi, Exact Statistical Methods for Data Analysis, ser. Springer Series in Statistics. New York, NY, USA: Springer Science and Business Media, 2001. [21] T. Everts. Web performance today. [22] C. Hale and R. Kennedy, “Using riak at yammer,” http://dl. dropbox.com/u/2744222/2011-03-22 Riak-At-Yammer.pdf, 2011. [23] K. P. Burnham and D. R. Anderson, Model selection and multimodel inference: a practical information-theoretic approach. Springer Science & Business Media, 2002. [24] o. Consortium, “RUBBoS:˜Bulletin Board Benchmark,” 2002. [25] B. Hubert, T. Graf, G. Maxwell, R. Van Mook, M. Van Oosterhout, P. B. Schroeder, J. Spaans, and P. Larroy, Linux Advanced Routing & Traffic Control HOWTO, Online publication, Linux Advanced Routing & Traffic Control, http://lartc.org/, Apr. 2004. [Online]. Available: http://lartc.org/lartc.pdf [26] H. Wada, A. Fekete, L. Zhao, K. Lee, and A. Liu, “Data consistency properties and the trade-offs in commercial cloud storage: the consumers’ perspective,” in Proc. Conference on Innovative Data Systems Research (CIDR), 2011, pp. 134–143. [27] D. Bermbach and S. Tai, “Eventual consistency: How soon is eventual? An evaluation of Amazon S3’s consistency behavior,” in Proc. Workshop on Middleware for Service Oriented Computing (MW4SOC), 2011. [28] M. R. Rahman, W. , A. AuYoung, K. Keeton, and J. J. Wylie, “Toward a principled framework for benchmarking consistency,” in Proc. of the Eighth USENIX Conference on Hot Topics in System Dependability, ser. HotDep’12. Berkeley, CA, USA: USENIX Association, 2012, pp. 8–8. [Online]. Available: http://dl.acm. org/citation.cfm?id=2387858.2387866 [29] K. Birman and R. Friedman, Trading Consistency for Availability in Distributed Systems. Cornell University. Department of Computer Science, 1996. [Online]. Available: https://books.google.com/ books?id=XAwDrgEACAAJ [30] E. A. Brewer, “Towards robust distributed systems (Invited Talk),” in Proc. of the 19th ACM SIGACT-SIGOPS Symposium on Principles of Distributed Computing, 2000. [31] L. Lamport, “Time, clocks and the ordering of events in a dis- tributed system,” Communications of the ACM, vol. 21, no. 7, p. 558, 1978. [32] Y. Mao, F. P. Junqueira, and K. Marzullo, “Mencius: Building efficient replicated state machines for WANs,” in OSDI’08. [33] J. Rao, E. J. Shekita, and S. Tata, “Using Paxos to build a scalable, consistent, and highly available datastore,” PVLDB, vol. 4, no. 4, p. 243, 2011. [34] W. J. Bolosky, D. Bradshaw, R. B. Haagens, N. P. Kusters, and P. Li, “Paxos replicated state machines as the basis of a high-performance data store,” in Proc. of the 8th USENIX Conference on Networked Systems Design and Implementation, ser. NSDI’11. Berkeley, CA, USA: USENIX Association, 2011, pp. 11–11. [Online]. Available: http://dl.acm.org/citation.cfm? id=1972457.1972472 [35] T. Kraska, G. Pang, M. J. Franklin, S. Madden, and A. Fekete, “MDCC: multi-data center consistency,” in Proc. of the 8th ACM European Conference on Computer Systems, ser. EuroSys ’13. New York, NY, USA: ACM, 2013, pp. 113–126. [Online]. Available: http://doi.acm.org/10.1145/2465351.2465363 [36] J. C. Corbett, J. Dean, M. Epstein, A. Fikes, C. Frost, J. J. Furman, S. Ghemawat, A. Gubarev, C. Heiser, P. Hochschild, W. Hsieh, S. Kanthak, E. Kogan, H. Li, A. Lloyd, S. Melnik, D. Mwaura, D. Nagle, S. Quinlan, R. Rao, L. Rolig, Y. Saito, M. Szymaniak, C. Taylor, R. Wang, and D. Woodford, “Spanner: Google’s globally-distributed database,” in Proc. of the 10th USENIX Conference on Operating Systems Design and Implementation, ser. OSDI’12. Berkeley, CA, USA: USENIX Association, 2012, pp. 251–264. [Online]. Available: http://dl.acm.org/citation.cfm? id=2387880.2387905 [37] H. Yu and A. Vahdat, “Building replicated internet services using TACT: A toolkit for tunable availability and consistency tradeoffs.” in WECWIS, 2000, pp. 75–84. [Online]. Available: http://dblp.uni-trier.de/db/conf/wecwis/ wecwis2000.html#YuV00 Subhajit Sidhanta Subhajit Sidhanta received his PhD from the Department of Computer Sci- ence and Engineering LSU, affiliated with the School of Electrical Engineering and Computer Science at Louisiana State University, USA, in 2016. He is currently a Postdoctoral Researcher working with the Distributed Systems Research Group at INESC-ID research lab, affiliated with Instituto Superior Tcnico at University of Lisbon, Portugal. His major research interest encom- passes the areas of consistency in distributed systems, performance modelling of distributed storage and parallel com- puting systems. Supratik Mukhopadhyay Supratik Mukhopad- hyay is a Professor in the Computer Science Department at Louisiana State University, USA. He received his PhD from the Max Planck Institut fuer Informatik and the University of Saarland in 2001, MS from the Indian Statistical Institute, India and his BS from Jadavpur University, India. His areas of specialization include formal veri- fication of software, inference engines, bigdata analytics distributed systems, parallel computing framework and program synthesis. Saikat Basu Saikat Basu received his PhD from the Computer Science Department at Louisiana State University, Baton Rouge, USA. He re- ceived his BS in Computer Science from Na- tional Institute of Technology, Durgapur, India in 2011. His primary research interests include applications of Computer Vision and Machine Learning to satellite imagery data and using High Performance Computing architectures to address bigdata analytics problems for very high-resolution satellite datasets. He is particu- larly interested in the applications of various Deep Learning frameworks to satellite image understanding and representations. Wojciech Golab Wojciech Golab received his PhD in Computer Science from the University of Toronto in 2010. After a post-doctoral fellowship at the University of Calgary, he spent two years as a Research Scientist at Hewlett-Packard Labs in Palo Alto, and then joined the University of Waterloo in 2012 as an Assistant Professor in Electrical and Computer Engineering. His re- search agenda focuses on algorithmic problems in distributed computing with applications to stor- age and processing of big data. Dr. Golab's the- oretical work on shared memory algorithms won the best paper award at DISC 2008, and was distinguished by the ACM Computing Reviews as one of 91 “notable computing items published in 2012”.