SlideShare a Scribd company logo
1 of 9
Download to read offline
Queuing model estimating response time goals feasibility
Anatoliy Rikun, PhD
Full Capture Solutions, Inc.
Anatoliy_Rikun@yahoo.com
Customers expect a fast response from their systems, but what are the limits
of the system? The paper analyzes a multi-class queuing model, which
evaluates if jobs response time goals are reachable, and if not, what would
be, in some sense, optimal alternatives. This queuing optimization model may
be useful to estimate the limits of tuning, or to compare alternatives between
hardware upgrade and tuning, or estimate a minimal necessary level of the
hardware upgrade to reach a given set of response time goals.
1. Introduction
Response time is important characteristics of
system performance, and response time metrics may
be a part of service level agreement.
Suppose we have a set of workload competing for
system resources with known response time targets
and known levels of system resources consumption.
Based on this information, how to estimate, if these
performance goals are achievable? And, if the
performance goals are not achievable, what is the
“best” possible solution we can get from the system?
This paper is organized as follows. Section 2
describes the “fairness” or “equitability” concept in
finding a good substitution in the case of unachievable
performance goals [GNT4, L9, BGTV3]. This concept
somehow reflects the logic used in IBM WLM Goal
Mode [BB9, SBG9], and it leads to lexicographically
minimal response times distribution in the cases, when
the response time goals are not reachable. Section 3
gives more formal model definition and necessary
assumptions. A greedy algorithm is presented in
Section 4. Section 5 gives some examples, and
estimation for comparison of tuning and upgrade
options and Sections 6 summarizes this paper.
2. Concept of Fairness, and What Can Be a
Reasonable Substitution For Unreachable
Goals?
Let us consider a CPU-bounded queuing
system of n jobs classes. For the sake of simplicity,
suppose all the job classes have the same importance
level, the same service times, but may have different
CPU utilizations, {Ui}, and different performance
goals, {Gi}, i=1,…,n. The job response time goals
may be defined in different ways: in may be, in particular,
average response time goals, deadlines, percentile
response time goals, execution velocity and others. Even
though in this paper only average response time goals
are considered, some of the conclusions can be
extended to the other metrics, too.
To evaluate system performance, it is natural to use
relative performance, or performance index metric
[SBG9], pi :
pi = Ri/Gi (1)
Thus, if all the performance indices are less or equal to 1,
then each job meet its performance goal. Now, suppose
that in this system, goals {Gi} are not reachable, i.e. for
any feasible response time vector, R={Ri} ∈ X , at least
one job average response time exceeds its goal, Gi .
This may be expressed in the term of the “worst” case
performance index, γl :
γ1 = max{ pi | i=1,…n} > 1 (2)
What could be a best possible response time distribution
in the case, when system capacity is not sufficient to
reach all the performance goals? A traditional approach
would be to minimize average – or somehow weighted
average over all job classes’ response times:
Ravg
*
= min { ∑i=1
n
cjRj | R ∈ X } (3)
Here ci – some job classes weighted coefficients1
, X – all
reachable combinations of the response time vector R,
1 In the calculations below ci =1/Gi i.e. (3) minimizes
the average performance index. But ci may be a cost
associated with delay for job class i.
and Ravg
*
- corresponds to a minimal possible weighted
average response time, which can be attained in the
system. A well known analytical approach to solving
problems like (3), so called cµ rule is very simple: it
recommends giving higher utilization to the jobs with
smaller ratio Uj/cj (“small” job first) 2
.
But, in spite of the beauty of cµ rule, the
underlying optimization model (3) may lead to
undesired results in the goal mode context. The reason
is that minimal average response time approach may
lead to solutions, when “small” jobs outperforms their
goals at the expense of “big” jobs, who do not meet
their goals. An example of this situation considered in
the Table 1. In that example, all the jobs have the
same response time goals, 1 sec., but, the optimal
average response time proves to be 0.17 for small jobs
at the expense of a big job, which has response time
1.7 sec. At the same time, all the goals are reachable
in this example. Thus, minimization of weighted
average response time leads to un-fair solutions.
At the same time, we need to reach all the response
time goals, or if this is not possible, to approach all of
them, as close as possible. In particular [IBM00], IBM
WLM tries to make all the workloads within a given
importance level to have similar performance index,
and when an important workload is missing its goal
WLM tries to provide it with the additional resources.
What this mean mathematically is called a mini-max or
lexicographical optimization [BB03
, GNT4, BGTV3, L9].
In other words, we are trying to reach a solution, when
all the jobs reach their goals, and have the same
performance index, γ. If this is not possible, jobs which
miss their goal most (and have the highest
performance index, γ.1) receive the highest priority;
then we are trying to reach the same performance
index, γ.2 for the rest of the system; if this not possible
– again most troubled jobs in the second group receive
the highest priority and so on. The algorithm details
with simple calculations are presented in section 4.
A comparison between the lexicographical
optimization, mimicking IBM WLM behavior, and the
weighted average response time minimization, is
2 cµ rule lead to optimal solution in (3) if feasible
area X satisfies to the conservation law [CY1]. In
case of different service times and identical weights,
cj the cµ rule recommend short-job-first (SJF) priority.
3 Prof. E. Bolker and Prof. J. Buzen, in their paper
did not use explicitly this terminology, but de-facto
their goal mode modeling, which was inspired by
IBM WLM is equivalent to the lexicographical
optimization, and all the results presented at Tables
1, 2 could be received using their approach, too.
presented in the Tables 1, 2 for a simple example of N=5
jobs. In this example the corresponding weight factors in
the “average response time minimization” model (3) are
in inverse proportion to the response time goals, ci = 1/Gj
– in this way jobs with more challenging response time
goals would have higher weight in the optimization. This
approach leads to the minimal average performance
index.
In the example below, the first 4 jobs are “small”
- each of them has utilization 0.1 and the last job is a
“big” one – its utilization is 0.5 and we consider M/M/1
queuing preemptive priority system. Let all the jobs have
the same response time goals, Gi = 1.0 sec, i=1,…,5 and
their service times, are the same, Si =0.1 sec, too:
Table 1 Minimal Average vs. “Fair” Solution when
response time goals are reachable.
Job Util.
& Goals
Minimal Ave-
rage Response
Time
Lexicographi-
cally Minimal
Resp. Time
Job
#
Ui Gi Ri
avg
pi
avg
Ri
*
pi
*
1 0.1 1 0.17 0.17 1 1
2 0.1 1 0.17 0.17 1 1
3 0.1 1 0.17 0.17 1 1
4 0.1 1 0.17 0.17 1 1
5 0.5 1 1.67 1.67 1 1
R
avg
0.47 0.47 1 1
γl
* 1.67 1
If all the response time goals are identical, the
solution of (3) is defined by “small job first” priority
distribution: πS =(1,1,1,1,2) – which recommends
running small jobs with the highest priority 1, and gives
lower priority to the big job (#5) . As a result, small jobs
response times is ten times faster than the big job
response. As a result, response time for small jobs (0.17)
is more than 5 times better than their goals, while big job
response time is almost 70% greater than its goal.
In some situation, this kind of “optimization” is
not acceptable, because, for example a user may not feel
the difference between system responses 0.3 and 0.1
sec but may be sensitive if system response is 3 sec
instead of 1 sec.
At the same time, all the goals in this example
are reachable – and the algorithm of lexicographical
minimization, which will be presented in section 4 can
find it. To check this, consider the response time vector,
which corresponds to the “big job first” priority rule, πB
=(2,1,1,1,1). In this case the big job response time R5 =
S5/((1- U5) = 0.1/0.5 =0.2, and for small jobs, Ri = Si/((1-
U5)▪ 1- ∑i=1
5
Uj))=0.1/=2.0, j=1,..4. Thus, if we use priority
πS 54% of the time and use the πB priority rule 45% of
the time, the average response time in the system will
match the response time goals for each job.
In the last two rows of the Table 1 we
presented weighted average respond times and the
“worst case” performance index. As expected, minimal
weighted average response time (R
avg =
0.47) is smaller
that the weighted average for “fair” solution (1), but its
“most missing goal “ index γl is worse: γl
*
= 1.67 >1.
Now, consider the case when the performance
goals are not reachable. In the Table 2, presented
below, all job’s utilizations and service times are the
same as in the previous case, but their goals are
different. In this case the average response time
minimization approach suggest using priority π=
(1,2,3,5,4) - which corresponds to sorting of 1/UiGi in
the decreasing order.
Even though in this case all the response
times are quite different and are not reachable, one
can see that lexicographical optimization leads to a
“fair” solution, when jobs {1,2,3,5} are all missing their
goals by 19%, and job 4 significantly exceed its goal4
Table 2 Minimal Average vs. “Fair” Solution
when response time goals are NOT reachable.
Job Util.
& Goals
Minimal Ave-
rage
Response
Time
Lexicograph.
Minimal
Resp. Time
Job
#
Ui Gi Ri
avg
pi
avg
Ri
*
pi
*
1 0.1 0.125 0.11 0.889 0.15 1.185
2 0.1 0.250 0.14 0.556 0.30 1.185
3 0.1 0.500 0.18 0.357 0.59 1.185
4 0.1 10.00 5.00 0.500 5.00 0.500
5 0.5 0.50 0.71 1.429 0.59 1.185
R
avg
1.67 0.47 1.33 1.05
γl
* 1.43 1.19
At the same time, the average response time
solution leads to quite different performance indices:
for jobs with challenging goals, the small jobs’
response times are 11-64% less than their goals, while
the big job exceeds its goal by 43%.
4 Job 4 exceeds its goal just because its goal G4 = 10
sec is not restrictive, at all (R4 = 5 in both solutions,
and it corresponds to job 4 running with the lowest
priority).
3. Model Definition.
3.1 Conservation Law and two equivalent
approaches to describe the set of achievable
solutions. Our goal is to find algorithm which could
evaluate if respond time goals are reachable, and if they
are not reachable, this algorithm should suggest a
solution which would be (in relative terms) as closed to
the goals as possible.
However, the problem of modeling priority
queuing systems in general case is very complicated. It
become significantly more tractable in the cases when so
called conservation law works. There exist numerous
different types of formulation for conservation law for
different types of queues [CY1, BGTV3, LK5, GM0, BB9,
FG8]. For example, if we have preemptive priority M/M/m
queue and all jobs have the same service time, changing
priorities does not change total queue (or average
response time) in the system. Intuitively, this sounds
obvious – because higher priority and shorter response
time for one job would lead to comparable delay for other
jobs. From another side, if service times for different job
classes are significantly different, giving higher priority to
shorter jobs would lead to smaller average queue than,
e.g. FIFO, and the conservation law does not work [BB9].
There are a few general conditions, which are necessary
for conservation law. For example, server performance
should be the same over the time; server is not allowed to
be idle when there are jobs waiting to be served. Besides,
scheduling should not be anticipating- i.e. should not be
based on the service time requirements of the customers
[CY1].
It should be clarified, that the restriction to analyze
only conservative queuing models is not related to IBM
Workload Manager logic. We need the conservation law
just to justify algorithm below. This algorithm is precise if
conservation law works, and leads to an approximate
solution in the case of moderate deviation from this law. As
it is noted in [BB9], “..normally, conservation law is quite
robust, meaning that it likely to be very close to correct,
even when assumption of equal (average) service times
across is violated. Time slicing and other preemption
mechanisms also contribute to the robustness of the
conservation law by reducing the likelihood of very long
processing burst”.
As an illustration of the conservation law in action
for quite different resource management tools, one can
look at Fig. 3, 10, 13 in [BD0] for measured total queues5
for SUN SRM, HP’s Process Resource Manager and IBM’s
Workload Manager for AIX. In all these cases, different
workload received different CPU’s shares, but the resulting
total queue was approximately constant independently on
which of workloads received bigger share.
5 At [BD0] authors presented “weighted average response
time”, which is proportional to the total queue.
Now, consider description of all achievable
performance states for a system of n job classes. Let
vector q={qi} denotes all job’s average queues, µ - their
service rates, λ ={λj} – arrival rates, u={ui} –utilizations,
and R={Ri} – response time vector. Let N={1,…n}
denotes the set of all jobs, and for each subset of job
classes, S ⊆ N we denote the total utilization of the jobs
in this group as ∑∈
=
Sj
jS uU , and QS = Q(US) is the
M/M/m queue corresponding to this utilization and FCFS
priority scheduling. According to the conservation law,
the set X of all feasible combinations of queues may be
described by the set of 2n
-1 inequalities (4) and equity
(5), for each possible subset S of all jobs:
∑∈
≥
Sj
Sj Qq , (4)
And, for the total queue in the system, QN – when in (4)
S=N this inequality become an equity:
∑∈
=
Nj
Nj Qq , (5)
This amazingly simple description of all possible
states of the queuing system is made possible by
conservation law. The inequalities (4)-(5) have very
simple explanation: for any group of jobs, S – the minimal
possible total queue in this group ∑qj = QS - corresponds
to the case, when all the jobs from this group are running
with the highest priority. Due to the preemptive
scheduling, their total queue is not affected by any other
jobs in the system and may be calculated as FCFS
priority scheduling queue and is defined by total
utilizations for the jobs in this group, US. In all the other
cases when some the other jobs j ∉S are running with
the same or higher priorities as jobs from group S, they
affect the total queue in the group S and we have strict
inequality: ∑qj > QS. Combining these two observations,
we have the inequality (4).
Let’s illustrate this approach with an example of
n=2 jobs, which have utilizations u=(0.25, 0.50) on 1-
processor CPU . The total queue in the system, QN =
(u1+u2)/(1-u1 –u2) = 3; if job 1 is running with highest
priority, its queue will Q1 = u1//(1-u1) = 0.33. When job 2
has the highest priority, its queue will be: Q2 = u2//(1-u2)
= 1. Thus, all possible states q=(q1, q2) in this system are
described by system of 3 inequality:
q1 ≥ Q1
q2 ≥ Q2
q1 + q2 = QN
For example, in this system queue state (q1, q2)
=(1.5, 1.5) is feasible because all the inequalities above are
fulfilled. Of course in the case of two jobs, it is clear in this
case how to reach this state: job 1 should be running with
high priority (QN - q1 –Q2) / ( QN -Q1-Q2) ≈ 30% of the time,
and the rest, 70% of the time it should have low priority.
This approach of representation of all possible states of
queuing system as a combination of queues, corresponding
to static priority rules was used in the pioneering paper
[GM0], and for conservative queuing systems it is
equivalent to the representation (4)-(5) [BB9, CY1].
But, in any case whether the feasibility area is described
by inequalities like (4)-(5), or by using priority, the number
of inequalities , or possible priority states grows
enormously when n become significant because
description (4)-(5) contains 2n
inequalities, and there exists
n! of possible priorities combinations. For example, for
n=20 it leads to ~106
and 1018
objects, correspondingly.
The algorithm, below is based on some intrinsic features of
the conservation law6
and is very fast, - all it needs is just n
calculation of M/M/m queues.
3.2 Formal Definition of “Fair” (Lexicographically
Optimal) Solution.
As it was already discussed, a measure of
success in reaching vector of performance goals, {Gi} is
vector of performance indices, {pi}, defined by (1).
Ideally, when all goals are reachable, and solution is
ideally “fair” we will have identical, and less than 1
performance indices for all the jobs:
p1 = p2 = … =pn = γl < 1 (5)
But, what should we do in the situation, when some or all
of the goals are not reachable? Tuning/Resource
balancing system are trying to give more system
resource to job which are missing their goal most, and
thus, improving the “worst” performance index, γ1 ; then
when improvement of γ1 is not possible, next most
troubled jobs received their share to improve their
performance index, γ2 and so on. The results of such
activity, - a vector of performance indices, p = {p1 ,..,pn}
also should be evaluated component-wise, when worst
(or biggest) components should be compared first.
Thus, if we have two competing tuning systems,
and in a comparable situation one lead to performance
indices vector, p1
= {p1
1
,…, pn
1
} and another – to a
performance vector p2
= {p1
2
,…, pn
2
} these results should
6 So call submodularity property of function Q(US)
[F1,CY1]
be compared lexicographically.
First, the performance indices should be sorted
in the descending order. Let the corresponding results
are vectors γ1
= {γ1
1
,.., γn
1
} and γ2
= {γ1
2
,…, γn
2
}. If γ1
1
< γ1
2
- the first vector of the performance results γ1
is
“better” or more preferable than γ2
: (γ1
p γ2
),
otherwise, if γ1
1
> γ1
2
the second result is better: (γ2
p γ1
). If γ1
1
=γ1
2
next performance pair of the indices,
γ2
1
,γ2
2
should be compared and so forth, so on.
As an illustration, consider performance vector results
from the Table 2. The minimization of average
response time lead to the performance vector,
p1
=(0.89,0.56,0.36,0.5,1.43) – and after sorting we will
have vector γ1
=(1.43,0.89,0.56,0.5,0.36). The
alternative vector p2
= (1.19, 1.19, 0.5, 1.19) and
γ2
=(1.19, 1.19, 1.19, 1.19, 0.5). Thus, because
γ1
2
=1.19 < γ1
1
=1.43 the second solution is more
preferable.
Our goal, is to find a feasible solution, vector
or queues q={qj} satisfying the conservation law (4)-(5)
and leading to lexicographically minimal performance
indices vector, γ7 .
4. Algorithm Description
The algorithm is based on the following simple
observations8
. Suppose all the jobs are sorted by their
response time goals in the increasing order:
G1 ≤ G2 ≤ … .≤ Gn (6)
Because all the job’s service times are
identical, it is clear that at least part of the time job 1,
which has the most challenging goal, should be
running with the highest priority; correspondingly, job 2
should be running at least some time with priority 1 or
2 and so forth so on.
From another side, because goal G1 is the
most challenging one, it is naturally to expect, that job
1 will have the “worst” performance index, γ1 , job 2 will
either have performance index γ1 or the next worst,γ2
and so on, i.e.:
γ1 ≥ γ2 ≥ … ≥ γn (7)
Suppose, a group of jobs S, has the same
performance index, γ = Rn/Gn , or Ri = pGi ∀j ∈S.
Using Little Law, qi =λj Ri = =γλjGi. Summing up these
7 Amazing, and general fact of submodularity theory, is
that solving the problem of lexicographical optimization is
equivalent to minimization of some quadratic function (in
our case, ∑qj
2
/Gj ) was discovered by Fujishige [F0].
8 A formal proof of the algorithm is presented in [R9]
equalities, we will have:
γS )(/)(/ SSQGq
Sj
jj
Sj
j ϖλ∑∑ ∈∈
==
(8)
where Q(S) and ω(S) – total queue and weighted goal
(8) in the job group S.
Now, all we need – is just to find the right
separation of the set of all jobs, 1,…,n into m subsets S1
, .., Sm , m ≤ n so that jobs from the same group, k
have the same performance index, γk (and γ1 > γ2 > …
>γm ).
Algorithm
Initialization:
First – sort all the jobs by their goals, like (6). Set
m=1, include current job, j:= 1 into the current group m:
Sm ={1}. Set higher groups utilization, Uh := 0 and total
utilization Utot = u1. Evaluate, using M/M/m formula total
queue in the group, Q(S1)=Q(Utot) – Q(Uh)=Q(U1) ; set
ω(S1)=λ1G1 and γ1 := Q(S1)/ω(S1).
Iteration: Repeat, while j < n:
Update current job index, j:=j+1,
First, try if this job j belongs to the current job set
Sm: Tentatively set Utot
~
:= Utot + uj; re-evaluate current
group total queue Qm
~
= Q(Utot
~
) – Q(Uh); ωm
~
:=
ω(Sm)+λjGj and updated performance index p~
=
Qm
~
/ωm
~
.
1 If p~
≥ γm than current job j belong to the current
group, do:
1.a Set Sm := Sm ∪j,γm := p~
, ωm := ωm
~
,
Utot:=Utot
~
and update the respond times for each of the
job, i from the current group, Ri :=γmGi ∀i ∈ Sm.
1.b If m >1 check monotonicity (7): If γm > γm-1, the
groups m and m-1 should be merged (Sm-1:= Sm ∪Sm-1,
ωm-1= ωm-1 +ωm , Qm-1:= Qm +Qm-1, γm-1 := Qm-1/ωm-1,
m:= m-1 ) and this merging operation should be repeated
until γ1,…, γm become a decreasing sequence.
2. If p~
< γm the new group, m=m+1 started:
Uh := Utot;; Utot = Utot
~
, Qm = Q(Utot) – Q(Uh), ωm =λjGj
γm := Qm/ωm, Rj=γmGj .
And, if j < n the iteration repeated.
After algorithm completion, we will have the
lexicographically optimal response times for each job,
and m different values of the performance indices, γk,
k=1,…m.
To illustrate how this algorithm work consider
its application to the example from Table 2.
Table 3 Algorithm iterations results for 5 jobs
j Uj Gj
m γm
Response time
estimations, R1,,…,Rn
1 0.1 0.125 1 0.89 0.11
2 0.1 0.25 2 0.56 0.11, 0.14
3 0.1 0.50 3 0.36 0.11, 0.14, 0.18
4 0.5 0.50 1 1.19 0.15, 0.30, 0.59, 0.59
5 0.1 10.0 2 0.50 0.30, 0.59, 0.59, 0.59, 5.0
In Table 3 presented are utilizations and goals of five
jobs from the Table 29
, as well as the algorithm
iterations results: the values of m – number of groups,
γm – performance index of the current group and
Rj=γiGj- estimated job response times after each
iteration, j. As one can see from this table, until the
iteration with the “big” job, j=4 each job has different
performance index, and each job group contained just
one job: S1={1}, S2={2}, S3={3}. At the iteration with big
job (in bold) all these groups were merged to one
group, S={1,2,3,4} with performance index γ1 ≈ 1.19
at the step 1.b. At the final iteration with
“unchallenging” goal G5 one more job group was
created (γ2 = 0.5).
This example illustrates a few important
properties of splitting all the jobs to the groups with
identical performance indices, S1 ,…, Sm . This splitting
corresponds to the “best” possible priority distribution.
Each job from the group Sk on the average
should be run with higher priority than any job from the
groups Sk+1 , …, Sm . Thus, even if all the jobs from
these groups k+1,..,m will be eliminated from the
system, there is no way to further improve performance
results for the jobs j∈S1 ∪ … ∪ Sk. Besides, this
grouping gives some way of simplification or
aggregation of the original performance problem. Thus,
if instead of n original jobs we consider m aggregated
jobs with the parameters of arrival rate λk
a
and the
goal Gk
a
defined as total and weighted total arrivals
rates and goals:
∑∑∑ ∈∈∈
==
111
/, Sj jSj jj
a
kSj j
a
k GG λλλλ (8)
than solution for the “aggregated “problem will lead to
the same set of performance indices, γ1,…, γm as it
was in the original non-aggregated problem of n jobs.
9 All the 5 jobs are sorted by their goals, so jobs #4 and
#5 are swapped comparing the Table 2.
The presented algorithm leads to the results of
the original paper [BB0], devoted to goal mode
scheduling and motivated by IBM WLM. The major
difference is that this algorithm avoids scanning ~2n
-1
constraints (4)-(5), and thus, it is scalable for big n.
In this algorithm, all the jobs have the same
importance level, but the case of different importance
levels may be easily accommodated in the algorithm.
In case of different – but relatively closed service
times, algorithm become approximate – and can be used
after rescaling jobs parameters. However, in the case
when service rates are significantly different more
general methods [FG8] or further research needed. Also,
this algorithm may be used for more general G/M/m
system with preemption in the case when all the job
classes have the same exponential service-time
distribution10
– but, of course, in this case calculation of
the functions Q(U) may become much more complicated.
5. Tuning vs. Upgrade and other examples.
Using tuning software tools may be an option to
avoid or postpone hardware update. Thus, it is important
to evaluate “limits of tuning” - i.e. can we reach the
performance goals without upgrading the system?
Besides, in the cases when the goals are not
reachable without hardware upgrade, it may be important
to evaluate upgrade + tuning option to evaluate what
should be a minimal hardware upgrade level which
allows meeting the performance goals if system is
perfectly tuned. Answering this question may lead to a
conclusion that the performance goals themselves are
not realistic or are not cost effective.
To simplify the analysis let’s start from the case of
one job, which has response time goal, G, actual response
time, R and total utilization U. Suppose, the goal is not
reached and the performance index, γ = G/ R >1. What
should be the minimal upgrade level υ > 1, such that the
response time goal becomes reachable at a server, which
is υ times faster, than the existing one?
Using the Little’s law and the fact, that the
throughput on the faster server (for open systems) will be
the same, the equation for the upgrade factor, υ will be:
Q( U/υ ) = Q(U ) /γ (9)
In the case when the original and the “upgraded” servers
have the same number of processors, m the equation (9)
may be rewritten as (10):
Q( m, U/υ ) = Q(m, U ) /γ (10)
10 G/M/m with the same average service time satisfies the
conservation law (4)-(5) [CY1].
For the M/M/m queues, it is easy to solve (10)
analytically for small m or numerically for any arbitrary
m. In particular, for m=1 solution of (10) will be:
υ = U + γ (1- U) (11)
In the case of m=2 the performance improvement
factor, υ will be a solution of quadratic equations (ρ =
U/m – per processor utilization):
(υ2
- ρ2
)= γ υ(1-ρ) (12)
It is interesting to analyze solution υ of the equation
(10) as a function of γ . For any combination of the
parameters m, ρ this function υ(γ), is monotonically
increasing. However, its behavior is quite different in
the areas of small and high utilizations. For small
utilization υ ≈ γ - and this reflects the obvious fact that
at small utilization there is no problem with queuing, we
need faster server just to make service time smaller. If
utilization is high (ρ≈1) this dependency is much
weaker (e.g. for m=1,2: ∂υ/∂γ ~ 1-ρ ) which reflect the
fact that at high utilization even relatively small
improvement in the cpu speed can lead to significant
effect.
It is important to mention, that modern
technology sometimes allows avoid “physical” upgrade
– customers can use (and pay) for additional
processors in the period of the pike loads [SB2]. In this
case, when we are trying to solve performance
problems by increasing the processors’ number from
m1 to m2 the analog of (10) will be:
m2 = min{ k| Q(k, U ) ≤ Q(m, U) /γ } (13)
All these speculations were made for the
simplest case of one “aggregated” job in the system.
However, this simple back-on-the-envelope formula (9)
can be used for approximate estimation of the
necessary upgrade, if we aggregate all the jobs, which
do not meet their goals.
Besides, the algorithm, presented in the
previous section allows estimation of a minimal
necessary upgrade level in the case of tuning +
upgrade analysis. Suppose after using the algorithm
from the previous section for n jobs we found, that the
best possible performance indices, which could be
achieved by tuning are : γ1
*
≥ γ2
*
≥ … ≥ γn
*
and
suppose that for k of the jobs their goals are not
reachable, γk
*
>1. In this case we can consider one
aggregated job with total utilization U, and performance
index γ , defined by (14):
∑∑∑ ===
==
k
j
jj
k
j
j
a
k
j
j
a
U
111
/,/ λγλγµλ (14)
Solving the equation (9) for the aggregated parameters,
Ua
, γa
gives a some estimation for the necessary upgrade
factor, υ. If respond times Rj
*
in (14) correspond to the
lexicographically optimal solution, (14) gives a good
estimation for the minimally necessary upgrade level
(see Tables 6, 7: tuning + upgrade section). The high
level of accuracy of aggregation may be explained by the
fact that all the aggregated jobs 1,…,k are not affected by
jobs k+1,…,n which has lower priorities. In the general
case, aggregation gives some reasonable but not very
accurate approximation. Thus, in the example presented
at Table 5, using aggregated formula (14), (11) instead of
the detailed information on the average priority
distributions leads to estimation υ≈1.4 instead of υ≈1.6.
Now, to compare tuning versus hardware
upgrade options let us consider following illustrative
example.
Suppose, we have 3 jobs, running on one
processor CPU and each job has a unit service time.
The parameters of these jobs (called, according to their
utilizations as Light, Medium, and Heavy) are presented
at Table 4. Job response times results in the Table 4
corresponds to the mix of priority ordering, π1={2,1,0},
π2={2,0,1}, π3={0,1,2}, which has the probabilities,
p={0.75, 0.15, 0.1}.
Table 4 Original jobs utilizations and the response times
Resp. Times
Jobs Utilizati
ons
Actua
l
Goals
Light (L) 0.1 10.97 2
Medium(M) 0.2 4.30 4
Heavy (H) 0.45 2.65 6
In the Table 5 we compared two approaches – “tuning”
and the cpu upgrade. Tuning results corresponds to the
lexicographically optimal solution, obtained with the
algorithm from the previous section. As one can see from
Table 5, goals gr
are reachable – system can be tuned in
a way, that all three jobs have the same performance
index, pj ≈ 0.8 (almost 20% better then the goals). At the
same time, reaching the goals using the existing priority
distribution would require almost 60% faster processor,
than the existing one (υ≈1.58).
The significant difference between the existing and the
upgraded cpu (almost 60%) may be explained in this
example by the fact that the original priorities does not
followed to the short job first rule (the most resource
consuming Heavy job, had the highest priority 75% of the
time). Or putting it another way- there is a high (≈97%)
correlation between the goals and the loads.
Table 5 Tuning and upgrade options
comparison for correlated loads and goals
(60% faster CPU), G={2,4,6}
Tuning Upgrade (υ≈1.58)
Jobs
Res-
ponse
Resp/
Goal
Res-
ponse
Resp/Goal
L 1.85 0.81 2.00 1.00
M 3.69 0.81 1.35 0.34
H 4.62 0.81 1.48 0.25
In the case of high correlation between the utilizations
and goals (if all the service times are the same) one can
expect especially significant results from lexicographical
optimization. And the opposite fact is true, too – when
loads and goal are highly negatively correlated potential
effect from tuning may be insignificant.
To illustrate this we compared tuning vs. upgrade for the
same model, but we swapped the goals for Light and
Heavy jobs, making utilizations and goals negatively
(≈-0.97) correlated. In this case of goals G={6,4,2} tuning
is not sufficient (γ ≈1.3) -and we need some upgrade
(υ≈1.08).
Table 6 Tuning and upgrade options
comparison for balanced (negatively
correlated) loads and goals, G={6,4,2}
Tuning +
upgrade
(υ≈1.08)
Upgrade
(υ≈1.21)
Jobs
Resp.
Time
Goals
Res-
ponse
Resp
/Goal
Res-
ponse
Resp/
Goal
L 6 6.0 1.0 4.54 0.76
M 4 4.0 1.0 2.45 0.61
H 2 2.0 1.0 1.99 1.0
As one can see from the Table 6, in the case of
“balanced” goals, when tight goals corresponds to the
small jobs, tuning does not help much: the necessary
upgrade without tuning is ~21% and with tuning ~8%.
This effect will be even more evident for multiprocessor
systems. But, of course the major factor defining the
necessary upgrade level is how challenging are the
goals.
In the Table 7 we presented results when all 3
goals are not reachable, and are much more challenging
then in the previous examples (Gi =1). In this case the
“un-tuned” system need 2.75 times faster processor to
reach their goals, while tuning + upgrade combination
needs much less faster cpu (υ =1.75). Note, that
estimations of the upgrade factor υ (1.08 and 1.75) in te
Tables 6 and 7 were made using the aggregated formula
(10) and (14) using lexicographical solutions proves to be
pretty accurate.
Table 7 Tuning and upgrade options comparison
in the case of challenging goals
Tuning +
upgrade
(υ≈1.75)
Upgrade
(υ≈2.75)
Jobs
Resp.
Time
Goals
Res-
ponse
Resp
/Goal
Res-
ponse
Resp/
Goal
L 1 1.00 1.00 0.66 0.66
M 1 1.00 1.00 0.55 0.55
H 1 1.00 1.00 1.00 1.00
6. Summary
Lexicographical optimization is an important
approach in the analyzing performance problems. This
relatively simple approach reflecting behavior of the
complicated real tuning system allows finding a feasible
solution, in the case when response time goals are
achievable, and lead to fair, lexicographically minimal
solutions, otherwise.
It was also shown that a straightforward
optimization approach, based on the weighted average of
the response times is not very helpful in analyzing such
systems.
Also presented are simplified –back on the
envelope formula for estimation necessary level of
upgrade in the cases, when performance goals are not
reachable.
References
[GNT4] Leonidas Georgiadis, Christos Nikolaou,
Alexander Thomasian “A fair workload allocation policy
for heterogeneous systems”, Journal of Parallel and
Distributed Computing v.64, Issue 4, April 2004, pp
507-519.
[L9] Luss, H. On equitable resource allocation
problems: a lexicographic mini-max approach.
Operations Research, v. 47., No. 3, pp.361-376, 1999.
[BGTV3] Bhatacharya, P.P, L.Georgiadis, P.Tsoucas,
I.Viniotis Adaptive Lexicographic Optimization in Multi-
class M/GI/1 Queues, Mathematics of Operations
Research, v. 18, No3, 1993, pp705-740..
[LK5] Leonard Kleinrock, “Queueing Systems”, v.1, v.2,
1975
[GM0] Goffman, E. and Mitrani, I. “A Characterization
of Waiting Time Performance Realizable by Single
Server Queues”, Operations Research, v28 (1980),
810-821.
[CY1] Chen, H., Yao, D. Fundamentals of Queuing
Networks, Springr, 2001
[BD0] Bolker, E., Ding, Y., On the Performance Impact
of Fair Share Scheduling. Proc CMG 2000), pp.71-81.
[BB9] Ethan Bolker, Jeff Buzen “Goal Mode: Part 1 -
Theory”, CMG Transactions, V96, May 1999, pp.9-15.
[SBG9] Annie Shum, Jeff Buzen, Boris Ginis, “Goal
Mode: Part 2- Practice”, CMG Transactions, V96, May
1999, pp.16-21.
[IBM00] IBM AIX V4.3.3 Workload Manager. Technical
References. February 2000 Update.
[F1] Fujishige, S., Submodular Functions and
Optimization, Annals of Discrete Optimization, v47, 1991,
270 p.
[R9] Rikun A. “A Polynomial Algorithm for Evaluation
Achievable Performance Solutions for Multi-class
Queues”, submitted for publication.
[FG8] Federgruen, A., Groenevelt H., Characterization
and Optimization of Achievable Performance in General
Queuing Systems, Operations Research, v. 36, No. 5,
1988, pp 733-741
[SB2] Shum, A., Buzen J. “Industry-wide Implications of
the ‘Capacity on Demand’ & ‘Pay-As-You-Go’
Phenomena: Intertwining of IT budget and capacity
planning”, In: J. Of Computer Resource Management,
2002, p.66-88.

More Related Content

What's hot

Learning scheduler parameters for adaptive preemption
Learning scheduler parameters for adaptive preemptionLearning scheduler parameters for adaptive preemption
Learning scheduler parameters for adaptive preemptioncsandit
 
Optimization of Patrol Manpower Allocation Using Goal Programming Approach -A...
Optimization of Patrol Manpower Allocation Using Goal Programming Approach -A...Optimization of Patrol Manpower Allocation Using Goal Programming Approach -A...
Optimization of Patrol Manpower Allocation Using Goal Programming Approach -A...IJERA Editor
 
A tricky task scheduling technique to optimize time cost and reliability in m...
A tricky task scheduling technique to optimize time cost and reliability in m...A tricky task scheduling technique to optimize time cost and reliability in m...
A tricky task scheduling technique to optimize time cost and reliability in m...eSAT Publishing House
 
EFFICIENT SCHEDULING STRATEGY USING COMMUNICATION AWARE SCHEDULING FOR PARALL...
EFFICIENT SCHEDULING STRATEGY USING COMMUNICATION AWARE SCHEDULING FOR PARALL...EFFICIENT SCHEDULING STRATEGY USING COMMUNICATION AWARE SCHEDULING FOR PARALL...
EFFICIENT SCHEDULING STRATEGY USING COMMUNICATION AWARE SCHEDULING FOR PARALL...ijdpsjournal
 
A FUZZY MATHEMATICAL MODEL FOR PEFORMANCE TESTING IN CLOUD COMPUTING USING US...
A FUZZY MATHEMATICAL MODEL FOR PEFORMANCE TESTING IN CLOUD COMPUTING USING US...A FUZZY MATHEMATICAL MODEL FOR PEFORMANCE TESTING IN CLOUD COMPUTING USING US...
A FUZZY MATHEMATICAL MODEL FOR PEFORMANCE TESTING IN CLOUD COMPUTING USING US...ijseajournal
 
International Journal of Computational Engineering Research(IJCER)
International Journal of Computational Engineering Research(IJCER)International Journal of Computational Engineering Research(IJCER)
International Journal of Computational Engineering Research(IJCER)ijceronline
 
A lognormal reliability design model
A lognormal reliability design modelA lognormal reliability design model
A lognormal reliability design modeleSAT Journals
 
Some Studies on Multistage Decision Making Under Fuzzy Dynamic Programming
Some Studies on Multistage Decision Making Under Fuzzy Dynamic ProgrammingSome Studies on Multistage Decision Making Under Fuzzy Dynamic Programming
Some Studies on Multistage Decision Making Under Fuzzy Dynamic ProgrammingWaqas Tariq
 
Task scheduling methodologies for high speed computing systems
Task scheduling methodologies for high speed computing systemsTask scheduling methodologies for high speed computing systems
Task scheduling methodologies for high speed computing systemsijesajournal
 
Smooth-and-Dive Accelerator: A Pre-MILP Primal Heuristic applied to Scheduling
Smooth-and-Dive Accelerator: A Pre-MILP Primal Heuristic applied to SchedulingSmooth-and-Dive Accelerator: A Pre-MILP Primal Heuristic applied to Scheduling
Smooth-and-Dive Accelerator: A Pre-MILP Primal Heuristic applied to SchedulingAlkis Vazacopoulos
 

What's hot (12)

Learning scheduler parameters for adaptive preemption
Learning scheduler parameters for adaptive preemptionLearning scheduler parameters for adaptive preemption
Learning scheduler parameters for adaptive preemption
 
Optimization of Patrol Manpower Allocation Using Goal Programming Approach -A...
Optimization of Patrol Manpower Allocation Using Goal Programming Approach -A...Optimization of Patrol Manpower Allocation Using Goal Programming Approach -A...
Optimization of Patrol Manpower Allocation Using Goal Programming Approach -A...
 
A tricky task scheduling technique to optimize time cost and reliability in m...
A tricky task scheduling technique to optimize time cost and reliability in m...A tricky task scheduling technique to optimize time cost and reliability in m...
A tricky task scheduling technique to optimize time cost and reliability in m...
 
EFFICIENT SCHEDULING STRATEGY USING COMMUNICATION AWARE SCHEDULING FOR PARALL...
EFFICIENT SCHEDULING STRATEGY USING COMMUNICATION AWARE SCHEDULING FOR PARALL...EFFICIENT SCHEDULING STRATEGY USING COMMUNICATION AWARE SCHEDULING FOR PARALL...
EFFICIENT SCHEDULING STRATEGY USING COMMUNICATION AWARE SCHEDULING FOR PARALL...
 
A FUZZY MATHEMATICAL MODEL FOR PEFORMANCE TESTING IN CLOUD COMPUTING USING US...
A FUZZY MATHEMATICAL MODEL FOR PEFORMANCE TESTING IN CLOUD COMPUTING USING US...A FUZZY MATHEMATICAL MODEL FOR PEFORMANCE TESTING IN CLOUD COMPUTING USING US...
A FUZZY MATHEMATICAL MODEL FOR PEFORMANCE TESTING IN CLOUD COMPUTING USING US...
 
International Journal of Computational Engineering Research(IJCER)
International Journal of Computational Engineering Research(IJCER)International Journal of Computational Engineering Research(IJCER)
International Journal of Computational Engineering Research(IJCER)
 
A lognormal reliability design model
A lognormal reliability design modelA lognormal reliability design model
A lognormal reliability design model
 
Some Studies on Multistage Decision Making Under Fuzzy Dynamic Programming
Some Studies on Multistage Decision Making Under Fuzzy Dynamic ProgrammingSome Studies on Multistage Decision Making Under Fuzzy Dynamic Programming
Some Studies on Multistage Decision Making Under Fuzzy Dynamic Programming
 
Task scheduling methodologies for high speed computing systems
Task scheduling methodologies for high speed computing systemsTask scheduling methodologies for high speed computing systems
Task scheduling methodologies for high speed computing systems
 
Smooth-and-Dive Accelerator: A Pre-MILP Primal Heuristic applied to Scheduling
Smooth-and-Dive Accelerator: A Pre-MILP Primal Heuristic applied to SchedulingSmooth-and-Dive Accelerator: A Pre-MILP Primal Heuristic applied to Scheduling
Smooth-and-Dive Accelerator: A Pre-MILP Primal Heuristic applied to Scheduling
 
MOMENTUM and ENERGY
MOMENTUM and ENERGYMOMENTUM and ENERGY
MOMENTUM and ENERGY
 
Ce25481484
Ce25481484Ce25481484
Ce25481484
 

Viewers also liked

Viewers also liked (11)

AMND16 Roses
AMND16 RosesAMND16 Roses
AMND16 Roses
 
AMND20 Fairies
AMND20 FairiesAMND20 Fairies
AMND20 Fairies
 
AMND06 Dreams
AMND06 DreamsAMND06 Dreams
AMND06 Dreams
 
fema 3
fema 3fema 3
fema 3
 
Piano pickup
Piano pickupPiano pickup
Piano pickup
 
Gestão De Ativos Imobilizados & Tecnologia RFID
Gestão De Ativos Imobilizados & Tecnologia RFIDGestão De Ativos Imobilizados & Tecnologia RFID
Gestão De Ativos Imobilizados & Tecnologia RFID
 
AMND21 Craftsmen
AMND21 CraftsmenAMND21 Craftsmen
AMND21 Craftsmen
 
fema 1
fema 1fema 1
fema 1
 
Multipliers of resilience
Multipliers of resilienceMultipliers of resilience
Multipliers of resilience
 
Mentaal kapitaal
Mentaal kapitaalMentaal kapitaal
Mentaal kapitaal
 
UK government digital transformation infographic
UK government digital transformation infographicUK government digital transformation infographic
UK government digital transformation infographic
 

Similar to Queuing model estimating response time goals feasibility_CMG_Proc_2009_9097

A Hybrid Evolutionary Optimization Model for Solving Job Shop Scheduling Prob...
A Hybrid Evolutionary Optimization Model for Solving Job Shop Scheduling Prob...A Hybrid Evolutionary Optimization Model for Solving Job Shop Scheduling Prob...
A Hybrid Evolutionary Optimization Model for Solving Job Shop Scheduling Prob...iosrjce
 
Stochastic scheduling
Stochastic schedulingStochastic scheduling
Stochastic schedulingSSA KPI
 
Schedulability of Rate Monotonic Algorithm using Improved Time Demand Analysi...
Schedulability of Rate Monotonic Algorithm using Improved Time Demand Analysi...Schedulability of Rate Monotonic Algorithm using Improved Time Demand Analysi...
Schedulability of Rate Monotonic Algorithm using Improved Time Demand Analysi...IJECEIAES
 
A New Approach for Job Scheduling Using Hybrid GA-ST Optimization-Crimson Pub...
A New Approach for Job Scheduling Using Hybrid GA-ST Optimization-Crimson Pub...A New Approach for Job Scheduling Using Hybrid GA-ST Optimization-Crimson Pub...
A New Approach for Job Scheduling Using Hybrid GA-ST Optimization-Crimson Pub...CrimsonPublishersRDMS
 
Solution manual real time system bt jane w s liu solution manual
Solution manual real time system bt jane w s liu solution manual   Solution manual real time system bt jane w s liu solution manual
Solution manual real time system bt jane w s liu solution manual neeraj7svp
 
Full solution manual real time system by jane w s liu solution manual
Full solution manual real time system by jane w s liu solution manualFull solution manual real time system by jane w s liu solution manual
Full solution manual real time system by jane w s liu solution manualneeraj7svp
 
Full solution manual real time system by jane w s liu solution manual
Full solution manual real time system by jane w s liu solution manualFull solution manual real time system by jane w s liu solution manual
Full solution manual real time system by jane w s liu solution manualneeraj7svp
 
An Optimum Time Quantum Using Linguistic Synthesis for Round Robin Cpu Schedu...
An Optimum Time Quantum Using Linguistic Synthesis for Round Robin Cpu Schedu...An Optimum Time Quantum Using Linguistic Synthesis for Round Robin Cpu Schedu...
An Optimum Time Quantum Using Linguistic Synthesis for Round Robin Cpu Schedu...ijsc
 
AN OPTIMUM TIME QUANTUM USING LINGUISTIC SYNTHESIS FOR ROUND ROBIN CPU SCHEDU...
AN OPTIMUM TIME QUANTUM USING LINGUISTIC SYNTHESIS FOR ROUND ROBIN CPU SCHEDU...AN OPTIMUM TIME QUANTUM USING LINGUISTIC SYNTHESIS FOR ROUND ROBIN CPU SCHEDU...
AN OPTIMUM TIME QUANTUM USING LINGUISTIC SYNTHESIS FOR ROUND ROBIN CPU SCHEDU...ijsc
 
mrcspbayraksan
mrcspbayraksanmrcspbayraksan
mrcspbayraksanYifan Liu
 
A MULTI-POPULATION BASED FROG-MEMETIC ALGORITHM FOR JOB SHOP SCHEDULING PROBLEM
A MULTI-POPULATION BASED FROG-MEMETIC ALGORITHM FOR JOB SHOP SCHEDULING PROBLEMA MULTI-POPULATION BASED FROG-MEMETIC ALGORITHM FOR JOB SHOP SCHEDULING PROBLEM
A MULTI-POPULATION BASED FROG-MEMETIC ALGORITHM FOR JOB SHOP SCHEDULING PROBLEMacijjournal
 
Schedulability Analysis for a Combination of Non-Preemptive Strict Periodic T...
Schedulability Analysis for a Combination of Non-Preemptive Strict Periodic T...Schedulability Analysis for a Combination of Non-Preemptive Strict Periodic T...
Schedulability Analysis for a Combination of Non-Preemptive Strict Periodic T...IOSR Journals
 
jurnal of occupational safety and health
jurnal of occupational safety and healthjurnal of occupational safety and health
jurnal of occupational safety and healthSiti Mastura
 
Temporal workload analysis and its application to power aware scheduling
Temporal workload analysis and its application to power aware schedulingTemporal workload analysis and its application to power aware scheduling
Temporal workload analysis and its application to power aware schedulingijesajournal
 
Temporal workload analysis and its application to power aware scheduling
Temporal workload analysis and its application to power aware schedulingTemporal workload analysis and its application to power aware scheduling
Temporal workload analysis and its application to power aware schedulingijesajournal
 
SWARM INTELLIGENCE SCHEDULING OF SOFT REAL-TIME TASKS IN HETEROGENEOUS MULTIP...
SWARM INTELLIGENCE SCHEDULING OF SOFT REAL-TIME TASKS IN HETEROGENEOUS MULTIP...SWARM INTELLIGENCE SCHEDULING OF SOFT REAL-TIME TASKS IN HETEROGENEOUS MULTIP...
SWARM INTELLIGENCE SCHEDULING OF SOFT REAL-TIME TASKS IN HETEROGENEOUS MULTIP...ecij
 

Similar to Queuing model estimating response time goals feasibility_CMG_Proc_2009_9097 (20)

A Hybrid Evolutionary Optimization Model for Solving Job Shop Scheduling Prob...
A Hybrid Evolutionary Optimization Model for Solving Job Shop Scheduling Prob...A Hybrid Evolutionary Optimization Model for Solving Job Shop Scheduling Prob...
A Hybrid Evolutionary Optimization Model for Solving Job Shop Scheduling Prob...
 
C017611624
C017611624C017611624
C017611624
 
Stochastic scheduling
Stochastic schedulingStochastic scheduling
Stochastic scheduling
 
Schedulability of Rate Monotonic Algorithm using Improved Time Demand Analysi...
Schedulability of Rate Monotonic Algorithm using Improved Time Demand Analysi...Schedulability of Rate Monotonic Algorithm using Improved Time Demand Analysi...
Schedulability of Rate Monotonic Algorithm using Improved Time Demand Analysi...
 
A New Approach for Job Scheduling Using Hybrid GA-ST Optimization-Crimson Pub...
A New Approach for Job Scheduling Using Hybrid GA-ST Optimization-Crimson Pub...A New Approach for Job Scheduling Using Hybrid GA-ST Optimization-Crimson Pub...
A New Approach for Job Scheduling Using Hybrid GA-ST Optimization-Crimson Pub...
 
Solution manual real time system bt jane w s liu solution manual
Solution manual real time system bt jane w s liu solution manual   Solution manual real time system bt jane w s liu solution manual
Solution manual real time system bt jane w s liu solution manual
 
Full solution manual real time system by jane w s liu solution manual
Full solution manual real time system by jane w s liu solution manualFull solution manual real time system by jane w s liu solution manual
Full solution manual real time system by jane w s liu solution manual
 
Full solution manual real time system by jane w s liu solution manual
Full solution manual real time system by jane w s liu solution manualFull solution manual real time system by jane w s liu solution manual
Full solution manual real time system by jane w s liu solution manual
 
K017446974
K017446974K017446974
K017446974
 
Table
TableTable
Table
 
An Optimum Time Quantum Using Linguistic Synthesis for Round Robin Cpu Schedu...
An Optimum Time Quantum Using Linguistic Synthesis for Round Robin Cpu Schedu...An Optimum Time Quantum Using Linguistic Synthesis for Round Robin Cpu Schedu...
An Optimum Time Quantum Using Linguistic Synthesis for Round Robin Cpu Schedu...
 
AN OPTIMUM TIME QUANTUM USING LINGUISTIC SYNTHESIS FOR ROUND ROBIN CPU SCHEDU...
AN OPTIMUM TIME QUANTUM USING LINGUISTIC SYNTHESIS FOR ROUND ROBIN CPU SCHEDU...AN OPTIMUM TIME QUANTUM USING LINGUISTIC SYNTHESIS FOR ROUND ROBIN CPU SCHEDU...
AN OPTIMUM TIME QUANTUM USING LINGUISTIC SYNTHESIS FOR ROUND ROBIN CPU SCHEDU...
 
mrcspbayraksan
mrcspbayraksanmrcspbayraksan
mrcspbayraksan
 
A MULTI-POPULATION BASED FROG-MEMETIC ALGORITHM FOR JOB SHOP SCHEDULING PROBLEM
A MULTI-POPULATION BASED FROG-MEMETIC ALGORITHM FOR JOB SHOP SCHEDULING PROBLEMA MULTI-POPULATION BASED FROG-MEMETIC ALGORITHM FOR JOB SHOP SCHEDULING PROBLEM
A MULTI-POPULATION BASED FROG-MEMETIC ALGORITHM FOR JOB SHOP SCHEDULING PROBLEM
 
Schedulability Analysis for a Combination of Non-Preemptive Strict Periodic T...
Schedulability Analysis for a Combination of Non-Preemptive Strict Periodic T...Schedulability Analysis for a Combination of Non-Preemptive Strict Periodic T...
Schedulability Analysis for a Combination of Non-Preemptive Strict Periodic T...
 
K017126670
K017126670K017126670
K017126670
 
jurnal of occupational safety and health
jurnal of occupational safety and healthjurnal of occupational safety and health
jurnal of occupational safety and health
 
Temporal workload analysis and its application to power aware scheduling
Temporal workload analysis and its application to power aware schedulingTemporal workload analysis and its application to power aware scheduling
Temporal workload analysis and its application to power aware scheduling
 
Temporal workload analysis and its application to power aware scheduling
Temporal workload analysis and its application to power aware schedulingTemporal workload analysis and its application to power aware scheduling
Temporal workload analysis and its application to power aware scheduling
 
SWARM INTELLIGENCE SCHEDULING OF SOFT REAL-TIME TASKS IN HETEROGENEOUS MULTIP...
SWARM INTELLIGENCE SCHEDULING OF SOFT REAL-TIME TASKS IN HETEROGENEOUS MULTIP...SWARM INTELLIGENCE SCHEDULING OF SOFT REAL-TIME TASKS IN HETEROGENEOUS MULTIP...
SWARM INTELLIGENCE SCHEDULING OF SOFT REAL-TIME TASKS IN HETEROGENEOUS MULTIP...
 

Queuing model estimating response time goals feasibility_CMG_Proc_2009_9097

  • 1. Queuing model estimating response time goals feasibility Anatoliy Rikun, PhD Full Capture Solutions, Inc. Anatoliy_Rikun@yahoo.com Customers expect a fast response from their systems, but what are the limits of the system? The paper analyzes a multi-class queuing model, which evaluates if jobs response time goals are reachable, and if not, what would be, in some sense, optimal alternatives. This queuing optimization model may be useful to estimate the limits of tuning, or to compare alternatives between hardware upgrade and tuning, or estimate a minimal necessary level of the hardware upgrade to reach a given set of response time goals. 1. Introduction Response time is important characteristics of system performance, and response time metrics may be a part of service level agreement. Suppose we have a set of workload competing for system resources with known response time targets and known levels of system resources consumption. Based on this information, how to estimate, if these performance goals are achievable? And, if the performance goals are not achievable, what is the “best” possible solution we can get from the system? This paper is organized as follows. Section 2 describes the “fairness” or “equitability” concept in finding a good substitution in the case of unachievable performance goals [GNT4, L9, BGTV3]. This concept somehow reflects the logic used in IBM WLM Goal Mode [BB9, SBG9], and it leads to lexicographically minimal response times distribution in the cases, when the response time goals are not reachable. Section 3 gives more formal model definition and necessary assumptions. A greedy algorithm is presented in Section 4. Section 5 gives some examples, and estimation for comparison of tuning and upgrade options and Sections 6 summarizes this paper. 2. Concept of Fairness, and What Can Be a Reasonable Substitution For Unreachable Goals? Let us consider a CPU-bounded queuing system of n jobs classes. For the sake of simplicity, suppose all the job classes have the same importance level, the same service times, but may have different CPU utilizations, {Ui}, and different performance goals, {Gi}, i=1,…,n. The job response time goals may be defined in different ways: in may be, in particular, average response time goals, deadlines, percentile response time goals, execution velocity and others. Even though in this paper only average response time goals are considered, some of the conclusions can be extended to the other metrics, too. To evaluate system performance, it is natural to use relative performance, or performance index metric [SBG9], pi : pi = Ri/Gi (1) Thus, if all the performance indices are less or equal to 1, then each job meet its performance goal. Now, suppose that in this system, goals {Gi} are not reachable, i.e. for any feasible response time vector, R={Ri} ∈ X , at least one job average response time exceeds its goal, Gi . This may be expressed in the term of the “worst” case performance index, γl : γ1 = max{ pi | i=1,…n} > 1 (2) What could be a best possible response time distribution in the case, when system capacity is not sufficient to reach all the performance goals? A traditional approach would be to minimize average – or somehow weighted average over all job classes’ response times: Ravg * = min { ∑i=1 n cjRj | R ∈ X } (3) Here ci – some job classes weighted coefficients1 , X – all reachable combinations of the response time vector R, 1 In the calculations below ci =1/Gi i.e. (3) minimizes the average performance index. But ci may be a cost associated with delay for job class i.
  • 2. and Ravg * - corresponds to a minimal possible weighted average response time, which can be attained in the system. A well known analytical approach to solving problems like (3), so called cµ rule is very simple: it recommends giving higher utilization to the jobs with smaller ratio Uj/cj (“small” job first) 2 . But, in spite of the beauty of cµ rule, the underlying optimization model (3) may lead to undesired results in the goal mode context. The reason is that minimal average response time approach may lead to solutions, when “small” jobs outperforms their goals at the expense of “big” jobs, who do not meet their goals. An example of this situation considered in the Table 1. In that example, all the jobs have the same response time goals, 1 sec., but, the optimal average response time proves to be 0.17 for small jobs at the expense of a big job, which has response time 1.7 sec. At the same time, all the goals are reachable in this example. Thus, minimization of weighted average response time leads to un-fair solutions. At the same time, we need to reach all the response time goals, or if this is not possible, to approach all of them, as close as possible. In particular [IBM00], IBM WLM tries to make all the workloads within a given importance level to have similar performance index, and when an important workload is missing its goal WLM tries to provide it with the additional resources. What this mean mathematically is called a mini-max or lexicographical optimization [BB03 , GNT4, BGTV3, L9]. In other words, we are trying to reach a solution, when all the jobs reach their goals, and have the same performance index, γ. If this is not possible, jobs which miss their goal most (and have the highest performance index, γ.1) receive the highest priority; then we are trying to reach the same performance index, γ.2 for the rest of the system; if this not possible – again most troubled jobs in the second group receive the highest priority and so on. The algorithm details with simple calculations are presented in section 4. A comparison between the lexicographical optimization, mimicking IBM WLM behavior, and the weighted average response time minimization, is 2 cµ rule lead to optimal solution in (3) if feasible area X satisfies to the conservation law [CY1]. In case of different service times and identical weights, cj the cµ rule recommend short-job-first (SJF) priority. 3 Prof. E. Bolker and Prof. J. Buzen, in their paper did not use explicitly this terminology, but de-facto their goal mode modeling, which was inspired by IBM WLM is equivalent to the lexicographical optimization, and all the results presented at Tables 1, 2 could be received using their approach, too. presented in the Tables 1, 2 for a simple example of N=5 jobs. In this example the corresponding weight factors in the “average response time minimization” model (3) are in inverse proportion to the response time goals, ci = 1/Gj – in this way jobs with more challenging response time goals would have higher weight in the optimization. This approach leads to the minimal average performance index. In the example below, the first 4 jobs are “small” - each of them has utilization 0.1 and the last job is a “big” one – its utilization is 0.5 and we consider M/M/1 queuing preemptive priority system. Let all the jobs have the same response time goals, Gi = 1.0 sec, i=1,…,5 and their service times, are the same, Si =0.1 sec, too: Table 1 Minimal Average vs. “Fair” Solution when response time goals are reachable. Job Util. & Goals Minimal Ave- rage Response Time Lexicographi- cally Minimal Resp. Time Job # Ui Gi Ri avg pi avg Ri * pi * 1 0.1 1 0.17 0.17 1 1 2 0.1 1 0.17 0.17 1 1 3 0.1 1 0.17 0.17 1 1 4 0.1 1 0.17 0.17 1 1 5 0.5 1 1.67 1.67 1 1 R avg 0.47 0.47 1 1 γl * 1.67 1 If all the response time goals are identical, the solution of (3) is defined by “small job first” priority distribution: πS =(1,1,1,1,2) – which recommends running small jobs with the highest priority 1, and gives lower priority to the big job (#5) . As a result, small jobs response times is ten times faster than the big job response. As a result, response time for small jobs (0.17) is more than 5 times better than their goals, while big job response time is almost 70% greater than its goal. In some situation, this kind of “optimization” is not acceptable, because, for example a user may not feel the difference between system responses 0.3 and 0.1 sec but may be sensitive if system response is 3 sec instead of 1 sec. At the same time, all the goals in this example are reachable – and the algorithm of lexicographical minimization, which will be presented in section 4 can find it. To check this, consider the response time vector, which corresponds to the “big job first” priority rule, πB =(2,1,1,1,1). In this case the big job response time R5 = S5/((1- U5) = 0.1/0.5 =0.2, and for small jobs, Ri = Si/((1- U5)▪ 1- ∑i=1 5 Uj))=0.1/=2.0, j=1,..4. Thus, if we use priority
  • 3. πS 54% of the time and use the πB priority rule 45% of the time, the average response time in the system will match the response time goals for each job. In the last two rows of the Table 1 we presented weighted average respond times and the “worst case” performance index. As expected, minimal weighted average response time (R avg = 0.47) is smaller that the weighted average for “fair” solution (1), but its “most missing goal “ index γl is worse: γl * = 1.67 >1. Now, consider the case when the performance goals are not reachable. In the Table 2, presented below, all job’s utilizations and service times are the same as in the previous case, but their goals are different. In this case the average response time minimization approach suggest using priority π= (1,2,3,5,4) - which corresponds to sorting of 1/UiGi in the decreasing order. Even though in this case all the response times are quite different and are not reachable, one can see that lexicographical optimization leads to a “fair” solution, when jobs {1,2,3,5} are all missing their goals by 19%, and job 4 significantly exceed its goal4 Table 2 Minimal Average vs. “Fair” Solution when response time goals are NOT reachable. Job Util. & Goals Minimal Ave- rage Response Time Lexicograph. Minimal Resp. Time Job # Ui Gi Ri avg pi avg Ri * pi * 1 0.1 0.125 0.11 0.889 0.15 1.185 2 0.1 0.250 0.14 0.556 0.30 1.185 3 0.1 0.500 0.18 0.357 0.59 1.185 4 0.1 10.00 5.00 0.500 5.00 0.500 5 0.5 0.50 0.71 1.429 0.59 1.185 R avg 1.67 0.47 1.33 1.05 γl * 1.43 1.19 At the same time, the average response time solution leads to quite different performance indices: for jobs with challenging goals, the small jobs’ response times are 11-64% less than their goals, while the big job exceeds its goal by 43%. 4 Job 4 exceeds its goal just because its goal G4 = 10 sec is not restrictive, at all (R4 = 5 in both solutions, and it corresponds to job 4 running with the lowest priority). 3. Model Definition. 3.1 Conservation Law and two equivalent approaches to describe the set of achievable solutions. Our goal is to find algorithm which could evaluate if respond time goals are reachable, and if they are not reachable, this algorithm should suggest a solution which would be (in relative terms) as closed to the goals as possible. However, the problem of modeling priority queuing systems in general case is very complicated. It become significantly more tractable in the cases when so called conservation law works. There exist numerous different types of formulation for conservation law for different types of queues [CY1, BGTV3, LK5, GM0, BB9, FG8]. For example, if we have preemptive priority M/M/m queue and all jobs have the same service time, changing priorities does not change total queue (or average response time) in the system. Intuitively, this sounds obvious – because higher priority and shorter response time for one job would lead to comparable delay for other jobs. From another side, if service times for different job classes are significantly different, giving higher priority to shorter jobs would lead to smaller average queue than, e.g. FIFO, and the conservation law does not work [BB9]. There are a few general conditions, which are necessary for conservation law. For example, server performance should be the same over the time; server is not allowed to be idle when there are jobs waiting to be served. Besides, scheduling should not be anticipating- i.e. should not be based on the service time requirements of the customers [CY1]. It should be clarified, that the restriction to analyze only conservative queuing models is not related to IBM Workload Manager logic. We need the conservation law just to justify algorithm below. This algorithm is precise if conservation law works, and leads to an approximate solution in the case of moderate deviation from this law. As it is noted in [BB9], “..normally, conservation law is quite robust, meaning that it likely to be very close to correct, even when assumption of equal (average) service times across is violated. Time slicing and other preemption mechanisms also contribute to the robustness of the conservation law by reducing the likelihood of very long processing burst”. As an illustration of the conservation law in action for quite different resource management tools, one can look at Fig. 3, 10, 13 in [BD0] for measured total queues5 for SUN SRM, HP’s Process Resource Manager and IBM’s Workload Manager for AIX. In all these cases, different workload received different CPU’s shares, but the resulting total queue was approximately constant independently on which of workloads received bigger share. 5 At [BD0] authors presented “weighted average response time”, which is proportional to the total queue.
  • 4. Now, consider description of all achievable performance states for a system of n job classes. Let vector q={qi} denotes all job’s average queues, µ - their service rates, λ ={λj} – arrival rates, u={ui} –utilizations, and R={Ri} – response time vector. Let N={1,…n} denotes the set of all jobs, and for each subset of job classes, S ⊆ N we denote the total utilization of the jobs in this group as ∑∈ = Sj jS uU , and QS = Q(US) is the M/M/m queue corresponding to this utilization and FCFS priority scheduling. According to the conservation law, the set X of all feasible combinations of queues may be described by the set of 2n -1 inequalities (4) and equity (5), for each possible subset S of all jobs: ∑∈ ≥ Sj Sj Qq , (4) And, for the total queue in the system, QN – when in (4) S=N this inequality become an equity: ∑∈ = Nj Nj Qq , (5) This amazingly simple description of all possible states of the queuing system is made possible by conservation law. The inequalities (4)-(5) have very simple explanation: for any group of jobs, S – the minimal possible total queue in this group ∑qj = QS - corresponds to the case, when all the jobs from this group are running with the highest priority. Due to the preemptive scheduling, their total queue is not affected by any other jobs in the system and may be calculated as FCFS priority scheduling queue and is defined by total utilizations for the jobs in this group, US. In all the other cases when some the other jobs j ∉S are running with the same or higher priorities as jobs from group S, they affect the total queue in the group S and we have strict inequality: ∑qj > QS. Combining these two observations, we have the inequality (4). Let’s illustrate this approach with an example of n=2 jobs, which have utilizations u=(0.25, 0.50) on 1- processor CPU . The total queue in the system, QN = (u1+u2)/(1-u1 –u2) = 3; if job 1 is running with highest priority, its queue will Q1 = u1//(1-u1) = 0.33. When job 2 has the highest priority, its queue will be: Q2 = u2//(1-u2) = 1. Thus, all possible states q=(q1, q2) in this system are described by system of 3 inequality: q1 ≥ Q1 q2 ≥ Q2 q1 + q2 = QN For example, in this system queue state (q1, q2) =(1.5, 1.5) is feasible because all the inequalities above are fulfilled. Of course in the case of two jobs, it is clear in this case how to reach this state: job 1 should be running with high priority (QN - q1 –Q2) / ( QN -Q1-Q2) ≈ 30% of the time, and the rest, 70% of the time it should have low priority. This approach of representation of all possible states of queuing system as a combination of queues, corresponding to static priority rules was used in the pioneering paper [GM0], and for conservative queuing systems it is equivalent to the representation (4)-(5) [BB9, CY1]. But, in any case whether the feasibility area is described by inequalities like (4)-(5), or by using priority, the number of inequalities , or possible priority states grows enormously when n become significant because description (4)-(5) contains 2n inequalities, and there exists n! of possible priorities combinations. For example, for n=20 it leads to ~106 and 1018 objects, correspondingly. The algorithm, below is based on some intrinsic features of the conservation law6 and is very fast, - all it needs is just n calculation of M/M/m queues. 3.2 Formal Definition of “Fair” (Lexicographically Optimal) Solution. As it was already discussed, a measure of success in reaching vector of performance goals, {Gi} is vector of performance indices, {pi}, defined by (1). Ideally, when all goals are reachable, and solution is ideally “fair” we will have identical, and less than 1 performance indices for all the jobs: p1 = p2 = … =pn = γl < 1 (5) But, what should we do in the situation, when some or all of the goals are not reachable? Tuning/Resource balancing system are trying to give more system resource to job which are missing their goal most, and thus, improving the “worst” performance index, γ1 ; then when improvement of γ1 is not possible, next most troubled jobs received their share to improve their performance index, γ2 and so on. The results of such activity, - a vector of performance indices, p = {p1 ,..,pn} also should be evaluated component-wise, when worst (or biggest) components should be compared first. Thus, if we have two competing tuning systems, and in a comparable situation one lead to performance indices vector, p1 = {p1 1 ,…, pn 1 } and another – to a performance vector p2 = {p1 2 ,…, pn 2 } these results should 6 So call submodularity property of function Q(US) [F1,CY1]
  • 5. be compared lexicographically. First, the performance indices should be sorted in the descending order. Let the corresponding results are vectors γ1 = {γ1 1 ,.., γn 1 } and γ2 = {γ1 2 ,…, γn 2 }. If γ1 1 < γ1 2 - the first vector of the performance results γ1 is “better” or more preferable than γ2 : (γ1 p γ2 ), otherwise, if γ1 1 > γ1 2 the second result is better: (γ2 p γ1 ). If γ1 1 =γ1 2 next performance pair of the indices, γ2 1 ,γ2 2 should be compared and so forth, so on. As an illustration, consider performance vector results from the Table 2. The minimization of average response time lead to the performance vector, p1 =(0.89,0.56,0.36,0.5,1.43) – and after sorting we will have vector γ1 =(1.43,0.89,0.56,0.5,0.36). The alternative vector p2 = (1.19, 1.19, 0.5, 1.19) and γ2 =(1.19, 1.19, 1.19, 1.19, 0.5). Thus, because γ1 2 =1.19 < γ1 1 =1.43 the second solution is more preferable. Our goal, is to find a feasible solution, vector or queues q={qj} satisfying the conservation law (4)-(5) and leading to lexicographically minimal performance indices vector, γ7 . 4. Algorithm Description The algorithm is based on the following simple observations8 . Suppose all the jobs are sorted by their response time goals in the increasing order: G1 ≤ G2 ≤ … .≤ Gn (6) Because all the job’s service times are identical, it is clear that at least part of the time job 1, which has the most challenging goal, should be running with the highest priority; correspondingly, job 2 should be running at least some time with priority 1 or 2 and so forth so on. From another side, because goal G1 is the most challenging one, it is naturally to expect, that job 1 will have the “worst” performance index, γ1 , job 2 will either have performance index γ1 or the next worst,γ2 and so on, i.e.: γ1 ≥ γ2 ≥ … ≥ γn (7) Suppose, a group of jobs S, has the same performance index, γ = Rn/Gn , or Ri = pGi ∀j ∈S. Using Little Law, qi =λj Ri = =γλjGi. Summing up these 7 Amazing, and general fact of submodularity theory, is that solving the problem of lexicographical optimization is equivalent to minimization of some quadratic function (in our case, ∑qj 2 /Gj ) was discovered by Fujishige [F0]. 8 A formal proof of the algorithm is presented in [R9] equalities, we will have: γS )(/)(/ SSQGq Sj jj Sj j ϖλ∑∑ ∈∈ == (8) where Q(S) and ω(S) – total queue and weighted goal (8) in the job group S. Now, all we need – is just to find the right separation of the set of all jobs, 1,…,n into m subsets S1 , .., Sm , m ≤ n so that jobs from the same group, k have the same performance index, γk (and γ1 > γ2 > … >γm ). Algorithm Initialization: First – sort all the jobs by their goals, like (6). Set m=1, include current job, j:= 1 into the current group m: Sm ={1}. Set higher groups utilization, Uh := 0 and total utilization Utot = u1. Evaluate, using M/M/m formula total queue in the group, Q(S1)=Q(Utot) – Q(Uh)=Q(U1) ; set ω(S1)=λ1G1 and γ1 := Q(S1)/ω(S1). Iteration: Repeat, while j < n: Update current job index, j:=j+1, First, try if this job j belongs to the current job set Sm: Tentatively set Utot ~ := Utot + uj; re-evaluate current group total queue Qm ~ = Q(Utot ~ ) – Q(Uh); ωm ~ := ω(Sm)+λjGj and updated performance index p~ = Qm ~ /ωm ~ . 1 If p~ ≥ γm than current job j belong to the current group, do: 1.a Set Sm := Sm ∪j,γm := p~ , ωm := ωm ~ , Utot:=Utot ~ and update the respond times for each of the job, i from the current group, Ri :=γmGi ∀i ∈ Sm. 1.b If m >1 check monotonicity (7): If γm > γm-1, the groups m and m-1 should be merged (Sm-1:= Sm ∪Sm-1, ωm-1= ωm-1 +ωm , Qm-1:= Qm +Qm-1, γm-1 := Qm-1/ωm-1, m:= m-1 ) and this merging operation should be repeated until γ1,…, γm become a decreasing sequence. 2. If p~ < γm the new group, m=m+1 started: Uh := Utot;; Utot = Utot ~ , Qm = Q(Utot) – Q(Uh), ωm =λjGj γm := Qm/ωm, Rj=γmGj . And, if j < n the iteration repeated. After algorithm completion, we will have the lexicographically optimal response times for each job, and m different values of the performance indices, γk, k=1,…m.
  • 6. To illustrate how this algorithm work consider its application to the example from Table 2. Table 3 Algorithm iterations results for 5 jobs j Uj Gj m γm Response time estimations, R1,,…,Rn 1 0.1 0.125 1 0.89 0.11 2 0.1 0.25 2 0.56 0.11, 0.14 3 0.1 0.50 3 0.36 0.11, 0.14, 0.18 4 0.5 0.50 1 1.19 0.15, 0.30, 0.59, 0.59 5 0.1 10.0 2 0.50 0.30, 0.59, 0.59, 0.59, 5.0 In Table 3 presented are utilizations and goals of five jobs from the Table 29 , as well as the algorithm iterations results: the values of m – number of groups, γm – performance index of the current group and Rj=γiGj- estimated job response times after each iteration, j. As one can see from this table, until the iteration with the “big” job, j=4 each job has different performance index, and each job group contained just one job: S1={1}, S2={2}, S3={3}. At the iteration with big job (in bold) all these groups were merged to one group, S={1,2,3,4} with performance index γ1 ≈ 1.19 at the step 1.b. At the final iteration with “unchallenging” goal G5 one more job group was created (γ2 = 0.5). This example illustrates a few important properties of splitting all the jobs to the groups with identical performance indices, S1 ,…, Sm . This splitting corresponds to the “best” possible priority distribution. Each job from the group Sk on the average should be run with higher priority than any job from the groups Sk+1 , …, Sm . Thus, even if all the jobs from these groups k+1,..,m will be eliminated from the system, there is no way to further improve performance results for the jobs j∈S1 ∪ … ∪ Sk. Besides, this grouping gives some way of simplification or aggregation of the original performance problem. Thus, if instead of n original jobs we consider m aggregated jobs with the parameters of arrival rate λk a and the goal Gk a defined as total and weighted total arrivals rates and goals: ∑∑∑ ∈∈∈ == 111 /, Sj jSj jj a kSj j a k GG λλλλ (8) than solution for the “aggregated “problem will lead to the same set of performance indices, γ1,…, γm as it was in the original non-aggregated problem of n jobs. 9 All the 5 jobs are sorted by their goals, so jobs #4 and #5 are swapped comparing the Table 2. The presented algorithm leads to the results of the original paper [BB0], devoted to goal mode scheduling and motivated by IBM WLM. The major difference is that this algorithm avoids scanning ~2n -1 constraints (4)-(5), and thus, it is scalable for big n. In this algorithm, all the jobs have the same importance level, but the case of different importance levels may be easily accommodated in the algorithm. In case of different – but relatively closed service times, algorithm become approximate – and can be used after rescaling jobs parameters. However, in the case when service rates are significantly different more general methods [FG8] or further research needed. Also, this algorithm may be used for more general G/M/m system with preemption in the case when all the job classes have the same exponential service-time distribution10 – but, of course, in this case calculation of the functions Q(U) may become much more complicated. 5. Tuning vs. Upgrade and other examples. Using tuning software tools may be an option to avoid or postpone hardware update. Thus, it is important to evaluate “limits of tuning” - i.e. can we reach the performance goals without upgrading the system? Besides, in the cases when the goals are not reachable without hardware upgrade, it may be important to evaluate upgrade + tuning option to evaluate what should be a minimal hardware upgrade level which allows meeting the performance goals if system is perfectly tuned. Answering this question may lead to a conclusion that the performance goals themselves are not realistic or are not cost effective. To simplify the analysis let’s start from the case of one job, which has response time goal, G, actual response time, R and total utilization U. Suppose, the goal is not reached and the performance index, γ = G/ R >1. What should be the minimal upgrade level υ > 1, such that the response time goal becomes reachable at a server, which is υ times faster, than the existing one? Using the Little’s law and the fact, that the throughput on the faster server (for open systems) will be the same, the equation for the upgrade factor, υ will be: Q( U/υ ) = Q(U ) /γ (9) In the case when the original and the “upgraded” servers have the same number of processors, m the equation (9) may be rewritten as (10): Q( m, U/υ ) = Q(m, U ) /γ (10) 10 G/M/m with the same average service time satisfies the conservation law (4)-(5) [CY1].
  • 7. For the M/M/m queues, it is easy to solve (10) analytically for small m or numerically for any arbitrary m. In particular, for m=1 solution of (10) will be: υ = U + γ (1- U) (11) In the case of m=2 the performance improvement factor, υ will be a solution of quadratic equations (ρ = U/m – per processor utilization): (υ2 - ρ2 )= γ υ(1-ρ) (12) It is interesting to analyze solution υ of the equation (10) as a function of γ . For any combination of the parameters m, ρ this function υ(γ), is monotonically increasing. However, its behavior is quite different in the areas of small and high utilizations. For small utilization υ ≈ γ - and this reflects the obvious fact that at small utilization there is no problem with queuing, we need faster server just to make service time smaller. If utilization is high (ρ≈1) this dependency is much weaker (e.g. for m=1,2: ∂υ/∂γ ~ 1-ρ ) which reflect the fact that at high utilization even relatively small improvement in the cpu speed can lead to significant effect. It is important to mention, that modern technology sometimes allows avoid “physical” upgrade – customers can use (and pay) for additional processors in the period of the pike loads [SB2]. In this case, when we are trying to solve performance problems by increasing the processors’ number from m1 to m2 the analog of (10) will be: m2 = min{ k| Q(k, U ) ≤ Q(m, U) /γ } (13) All these speculations were made for the simplest case of one “aggregated” job in the system. However, this simple back-on-the-envelope formula (9) can be used for approximate estimation of the necessary upgrade, if we aggregate all the jobs, which do not meet their goals. Besides, the algorithm, presented in the previous section allows estimation of a minimal necessary upgrade level in the case of tuning + upgrade analysis. Suppose after using the algorithm from the previous section for n jobs we found, that the best possible performance indices, which could be achieved by tuning are : γ1 * ≥ γ2 * ≥ … ≥ γn * and suppose that for k of the jobs their goals are not reachable, γk * >1. In this case we can consider one aggregated job with total utilization U, and performance index γ , defined by (14): ∑∑∑ === == k j jj k j j a k j j a U 111 /,/ λγλγµλ (14) Solving the equation (9) for the aggregated parameters, Ua , γa gives a some estimation for the necessary upgrade factor, υ. If respond times Rj * in (14) correspond to the lexicographically optimal solution, (14) gives a good estimation for the minimally necessary upgrade level (see Tables 6, 7: tuning + upgrade section). The high level of accuracy of aggregation may be explained by the fact that all the aggregated jobs 1,…,k are not affected by jobs k+1,…,n which has lower priorities. In the general case, aggregation gives some reasonable but not very accurate approximation. Thus, in the example presented at Table 5, using aggregated formula (14), (11) instead of the detailed information on the average priority distributions leads to estimation υ≈1.4 instead of υ≈1.6. Now, to compare tuning versus hardware upgrade options let us consider following illustrative example. Suppose, we have 3 jobs, running on one processor CPU and each job has a unit service time. The parameters of these jobs (called, according to their utilizations as Light, Medium, and Heavy) are presented at Table 4. Job response times results in the Table 4 corresponds to the mix of priority ordering, π1={2,1,0}, π2={2,0,1}, π3={0,1,2}, which has the probabilities, p={0.75, 0.15, 0.1}. Table 4 Original jobs utilizations and the response times Resp. Times Jobs Utilizati ons Actua l Goals Light (L) 0.1 10.97 2 Medium(M) 0.2 4.30 4 Heavy (H) 0.45 2.65 6 In the Table 5 we compared two approaches – “tuning” and the cpu upgrade. Tuning results corresponds to the lexicographically optimal solution, obtained with the algorithm from the previous section. As one can see from Table 5, goals gr are reachable – system can be tuned in a way, that all three jobs have the same performance index, pj ≈ 0.8 (almost 20% better then the goals). At the same time, reaching the goals using the existing priority distribution would require almost 60% faster processor, than the existing one (υ≈1.58).
  • 8. The significant difference between the existing and the upgraded cpu (almost 60%) may be explained in this example by the fact that the original priorities does not followed to the short job first rule (the most resource consuming Heavy job, had the highest priority 75% of the time). Or putting it another way- there is a high (≈97%) correlation between the goals and the loads. Table 5 Tuning and upgrade options comparison for correlated loads and goals (60% faster CPU), G={2,4,6} Tuning Upgrade (υ≈1.58) Jobs Res- ponse Resp/ Goal Res- ponse Resp/Goal L 1.85 0.81 2.00 1.00 M 3.69 0.81 1.35 0.34 H 4.62 0.81 1.48 0.25 In the case of high correlation between the utilizations and goals (if all the service times are the same) one can expect especially significant results from lexicographical optimization. And the opposite fact is true, too – when loads and goal are highly negatively correlated potential effect from tuning may be insignificant. To illustrate this we compared tuning vs. upgrade for the same model, but we swapped the goals for Light and Heavy jobs, making utilizations and goals negatively (≈-0.97) correlated. In this case of goals G={6,4,2} tuning is not sufficient (γ ≈1.3) -and we need some upgrade (υ≈1.08). Table 6 Tuning and upgrade options comparison for balanced (negatively correlated) loads and goals, G={6,4,2} Tuning + upgrade (υ≈1.08) Upgrade (υ≈1.21) Jobs Resp. Time Goals Res- ponse Resp /Goal Res- ponse Resp/ Goal L 6 6.0 1.0 4.54 0.76 M 4 4.0 1.0 2.45 0.61 H 2 2.0 1.0 1.99 1.0 As one can see from the Table 6, in the case of “balanced” goals, when tight goals corresponds to the small jobs, tuning does not help much: the necessary upgrade without tuning is ~21% and with tuning ~8%. This effect will be even more evident for multiprocessor systems. But, of course the major factor defining the necessary upgrade level is how challenging are the goals. In the Table 7 we presented results when all 3 goals are not reachable, and are much more challenging then in the previous examples (Gi =1). In this case the “un-tuned” system need 2.75 times faster processor to reach their goals, while tuning + upgrade combination needs much less faster cpu (υ =1.75). Note, that estimations of the upgrade factor υ (1.08 and 1.75) in te Tables 6 and 7 were made using the aggregated formula (10) and (14) using lexicographical solutions proves to be pretty accurate. Table 7 Tuning and upgrade options comparison in the case of challenging goals Tuning + upgrade (υ≈1.75) Upgrade (υ≈2.75) Jobs Resp. Time Goals Res- ponse Resp /Goal Res- ponse Resp/ Goal L 1 1.00 1.00 0.66 0.66 M 1 1.00 1.00 0.55 0.55 H 1 1.00 1.00 1.00 1.00 6. Summary Lexicographical optimization is an important approach in the analyzing performance problems. This relatively simple approach reflecting behavior of the complicated real tuning system allows finding a feasible solution, in the case when response time goals are achievable, and lead to fair, lexicographically minimal solutions, otherwise. It was also shown that a straightforward optimization approach, based on the weighted average of the response times is not very helpful in analyzing such systems. Also presented are simplified –back on the envelope formula for estimation necessary level of upgrade in the cases, when performance goals are not reachable. References [GNT4] Leonidas Georgiadis, Christos Nikolaou,
  • 9. Alexander Thomasian “A fair workload allocation policy for heterogeneous systems”, Journal of Parallel and Distributed Computing v.64, Issue 4, April 2004, pp 507-519. [L9] Luss, H. On equitable resource allocation problems: a lexicographic mini-max approach. Operations Research, v. 47., No. 3, pp.361-376, 1999. [BGTV3] Bhatacharya, P.P, L.Georgiadis, P.Tsoucas, I.Viniotis Adaptive Lexicographic Optimization in Multi- class M/GI/1 Queues, Mathematics of Operations Research, v. 18, No3, 1993, pp705-740.. [LK5] Leonard Kleinrock, “Queueing Systems”, v.1, v.2, 1975 [GM0] Goffman, E. and Mitrani, I. “A Characterization of Waiting Time Performance Realizable by Single Server Queues”, Operations Research, v28 (1980), 810-821. [CY1] Chen, H., Yao, D. Fundamentals of Queuing Networks, Springr, 2001 [BD0] Bolker, E., Ding, Y., On the Performance Impact of Fair Share Scheduling. Proc CMG 2000), pp.71-81. [BB9] Ethan Bolker, Jeff Buzen “Goal Mode: Part 1 - Theory”, CMG Transactions, V96, May 1999, pp.9-15. [SBG9] Annie Shum, Jeff Buzen, Boris Ginis, “Goal Mode: Part 2- Practice”, CMG Transactions, V96, May 1999, pp.16-21. [IBM00] IBM AIX V4.3.3 Workload Manager. Technical References. February 2000 Update. [F1] Fujishige, S., Submodular Functions and Optimization, Annals of Discrete Optimization, v47, 1991, 270 p. [R9] Rikun A. “A Polynomial Algorithm for Evaluation Achievable Performance Solutions for Multi-class Queues”, submitted for publication. [FG8] Federgruen, A., Groenevelt H., Characterization and Optimization of Achievable Performance in General Queuing Systems, Operations Research, v. 36, No. 5, 1988, pp 733-741 [SB2] Shum, A., Buzen J. “Industry-wide Implications of the ‘Capacity on Demand’ & ‘Pay-As-You-Go’ Phenomena: Intertwining of IT budget and capacity planning”, In: J. Of Computer Resource Management, 2002, p.66-88.