The document describes the Component Balancer, which establishes and maintains response time goals for selected business logic in component-based applications without modifying application code. It does this by controlling the calling rate of methods based on workload analysis. Benefits include business-level optimization and smoother resource utilization under heavy loads. The Component Balancer analyzes method performance and relationships, then self-tunes by inserting conditioning code to optimize or delay methods to meet response time goals as load varies. Case studies demonstrate improved response times and scalability.
1. The Component Balancer:
Optimization of Component-Based Applications
Jim Fontana, Viraj Byakod
Unisys Corporation
Jim.Fontana@unisys.com, Viraj.Byakod@unisys.com
Abstract
The Component Balancer establishes and maintains
response time goals for selected business logic contained
in methods in component-based applications without the
need to modify application code. The premise for this
optimization is that meaningful business logic in
component methods can be driven to response time goals
by controlling the calling rate of other methods, based on
analysis of the application workload. For example, if two
methods are accessing the same database, one method’s
response time can be improved if the other method is
selected to withstand longer delays. Other benefits of the
Component Balancer include business level control and
optimization and, for heavily loaded systems, smoother
resource utilization and increased scalability.
1. Introduction
The Component Balancer establishes and maintains
response time goals on business logic contained in
component methods. As the load on the system varies, the
Component Balancer attempts to keep methods with
response time goals at a specified target performance level
at the expense of lower priority methods.
Additional benefits of the Component Balancer are
business-level control and optimization and, for
overloaded systems, smoother resource utilization and
increased scalability. The case studies below expand on
these benefits.
The Component Balancer has two phases of operation: the
analysis phase provides optimization recommendations
and the optimization phase, using these recommendations,
self-tunes under variable load. Additionally, the
Component Balancer works across machines, applications
and application servers and requires no changes to the
application.
A key capability is the ability to wrap component methods
with “conditioning” code. The inserted conditioning code
is called a conditioner. Supported components, such as
COM+, are called managed types. Conditioners and
managed types are plug-ins to a framework.
Conditioners can be added and removed while an
application is running. No component source code is
needed and no recompilation or reconfiguration of the
components is required. Multiple conditioners can be
applied to a method. In addition to executing pre and post
processing code, conditioners have access to the
parameters of a method, can capture and throw
exceptions, can bypass the method execution and can
abort lower priority conditioners. The overhead is
minimal - 35 microseconds per method call plus the cost
of conditioners on a 700 Mhz system.
2. Analysis and Optimization Details
The details of the analysis and optimization algorithms are
given in this section. One of the design goals was to
expose as few “moving parts” as possible to the end user –
keeping the parameters that control the optimization
process hidden and letting the self-tuning do the work. As
we gain a better understanding of the optimization
process, some of these parameters may be exposed for an
advanced user.
2.1 Analysis Detail
Analysis occurs over a time interval that is specified by
the user. The overall analysis time is divided into a
number of analysis periods. Each period is treated as an
independent set of statistics representing the system as
observed during that period. Analysis results during each
period are accumulated into an analysis report.
During each analysis period, pair wise calculations are
done on methods to determine which methods are affected
by other methods. For example, deploying analysis
conditioning for methods A, B and C result in pairs AB,
BA, AC, CA, BC and CB being analyzed.
A statistical Analysis of Variance is then applied for each
of the pairs to determine if the average response times
differ depending on the coincident execution of the
methods. For example, when pair AB is analyzed, the
average response time for all calls to method A during the
sample period when it ran “by itself” is calculated, that is,
SPECTS '04 363 ISBN: 1-56555-284-9
2. without method B running at the same time. Then the
average response time is calculated for all calls to method
A during the sample period when it ran overlapped with
method B. Method A must start after method B to be
considered as overlapped. This is based on the
assumption that the first method “in” will get any
contended resource.
The individual measurements of the response times are
used to calculate the F statistic. It is used to determine if
the mean values of two random variables are statistically
different, assuming a level of significance. The statistic is
useful because it applies a proven technique to determine
equality in an area of uncertainty. For example, if the two
random variables each have a high level of variance, the
hypothesis that their means are equal will be a function of
the difference between their statistical means and also the
amount of variance. We can accept or reject the
hypothesis that the random variables have equal means at
a given level of “significance.”
Therefore, given a level of significance we can then
determine if the average response time for method A
during the sampling interval is significantly changed when
running coincidentally with method B.
For COM+, methods that invoke other methods in a
calling sequence are identified and not considered. If A
calls B (or is in a calling sequence chain with A), B
affects the performance of A, but delaying B will not
improve the performance of A.
Since the method overlap algorithm is not symmetrical,
the F-value is also calculated for method B to see if it is
being affected by method A. The total number of calls to
A, the number of calls to A only and number of calls to A
while overlapped with B and the F-value for each method
pair is stored in the raw analysis report during each
period.
The final analysis report is created from the raw analysis
report as a result of two steps. The first step generates an
intermediate F statistic for each method pair and
accumulates the number of statistically significant method
pairs. From experimentation, it was found that a 67%
level of significance could yield improvements in the
system. Sample sizes for non-overlapped and overlapped
operations must each be greater than 10. Therefore, if the
average response times are statistically different, given a
significance level of 67%, we have found that there is a
possibility that the performance of one method can be
improved by delaying the other.
The level of significance has the single biggest affect on
the analysis results. Raising the value too high will
eliminate potentially valid results while giving more
credence to results that are significant. A value that is too
low will most likely introduce some results that are not
valid or result in little benefit.
The second step involves processing the method pairs.
Significant differences for methods affected by other
methods are considered positive and are assigned a score
of +1. Significant differences for methods affecting other
methods are considered negative and assigned a score of
-1. A number of independent time samples are used in
the analysis process. During the second step, the scores
from each of these time samples are accumulated. If the
resulting value is positive, the method is considered a
candidate for optimization. If the resulting value is
negative then the method is considered a candidate for
delay.
A simple COM+ application was developed to test and
verify analysis results. The application has one
component with four methods: getSpecialOffers, deposit,
withdraw and checkBalance. The application internals
were as follows: deposit directly affects checkBalance;
withdraw calls checkBalance and getSpecialOffers runs
without interfering with or being interfered by other
methods.
A 25-minute run was made on a laptop, with eight users
per method being called at a random rate between 0 and
900 milliseconds on a 700 Mhz system.
The results are given in Figure 1 below.
Figure 1. Analysis Report
The top three lines in the report show the final scores for
each method, while the bottom four lines show
intermediate scores for the various method pairs. As
expected, deposit (-16 = -12 –6 +1 +1) is shown as the
only potential delay, with checkBalance (11 = 12 –1) a
SPECTS '04 364 ISBN: 1-56555-284-9
3. good potential for optimization, followed by withdraw (5
= 6 –1). Relative numbers like this are expected as
withdraw is indirectly affected by deposit.
The deployment algorithm starts with the largest positive
and largest negative numbers and moves inward,
deploying optimization and delay conditioning, until the
list is exhausted or until an internal limit is reached on the
number of deployed conditioners.
A planned enhancement is to consider stopping if there is
a large relative gap in the totals of methods in the final
report – positive or negative. This would mean that the
highest impact conditioning is deployed. Another
improvement that is being considered is to take into
account methods that affect each other – that is, A affects
B and B affects A. In this situation, depending on the
distribution of the workload, their final numbers may
cancel each other out and make them indistinguishable
from methods that just have low final numbers.
Another point to note is that a method is not analyzed
against itself in the Component Balancer analysis. For
other purposes, this analysis could be useful.
2.2 Optimization Detail
Response time data from methods being optimized is
processed in real time as it comes in from the target
machines and placed in a working set. Optimization
calculations are done every few seconds for each group in
the working set.
A delay increment is calculated using fuzzy logic for each
method targeted for optimization in the group. The delay
increment can be positive or negative. The largest delay
increment for a method in the group is chosen and added
to the total delay. The total delay is not allowed to exceed
a maximum of 300 milliseconds or go below zero. The
delay conditioner is given a user selectable maximum
delay factor between 1 and 3 that it uses to calculate the
delay it actually uses on the method – so the maximum
any method can be delayed is 900 milliseconds.
The calculated delay is pushed out to the machines where
the delayed methods are located by writing a registry key
entry on those machines.
If no activity has occurred in a working set for 10 seconds
– that is, no conditioned methods were called - the delay is
set to zero. If one hour elapses with no entries for a
method, it is removed from the working set.
The fuzzy logic takes three parameters – the current
average response time, the current average calls per
second (cps) and the previous average calls per second.
The two average calls per second values are used to
determine load trending. All parameters are normalized
against the current response time and cps ranges before
they are passed to the fuzzy logic
The parameters are “fuzzified” – response time is
converted to fast, ok and slow and the calls per second are
converted to low, medium and high. The fuzzy logic
executes a set of rules given these parameters. For
example, if the response time is ok and the current cps are
high and the previous cps are low, then the fuzzy logic
outputs a positive delay increment. This means that, since
the load has increased, a response time increase is
anticipated and a bigger delay most likely is needed.
The delay value is then “defuzzified”, that is, converted
back to a number in the accepted range and returned to the
optimizer. The delay increment moves over its allowed
range based on how “ok” the response time is, how “high”
the current cps are and how “low” the previous cps was.
The target response time is the current minimum response
time if the maximum goal is chosen. No optimization is
done if no goal is chosen. For any goal in-between, the
target response time is in a relative range between the
current lower and upper response times. For example, if
the goal is 75% and the response time range is between 50
to 150 milliseconds, then the response time goal will be
75 milliseconds.
The upper response time value and lower cps value are
reset every minute. Doing this forces the optimization to
squeeze towards an optimum target response time. The
lower response time value ratchets down and indicates the
current best value that the method can run at. The upper
cps value ratchets up and indicates the current best
throughput for the method.
The cost of Component Balancer conditioning in COM+
is approximately 900 microseconds for each analyzed
method, 200 microseconds for each delayed method and
300 microseconds for each optimized method.
3. Case Studies
In all case studies, optimization was based on analysis
results.
3.1 ISD Benchmark
The ISD (Inventory, Sales and Deliver) benchmark,
developed at Unisys, is loosely based on TPC-C and is
used for MS SQL Server and Oracle database tuning.
This benchmark uses an in-process COM+ component
SPECTS '04 365 ISBN: 1-56555-284-9
4. with 7 methods to drive the database and comes with a
driver script. We ran the benchmark with a 2 processor
500 Mhz database server running SQL Server and a 2
processor 1 GHz middle-tier machine.
We did an analysis run and deployed the recommended
optimizations with a maximum optimization goal. Only
one method, ItemReport, was running at a high response
time, all other methods were well below 500 milliseconds.
With one method delayed, the ItemReport average
response time dropped from 5.38 to 2.06 seconds (161%)
with a 5% overall loss in throughput. With two methods
delayed, the ItemReport response time dropped to 1.06
seconds (408%) with an overall loss in throughput of
12%. The CPU utilization on the database server dropped
from 100% to 93% in the one method delayed case and to
85% in the two methods delayed case. This optimization
trades throughput for improved response time. The other,
non-delayed methods were also improved by the
optimization.
By allocating one more processor to the database server,
the ItemReport response time went back to its “normal”
sub-second value. The business benefit here is to take that
extra processor on a consolidated database server and use
it for some other purpose and utilize the Component
Balancer to manage ISD performance during peak loads.
3.2 Nile Bookstore
A case study was done using the Doculabs-developed Nile
benchmark, which simulates an online bookstore web
application.
Measurements were done using Microsoft’s Web
Application Stress tool to drive a 4 processor, 700 Mhz
system running IIS 5 using the Nile C++ COM+
components (2 components with 14 total methods) and a
back-end MS SQL Server database on a 4 processor, 548
Mhz system. We drove the Nile benchmark with scripts
that had a 4:1 ratio of browse versus browse and buy
transactions. The browsing workload was equally split
between short and long browses and the buying workload
was equally split between users that buy 1 book and buy 5
books. Results from a 200 user run are given in Figure 2
below.
Our optimization, being downstream from IIS, which is
the bottleneck in this configuration, does not significantly
change the performance profile - 16% improvement for
the logon response time and 6% more throughput.
However, buyers now have priority over browsers. This
shows that we can control the where time is spent in the
application in a meaningful fashion.
0
2
4
6
8
10
Short
Browse
Long
Browse
Buy 1
Book
Buy 5
Books
Average Time Per Script (sec)
Unoptimized
Optimized
Figure 2. Nile Response Times By Script
The normal configuration for the Nile would be a load-
balanced middle-tier server farm that puts a larger load
onto MS SQL Server. To simulate this, we moved the
backend database to an underpowered server (a one
processor 400 Mhz system) and injected extra load to
simulate a “medium” and “heavy” MS SQL Server load.
Now, response times are affected by MS SQL Server
performance and optimization can have a larger effect.
Results in Figure 3 below are for 100 users.
0
50
100
150
200
250
300
Medium Heavy
Database Load
Welcome Screen Avg Response Time (ms)
Unoptimized
Optimized
Optimized+
Figure 3. Optimization Results with Database Bottleneck
The “Optimize+” runs added a custom conditioner written
to cache the results of the GetSpecials method – which
returns the list of bookstore specials and is always
displayed on the welcome screen after logon. Both
optimizations improve the response times of the first two
screens the user sees without affecting the overall
throughput (logon screen response times are not shown).
SPECTS '04 366 ISBN: 1-56555-284-9
5. Just deploying the cache GetSpecials custom conditioner
by itself resulted in a drop in the number of transactions
the database had to perform as seen in Figure 4 below.
Cache GetSpecials
conditioner deployed here.
Figure 4. Caching GetSpecials Custom Conditioner
We also did testing that overloaded the database. This
caused the database to thrash and erratic behavior to
appear. The Purchase method randomly spiked up to one
second and its call rate became quite variable. With
optimization, we were able to keep the Purchase response
time at 60 milliseconds with a steady call rate, drop the
CPU utilization on the database server from 100% to 93%
and increase the number of database transactions per
second by 26%. In this scenario, the Component Balancer
smoothed response times and scaled database
performance during an unexpected surge in load.
4. Conclusion
The Component Balancer improves response times of
high priority methods when under load. Additionally, it
can change and improve the business level profile of an
application. The Component Balancer is available as part
of the Application Sentinel for Resource Management
product from Unisys Corporation.
Acknowledgments
I would like to acknowledge the following people:
Russ Cole and Praveen Nagarajan – initial prototyping
and feasibility
Ron Neubauer and Paul Koerber – automation
Mark Tadman and Peter Partch – COM+ conditioning
Jack Chang and Joyce Liu – Weblogic conditioning
Bob Walker – Nile investigation and set up
Doug Tolbert – statistical consultation
Michael Salsburg – detailed review and editorial advice
References
[1] Brown, Keith, “Building a Lightweight COM Interception
Framework, Part I: The Universal Delegator”, Microsoft
Systems Journal, vol. 14, pp. 17-29, January 1999.
[2] Brown, Keith, “Building a Lightweight COM Interception
Framework, Part II: The Guts of the UD”, Microsoft Systems
Journal, vol. 14, pp. 49-59, February 1999.
[3] Robert E. Filman, “Applying Aspect-Oriented Programming
to Intelligent Synthesis”, Workshop on Aspects and Dimensions
of Concerns, 14th European Conference on Object-Oriented
Programming, Cannes, France, June 2000.
[4] Ribler, R. L., H. Simitci, D. A. Reed, “The Autopilot
Performance-Directed Adaptive Control System”, Future
Generation Computer Systems, special issue of Performance
Data Mining, Volume 18, Number 1, 2001, pp. 175-187.
[5] Zona Research, “The Economic Impacts of Unacceptable
Web-Site Download Speeds”, Zona Market Bulletin, Redwood
City, California, Zona Research, 1999.
SPECTS '04 367 ISBN: 1-56555-284-9