SlideShare a Scribd company logo
1 of 53
Harmonic Mean Aggregation

Copyright © 2013 Performance Dynamics

Aggregating Monitored Rate
Data Using the Harmonic
Mean
Progressive notes developed in response to remarks that arose
during the Monitorama Conference, Boston MA, March 28-29, 2013
Neil J. Gunther
Performance Dynamics Company

N.J. Gunther

Last updated November 24, 2013

1
Copyright © 2013 Performance Dynamics

Harmonic Mean Aggregation

Contents
1 Monitoring as Motivation

3

2 Meaning of the Means

8

3 Visual Explanation

13

4 Checking HM Correctness

20

5 Application to Time Series

24

6 Weighted Harmonic Mean

33

7 Accommodating Zero Rates

40

8 Conclusions

51

N.J. Gunther

Last updated November 24, 2013

2
Harmonic Mean Aggregation

1

N.J. Gunther

Copyright © 2013 Performance Dynamics

Monitoring as Motivation

Last updated November 24, 2013

3
Copyright © 2013 Performance Dynamics

Harmonic Mean Aggregation

During the presentations at Monitorama, we saw any number of
monitored metrics displayed as a time series, like Fig. 1.
Metric
50 000

40 000

30 000

20 000

10 000

0

200

400

600

800

1000

Time

Figure 1: Typical time series display of a collected metric
Eventually, we need to aggregate these data.

N.J. Gunther

Last updated November 24, 2013

4
Harmonic Mean Aggregation

Copyright © 2013 Performance Dynamics

Aggregation
Aggregation refers to averaging the monitored data on the boundary of
some time period, T . Such boundaries might occur daily, weekly,
monthly, etc.
A more important question (that is often overlooked) is, what do we
mean by averaging?
The usual assumption is that aggregation means taking the statistical
mean or, what is the same thing, taking the arithmetic average of all the
metric values occurring in each period T .
This may or may not be a valid assumption, depending on 2 things:
1. The type of metric being monitored
2. Whether the metric is sampled or an event
Remark 1. The distinction b/w sampled metrics and event metrics was
never delineated in any Monitorama presentations. More on this later.
N.J. Gunther

Last updated November 24, 2013

5
Harmonic Mean Aggregation

Copyright © 2013 Performance Dynamics

Types of Metrics
There are only 3 types of metrics (see my Keynote):
1. Time — the fundamental performance metric. Dimension [T ]
Example measurement units: ns, weeks.
2. Counts — integer or decimal number. Dimensionless [φ]
Example measurement units: subscriptions, RSS.
3. Rate — inverse time. Dimension [1/T ] or [T −1 ]
Example measurement units: Gbps, MIPS.
Definition 1. The throughput (X) is a rate metric type. It’s the number
of work units completed (C) per unit time (T ):
C
(1)
T
Example 1. A web server handling C = 30, 000 httpGets every minute
has an average throughput of X = 30000/60 = 500 Gets per second.
X=

N.J. Gunther

Last updated November 24, 2013

6
Harmonic Mean Aggregation

Copyright © 2013 Performance Dynamics

Graphite Workshop
During the Graphite workshop, aggregating monitored rate data was
mentioned. This caused me to interject the cautionary comment:

The correct way to average rates (inverse-time metrics) is to apply
the harmonic mean, not the arithmetic mean.

At least that’s what the classic computer performance books tell you.
See, e.g., Allen (Academic Press 1990) and Jain (Wiley 1991).
I wasn’t emphatic about it b/c the examples in those textbooks do not
refer to time series. Good thing b/c the usual form of the harmonic mean
doesn’t work for time series!
That’s what I’m going to address here. Goggle up; science ahead.

N.J. Gunther

Last updated November 24, 2013

7
Harmonic Mean Aggregation

2

N.J. Gunther

Copyright © 2013 Performance Dynamics

Meaning of the Means

Last updated November 24, 2013

8
Harmonic Mean Aggregation

Copyright © 2013 Performance Dynamics

Meaning of the Means – AM
Definition 2 (Arithmetic Mean). The sum on the numbers (iid rvs)
divided by the number of numbers:
X1 + X2 + . . . + XN
=
AM =
N

N
k=1

Xk

N

(2)

Example 2 (Arithmetic mean of the first 100 integers).
AM =

1 + 2 + . . . + 100
50 × 101
=
= 50.50
100
100

In R, the arithmetic mean is calculated simply as:
> mean(1:100)
[1] 50.5

N.J. Gunther

Last updated November 24, 2013

9
Copyright © 2013 Performance Dynamics

Harmonic Mean Aggregation

Meaning of the Means – HM
Definition 3 (Harmonic Mean). The inverse of the arithmetic
mean of the inverses (iid rvs):
HM =

1
1
( X1
N

1
1
+ X2 + . . . +

1
XN

)

=

1
N

N

k=1

1
Xk

−1

(3)

Example 3 (Harmonic mean of the first 100 integers).
HM =

1+

1
2

100
+ ... +

1
100

= 19.28

Since the harmonic mean is not defined in the base R pkg, we write:
> 100/sum(1/1:100) # matches Example 3
[1] 19.27756
or
> 1/mean(1/1:100) # matches eqn.(3)
[1] 19.27756

N.J. Gunther

Last updated November 24, 2013

10
Harmonic Mean Aggregation

Copyright © 2013 Performance Dynamics

The Ad Nauseam Example
But how do we know when to apply the harmonic mean?
The example used to illustrate the application of HM ad nauseam is a
vehicle covering the same distance at different speeds.
Example 4 (Variable speed trip). Suppose a car travels 100 miles from
city A to city B at 100 mph. But, on the return journey the weather is
bad, so the car is forced to travel at the slower speed of 50 mph. What is
the average speed for the round trip?
The total RTT time is 3 hrs b/c it takes 1 hr to go from A to B and 2
hrs to return at half the speed.
If we assume the arithmetic mean of the speeds, the average speed is:
AM = 1 (100 + 50) or 75 mph. But covering 200 miles at an average
2
speed of 75 mph would take 2 hrs 40 mins, not 3 hrs. Oops!

N.J. Gunther

Last updated November 24, 2013

11
Copyright © 2013 Performance Dynamics

Harmonic Mean Aggregation

If, however, we apply the harmonic mean:
HM =

1
1
1
( 100 +
2

1
)
50

we get an average speed of 662⁄3 mph. And covering 200 miles at an
average speed of 662⁄3 mph does take 3 hrs.
Remark 2. Notice that HM < AM. This is always true.
In my Graphite workshop mini-talk, I gave the example of database reads
and writes as corresponding to the two different IOPS rates or speeds
executing the same number of IOs, analogous to the same distance.
Proposition 1. The harmonic mean applies when the same amount of
work is done at different rates.
Another common example would be where you want to average the
different throughput rates of the same benchmark measured on different
speed processor systems.
But benchmarking is not monitoring.

N.J. Gunther

Last updated November 24, 2013

12
Harmonic Mean Aggregation

3

N.J. Gunther

Copyright © 2013 Performance Dynamics

Visual Explanation

Last updated November 24, 2013

13
Copyright © 2013 Performance Dynamics

Harmonic Mean Aggregation

Visual Explanation
Metric
3.0
2.5
2.0
1.5
1.0
0.5

0

1

2

3

4

Time

Figure 2: Invariant areas
The blue and red areas are equal: 3h × 1w = 3w × 1h = 3 squares each.
The areas represent the same count metric (C): distance, IOs, etc.
N.J. Gunther

Last updated November 24, 2013

14
Copyright © 2013 Performance Dynamics

Harmonic Mean Aggregation

The AM Doesn’t Work
Metric
3.0
2.5

AM

2.0

gap?

1.5
1.0
0.5

0

1

2

3

4

Time

Figure 3: Yellow area corresponds to height AM = 2
Since the yellow area of 6 squares, corresponding to a height AM = 2
[AM = 1 (3 + 1)], is only 3 squares wide, there is a gap 1 square wide.
2
N.J. Gunther

Last updated November 24, 2013

15
Copyright © 2013 Performance Dynamics

Harmonic Mean Aggregation

Correcting the AM Area
Metric
3.0
2.5
2.0

AM

1.5

HM

1.0
0.5

0

1

2

3

4

Time

Figure 4: Squashing the yellow area into the green area
The green area of 6 squares, corresponding to a height HM = 1.5
[HM = 2 × 3/(3 + 1)], now has the correct width (total time).
N.J. Gunther

Last updated November 24, 2013

16
Copyright © 2013 Performance Dynamics

Harmonic Mean Aggregation

Covering All the Columns
Metric
3.0
2.5
2.0

AM

1.5

HM

1.0
0.5

0

1

2

3

4

Time

Figure 5: Harmonic column height (HM) of width 4 units
The original blue and red areas correspond to histogram columns of
different widths. The green HM column has the correct total width.
N.J. Gunther

Last updated November 24, 2013

17
Copyright © 2013 Performance Dynamics

Harmonic Mean Aggregation

Does AM Ever Work?
Yes. The AM is applicable when columns have uniform width.
Metric
2.0

Metric
2.0

AM

1.5

AM

1.5

1.0

1.0

0.5

0.5

0.0

0.5

1.0

1.5

2.0

2.5

Time

0.0

0.5

1.0

1.5

2.0

2.5

Time

Figure 6: AM works for uniform column widths
Most common case and why statisticians use the AM for statistical mean.
And why the HM is not in the base R package.

N.J. Gunther

Last updated November 24, 2013

18
Harmonic Mean Aggregation

Copyright © 2013 Performance Dynamics

Time Bin Widths
The count per unit time constitutes a rate metric (X = C/T ).
Proposition 2. The harmonic mean (HM) applies to histograms with
columns having the same areas (counts) but different widths . In the
case of monitored data, these different widths constitute different time
bins. This case is most likely to occur with asynchronous event data.
Proposition 3. Since the event counts (C) occur in time (T) on the
x-axis, the y-axis must be a rate metric, e.g. throughput X = C/T .
Events per unit time.
Proposition 4. The arithmetic mean (AM) applies to histograms with
columns having the same widths but different areas (counts). That
turns out to be the most common case b/c the monitored data are
sampled on equal periodic boundaries, like the ticks of a metronome.

N.J. Gunther

Last updated November 24, 2013

19
Harmonic Mean Aggregation

4

N.J. Gunther

Copyright © 2013 Performance Dynamics

Checking HM Correctness

Last updated November 24, 2013

20
Copyright © 2013 Performance Dynamics

Harmonic Mean Aggregation

Checking the Correctness of HM
Recalling eqn. (3) for N periods:
HM =
=

1
X1
C
X1

+

1
X2

N
+ ... +

NC
C
+ X2 + . . . +

1
XN
C
XN

(4)

We’ve simply multiplied each interval by the constant count C, as is
appropriate for HM.
Substituting the definition of throughput from eqn. (1) produces:
HM =

NC
T1 + T2 + . . . + TN

(5)

which agrees with the notion
Average (harmonic) rate =

N.J. Gunther

Total counts
Total time

Last updated November 24, 2013

(6)

21
Harmonic Mean Aggregation

Copyright © 2013 Performance Dynamics

Remark 3. The same counts per period (C), completed at different rates
(Xk ) in the denominator of eqn. (4), are responsible for producing the
nonuniform time intervals (Tk ) in the denominator of HM in eqn. (5).
Theorem 1 (When is HM = AM?). If Tk intervals are the same, as
they are with sampled data, the counts per sample will be different, i.e.,
will have different rates per sample, and HM reduces to AM.
Proof 1. Under these conditions, eqn. (5) for the HM becomes

1
N

C1 + C2 + . . . + CN
C1 + C2 + . . . + CN
=
T + T + ... + T
NT
C1
C2
CN
X1 + X2 + . . . + XN
+
+ ... +
=
T
T
T
N

But this is precisely the definition of AM given by eqn. (2).

N.J. Gunther

Last updated November 24, 2013

22
Copyright © 2013 Performance Dynamics

Harmonic Mean Aggregation

Checking the Examples
We can use eqn. (6) to check that HM is the right type of average.
1. Example 4 with N = 2 speeds (X1 = 100 mph, X2 = 50 mph) over
the same distance (C = 100 miles):
HM =

1
1
1
( 100 +
2

1
)
50

= 662⁄3 mph

200 miles
Total counts
=
= 66.67 mph
Total time
3 hrs
2. Visual HM example with different column widths:
HM =

1
3
= units high
1 1
2
( + 1)
2 3
1

Total counts
6 squares
=
= 1.5 units high
Total time
4 units

N.J. Gunther

Last updated November 24, 2013

23
Harmonic Mean Aggregation

5

N.J. Gunther

Copyright © 2013 Performance Dynamics

Application to Time Series

Last updated November 24, 2013

24
Copyright © 2013 Performance Dynamics

Harmonic Mean Aggregation

Monitored Subscription Rates
Rate
4000

3000

2000

1000

0

5

10

15

20

25

30

35

Time

Figure 7: Real data: subscription rates over 33 days
Days

9.24932

18.663

27.4192

30.2493

33.0007

Rate

N.J. Gunther

0
0.00

1081.16

1062.28

1142.05

3533.40

3634.56

Last updated November 24, 2013

25
Copyright © 2013 Performance Dynamics

Harmonic Mean Aggregation

Irregular Time Boundaries
Rate
4000

3000

2000

1000

0

5

10

15

20

25

30

35

Time

Figure 8: Since the time-series data are not sampled but triggered
on 10,000 subscriptions, the data points do not fall on regular time
boundaries.

N.J. Gunther

Last updated November 24, 2013

26
Copyright © 2013 Performance Dynamics

Harmonic Mean Aggregation

Rates as Column Heights
Rate
4000

3000

2000

1000

0

5

10

15

20

25

30

35

Time

Figure 9: Irregular time intervals are more easily discerned in a
columnated format. We want to aggregate these data into a single
datum.

N.J. Gunther

Last updated November 24, 2013

27
Copyright © 2013 Performance Dynamics

Harmonic Mean Aggregation

The numerical subscription rates (Xk ) are:
X1

X2

X3

X4

X5

X6

0.00

1081.16

1062.28

1142.05

3533.40

3634.56

Using R, the AM and HM are:
> hmean <- function(vals) { 1/mean(1/vals) }
> rates <- c(0.00 1081.16 1062.28 1142.05 3533.40 3634.56)
> mean(rates)
# AM
[1] 1742.242
> hmean(rates) # HM
[1] 0

The AM evaluates but the HM fails. Why? From eqn. (5) the HM is
HM =

1 1
(
6 0.0

+

1
1081.16

+

1
1062.28

1
1
+ 1142.05 +

1
3533.40

+

1
)
3634.56

(7)

But the first term in the denominator is infinite and dominates all the
other values. The final inversion “1/∞” produces HM = 0.

N.J. Gunther

Last updated November 24, 2013

28
Copyright © 2013 Performance Dynamics

Harmonic Mean Aggregation

Let’s Try That Again
We don’t need the first data-point. Treat it as the origin of the time
period associated with the X2 data point. To drop it in R, we write:
> rates[-1]
[1] 1081.16 1062.28 1142.05 3533.40 3634.56
> hmean(rates[-1])
[1] 1515.118

which is non-zero and less than AM. That’s encouraging.
Alternatively, we can evaluate HM explicitly as
> length(rates[-1])/sum(1/rates[-1])
[1] 1515.118

Note that the numerator is now 5 rather than 6
> length(rates[-1])
[1] 5

due to dropping the first value.

N.J. Gunther

Last updated November 24, 2013

29
Copyright © 2013 Performance Dynamics

Harmonic Mean Aggregation

Check the HM Value
The measured rates were triggered on a count of 10,000 per period.
The total count is therefore C = 5 × 10, 000 subscriptions.
The total time period is T = 33.0007 days.a
From eqn. (6) the time-averaged harmonic rate is:
XHM =

C
50, 000
=
= 1515.12
T
33.0007

which agrees with hmean(rates[-1]) on the previous page.
Alternatively, only the HM gives the correct total time window
T =

C
50, 000
=
= 33.0007
XHM
1515.12

in agreement with the concept shown in Figure 5.
a Don’t

pay too much attention the decimal digits. I’m only displaying them
for consistency and readability.

N.J. Gunther

Last updated November 24, 2013

30
Copyright © 2013 Performance Dynamics

Harmonic Mean Aggregation

AM and HM for Subscription Data
Rate
4000

3000

2000

AM
HM

1000

0

0

5

10

15

20

25

30

35

Time

Figure 10: The AM and HM represent the average subscription rate
and therefore correspond to different positions on the y-axis. But,
only the HM gives the correct total time window of 33 days.

N.J. Gunther

Last updated November 24, 2013

31
Copyright © 2013 Performance Dynamics

Harmonic Mean Aggregation

The Aggregated HM Value
Rate
4000

3000

2000

1000

0

0

5

10

15

20

25

30

35

Time

Figure 11: The HM is the big blue dot that correctly replaces these
subscription-rate data for this time bin (33 days) when they are
aggregated

N.J. Gunther

Last updated November 24, 2013

32
Harmonic Mean Aggregation

6

N.J. Gunther

Copyright © 2013 Performance Dynamics

Weighted Harmonic Mean

Last updated November 24, 2013

33
Copyright © 2013 Performance Dynamics

Harmonic Mean Aggregation

Weighted Harmonic Mean
Recalling Example 4, we consider the following generalization of the HM.
Definition 4 (Weighted Harmonic Mean).
WHM =
where the total weight W =

1
W

w
( X1
1

k

1
w
+ X2 + . . . +
2

wk
XN

)

(8)

wk .

Example 5 (Variable speed over different distances). A car travels 50
miles at 40 mph, 60 miles at 50 mph and 40 miles at 60 mph. What is
the average speed of the trip?
The distance weights are: w1 = 50, w2 = 60, w3 = 40. Substituting into
eqn. 8 yields:
50 + 60 + 40
WHM = 50
= 48.13 mph
+ 60 + 40
40
50
60
N.J. Gunther

Last updated November 24, 2013

34
Copyright © 2013 Performance Dynamics

Harmonic Mean Aggregation

Significance of the WHM
Check the preceding calculation in R:
> wts
<- c(50, 60, 40)
> rates <- c(40, 50, 60)
> sum(wts)/(sum(wts/rates))
[1] 48.12834

The counts per period were constant in both Example 4 (Ck = 100
miles) and the example in Section 5 (Ck = 10, 000 subscribers).
Proposition 5. The WHM allows us to calculate HM when counts per
period are distributed arbitrarily within the aggregation time window.
Eqn. (8) can be rewritten with weights as percentages:
WHM =

1
%
( w11
X

+

w2 %
X2

+ ... +

wk %
)
XN

(9)

where wk % = wk /W .
N.J. Gunther

Last updated November 24, 2013

35
Copyright © 2013 Performance Dynamics

Harmonic Mean Aggregation

Determining the Percentage Weights
The percentage weights can be obtained directly from monitored data
using the following steps:
1. Each rate data point rk has an associated time increment ∆tk
2. The product wk = rk × ∆tk is the raw weight (area) for data point k
3. The total weight is W =

wk (total area)
wk
(fraction of total area)
4. The percentage weight is wk % =
W
k

In R, we can write the above calculation as a function with 2 args:
wtspc <- function(rates, tdeltas) {
weights <- rates * tdeltas
totalwt <- sum(weights)
return(weights / totalwt)
}

N.J. Gunther

Last updated November 24, 2013

36
Copyright © 2013 Performance Dynamics

Harmonic Mean Aggregation

Application of WHM to Time Series
Rate
70
60
50
40
30
20
10

0

100

200

300

400

500

Time

Figure 12: Monitored rates for application “GAM”
Aggregation window size is 60 samples with T = 558.83 units

N.J. Gunther

Last updated November 24, 2013

37
Copyright © 2013 Performance Dynamics

Harmonic Mean Aggregation

> gamrates
[1] 18.68 10.77 16.60 19.69 1.95 22.53 4.99
[13] 6.51 6.80 22.19 4.35 3.90 3.16 9.98
[25] 8.30 6.16 11.93 63.95 21.63 11.37 5.31
[37] 3.35 3.69 6.18 17.51 21.79 8.99 11.83
[49] 14.93 6.38 4.21 3.25 31.02 17.10 20.49

2.50 7.91
5.25 48.49
5.48 3.49
8.26 4.54
3.85 10.66

5.21
5.26
4.96
2.71
4.58

9.73
1.95
3.88
4.02
5.08

5.67
8.49
8.86
6.94
3.70

> gamdeltas
[1] 3.03 4.95 3.59 2.88 30.12 2.66 11.98 21.35 6.47 11.30 5.32 8.94
[13] 8.95 7.42 2.70 12.48 14.06 15.99 5.98 10.68 1.16 10.48 29.67 6.55
[25] 6.40 9.17 4.23 0.85 2.57 4.87 9.67 10.14 16.40 11.39 13.24 6.05
[37] 16.44 16.08 9.41 3.25 2.32 5.67 4.60 7.12 12.96 20.98 12.67 7.48
[49] 3.60 8.18 12.65 16.33 1.89 2.95 2.58 14.48 5.19 12.70 9.87 15.77

Using eqn. (9) and our R function wtspc() we find:
> (whm.gam <- 1 / sum(wtspc(gamrates, gamdeltas) / gamrates))
[1] 5.913534

Check WHM value produces the correct total time T = 558.83 units:
> sum(gamdeltas)
[1] 558.827
> sum(gamdeltas*gamrates) / whm.gam
[1] 558.827

N.J. Gunther

Last updated November 24, 2013

38
Copyright © 2013 Performance Dynamics

Harmonic Mean Aggregation

WHM Aggregation Result
Rate
70
60
50
40
30
20
10

0

100

200

300

400

500

Time

Figure 13: WHM aggregation of monitored “GAM” rates in Fig. 12

N.J. Gunther

Last updated November 24, 2013

39
Harmonic Mean Aggregation

7

N.J. Gunther

Copyright © 2013 Performance Dynamics

Accommodating Zero Rates

Last updated November 24, 2013

40
Copyright © 2013 Performance Dynamics

Harmonic Mean Aggregation

Handling Zeros in the Time Series
WHM in Sect. 6 worked b/c those data did not contain any zero rate values.
However, with HM of eqn. (7) we already saw that
1
→ ∞ as X → 0
X
Since that single value dominates all the other nonzero terms in the
denominator of HM, the final inversion produces an overall zero value:
HM =

1
→ 0 as X → 0
1/X

The same is true for WHM in eqn. (9).

This dooms the algorithmic use of WHM for general time series.

Since monitored rate metrics can be expected to include zero values in any
aggregation period, we need a way to accommodate them.
N.J. Gunther

Last updated November 24, 2013

41
Copyright © 2013 Performance Dynamics

Harmonic Mean Aggregation

Example 6 (Toy sample rates with zero values).
X1 = 0, X2 = 100, X3 = 100, X4 = 0, X5 = 100
The standard harmonic mean (3) produces the result HM = 0.
> zr <- c(0,100,100,0,100)
> hmean(zr)
[1] 0

Some possible remedies:
Ignore zero values: Pretend the zeros don’t exist and there are only 3
(positive) data values.
HM3,3 =

3/3
1/3
100

+

1/3
100

+

1/3
100

= 100

(10)

Drop zero values: Retain 3 of 5 positive values with weights of 1/5.
HM3,5 =

3/5
1/5
100

+

1/5
100

+

1/5
100

= 100

(11)

Surprise! Ignoring == Dropping

N.J. Gunther

Last updated November 24, 2013

42
Copyright © 2013 Performance Dynamics

Harmonic Mean Aggregation

But Wait! It Gets Worse
HM3,3 and HM3,5 are both identical to the arithmetic mean!
We can check this in R:
> zr[-which(zr==0)] # drop zeros
[1] 100 100 100
> zpos <- zr[-which(zr==0)]
> hmean(zpos)
# HM
[1] 100
> mean(zpos)
# AM
[1] 100

Proposition 6. Naively including zero rates produces HM = 0. FAIL
Proposition 7. Naively dropping zero rates produces the AM. FAIL

N.J. Gunther

Last updated November 24, 2013

43
Copyright © 2013 Performance Dynamics

Harmonic Mean Aggregation

A More Careful Approach
We want to find an algorithm that produces 0 < HM < 100 for
Example 6 by accounting for all 5 data points, but not overbiasing due
to the presence of zero values.
Conjecture 1. The zeros in X1 , X4 have weights 1/5 each. Ignore those
terms in the harmonic sum but redistribute their weights across the
weights of the remaining non-zero terms X2 , X3 , X5 .
Each term in the harmonic sum has a weight of 1/5. The 2 zero terms
have a total weight of 2/5. Adding a third of that total zero-term weight
to each of the positive-term weights produces a new weight:
1
3

N.J. Gunther

2
5

+

1
5

Last updated November 24, 2013

44
Copyright © 2013 Performance Dynamics

Harmonic Mean Aggregation

Now, eqn. (11) becomes
3/5
(2/5)/3 + 1/5
100

+

(2/5)/3 + 1/5
100

+

(12)

(2/5)/3 + 1/5
100

In addition, each weight simplifies further as
1
3

2
5

+

1
1
=
5
3

2
5

+

1
5

3
3

=

2
3

1
5

+

1
5

3
3

=

1
3

Hence, (12) reduces to
3/5
1/3
100

+

1/3
100

+

1/3
100

= 60

(13)

which is less than the AM, but not zero, and thus meets our requirement.
Eqn. (13) for the zero-renormalized harmonic mean has the form
ZRHM5,2 =

3
HM3,3
5

(14)

where HM3,3 is the same as eqn. (10).

N.J. Gunther

Last updated November 24, 2013

45
Copyright © 2013 Performance Dynamics

Harmonic Mean Aggregation

The ZRHM Theorem
Since the 2nd factor in the RHS of eqn. (14) is the usual HM, it could
also be extended to include weighted terms (w%) for irregular counts per
time interval as defined by the WHM. See eqn. (9) in Section 6.
We can now write a general formula for calculating the harmonic mean
of arbitrary rate data.
Theorem 2 (Zero Renormalized Harmonic Mean).
NZ
ZRHM =
NW

1
NZ

NZ

k=1

w%
Xk

−1

(15)

where NW is the total number of data points in the aggregation window,
N0 is the number of zeros and NZ = NW − N0 . (cf. eqn. (3))
Proof 2. See preceding discussion.

N.J. Gunther

Last updated November 24, 2013

46
Harmonic Mean Aggregation

Copyright © 2013 Performance Dynamics

The ZRHM Algorithm
The following R function implements eqn. (15) of Thm 2 with uniform
weights.
zrhm <- function(tsrates) {
ndatas <- length(tsrates)
nzeros <- length(which(tsrates == 0))
pozdata <- tsrates[which(tsrates != 0)]
nozwt
<- (ndatas - nzeros) / ndatas
nozhm
<- 1 / mean(1 / pozdata)
return(nozwt * nozhm)
}

It takes an arbitrary time series, tsrates, of monitored rate data as its
argument (including zero values) and returns the ZRHM.

N.J. Gunther

Last updated November 24, 2013

47
Copyright © 2013 Performance Dynamics

Harmonic Mean Aggregation

Test Cases
Toy rate data: From Example 6
> zr
[1]
0 100 100
> zrhm(zr)
[1] 60

0 100

which agrees with the manually calculated result.
Subscription data: From Section 5
> sub.rates
[1]
0.00 1081.16 1062.28 1142.05 3533.40 3634.56
> hmean(sub.rates)
[1] 0
> hmean(sub.rates[-1])
[1] 1515.118
> zrhm(sub.rates)
[1] 1262.599

The result, HM−1 = 1515.118, is obtained by not including the zero
value at the origin. When that value is included, ZRHM < HM−1 ,
as expected, but ZRHM > 0, unlike HM = 0.
N.J. Gunther

Last updated November 24, 2013

48
Copyright © 2013 Performance Dynamics

Harmonic Mean Aggregation

Arbitrary Time Series
Fig. 14 shows a time series of 1000 rate values ranging b/w 0 and 100.
It contains 7 zero values whose locations in time are not known a priori.
Rate
100
80
60
40
20

200

400

600

800

1000

Time

Figure 14: AM = 50.93, HM = 0, ZRHM = 22.03

N.J. Gunther

Last updated November 24, 2013

49
Harmonic Mean Aggregation

Copyright © 2013 Performance Dynamics

ZRHM Summary
• ZRHM is especially useful if a threshold is defined as a lower bound,
e.g., cache hit-rate, video bit-rate, b/c ZRHM is biased toward
smaller rather than larger values.
• For a string of contiguous zero values can be treated as boundaries
b/w smaller aggregation windows. Take the 1st zero as defining the
end of a aggregation window, last zero as the beginning of next
aggregation window.
• No longer need to confirm the total time T from subareas.

N.J. Gunther

Last updated November 24, 2013

50
Copyright © 2013 Performance Dynamics

Harmonic Mean Aggregation

8

N.J. Gunther

Conclusions

Last updated November 24, 2013

51
Harmonic Mean Aggregation

Copyright © 2013 Performance Dynamics

What We Have Learned
• We compared AM vs HM averaging for monitored rate data.
Conventional wisdom says HM is the correct way to average rate
metrics. [See Example 4] But, for monitored data...
• HM assumes counts in each time bin are equal but bins have
different widths. Async event data (intermittent) triggered on a
common count criterion, e.g., every 1000 subscriptions.
• Otherwise, if time bins have same width, as with data collected on
same sample interval, HM = AM. [See Thm 1]
• HM fails if any rate measurement is zero. [See slide 41] Compensate
by using ZRHM. [See Thm 2]
• Since HM < AM, ZRHM is useful for detecting monitored rate falls
to a lower bound.

N.J. Gunther

Last updated November 24, 2013

52
Copyright © 2013 Performance Dynamics

Harmonic Mean Aggregation

When Should I Use the Harmonic Mean?
You should use the HM, or more accurately ZRHM, to aggregate
monitored data when all of the following criteria apply:
R — Rate metric
A — Async time intervals
T — Too low data values are of interest
E — Event data, not sampled data
Example metrics:
• Cache-hit rate
• Video bit-rate
• Call center service

N.J. Gunther

Last updated November 24, 2013

53

More Related Content

Viewers also liked

2.3 stem and leaf displays
2.3 stem and leaf displays2.3 stem and leaf displays
2.3 stem and leaf displaysleblance
 
Regression analysis
Regression analysisRegression analysis
Regression analysissaba khan
 
Stem and-leaf plots
Stem and-leaf plotsStem and-leaf plots
Stem and-leaf plotsValPatton
 
stem and leaf diagrams
 stem and leaf diagrams stem and leaf diagrams
stem and leaf diagramsblockmath
 
Statistical measures box plots
Statistical measures   box plotsStatistical measures   box plots
Statistical measures box plotsjaflint718
 
Stem and-leaf-diagram-ppt.-dfs
Stem and-leaf-diagram-ppt.-dfsStem and-leaf-diagram-ppt.-dfs
Stem and-leaf-diagram-ppt.-dfsFarhana Shaheen
 
Regression analysis
Regression analysisRegression analysis
Regression analysisbijuhari
 
Moments in statistics
Moments in statisticsMoments in statistics
Moments in statistics515329748
 
Presentation On Regression
Presentation On RegressionPresentation On Regression
Presentation On Regressionalok tiwari
 
Skewness & Kurtosis
Skewness & KurtosisSkewness & Kurtosis
Skewness & KurtosisNavin Bafna
 

Viewers also liked (13)

2.3 stem and leaf displays
2.3 stem and leaf displays2.3 stem and leaf displays
2.3 stem and leaf displays
 
Regression analysis
Regression analysisRegression analysis
Regression analysis
 
Stem and-leaf plots
Stem and-leaf plotsStem and-leaf plots
Stem and-leaf plots
 
stem and leaf diagrams
 stem and leaf diagrams stem and leaf diagrams
stem and leaf diagrams
 
Bar Graph
Bar GraphBar Graph
Bar Graph
 
Statistical measures box plots
Statistical measures   box plotsStatistical measures   box plots
Statistical measures box plots
 
Stem and-leaf-diagram-ppt.-dfs
Stem and-leaf-diagram-ppt.-dfsStem and-leaf-diagram-ppt.-dfs
Stem and-leaf-diagram-ppt.-dfs
 
Regression analysis
Regression analysisRegression analysis
Regression analysis
 
Moments in statistics
Moments in statisticsMoments in statistics
Moments in statistics
 
Presentation On Regression
Presentation On RegressionPresentation On Regression
Presentation On Regression
 
Bar chart
Bar chartBar chart
Bar chart
 
Skewness
SkewnessSkewness
Skewness
 
Skewness & Kurtosis
Skewness & KurtosisSkewness & Kurtosis
Skewness & Kurtosis
 

Similar to Harmonic Mean for Monitored Rate Data

analysis of algorithms
analysis of algorithmsanalysis of algorithms
analysis of algorithmsMyMovies15
 
Assignment #4 questions and solutions-2013
Assignment #4 questions and solutions-2013Assignment #4 questions and solutions-2013
Assignment #4 questions and solutions-2013Darlington Etaje
 
Chapter_3-_Sensitivity-duality_-_students.pptx
Chapter_3-_Sensitivity-duality_-_students.pptxChapter_3-_Sensitivity-duality_-_students.pptx
Chapter_3-_Sensitivity-duality_-_students.pptxSIAWSINGONGKPMGuru
 
Data Structures and Algorithms Lecture 2: Analysis of Algorithms, Asymptotic ...
Data Structures and Algorithms Lecture 2: Analysis of Algorithms, Asymptotic ...Data Structures and Algorithms Lecture 2: Analysis of Algorithms, Asymptotic ...
Data Structures and Algorithms Lecture 2: Analysis of Algorithms, Asymptotic ...TechVision8
 
IRJET- Comparison for Max-Flow Min-Cut Algorithms for Optimal Assignment Problem
IRJET- Comparison for Max-Flow Min-Cut Algorithms for Optimal Assignment ProblemIRJET- Comparison for Max-Flow Min-Cut Algorithms for Optimal Assignment Problem
IRJET- Comparison for Max-Flow Min-Cut Algorithms for Optimal Assignment ProblemIRJET Journal
 
Alam afrizal tambahan
Alam afrizal tambahanAlam afrizal tambahan
Alam afrizal tambahanAlam Afrizal
 
Testing of Matrices Multiplication Methods on Different Processors
Testing of Matrices Multiplication Methods on Different ProcessorsTesting of Matrices Multiplication Methods on Different Processors
Testing of Matrices Multiplication Methods on Different ProcessorsEditor IJMTER
 
JGrass-NewAge LongWave radiation Balance
JGrass-NewAge LongWave radiation BalanceJGrass-NewAge LongWave radiation Balance
JGrass-NewAge LongWave radiation BalanceMarialaura Bancheri
 
Class lectures on Hydrology by Rabindra Ranjan Saha Lecture 3
Class lectures on Hydrology by Rabindra Ranjan Saha  Lecture 3Class lectures on Hydrology by Rabindra Ranjan Saha  Lecture 3
Class lectures on Hydrology by Rabindra Ranjan Saha Lecture 3World University of Bangladesh
 
Fpga implementation of optimal step size nlms algorithm and its performance a...
Fpga implementation of optimal step size nlms algorithm and its performance a...Fpga implementation of optimal step size nlms algorithm and its performance a...
Fpga implementation of optimal step size nlms algorithm and its performance a...eSAT Journals
 
Fpga implementation of optimal step size nlms algorithm and its performance a...
Fpga implementation of optimal step size nlms algorithm and its performance a...Fpga implementation of optimal step size nlms algorithm and its performance a...
Fpga implementation of optimal step size nlms algorithm and its performance a...eSAT Publishing House
 

Similar to Harmonic Mean for Monitored Rate Data (20)

Analysis of Algorithum
Analysis of AlgorithumAnalysis of Algorithum
Analysis of Algorithum
 
S2 pn
S2 pnS2 pn
S2 pn
 
Lec7
Lec7Lec7
Lec7
 
Lec7.ppt
Lec7.pptLec7.ppt
Lec7.ppt
 
Lec7.ppt
Lec7.pptLec7.ppt
Lec7.ppt
 
analysis of algorithms
analysis of algorithmsanalysis of algorithms
analysis of algorithms
 
Aggarwal Draft
Aggarwal DraftAggarwal Draft
Aggarwal Draft
 
Assignment #4 questions and solutions-2013
Assignment #4 questions and solutions-2013Assignment #4 questions and solutions-2013
Assignment #4 questions and solutions-2013
 
Schema anf
Schema anfSchema anf
Schema anf
 
Chapter_3-_Sensitivity-duality_-_students.pptx
Chapter_3-_Sensitivity-duality_-_students.pptxChapter_3-_Sensitivity-duality_-_students.pptx
Chapter_3-_Sensitivity-duality_-_students.pptx
 
Data Structures and Algorithms Lecture 2: Analysis of Algorithms, Asymptotic ...
Data Structures and Algorithms Lecture 2: Analysis of Algorithms, Asymptotic ...Data Structures and Algorithms Lecture 2: Analysis of Algorithms, Asymptotic ...
Data Structures and Algorithms Lecture 2: Analysis of Algorithms, Asymptotic ...
 
IRJET- Comparison for Max-Flow Min-Cut Algorithms for Optimal Assignment Problem
IRJET- Comparison for Max-Flow Min-Cut Algorithms for Optimal Assignment ProblemIRJET- Comparison for Max-Flow Min-Cut Algorithms for Optimal Assignment Problem
IRJET- Comparison for Max-Flow Min-Cut Algorithms for Optimal Assignment Problem
 
Alam afrizal tambahan
Alam afrizal tambahanAlam afrizal tambahan
Alam afrizal tambahan
 
Testing of Matrices Multiplication Methods on Different Processors
Testing of Matrices Multiplication Methods on Different ProcessorsTesting of Matrices Multiplication Methods on Different Processors
Testing of Matrices Multiplication Methods on Different Processors
 
Tai lieu ve khu mua vu x11.x12
Tai lieu ve khu mua vu x11.x12Tai lieu ve khu mua vu x11.x12
Tai lieu ve khu mua vu x11.x12
 
JGrass-NewAge LongWave radiation Balance
JGrass-NewAge LongWave radiation BalanceJGrass-NewAge LongWave radiation Balance
JGrass-NewAge LongWave radiation Balance
 
Class lectures on Hydrology by Rabindra Ranjan Saha Lecture 3
Class lectures on Hydrology by Rabindra Ranjan Saha  Lecture 3Class lectures on Hydrology by Rabindra Ranjan Saha  Lecture 3
Class lectures on Hydrology by Rabindra Ranjan Saha Lecture 3
 
Lecture 01 Measurements
Lecture 01 MeasurementsLecture 01 Measurements
Lecture 01 Measurements
 
Fpga implementation of optimal step size nlms algorithm and its performance a...
Fpga implementation of optimal step size nlms algorithm and its performance a...Fpga implementation of optimal step size nlms algorithm and its performance a...
Fpga implementation of optimal step size nlms algorithm and its performance a...
 
Fpga implementation of optimal step size nlms algorithm and its performance a...
Fpga implementation of optimal step size nlms algorithm and its performance a...Fpga implementation of optimal step size nlms algorithm and its performance a...
Fpga implementation of optimal step size nlms algorithm and its performance a...
 

Recently uploaded

2024 April Patch Tuesday
2024 April Patch Tuesday2024 April Patch Tuesday
2024 April Patch TuesdayIvanti
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxLoriGlavin3
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
Sample pptx for embedding into website for demo
Sample pptx for embedding into website for demoSample pptx for embedding into website for demo
Sample pptx for embedding into website for demoHarshalMandlekar2
 
Decarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityDecarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityIES VE
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxLoriGlavin3
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxLoriGlavin3
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersRaghuram Pandurangan
 
Scale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterScale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterMydbops
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxLoriGlavin3
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
Exploring ChatGPT Prompt Hacks To Maximally Optimise Your Queries
Exploring ChatGPT Prompt Hacks To Maximally Optimise Your QueriesExploring ChatGPT Prompt Hacks To Maximally Optimise Your Queries
Exploring ChatGPT Prompt Hacks To Maximally Optimise Your QueriesSanjay Willie
 
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentEmixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentPim van der Noll
 

Recently uploaded (20)

2024 April Patch Tuesday
2024 April Patch Tuesday2024 April Patch Tuesday
2024 April Patch Tuesday
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
Sample pptx for embedding into website for demo
Sample pptx for embedding into website for demoSample pptx for embedding into website for demo
Sample pptx for embedding into website for demo
 
Decarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityDecarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a reality
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptx
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information Developers
 
Scale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterScale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL Router
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
Exploring ChatGPT Prompt Hacks To Maximally Optimise Your Queries
Exploring ChatGPT Prompt Hacks To Maximally Optimise Your QueriesExploring ChatGPT Prompt Hacks To Maximally Optimise Your Queries
Exploring ChatGPT Prompt Hacks To Maximally Optimise Your Queries
 
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentEmixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
 

Harmonic Mean for Monitored Rate Data

  • 1. Harmonic Mean Aggregation Copyright © 2013 Performance Dynamics Aggregating Monitored Rate Data Using the Harmonic Mean Progressive notes developed in response to remarks that arose during the Monitorama Conference, Boston MA, March 28-29, 2013 Neil J. Gunther Performance Dynamics Company N.J. Gunther Last updated November 24, 2013 1
  • 2. Copyright © 2013 Performance Dynamics Harmonic Mean Aggregation Contents 1 Monitoring as Motivation 3 2 Meaning of the Means 8 3 Visual Explanation 13 4 Checking HM Correctness 20 5 Application to Time Series 24 6 Weighted Harmonic Mean 33 7 Accommodating Zero Rates 40 8 Conclusions 51 N.J. Gunther Last updated November 24, 2013 2
  • 3. Harmonic Mean Aggregation 1 N.J. Gunther Copyright © 2013 Performance Dynamics Monitoring as Motivation Last updated November 24, 2013 3
  • 4. Copyright © 2013 Performance Dynamics Harmonic Mean Aggregation During the presentations at Monitorama, we saw any number of monitored metrics displayed as a time series, like Fig. 1. Metric 50 000 40 000 30 000 20 000 10 000 0 200 400 600 800 1000 Time Figure 1: Typical time series display of a collected metric Eventually, we need to aggregate these data. N.J. Gunther Last updated November 24, 2013 4
  • 5. Harmonic Mean Aggregation Copyright © 2013 Performance Dynamics Aggregation Aggregation refers to averaging the monitored data on the boundary of some time period, T . Such boundaries might occur daily, weekly, monthly, etc. A more important question (that is often overlooked) is, what do we mean by averaging? The usual assumption is that aggregation means taking the statistical mean or, what is the same thing, taking the arithmetic average of all the metric values occurring in each period T . This may or may not be a valid assumption, depending on 2 things: 1. The type of metric being monitored 2. Whether the metric is sampled or an event Remark 1. The distinction b/w sampled metrics and event metrics was never delineated in any Monitorama presentations. More on this later. N.J. Gunther Last updated November 24, 2013 5
  • 6. Harmonic Mean Aggregation Copyright © 2013 Performance Dynamics Types of Metrics There are only 3 types of metrics (see my Keynote): 1. Time — the fundamental performance metric. Dimension [T ] Example measurement units: ns, weeks. 2. Counts — integer or decimal number. Dimensionless [φ] Example measurement units: subscriptions, RSS. 3. Rate — inverse time. Dimension [1/T ] or [T −1 ] Example measurement units: Gbps, MIPS. Definition 1. The throughput (X) is a rate metric type. It’s the number of work units completed (C) per unit time (T ): C (1) T Example 1. A web server handling C = 30, 000 httpGets every minute has an average throughput of X = 30000/60 = 500 Gets per second. X= N.J. Gunther Last updated November 24, 2013 6
  • 7. Harmonic Mean Aggregation Copyright © 2013 Performance Dynamics Graphite Workshop During the Graphite workshop, aggregating monitored rate data was mentioned. This caused me to interject the cautionary comment: The correct way to average rates (inverse-time metrics) is to apply the harmonic mean, not the arithmetic mean. At least that’s what the classic computer performance books tell you. See, e.g., Allen (Academic Press 1990) and Jain (Wiley 1991). I wasn’t emphatic about it b/c the examples in those textbooks do not refer to time series. Good thing b/c the usual form of the harmonic mean doesn’t work for time series! That’s what I’m going to address here. Goggle up; science ahead. N.J. Gunther Last updated November 24, 2013 7
  • 8. Harmonic Mean Aggregation 2 N.J. Gunther Copyright © 2013 Performance Dynamics Meaning of the Means Last updated November 24, 2013 8
  • 9. Harmonic Mean Aggregation Copyright © 2013 Performance Dynamics Meaning of the Means – AM Definition 2 (Arithmetic Mean). The sum on the numbers (iid rvs) divided by the number of numbers: X1 + X2 + . . . + XN = AM = N N k=1 Xk N (2) Example 2 (Arithmetic mean of the first 100 integers). AM = 1 + 2 + . . . + 100 50 × 101 = = 50.50 100 100 In R, the arithmetic mean is calculated simply as: > mean(1:100) [1] 50.5 N.J. Gunther Last updated November 24, 2013 9
  • 10. Copyright © 2013 Performance Dynamics Harmonic Mean Aggregation Meaning of the Means – HM Definition 3 (Harmonic Mean). The inverse of the arithmetic mean of the inverses (iid rvs): HM = 1 1 ( X1 N 1 1 + X2 + . . . + 1 XN ) = 1 N N k=1 1 Xk −1 (3) Example 3 (Harmonic mean of the first 100 integers). HM = 1+ 1 2 100 + ... + 1 100 = 19.28 Since the harmonic mean is not defined in the base R pkg, we write: > 100/sum(1/1:100) # matches Example 3 [1] 19.27756 or > 1/mean(1/1:100) # matches eqn.(3) [1] 19.27756 N.J. Gunther Last updated November 24, 2013 10
  • 11. Harmonic Mean Aggregation Copyright © 2013 Performance Dynamics The Ad Nauseam Example But how do we know when to apply the harmonic mean? The example used to illustrate the application of HM ad nauseam is a vehicle covering the same distance at different speeds. Example 4 (Variable speed trip). Suppose a car travels 100 miles from city A to city B at 100 mph. But, on the return journey the weather is bad, so the car is forced to travel at the slower speed of 50 mph. What is the average speed for the round trip? The total RTT time is 3 hrs b/c it takes 1 hr to go from A to B and 2 hrs to return at half the speed. If we assume the arithmetic mean of the speeds, the average speed is: AM = 1 (100 + 50) or 75 mph. But covering 200 miles at an average 2 speed of 75 mph would take 2 hrs 40 mins, not 3 hrs. Oops! N.J. Gunther Last updated November 24, 2013 11
  • 12. Copyright © 2013 Performance Dynamics Harmonic Mean Aggregation If, however, we apply the harmonic mean: HM = 1 1 1 ( 100 + 2 1 ) 50 we get an average speed of 662⁄3 mph. And covering 200 miles at an average speed of 662⁄3 mph does take 3 hrs. Remark 2. Notice that HM < AM. This is always true. In my Graphite workshop mini-talk, I gave the example of database reads and writes as corresponding to the two different IOPS rates or speeds executing the same number of IOs, analogous to the same distance. Proposition 1. The harmonic mean applies when the same amount of work is done at different rates. Another common example would be where you want to average the different throughput rates of the same benchmark measured on different speed processor systems. But benchmarking is not monitoring. N.J. Gunther Last updated November 24, 2013 12
  • 13. Harmonic Mean Aggregation 3 N.J. Gunther Copyright © 2013 Performance Dynamics Visual Explanation Last updated November 24, 2013 13
  • 14. Copyright © 2013 Performance Dynamics Harmonic Mean Aggregation Visual Explanation Metric 3.0 2.5 2.0 1.5 1.0 0.5 0 1 2 3 4 Time Figure 2: Invariant areas The blue and red areas are equal: 3h × 1w = 3w × 1h = 3 squares each. The areas represent the same count metric (C): distance, IOs, etc. N.J. Gunther Last updated November 24, 2013 14
  • 15. Copyright © 2013 Performance Dynamics Harmonic Mean Aggregation The AM Doesn’t Work Metric 3.0 2.5 AM 2.0 gap? 1.5 1.0 0.5 0 1 2 3 4 Time Figure 3: Yellow area corresponds to height AM = 2 Since the yellow area of 6 squares, corresponding to a height AM = 2 [AM = 1 (3 + 1)], is only 3 squares wide, there is a gap 1 square wide. 2 N.J. Gunther Last updated November 24, 2013 15
  • 16. Copyright © 2013 Performance Dynamics Harmonic Mean Aggregation Correcting the AM Area Metric 3.0 2.5 2.0 AM 1.5 HM 1.0 0.5 0 1 2 3 4 Time Figure 4: Squashing the yellow area into the green area The green area of 6 squares, corresponding to a height HM = 1.5 [HM = 2 × 3/(3 + 1)], now has the correct width (total time). N.J. Gunther Last updated November 24, 2013 16
  • 17. Copyright © 2013 Performance Dynamics Harmonic Mean Aggregation Covering All the Columns Metric 3.0 2.5 2.0 AM 1.5 HM 1.0 0.5 0 1 2 3 4 Time Figure 5: Harmonic column height (HM) of width 4 units The original blue and red areas correspond to histogram columns of different widths. The green HM column has the correct total width. N.J. Gunther Last updated November 24, 2013 17
  • 18. Copyright © 2013 Performance Dynamics Harmonic Mean Aggregation Does AM Ever Work? Yes. The AM is applicable when columns have uniform width. Metric 2.0 Metric 2.0 AM 1.5 AM 1.5 1.0 1.0 0.5 0.5 0.0 0.5 1.0 1.5 2.0 2.5 Time 0.0 0.5 1.0 1.5 2.0 2.5 Time Figure 6: AM works for uniform column widths Most common case and why statisticians use the AM for statistical mean. And why the HM is not in the base R package. N.J. Gunther Last updated November 24, 2013 18
  • 19. Harmonic Mean Aggregation Copyright © 2013 Performance Dynamics Time Bin Widths The count per unit time constitutes a rate metric (X = C/T ). Proposition 2. The harmonic mean (HM) applies to histograms with columns having the same areas (counts) but different widths . In the case of monitored data, these different widths constitute different time bins. This case is most likely to occur with asynchronous event data. Proposition 3. Since the event counts (C) occur in time (T) on the x-axis, the y-axis must be a rate metric, e.g. throughput X = C/T . Events per unit time. Proposition 4. The arithmetic mean (AM) applies to histograms with columns having the same widths but different areas (counts). That turns out to be the most common case b/c the monitored data are sampled on equal periodic boundaries, like the ticks of a metronome. N.J. Gunther Last updated November 24, 2013 19
  • 20. Harmonic Mean Aggregation 4 N.J. Gunther Copyright © 2013 Performance Dynamics Checking HM Correctness Last updated November 24, 2013 20
  • 21. Copyright © 2013 Performance Dynamics Harmonic Mean Aggregation Checking the Correctness of HM Recalling eqn. (3) for N periods: HM = = 1 X1 C X1 + 1 X2 N + ... + NC C + X2 + . . . + 1 XN C XN (4) We’ve simply multiplied each interval by the constant count C, as is appropriate for HM. Substituting the definition of throughput from eqn. (1) produces: HM = NC T1 + T2 + . . . + TN (5) which agrees with the notion Average (harmonic) rate = N.J. Gunther Total counts Total time Last updated November 24, 2013 (6) 21
  • 22. Harmonic Mean Aggregation Copyright © 2013 Performance Dynamics Remark 3. The same counts per period (C), completed at different rates (Xk ) in the denominator of eqn. (4), are responsible for producing the nonuniform time intervals (Tk ) in the denominator of HM in eqn. (5). Theorem 1 (When is HM = AM?). If Tk intervals are the same, as they are with sampled data, the counts per sample will be different, i.e., will have different rates per sample, and HM reduces to AM. Proof 1. Under these conditions, eqn. (5) for the HM becomes 1 N C1 + C2 + . . . + CN C1 + C2 + . . . + CN = T + T + ... + T NT C1 C2 CN X1 + X2 + . . . + XN + + ... + = T T T N But this is precisely the definition of AM given by eqn. (2). N.J. Gunther Last updated November 24, 2013 22
  • 23. Copyright © 2013 Performance Dynamics Harmonic Mean Aggregation Checking the Examples We can use eqn. (6) to check that HM is the right type of average. 1. Example 4 with N = 2 speeds (X1 = 100 mph, X2 = 50 mph) over the same distance (C = 100 miles): HM = 1 1 1 ( 100 + 2 1 ) 50 = 662⁄3 mph 200 miles Total counts = = 66.67 mph Total time 3 hrs 2. Visual HM example with different column widths: HM = 1 3 = units high 1 1 2 ( + 1) 2 3 1 Total counts 6 squares = = 1.5 units high Total time 4 units N.J. Gunther Last updated November 24, 2013 23
  • 24. Harmonic Mean Aggregation 5 N.J. Gunther Copyright © 2013 Performance Dynamics Application to Time Series Last updated November 24, 2013 24
  • 25. Copyright © 2013 Performance Dynamics Harmonic Mean Aggregation Monitored Subscription Rates Rate 4000 3000 2000 1000 0 5 10 15 20 25 30 35 Time Figure 7: Real data: subscription rates over 33 days Days 9.24932 18.663 27.4192 30.2493 33.0007 Rate N.J. Gunther 0 0.00 1081.16 1062.28 1142.05 3533.40 3634.56 Last updated November 24, 2013 25
  • 26. Copyright © 2013 Performance Dynamics Harmonic Mean Aggregation Irregular Time Boundaries Rate 4000 3000 2000 1000 0 5 10 15 20 25 30 35 Time Figure 8: Since the time-series data are not sampled but triggered on 10,000 subscriptions, the data points do not fall on regular time boundaries. N.J. Gunther Last updated November 24, 2013 26
  • 27. Copyright © 2013 Performance Dynamics Harmonic Mean Aggregation Rates as Column Heights Rate 4000 3000 2000 1000 0 5 10 15 20 25 30 35 Time Figure 9: Irregular time intervals are more easily discerned in a columnated format. We want to aggregate these data into a single datum. N.J. Gunther Last updated November 24, 2013 27
  • 28. Copyright © 2013 Performance Dynamics Harmonic Mean Aggregation The numerical subscription rates (Xk ) are: X1 X2 X3 X4 X5 X6 0.00 1081.16 1062.28 1142.05 3533.40 3634.56 Using R, the AM and HM are: > hmean <- function(vals) { 1/mean(1/vals) } > rates <- c(0.00 1081.16 1062.28 1142.05 3533.40 3634.56) > mean(rates) # AM [1] 1742.242 > hmean(rates) # HM [1] 0 The AM evaluates but the HM fails. Why? From eqn. (5) the HM is HM = 1 1 ( 6 0.0 + 1 1081.16 + 1 1062.28 1 1 + 1142.05 + 1 3533.40 + 1 ) 3634.56 (7) But the first term in the denominator is infinite and dominates all the other values. The final inversion “1/∞” produces HM = 0. N.J. Gunther Last updated November 24, 2013 28
  • 29. Copyright © 2013 Performance Dynamics Harmonic Mean Aggregation Let’s Try That Again We don’t need the first data-point. Treat it as the origin of the time period associated with the X2 data point. To drop it in R, we write: > rates[-1] [1] 1081.16 1062.28 1142.05 3533.40 3634.56 > hmean(rates[-1]) [1] 1515.118 which is non-zero and less than AM. That’s encouraging. Alternatively, we can evaluate HM explicitly as > length(rates[-1])/sum(1/rates[-1]) [1] 1515.118 Note that the numerator is now 5 rather than 6 > length(rates[-1]) [1] 5 due to dropping the first value. N.J. Gunther Last updated November 24, 2013 29
  • 30. Copyright © 2013 Performance Dynamics Harmonic Mean Aggregation Check the HM Value The measured rates were triggered on a count of 10,000 per period. The total count is therefore C = 5 × 10, 000 subscriptions. The total time period is T = 33.0007 days.a From eqn. (6) the time-averaged harmonic rate is: XHM = C 50, 000 = = 1515.12 T 33.0007 which agrees with hmean(rates[-1]) on the previous page. Alternatively, only the HM gives the correct total time window T = C 50, 000 = = 33.0007 XHM 1515.12 in agreement with the concept shown in Figure 5. a Don’t pay too much attention the decimal digits. I’m only displaying them for consistency and readability. N.J. Gunther Last updated November 24, 2013 30
  • 31. Copyright © 2013 Performance Dynamics Harmonic Mean Aggregation AM and HM for Subscription Data Rate 4000 3000 2000 AM HM 1000 0 0 5 10 15 20 25 30 35 Time Figure 10: The AM and HM represent the average subscription rate and therefore correspond to different positions on the y-axis. But, only the HM gives the correct total time window of 33 days. N.J. Gunther Last updated November 24, 2013 31
  • 32. Copyright © 2013 Performance Dynamics Harmonic Mean Aggregation The Aggregated HM Value Rate 4000 3000 2000 1000 0 0 5 10 15 20 25 30 35 Time Figure 11: The HM is the big blue dot that correctly replaces these subscription-rate data for this time bin (33 days) when they are aggregated N.J. Gunther Last updated November 24, 2013 32
  • 33. Harmonic Mean Aggregation 6 N.J. Gunther Copyright © 2013 Performance Dynamics Weighted Harmonic Mean Last updated November 24, 2013 33
  • 34. Copyright © 2013 Performance Dynamics Harmonic Mean Aggregation Weighted Harmonic Mean Recalling Example 4, we consider the following generalization of the HM. Definition 4 (Weighted Harmonic Mean). WHM = where the total weight W = 1 W w ( X1 1 k 1 w + X2 + . . . + 2 wk XN ) (8) wk . Example 5 (Variable speed over different distances). A car travels 50 miles at 40 mph, 60 miles at 50 mph and 40 miles at 60 mph. What is the average speed of the trip? The distance weights are: w1 = 50, w2 = 60, w3 = 40. Substituting into eqn. 8 yields: 50 + 60 + 40 WHM = 50 = 48.13 mph + 60 + 40 40 50 60 N.J. Gunther Last updated November 24, 2013 34
  • 35. Copyright © 2013 Performance Dynamics Harmonic Mean Aggregation Significance of the WHM Check the preceding calculation in R: > wts <- c(50, 60, 40) > rates <- c(40, 50, 60) > sum(wts)/(sum(wts/rates)) [1] 48.12834 The counts per period were constant in both Example 4 (Ck = 100 miles) and the example in Section 5 (Ck = 10, 000 subscribers). Proposition 5. The WHM allows us to calculate HM when counts per period are distributed arbitrarily within the aggregation time window. Eqn. (8) can be rewritten with weights as percentages: WHM = 1 % ( w11 X + w2 % X2 + ... + wk % ) XN (9) where wk % = wk /W . N.J. Gunther Last updated November 24, 2013 35
  • 36. Copyright © 2013 Performance Dynamics Harmonic Mean Aggregation Determining the Percentage Weights The percentage weights can be obtained directly from monitored data using the following steps: 1. Each rate data point rk has an associated time increment ∆tk 2. The product wk = rk × ∆tk is the raw weight (area) for data point k 3. The total weight is W = wk (total area) wk (fraction of total area) 4. The percentage weight is wk % = W k In R, we can write the above calculation as a function with 2 args: wtspc <- function(rates, tdeltas) { weights <- rates * tdeltas totalwt <- sum(weights) return(weights / totalwt) } N.J. Gunther Last updated November 24, 2013 36
  • 37. Copyright © 2013 Performance Dynamics Harmonic Mean Aggregation Application of WHM to Time Series Rate 70 60 50 40 30 20 10 0 100 200 300 400 500 Time Figure 12: Monitored rates for application “GAM” Aggregation window size is 60 samples with T = 558.83 units N.J. Gunther Last updated November 24, 2013 37
  • 38. Copyright © 2013 Performance Dynamics Harmonic Mean Aggregation > gamrates [1] 18.68 10.77 16.60 19.69 1.95 22.53 4.99 [13] 6.51 6.80 22.19 4.35 3.90 3.16 9.98 [25] 8.30 6.16 11.93 63.95 21.63 11.37 5.31 [37] 3.35 3.69 6.18 17.51 21.79 8.99 11.83 [49] 14.93 6.38 4.21 3.25 31.02 17.10 20.49 2.50 7.91 5.25 48.49 5.48 3.49 8.26 4.54 3.85 10.66 5.21 5.26 4.96 2.71 4.58 9.73 1.95 3.88 4.02 5.08 5.67 8.49 8.86 6.94 3.70 > gamdeltas [1] 3.03 4.95 3.59 2.88 30.12 2.66 11.98 21.35 6.47 11.30 5.32 8.94 [13] 8.95 7.42 2.70 12.48 14.06 15.99 5.98 10.68 1.16 10.48 29.67 6.55 [25] 6.40 9.17 4.23 0.85 2.57 4.87 9.67 10.14 16.40 11.39 13.24 6.05 [37] 16.44 16.08 9.41 3.25 2.32 5.67 4.60 7.12 12.96 20.98 12.67 7.48 [49] 3.60 8.18 12.65 16.33 1.89 2.95 2.58 14.48 5.19 12.70 9.87 15.77 Using eqn. (9) and our R function wtspc() we find: > (whm.gam <- 1 / sum(wtspc(gamrates, gamdeltas) / gamrates)) [1] 5.913534 Check WHM value produces the correct total time T = 558.83 units: > sum(gamdeltas) [1] 558.827 > sum(gamdeltas*gamrates) / whm.gam [1] 558.827 N.J. Gunther Last updated November 24, 2013 38
  • 39. Copyright © 2013 Performance Dynamics Harmonic Mean Aggregation WHM Aggregation Result Rate 70 60 50 40 30 20 10 0 100 200 300 400 500 Time Figure 13: WHM aggregation of monitored “GAM” rates in Fig. 12 N.J. Gunther Last updated November 24, 2013 39
  • 40. Harmonic Mean Aggregation 7 N.J. Gunther Copyright © 2013 Performance Dynamics Accommodating Zero Rates Last updated November 24, 2013 40
  • 41. Copyright © 2013 Performance Dynamics Harmonic Mean Aggregation Handling Zeros in the Time Series WHM in Sect. 6 worked b/c those data did not contain any zero rate values. However, with HM of eqn. (7) we already saw that 1 → ∞ as X → 0 X Since that single value dominates all the other nonzero terms in the denominator of HM, the final inversion produces an overall zero value: HM = 1 → 0 as X → 0 1/X The same is true for WHM in eqn. (9). This dooms the algorithmic use of WHM for general time series. Since monitored rate metrics can be expected to include zero values in any aggregation period, we need a way to accommodate them. N.J. Gunther Last updated November 24, 2013 41
  • 42. Copyright © 2013 Performance Dynamics Harmonic Mean Aggregation Example 6 (Toy sample rates with zero values). X1 = 0, X2 = 100, X3 = 100, X4 = 0, X5 = 100 The standard harmonic mean (3) produces the result HM = 0. > zr <- c(0,100,100,0,100) > hmean(zr) [1] 0 Some possible remedies: Ignore zero values: Pretend the zeros don’t exist and there are only 3 (positive) data values. HM3,3 = 3/3 1/3 100 + 1/3 100 + 1/3 100 = 100 (10) Drop zero values: Retain 3 of 5 positive values with weights of 1/5. HM3,5 = 3/5 1/5 100 + 1/5 100 + 1/5 100 = 100 (11) Surprise! Ignoring == Dropping N.J. Gunther Last updated November 24, 2013 42
  • 43. Copyright © 2013 Performance Dynamics Harmonic Mean Aggregation But Wait! It Gets Worse HM3,3 and HM3,5 are both identical to the arithmetic mean! We can check this in R: > zr[-which(zr==0)] # drop zeros [1] 100 100 100 > zpos <- zr[-which(zr==0)] > hmean(zpos) # HM [1] 100 > mean(zpos) # AM [1] 100 Proposition 6. Naively including zero rates produces HM = 0. FAIL Proposition 7. Naively dropping zero rates produces the AM. FAIL N.J. Gunther Last updated November 24, 2013 43
  • 44. Copyright © 2013 Performance Dynamics Harmonic Mean Aggregation A More Careful Approach We want to find an algorithm that produces 0 < HM < 100 for Example 6 by accounting for all 5 data points, but not overbiasing due to the presence of zero values. Conjecture 1. The zeros in X1 , X4 have weights 1/5 each. Ignore those terms in the harmonic sum but redistribute their weights across the weights of the remaining non-zero terms X2 , X3 , X5 . Each term in the harmonic sum has a weight of 1/5. The 2 zero terms have a total weight of 2/5. Adding a third of that total zero-term weight to each of the positive-term weights produces a new weight: 1 3 N.J. Gunther 2 5 + 1 5 Last updated November 24, 2013 44
  • 45. Copyright © 2013 Performance Dynamics Harmonic Mean Aggregation Now, eqn. (11) becomes 3/5 (2/5)/3 + 1/5 100 + (2/5)/3 + 1/5 100 + (12) (2/5)/3 + 1/5 100 In addition, each weight simplifies further as 1 3 2 5 + 1 1 = 5 3 2 5 + 1 5 3 3 = 2 3 1 5 + 1 5 3 3 = 1 3 Hence, (12) reduces to 3/5 1/3 100 + 1/3 100 + 1/3 100 = 60 (13) which is less than the AM, but not zero, and thus meets our requirement. Eqn. (13) for the zero-renormalized harmonic mean has the form ZRHM5,2 = 3 HM3,3 5 (14) where HM3,3 is the same as eqn. (10). N.J. Gunther Last updated November 24, 2013 45
  • 46. Copyright © 2013 Performance Dynamics Harmonic Mean Aggregation The ZRHM Theorem Since the 2nd factor in the RHS of eqn. (14) is the usual HM, it could also be extended to include weighted terms (w%) for irregular counts per time interval as defined by the WHM. See eqn. (9) in Section 6. We can now write a general formula for calculating the harmonic mean of arbitrary rate data. Theorem 2 (Zero Renormalized Harmonic Mean). NZ ZRHM = NW 1 NZ NZ k=1 w% Xk −1 (15) where NW is the total number of data points in the aggregation window, N0 is the number of zeros and NZ = NW − N0 . (cf. eqn. (3)) Proof 2. See preceding discussion. N.J. Gunther Last updated November 24, 2013 46
  • 47. Harmonic Mean Aggregation Copyright © 2013 Performance Dynamics The ZRHM Algorithm The following R function implements eqn. (15) of Thm 2 with uniform weights. zrhm <- function(tsrates) { ndatas <- length(tsrates) nzeros <- length(which(tsrates == 0)) pozdata <- tsrates[which(tsrates != 0)] nozwt <- (ndatas - nzeros) / ndatas nozhm <- 1 / mean(1 / pozdata) return(nozwt * nozhm) } It takes an arbitrary time series, tsrates, of monitored rate data as its argument (including zero values) and returns the ZRHM. N.J. Gunther Last updated November 24, 2013 47
  • 48. Copyright © 2013 Performance Dynamics Harmonic Mean Aggregation Test Cases Toy rate data: From Example 6 > zr [1] 0 100 100 > zrhm(zr) [1] 60 0 100 which agrees with the manually calculated result. Subscription data: From Section 5 > sub.rates [1] 0.00 1081.16 1062.28 1142.05 3533.40 3634.56 > hmean(sub.rates) [1] 0 > hmean(sub.rates[-1]) [1] 1515.118 > zrhm(sub.rates) [1] 1262.599 The result, HM−1 = 1515.118, is obtained by not including the zero value at the origin. When that value is included, ZRHM < HM−1 , as expected, but ZRHM > 0, unlike HM = 0. N.J. Gunther Last updated November 24, 2013 48
  • 49. Copyright © 2013 Performance Dynamics Harmonic Mean Aggregation Arbitrary Time Series Fig. 14 shows a time series of 1000 rate values ranging b/w 0 and 100. It contains 7 zero values whose locations in time are not known a priori. Rate 100 80 60 40 20 200 400 600 800 1000 Time Figure 14: AM = 50.93, HM = 0, ZRHM = 22.03 N.J. Gunther Last updated November 24, 2013 49
  • 50. Harmonic Mean Aggregation Copyright © 2013 Performance Dynamics ZRHM Summary • ZRHM is especially useful if a threshold is defined as a lower bound, e.g., cache hit-rate, video bit-rate, b/c ZRHM is biased toward smaller rather than larger values. • For a string of contiguous zero values can be treated as boundaries b/w smaller aggregation windows. Take the 1st zero as defining the end of a aggregation window, last zero as the beginning of next aggregation window. • No longer need to confirm the total time T from subareas. N.J. Gunther Last updated November 24, 2013 50
  • 51. Copyright © 2013 Performance Dynamics Harmonic Mean Aggregation 8 N.J. Gunther Conclusions Last updated November 24, 2013 51
  • 52. Harmonic Mean Aggregation Copyright © 2013 Performance Dynamics What We Have Learned • We compared AM vs HM averaging for monitored rate data. Conventional wisdom says HM is the correct way to average rate metrics. [See Example 4] But, for monitored data... • HM assumes counts in each time bin are equal but bins have different widths. Async event data (intermittent) triggered on a common count criterion, e.g., every 1000 subscriptions. • Otherwise, if time bins have same width, as with data collected on same sample interval, HM = AM. [See Thm 1] • HM fails if any rate measurement is zero. [See slide 41] Compensate by using ZRHM. [See Thm 2] • Since HM < AM, ZRHM is useful for detecting monitored rate falls to a lower bound. N.J. Gunther Last updated November 24, 2013 52
  • 53. Copyright © 2013 Performance Dynamics Harmonic Mean Aggregation When Should I Use the Harmonic Mean? You should use the HM, or more accurately ZRHM, to aggregate monitored data when all of the following criteria apply: R — Rate metric A — Async time intervals T — Too low data values are of interest E — Event data, not sampled data Example metrics: • Cache-hit rate • Video bit-rate • Call center service N.J. Gunther Last updated November 24, 2013 53