Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
A Melange of Methods for Manipulating Monitored
Data
Converging on Consistency
Neil Gunther @DrQz
en.wikipedia.org/wiki/Ne...
Introductions
c 2014 Performance Dynamics A Melange of Methods for Manipulating Monitored Data May 6, 2014 2 / 52
c 2014 Performance Dynamics A Melange of Methods for Manipulating Monitored Data May 6, 2014 3 / 52
I didn’t do Monitorama Berlin
c 2014 Performance Dynamics A Melange of Methods for Manipulating Monitored Data May 6, 2014...
I didn’t do Monitorama Berlin
I didn’t get the memo about plane crashes
c 2014 Performance Dynamics A Melange of Methods f...
I didn’t do Monitorama Berlin
I didn’t get the memo about plane crashes
Sorry... Deal with it
c 2014 Performance Dynamics ...
I didn’t do Monitorama Berlin
I didn’t get the memo about plane crashes
Sorry... Deal with it
SFO runway 28L, 11:28 a.m., ...
I didn’t do Monitorama Berlin
I didn’t get the memo about plane crashes
Sorry... Deal with it
SFO runway 28L, 11:28 a.m., ...
I didn’t do Monitorama Berlin
I didn’t get the memo about plane crashes
Sorry... Deal with it
SFO runway 28L, 11:28 a.m., ...
“Asiana pilots appear to be overly reliant on instrument-guided landings and lack the
training to touch down manually.” —S...
A Message from Your Sponsors
Don’t be too reliant on your instruments (strip charts, colored dials, shiny things)
c 2014 P...
Consistency
1 It’s not about pretty pictures
c 2014 Performance Dynamics A Melange of Methods for Manipulating Monitored D...
Consistency
1 It’s not about pretty pictures
2 It’s not about whiz bang tools
c 2014 Performance Dynamics A Melange of Met...
Consistency
1 It’s not about pretty pictures
2 It’s not about whiz bang tools
3 It’s not about fancy math
c 2014 Performan...
Consistency
1 It’s not about pretty pictures
2 It’s not about whiz bang tools
3 It’s not about fancy math
4 Data are usual...
Consistency
1 It’s not about pretty pictures
2 It’s not about whiz bang tools
3 It’s not about fancy math
4 Data are usual...
Consistency
1 It’s not about pretty pictures
2 It’s not about whiz bang tools
3 It’s not about fancy math
4 Data are usual...
Consistency
1 It’s not about pretty pictures
2 It’s not about whiz bang tools
3 It’s not about fancy math
4 Data are usual...
The Greatest Scatter Plot
Topics
1 The Greatest Scatter Plot
2 Irregular Time Series
3 The Power of Power Laws
Zipf’s Law ...
The Greatest Scatter Plot
The Greatest Scatter Plot
c 2014 Performance Dynamics A Melange of Methods for Manipulating Moni...
The Greatest Scatter Plot
Goggle up! Science ahead...
c 2014 Performance Dynamics A Melange of Methods for Manipulating Mo...
The Greatest Scatter Plot
Some Monitored Data
5 10 15 20
0.00.51.01.52.0
Time
Metric1
5 10 15 20
-2002006001000
Time
Metri...
The Greatest Scatter Plot
Scatter Plot
0.0 0.5 1.0 1.5 2.0
05001000
Metric 1
Metric2
Are Metric 1 and Metric 2 related in ...
The Greatest Scatter Plot
Linear Regression
0.0 0.5 1.0 1.5 2.0
05001000
Metric 1
Metric2
LSQ fit: Metric2 = 423.94 Metric1...
The Greatest Scatter Plot
This is Not the End
This is just the beginning
Need to reach consistency
1 Is the linear fit stil...
The Greatest Scatter Plot
The most important scatter plot in history (1929)
le on the expanding universe appeared in PNAS ...
The Greatest Scatter Plot
Astronomer Edwin Hubble 1929
1 Is the linear fit still a reasonable choice?
Edwin Hubble suspecte...
The Greatest Scatter Plot
Astronomer Edwin Hubble 1929
1 Is the linear fit still a reasonable choice?
Edwin Hubble suspecte...
The Greatest Scatter Plot
Astronomer Edwin Hubble 1929
1 Is the linear fit still a reasonable choice?
Edwin Hubble suspecte...
The Greatest Scatter Plot
Astronomer Edwin Hubble 1929
1 Is the linear fit still a reasonable choice?
Edwin Hubble suspecte...
The Greatest Scatter Plot
0.0 0.5 1.0 1.5 2.0
05001000
Hubble's 1929 Corrected Data
Galactic distance (Mpc)
Recessionalvel...
The Greatest Scatter Plot
0.0 0.5 1.0 1.5 2.0
05001000
Hubble's 1929 Corrected Data
Galactic distance (Mpc)
Recessionalvel...
The Greatest Scatter Plot
Pay Day 2003
l
w
d
a
d
‘a
e
T
l
a
w
p
n
Z
P
t
a
t
p
e
h
Fig. 3. The Hubble diagram for type Ia s...
The Greatest Scatter Plot
Consistency
1 Hubble took some static for his 1929 paper
2 Couldn’t reach consistency and had to...
The Greatest Scatter Plot
Consistency
1 Hubble took some static for his 1929 paper
2 Couldn’t reach consistency and had to...
Irregular Time Series
Topics
1 The Greatest Scatter Plot
2 Irregular Time Series
3 The Power of Power Laws
Zipf’s Law of W...
Irregular Time Series
Irregular Time Series
c 2014 Performance Dynamics A Melange of Methods for Manipulating Monitored Da...
Irregular Time Series
Aggregating Time Series
1
Regular sample intervals:
Samples on tick of a metronome
Computer performa...
Irregular Time Series
Back to Monitorama Boston 2013
Aggregation always assumes the arithmetic mean (AM)
Aggregation of ir...
Irregular Time Series
Back to Monitorama Boston 2013
Aggregation always assumes the arithmetic mean (AM)
Aggregation of ir...
Irregular Time Series
Equal Intervals
AM
0.0 0.5 1.0 1.5 2.0 2.5
Time
0.5
1.0
1.5
2.0
Metric
Heights : hblue = 1 and hred ...
Irregular Time Series
Arithmetic Mean of Heights
AM
0.0 0.5 1.0 1.5 2.0 2.5
Time
0.5
1.0
1.5
2.0
Metric
AM =
1
2
hblue +
1...
Irregular Time Series
Unequal Intervals (Area = 6)
0 1 2 3 4
Time
0.5
1.0
1.5
2.0
2.5
3.0
Metric
Heights : hblue = 3 and h...
Irregular Time Series
AM Leaves a Gap (Area = 6)
AM
gap?
0 1 2 3 4
Time
0.5
1.0
1.5
2.0
2.5
3.0
Metric
AM =
1
2
hblue +
1
...
Irregular Time Series
Stretch the Rectangle (Area = 6, Width = 4)
AM
HM
0 1 2 3 4
Time
0.5
1.0
1.5
2.0
2.5
3.0
Metric
HM =...
Irregular Time Series
Lowers the Height
AM
HM
0 1 2 3 4
Time
0.5
1.0
1.5
2.0
2.5
3.0
Metric
Theorem
HM < AM
Harmonic mean ...
Irregular Time Series
Monitored Subscription Rates
Samples only occur when subscription count reaches 10,000.
Sampling int...
Irregular Time Series
Consistency
Use HM to aggregate monitored data when the following criteria apply:
R — Rate metric (o...
The Power of Power Laws
Topics
1 The Greatest Scatter Plot
2 Irregular Time Series
3 The Power of Power Laws
Zipf’s Law of...
The Power of Power Laws
The Power of Power Laws
c 2014 Performance Dynamics A Melange of Methods for Manipulating Monitore...
The Power of Power Laws Zipf’s Law of Words
Example 1: Zipf’s Law
Ranked data is 1000 most common wordforms in UK English ...
The Power of Power Laws Zipf’s Law of Words
Linear Axes
050000100000150000200000
Ranked 1000 UK English Words
Ranked words...
The Power of Power Laws Zipf’s Law of Words
Log-Log Axes
5e+022e+035e+032e+045e+042e+05
Ranked 1000 UK English Words
Ranke...
The Power of Power Laws Zipf’s Law of Words
Regression Fit
5e+022e+035e+032e+045e+042e+05
Ranked 1000 UK English Words
Ran...
The Power of Power Laws Zipf’s Law of Words
Consistency
Log axes are word frequency (y) and ranked word order (x):
log(y) ...
The Power of Power Laws Zipf’s Law of Words
Consistency
Log axes are word frequency (y) and ranked word order (x):
log(y) ...
The Power of Power Laws Database Query Times
Example 2: Database Query Times
0 100 200 300 400 500
0100200300400
Index
ora...
The Power of Power Laws Database Query Times
Visualize Ranked Data
0 100 200 300 400 500
0100200300400
Ranked SQL Times
In...
The Power of Power Laws Database Query Times
Try Double-Log Visualization
1 2 5 10 20 50 100 200 500
0.10.51.05.050.0500.0...
The Power of Power Laws Database Query Times
Three Data Windows
1 2 5 10 20 50 100
100200300400500
Log-Log of SQL-A Times
...
The Power of Power Laws Database Query Times
Regression Analysis
1 2 5 10 20 50 100
100200300400500
Log-Log SQL A-Times
In...
The Power of Power Laws Database Query Times
Consistency
1 2 5 10 20 50 100
100200300400500
Log-Log SQL A-Times
Index
etA
...
The Power of Power Laws Eleventh Hour Spikes
Example 3: Eleventh Hour Spikes
All Australian businesses were required to re...
The Power of Power Laws Eleventh Hour Spikes
Complete Time Series
11 3 2000 21 4 2000 21 5 2000 15 6 2000 10 7 2000 4 8 20...
The Power of Power Laws Eleventh Hour Spikes
Complete Time Series
11 3 2000 21 4 2000 21 5 2000 15 6 2000 10 7 2000 4 8 20...
The Power of Power Laws Eleventh Hour Spikes
Semi-Log Plot
11 3 2000 21 4 2000 21 5 2000
1 104
2 104
5 104
1 105
2 105
5 1...
The Power of Power Laws Eleventh Hour Spikes
Statistical Regression on Peaks
11 3 2000 21 4 2000
1 104
2 104
5 104
1 105
2...
The Power of Power Laws Eleventh Hour Spikes
Trend on Linear Axes
11 3 2000 21 4 2000 21 5 2000 15 6 2000 10 7 2000
0
2000...
The Power of Power Laws Eleventh Hour Spikes
Power Law Fit
Exp growth
Power law
11 3 2000 21 4 2000 21 5 2000 15 6 2000 10...
The Power of Power Laws Eleventh Hour Spikes
Consistency
Log-log plots are an easy way to test for power law distributions...
The Power of Power Laws Eleventh Hour Spikes
Consistency
Log-log plots are an easy way to test for power law distributions...
The Power of Power Laws Eleventh Hour Spikes
Consistency
Log-log plots are an easy way to test for power law distributions...
The Power of Power Laws Eleventh Hour Spikes
Performance Dynamics Company
Castro Valley, California
www.perfdynamics.com
p...
Upcoming SlideShare
Loading in …5
×

Monitorama14: A Melange of Methods for Manipulating Monitored Data

3,948 views

Published on

Discusses The Greatest Scatter Plot (Hubble 1929), Irregular Time Series (Harmonic Mean), Zipf’s Law of Words, Oracle Query Times, and Eleventh Hour Spikes.

Monitorama14: A Melange of Methods for Manipulating Monitored Data

  1. 1. A Melange of Methods for Manipulating Monitored Data Converging on Consistency Neil Gunther @DrQz en.wikipedia.org/wiki/Neil_J._Gunther Performance Dynamics Monitorama PDX May 6, 2014 SM c 2014 Performance Dynamics A Melange of Methods for Manipulating Monitored Data May 6, 2014 1 / 52
  2. 2. Introductions c 2014 Performance Dynamics A Melange of Methods for Manipulating Monitored Data May 6, 2014 2 / 52
  3. 3. c 2014 Performance Dynamics A Melange of Methods for Manipulating Monitored Data May 6, 2014 3 / 52
  4. 4. I didn’t do Monitorama Berlin c 2014 Performance Dynamics A Melange of Methods for Manipulating Monitored Data May 6, 2014 3 / 52
  5. 5. I didn’t do Monitorama Berlin I didn’t get the memo about plane crashes c 2014 Performance Dynamics A Melange of Methods for Manipulating Monitored Data May 6, 2014 3 / 52
  6. 6. I didn’t do Monitorama Berlin I didn’t get the memo about plane crashes Sorry... Deal with it c 2014 Performance Dynamics A Melange of Methods for Manipulating Monitored Data May 6, 2014 3 / 52
  7. 7. I didn’t do Monitorama Berlin I didn’t get the memo about plane crashes Sorry... Deal with it SFO runway 28L, 11:28 a.m., July 6, 2013 c 2014 Performance Dynamics A Melange of Methods for Manipulating Monitored Data May 6, 2014 3 / 52
  8. 8. I didn’t do Monitorama Berlin I didn’t get the memo about plane crashes Sorry... Deal with it SFO runway 28L, 11:28 a.m., July 6, 2013 Asiana Airlines Flight 214 landing arse-backwards c 2014 Performance Dynamics A Melange of Methods for Manipulating Monitored Data May 6, 2014 3 / 52
  9. 9. I didn’t do Monitorama Berlin I didn’t get the memo about plane crashes Sorry... Deal with it SFO runway 28L, 11:28 a.m., July 6, 2013 Asiana Airlines Flight 214 landing arse-backwards (sans tail) c 2014 Performance Dynamics A Melange of Methods for Manipulating Monitored Data May 6, 2014 3 / 52
  10. 10. “Asiana pilots appear to be overly reliant on instrument-guided landings and lack the training to touch down manually.” —SFO Commissioner Eleanor Johns c 2014 Performance Dynamics A Melange of Methods for Manipulating Monitored Data May 6, 2014 4 / 52
  11. 11. A Message from Your Sponsors Don’t be too reliant on your instruments (strip charts, colored dials, shiny things) c 2014 Performance Dynamics A Melange of Methods for Manipulating Monitored Data May 6, 2014 5 / 52
  12. 12. Consistency 1 It’s not about pretty pictures c 2014 Performance Dynamics A Melange of Methods for Manipulating Monitored Data May 6, 2014 6 / 52
  13. 13. Consistency 1 It’s not about pretty pictures 2 It’s not about whiz bang tools c 2014 Performance Dynamics A Melange of Methods for Manipulating Monitored Data May 6, 2014 6 / 52
  14. 14. Consistency 1 It’s not about pretty pictures 2 It’s not about whiz bang tools 3 It’s not about fancy math c 2014 Performance Dynamics A Melange of Methods for Manipulating Monitored Data May 6, 2014 6 / 52
  15. 15. Consistency 1 It’s not about pretty pictures 2 It’s not about whiz bang tools 3 It’s not about fancy math 4 Data are usually trying to tell you something c 2014 Performance Dynamics A Melange of Methods for Manipulating Monitored Data May 6, 2014 6 / 52
  16. 16. Consistency 1 It’s not about pretty pictures 2 It’s not about whiz bang tools 3 It’s not about fancy math 4 Data are usually trying to tell you something 5 Your interpretation has to be consistent with other data c 2014 Performance Dynamics A Melange of Methods for Manipulating Monitored Data May 6, 2014 6 / 52
  17. 17. Consistency 1 It’s not about pretty pictures 2 It’s not about whiz bang tools 3 It’s not about fancy math 4 Data are usually trying to tell you something 5 Your interpretation has to be consistent with other data 6 Your interpretation has to be consistent with other information c 2014 Performance Dynamics A Melange of Methods for Manipulating Monitored Data May 6, 2014 6 / 52
  18. 18. Consistency 1 It’s not about pretty pictures 2 It’s not about whiz bang tools 3 It’s not about fancy math 4 Data are usually trying to tell you something 5 Your interpretation has to be consistent with other data 6 Your interpretation has to be consistent with other information This talk is about Converging on consistency by example c 2014 Performance Dynamics A Melange of Methods for Manipulating Monitored Data May 6, 2014 6 / 52
  19. 19. The Greatest Scatter Plot Topics 1 The Greatest Scatter Plot 2 Irregular Time Series 3 The Power of Power Laws Zipf’s Law of Words Database Query Times Eleventh Hour Spikes c 2014 Performance Dynamics A Melange of Methods for Manipulating Monitored Data May 6, 2014 7 / 52
  20. 20. The Greatest Scatter Plot The Greatest Scatter Plot c 2014 Performance Dynamics A Melange of Methods for Manipulating Monitored Data May 6, 2014 8 / 52
  21. 21. The Greatest Scatter Plot Goggle up! Science ahead... c 2014 Performance Dynamics A Melange of Methods for Manipulating Monitored Data May 6, 2014 9 / 52
  22. 22. The Greatest Scatter Plot Some Monitored Data 5 10 15 20 0.00.51.01.52.0 Time Metric1 5 10 15 20 -2002006001000 Time Metric2 Two time series, two metrics: Metric 1 and Metric 2 c 2014 Performance Dynamics A Melange of Methods for Manipulating Monitored Data May 6, 2014 10 / 52
  23. 23. The Greatest Scatter Plot Scatter Plot 0.0 0.5 1.0 1.5 2.0 05001000 Metric 1 Metric2 Are Metric 1 and Metric 2 related in any way? c 2014 Performance Dynamics A Melange of Methods for Manipulating Monitored Data May 6, 2014 11 / 52
  24. 24. The Greatest Scatter Plot Linear Regression 0.0 0.5 1.0 1.5 2.0 05001000 Metric 1 Metric2 LSQ fit: Metric2 = 423.94 Metric1 and R2 = 0.82 c 2014 Performance Dynamics A Melange of Methods for Manipulating Monitored Data May 6, 2014 12 / 52
  25. 25. The Greatest Scatter Plot This is Not the End This is just the beginning Need to reach consistency 1 Is the linear fit still a reasonable choice? 2 What is the meaning of the slope ? 3 Willing to extrapolate this model into the future? c 2014 Performance Dynamics A Melange of Methods for Manipulating Monitored Data May 6, 2014 13 / 52
  26. 26. The Greatest Scatter Plot The most important scatter plot in history (1929) le on the expanding universe appeared in PNAS in 1929 [Hubble, E. P. (1929) Proc. Natl. Acad. Sci. USA 15, that a galaxy’s distance is proportional to its redshift, is so well known and so deeply embedded into the ough the Hubble diagram, the Hubble constant, Hubble’s Law, and the Hubble time, that the article itself hough Hubble’s distances have a large systematic error, Hubble’s velocities come chiefly from Vesto erpretation in terms of the de Sitter effect is out of the mainstream of modern cosmology, this article ation of the expanding, evolving, and accelerating universe that engages today’s burgeoning field of Edwin Hub- ‘‘A relation and radial tra-galactic g point in un- In this brief e evidence for es in 20th cen- g universe. es recede nd more dis- idly in pro- His graph of Fig. 1) is the he equation t, velocity ϭ s Law; the ubble con- Hubble time. of cosmic this is the the scientific an expanding lt is so impor- ant reference, eponymous bble’s aston- ridge, luminous matter reveals the pres- of acceleration set in are the route to Fig. 1. Velocity–distance relation among extra-galactic nebulae. Radial velocities, corrected for solar motion (but labeled in the wrong units), are plotted against distances estimated from involved stars and mean luminosities of nebulae in a cluster. The black discs and full line represent the solution for solar motion by using the nebulae individually; the circles and broken line represent the solution combining the nebulae into groups; the cross represents the mean velocity corresponding to the mean distance of 22 nebulae whose distances could not be estimated individually. [Reproduced with permission from ref. 1 (Copyright 1929, The Huntington Library, Art Collections and Botanical Gardens).] Metric 1 (x-axis) = distance to the observed star (r) Metric 2 (y-axis) = recessional velocity of the star (v) 106 parsecs ≡ 1 Mpc = 3.3 million light years c 2014 Performance Dynamics A Melange of Methods for Manipulating Monitored Data May 6, 2014 14 / 52
  27. 27. The Greatest Scatter Plot Astronomer Edwin Hubble 1929 1 Is the linear fit still a reasonable choice? Edwin Hubble suspected v ∼ r Supports Big Bang hypothesis 2 What does the slope mean? Slope: v r = r t × 1 r = 1 t ≡ H0 (Hubble’s constant) Inverse Hubble constant has units of time tH = 1/H0 tH is the expansion time = Age of Universe! 3 Small problem Hubble calculated: tH 2 billion years c 2014 Performance Dynamics A Melange of Methods for Manipulating Monitored Data May 6, 2014 15 / 52
  28. 28. The Greatest Scatter Plot Astronomer Edwin Hubble 1929 1 Is the linear fit still a reasonable choice? Edwin Hubble suspected v ∼ r Supports Big Bang hypothesis 2 What does the slope mean? Slope: v r = r t × 1 r = 1 t ≡ H0 (Hubble’s constant) Inverse Hubble constant has units of time tH = 1/H0 tH is the expansion time = Age of Universe! 3 Small problem Hubble calculated: tH 2 billion years Age of Earth tE 3–5 billion years (Oops!) c 2014 Performance Dynamics A Melange of Methods for Manipulating Monitored Data May 6, 2014 15 / 52
  29. 29. The Greatest Scatter Plot Astronomer Edwin Hubble 1929 1 Is the linear fit still a reasonable choice? Edwin Hubble suspected v ∼ r Supports Big Bang hypothesis 2 What does the slope mean? Slope: v r = r t × 1 r = 1 t ≡ H0 (Hubble’s constant) Inverse Hubble constant has units of time tH = 1/H0 tH is the expansion time = Age of Universe! 3 Small problem Hubble calculated: tH 2 billion years Age of Earth tE 3–5 billion years (Oops!) Not consistent c 2014 Performance Dynamics A Melange of Methods for Manipulating Monitored Data May 6, 2014 15 / 52
  30. 30. The Greatest Scatter Plot Astronomer Edwin Hubble 1929 1 Is the linear fit still a reasonable choice? Edwin Hubble suspected v ∼ r Supports Big Bang hypothesis 2 What does the slope mean? Slope: v r = r t × 1 r = 1 t ≡ H0 (Hubble’s constant) Inverse Hubble constant has units of time tH = 1/H0 tH is the expansion time = Age of Universe! 3 Small problem Hubble calculated: tH 2 billion years Age of Earth tE 3–5 billion years (Oops!) Not consistent Whaddya gonna do? c 2014 Performance Dynamics A Melange of Methods for Manipulating Monitored Data May 6, 2014 15 / 52
  31. 31. The Greatest Scatter Plot 0.0 0.5 1.0 1.5 2.0 05001000 Hubble's 1929 Corrected Data Galactic distance (Mpc) Recessionalvelocity(km/s) Hubble even corrected for so-called peculiar velocity (black dots) c 2014 Performance Dynamics A Melange of Methods for Manipulating Monitored Data May 6, 2014 16 / 52
  32. 32. The Greatest Scatter Plot 0.0 0.5 1.0 1.5 2.0 05001000 Hubble's 1929 Corrected Data Galactic distance (Mpc) Recessionalvelocity(km/s) Slope moved the wrong way c 2014 Performance Dynamics A Melange of Methods for Manipulating Monitored Data May 6, 2014 17 / 52
  33. 33. The Greatest Scatter Plot Pay Day 2003 l w d a d ‘a e T l a w p n Z P t a t p e h Fig. 3. The Hubble diagram for type Ia supernovae. From the compilation of well observed type Ia supernovae by Jha (29). The scatter about the line corresponds to statistical distance errors of Ͻ10% per object. The small red region in the lower left marks the span of Hubble’s original Hubble diagram from Hubble’s (linear) Law: v = H0r out to 2.3 billion light years c 2014 Performance Dynamics A Melange of Methods for Manipulating Monitored Data May 6, 2014 18 / 52
  34. 34. The Greatest Scatter Plot Consistency 1 Hubble took some static for his 1929 paper 2 Couldn’t reach consistency and had to gamble 3 Best measurements (telescopes) at the time 4 Telescopes and measurements improved 5 Converged toward consistency over next decades 6 tH = 2.36 Gy (1929) → tH = 13.89 Gy (2003) Data was wrong but his interpretation (model) was correct c 2014 Performance Dynamics A Melange of Methods for Manipulating Monitored Data May 6, 2014 19 / 52
  35. 35. The Greatest Scatter Plot Consistency 1 Hubble took some static for his 1929 paper 2 Couldn’t reach consistency and had to gamble 3 Best measurements (telescopes) at the time 4 Telescopes and measurements improved 5 Converged toward consistency over next decades 6 tH = 2.36 Gy (1929) → tH = 13.89 Gy (2003) Data was wrong but his interpretation (model) was correct Guerrilla Mantra 1.16: Treating data as something divine is a sin c 2014 Performance Dynamics A Melange of Methods for Manipulating Monitored Data May 6, 2014 19 / 52
  36. 36. Irregular Time Series Topics 1 The Greatest Scatter Plot 2 Irregular Time Series 3 The Power of Power Laws Zipf’s Law of Words Database Query Times Eleventh Hour Spikes c 2014 Performance Dynamics A Melange of Methods for Manipulating Monitored Data May 6, 2014 20 / 52
  37. 37. Irregular Time Series Irregular Time Series c 2014 Performance Dynamics A Melange of Methods for Manipulating Monitored Data May 6, 2014 21 / 52
  38. 38. Irregular Time Series Aggregating Time Series 1 Regular sample intervals: Samples on tick of a metronome Computer performance metrics Weather data 2 Irregular sample intervals: Missing data (e.g., stock exchanges) Unequal sampling due to: Events Subscriptions (e.g., every 10,0000 sign-ups) Occasional (e.g., personal weight) c 2014 Performance Dynamics A Melange of Methods for Manipulating Monitored Data May 6, 2014 22 / 52
  39. 39. Irregular Time Series Back to Monitorama Boston 2013 Aggregation always assumes the arithmetic mean (AM) Aggregation of irregular time series came up in @mleinart’s talk NJG: “Should aggregate rate data using the harmonic mean (HM)” But harmonic mean is not clear for time series Cost me a month after Monitorama Boston to figure it out See my blog post and detailed slides of April 9, 2013 Harmonic Averaging of Monitored Rate Data c 2014 Performance Dynamics A Melange of Methods for Manipulating Monitored Data May 6, 2014 23 / 52
  40. 40. Irregular Time Series Back to Monitorama Boston 2013 Aggregation always assumes the arithmetic mean (AM) Aggregation of irregular time series came up in @mleinart’s talk NJG: “Should aggregate rate data using the harmonic mean (HM)” But harmonic mean is not clear for time series Cost me a month after Monitorama Boston to figure it out See my blog post and detailed slides of April 9, 2013 Harmonic Averaging of Monitored Rate Data Which is why Monitorama is cool c 2014 Performance Dynamics A Melange of Methods for Manipulating Monitored Data May 6, 2014 23 / 52
  41. 41. Irregular Time Series Equal Intervals AM 0.0 0.5 1.0 1.5 2.0 2.5 Time 0.5 1.0 1.5 2.0 Metric Heights : hblue = 1 and hred = 1 c 2014 Performance Dynamics A Melange of Methods for Manipulating Monitored Data May 6, 2014 24 / 52
  42. 42. Irregular Time Series Arithmetic Mean of Heights AM 0.0 0.5 1.0 1.5 2.0 2.5 Time 0.5 1.0 1.5 2.0 Metric AM = 1 2 hblue + 1 2 hred = 1 2 (2 + 1) = 1.5 c 2014 Performance Dynamics A Melange of Methods for Manipulating Monitored Data May 6, 2014 25 / 52
  43. 43. Irregular Time Series Unequal Intervals (Area = 6) 0 1 2 3 4 Time 0.5 1.0 1.5 2.0 2.5 3.0 Metric Heights : hblue = 3 and hred = 1 c 2014 Performance Dynamics A Melange of Methods for Manipulating Monitored Data May 6, 2014 26 / 52
  44. 44. Irregular Time Series AM Leaves a Gap (Area = 6) AM gap? 0 1 2 3 4 Time 0.5 1.0 1.5 2.0 2.5 3.0 Metric AM = 1 2 hblue + 1 2 hred = 1 2 [3 + 1] = 2.0 c 2014 Performance Dynamics A Melange of Methods for Manipulating Monitored Data May 6, 2014 27 / 52
  45. 45. Irregular Time Series Stretch the Rectangle (Area = 6, Width = 4) AM HM 0 1 2 3 4 Time 0.5 1.0 1.5 2.0 2.5 3.0 Metric HM = 1.5 × 4 = 6 c 2014 Performance Dynamics A Melange of Methods for Manipulating Monitored Data May 6, 2014 28 / 52
  46. 46. Irregular Time Series Lowers the Height AM HM 0 1 2 3 4 Time 0.5 1.0 1.5 2.0 2.5 3.0 Metric Theorem HM < AM Harmonic mean is always smaller than Arithmetic mean of the same samples c 2014 Performance Dynamics A Melange of Methods for Manipulating Monitored Data May 6, 2014 29 / 52
  47. 47. Irregular Time Series Monitored Subscription Rates Samples only occur when subscription count reaches 10,000. Sampling intervals are unevenly spaced in time over 33 days. AM HM 0 5 10 15 20 25 30 35 Time0 1000 2000 3000 4000 Rate AM and HM are (different) averaged subscription rates. Only HM gives the correct total time window of 33 days. c 2014 Performance Dynamics A Melange of Methods for Manipulating Monitored Data May 6, 2014 30 / 52
  48. 48. Irregular Time Series Consistency Use HM to aggregate monitored data when the following criteria apply: R — Rate metric (on y-axis) A — Async time intervals (on x-axis) T — Threshold is low vs. high E — Event data Example metrics: Cache-hit rate Video bit-rate Call rate Please send in your examples c 2014 Performance Dynamics A Melange of Methods for Manipulating Monitored Data May 6, 2014 31 / 52
  49. 49. The Power of Power Laws Topics 1 The Greatest Scatter Plot 2 Irregular Time Series 3 The Power of Power Laws Zipf’s Law of Words Database Query Times Eleventh Hour Spikes c 2014 Performance Dynamics A Melange of Methods for Manipulating Monitored Data May 6, 2014 32 / 52
  50. 50. The Power of Power Laws The Power of Power Laws c 2014 Performance Dynamics A Melange of Methods for Manipulating Monitored Data May 6, 2014 33 / 52
  51. 51. The Power of Power Laws Zipf’s Law of Words Example 1: Zipf’s Law Ranked data is 1000 most common wordforms in UK English based on 29 works of literature by 18 authors (i.e., 4.6 million words) Wordform: english word Abs: absolute frequency (total number of occurrences) Data format > td <- read.table("~/../Power Laws/zipf1000.txt",header=TRUE) > head(td) Rank Wordform Abs r mod 1 1 the 225300 29 223066.9 2 2 and 157486 29 156214.4 3 3 to 134478 29 134044.8 4 4 of 126523 29 125510.2 5 5 a 100200 29 99871.2 6 6 I 91584 29 86645.5 c 2014 Performance Dynamics A Melange of Methods for Manipulating Monitored Data May 6, 2014 34 / 52
  52. 52. The Power of Power Laws Zipf’s Law of Words Linear Axes 050000100000150000200000 Ranked 1000 UK English Words Ranked words (W) Frequencyofoccurrence(F) the their us love voice true state eye stand worth service neck land art c 2014 Performance Dynamics A Melange of Methods for Manipulating Monitored Data May 6, 2014 35 / 52
  53. 53. The Power of Power Laws Zipf’s Law of Words Log-Log Axes 5e+022e+035e+032e+045e+042e+05 Ranked 1000 UK English Words Ranked words (W) Frequencyofoccurrence(F) the it at would much us love lay eye dare c 2014 Performance Dynamics A Melange of Methods for Manipulating Monitored Data May 6, 2014 36 / 52
  54. 54. The Power of Power Laws Zipf’s Law of Words Regression Fit 5e+022e+035e+032e+045e+042e+05 Ranked 1000 UK English Words Ranked words (W) Frequencyofoccurrence(F) the it at would much us love lay eye dare c 2014 Performance Dynamics A Melange of Methods for Manipulating Monitored Data May 6, 2014 37 / 52
  55. 55. The Power of Power Laws Zipf’s Law of Words Consistency Log axes are word frequency (y) and ranked word order (x): log(y) = −1.13 log(x) y = x−1.13 y = 1 x1.13 Here, “power” refers to x to the power −1.13 (exponent) Power laws differ from standard statistical distributions Power laws carry most of the information in their tail Fatter tail corresponds to stronger correlations than usual Power laws imply persistent correlations that have to be explained c 2014 Performance Dynamics A Melange of Methods for Manipulating Monitored Data May 6, 2014 38 / 52
  56. 56. The Power of Power Laws Zipf’s Law of Words Consistency Log axes are word frequency (y) and ranked word order (x): log(y) = −1.13 log(x) y = x−1.13 y = 1 x1.13 Here, “power” refers to x to the power −1.13 (exponent) Power laws differ from standard statistical distributions Power laws carry most of the information in their tail Fatter tail corresponds to stronger correlations than usual Power laws imply persistent correlations that have to be explained Zipf’s law correlations arise from grammatical rules c 2014 Performance Dynamics A Melange of Methods for Manipulating Monitored Data May 6, 2014 38 / 52
  57. 57. The Power of Power Laws Database Query Times Example 2: Database Query Times 0 100 200 300 400 500 0100200300400 Index orad$Elapstime Like Zipf’s law, data must be ranked by frequency of occurrence c 2014 Performance Dynamics A Melange of Methods for Manipulating Monitored Data May 6, 2014 39 / 52
  58. 58. The Power of Power Laws Database Query Times Visualize Ranked Data 0 100 200 300 400 500 0100200300400 Ranked SQL Times Index otr Impossible to tell functional form of this curve c 2014 Performance Dynamics A Melange of Methods for Manipulating Monitored Data May 6, 2014 40 / 52
  59. 59. The Power of Power Laws Database Query Times Try Double-Log Visualization 1 2 5 10 20 50 100 200 500 0.10.51.05.050.0500.0 Log-Log SQL Times Index otr Clearly not power law overall But first 100 queries do appear to be power law c 2014 Performance Dynamics A Melange of Methods for Manipulating Monitored Data May 6, 2014 41 / 52
  60. 60. The Power of Power Laws Database Query Times Three Data Windows 1 2 5 10 20 50 100 100200300400500 Log-Log of SQL-A Times Index etA 0 50 100 150 304050607080 Log-Lin of SQL-B Times Index etB 0 20 40 60 80 0.0900.0950.1000.1050.110 Log-Lin of SQL-C Times Index etC (A) log-log axes (B) log-linear axes (C) log-linear axes This suggests breaking data across 3 regions: c 2014 Performance Dynamics A Melange of Methods for Manipulating Monitored Data May 6, 2014 42 / 52
  61. 61. The Power of Power Laws Database Query Times Regression Analysis 1 2 5 10 20 50 100 100200300400500 Log-Log SQL A-Times Index etA 0 50 100 150 304050607080 Log-Lin SQL B-Times Index etB 0 20 40 60 80 0.0900.0950.1000.1050.110 Log-Lin SQL C-Times Index etC (A) yA ∼ x−0.4632 power law decay (B) yB ∼ e−0.0074x exponential decay (C) yC ∼ e−0.0028x exponential decay But this is still not enough c 2014 Performance Dynamics A Melange of Methods for Manipulating Monitored Data May 6, 2014 43 / 52
  62. 62. The Power of Power Laws Database Query Times Consistency 1 2 5 10 20 50 100 100200300400500 Log-Log SQL A-Times Index etA Power law slope γ = 0.46 Half Zipfian slope γ = 1.0 Correlations stronger than Zipf Hypothesis 1 Shorter query times (window A) may involve dictionary lookups or other structured data. Structure provides correlations. 2 Longer queries in window B are unstructured (ad hoc?) and randomized. Weak correlations produce exponential decay. 3 Ditto for window C. c 2014 Performance Dynamics A Melange of Methods for Manipulating Monitored Data May 6, 2014 44 / 52
  63. 63. The Power of Power Laws Eleventh Hour Spikes Example 3: Eleventh Hour Spikes All Australian businesses were required to register with the Australian Tax Office (ATO) for an Australian Business Number (ABN) to claim an income tax refund. The ABN was introduced in Y2K. Time series data from ABN registrations database. Period covers March 27 to September 19, 2000 Deadline traffic spike on 31 May, 2000 Similar to rush to meet Obamacare deadline of March 31, 2014. More details in my CMG Australia 2006 paper. c 2014 Performance Dynamics A Melange of Methods for Manipulating Monitored Data May 6, 2014 45 / 52
  64. 64. The Power of Power Laws Eleventh Hour Spikes Complete Time Series 11 3 2000 21 4 2000 21 5 2000 15 6 2000 10 7 2000 4 8 2000 29 8 2000 0 200000 400000 600000 800000 1. 106 ORAConnections Question: Could the “11th hour” spike have been predicted? Answer: Yes, but quite involved. How: Using a power law. c 2014 Performance Dynamics A Melange of Methods for Manipulating Monitored Data May 6, 2014 46 / 52
  65. 65. The Power of Power Laws Eleventh Hour Spikes Complete Time Series 11 3 2000 21 4 2000 21 5 2000 15 6 2000 10 7 2000 4 8 2000 29 8 2000 0 200000 400000 600000 800000 1. 106 ORAConnections Question: Could the “11th hour” spike have been predicted? Answer: Yes, but quite involved. How: Using a power law. What else!? c 2014 Performance Dynamics A Melange of Methods for Manipulating Monitored Data May 6, 2014 46 / 52
  66. 66. The Power of Power Laws Eleventh Hour Spikes Semi-Log Plot 11 3 2000 21 4 2000 21 5 2000 1 104 2 104 5 104 1 105 2 105 5 105 1 106 2 106 ORAConnections y-axis is the number of Oracle RDBMS connections (log scale) Peak growth preceding spike looks almost linear on semi-log plot Time range: 0–38 days c 2014 Performance Dynamics A Melange of Methods for Manipulating Monitored Data May 6, 2014 47 / 52
  67. 67. The Power of Power Laws Eleventh Hour Spikes Statistical Regression on Peaks 11 3 2000 21 4 2000 1 104 2 104 5 104 1 105 2 105 5 105 1 106 ORAConnections Linear growth on semi-log axes implies exponential function y = AeBt Fit parameters Origin: A = 1.14128 × 105 Curvature: B = 0.0175 Doubling period: ln(2) B ∼ 6 months c 2014 Performance Dynamics A Melange of Methods for Manipulating Monitored Data May 6, 2014 48 / 52
  68. 68. The Power of Power Laws Eleventh Hour Spikes Trend on Linear Axes 11 3 2000 21 4 2000 21 5 2000 15 6 2000 10 7 2000 0 200000 400000 600000 800000 1. 106ORAConnections Exponential forecast looks valid, up to the crosshairs Significantly underestimates onset of the “11th hour” peak And rapid drop off after the peak Faster than exponential suggests power law c 2014 Performance Dynamics A Melange of Methods for Manipulating Monitored Data May 6, 2014 49 / 52
  69. 69. The Power of Power Laws Eleventh Hour Spikes Power Law Fit Exp growth Power law 11 3 2000 21 4 2000 21 5 2000 15 6 2000 10 7 2000 0 200000 400000 600000 800000 1. 106ORAConnections Log axes are y: connects (y) and time in days (x): log(y) = −0.6421 log(|x − xc|) y = 1 |x − xc|0.6421 where peak occurs at xc = 61 days c 2014 Performance Dynamics A Melange of Methods for Manipulating Monitored Data May 6, 2014 50 / 52
  70. 70. The Power of Power Laws Eleventh Hour Spikes Consistency Log-log plots are an easy way to test for power law distributions May have mixed regions of power law and other distributions Can even predict critical spikes Power laws signal presence of strong correlations Explaining those correlations may be more difficult Zipf’s law took 40 years c 2014 Performance Dynamics A Melange of Methods for Manipulating Monitored Data May 6, 2014 51 / 52
  71. 71. The Power of Power Laws Eleventh Hour Spikes Consistency Log-log plots are an easy way to test for power law distributions May have mixed regions of power law and other distributions Can even predict critical spikes Power laws signal presence of strong correlations Explaining those correlations may be more difficult Zipf’s law took 40 years Remember Aim for consistency Learn to talk to God c 2014 Performance Dynamics A Melange of Methods for Manipulating Monitored Data May 6, 2014 51 / 52
  72. 72. The Power of Power Laws Eleventh Hour Spikes Consistency Log-log plots are an easy way to test for power law distributions May have mixed regions of power law and other distributions Can even predict critical spikes Power laws signal presence of strong correlations Explaining those correlations may be more difficult Zipf’s law took 40 years Remember Aim for consistency Learn to talk to God (She’s listening) c 2014 Performance Dynamics A Melange of Methods for Manipulating Monitored Data May 6, 2014 51 / 52
  73. 73. The Power of Power Laws Eleventh Hour Spikes Performance Dynamics Company Castro Valley, California www.perfdynamics.com perfdynamics.blogspot.com twitter.com/DrQz Facebook Training classes (May 19, 2014) njgunther@perfdynamics.com OFF: +1-510-537-5758 c 2014 Performance Dynamics A Melange of Methods for Manipulating Monitored Data May 6, 2014 52 / 52

×