A Comparative Study of Automatic Text Summarization Methodologies
Monitorama 2013 Keynote
1. Mo e Than Monitoring
#monitoring ++
Neil Gunther
Performance Dynamics
Monitorama Keynote
Boston, March 28 2013
SM
c 2013 Performance Dynamics Mo e Than Monitoring March 30, 2013 1 / 47
2. Let’s Get Calibrated about Data
Outline
1 Let’s Get Calibrated about Data
2 Potted History of Monitoring
3 Performance Visualization Basics
4 Monitored Data are Time Series
5 Performance Visualization in R
6 Possible Hacks
c 2013 Performance Dynamics Mo e Than Monitoring March 30, 2013 2 / 47
3. Let’s Get Calibrated about Data
Guerrilla Mantra: All data is wrong by definition
Measurement is a process, not math.
All data contains measurement errors.
How big are they and can you tolerate them?
Treating data as divine is a sin.
c 2013 Performance Dynamics Mo e Than Monitoring March 30, 2013 3 / 47
4. Let’s Get Calibrated about Data
Guerrilla Mantra: All data is wrong by definition
Measurement is a process, not math.
All data contains measurement errors.
How big are they and can you tolerate them?
Treating data as divine is a sin.
c 2013 Performance Dynamics Mo e Than Monitoring March 30, 2013 3 / 47
5. Let’s Get Calibrated about Data
Guerrilla Mantra: VAMOOS your data doubts
Visualize
Analyze
Modelize
Over and Over until
Satisfied
c 2013 Performance Dynamics Mo e Than Monitoring March 30, 2013 4 / 47
6. Let’s Get Calibrated about Data
Guerrilla Mantra: VAMOOS your data doubts
Visualize
Analyze
Modelize
Over and Over until
Satisfied
c 2013 Performance Dynamics Mo e Than Monitoring March 30, 2013 4 / 47
7. Let’s Get Calibrated about Data
Guerrilla Mantra: There are only 3 performance metrics
1 Time, e.g., cpu_ticks
2 Rate (inverse time), e.g., httpGets/s,
3 Number or count, e.g., RSS
c 2013 Performance Dynamics Mo e Than Monitoring March 30, 2013 5 / 47
8. Let’s Get Calibrated about Data
Guerrilla Mantra: There are only 3 performance metrics
1 Time, e.g., cpu_ticks
2 Rate (inverse time), e.g., httpGets/s,
3 Number or count, e.g., RSS
c 2013 Performance Dynamics Mo e Than Monitoring March 30, 2013 5 / 47
9. Let’s Get Calibrated about Data
Watch Out for Patterns
I mean that in a bad way. Your brain can’t help itself.
c 2013 Performance Dynamics Mo e Than Monitoring March 30, 2013 6 / 47
10. Potted History of Monitoring
Outline
1 Let’s Get Calibrated about Data
2 Potted History of Monitoring
3 Performance Visualization Basics
4 Monitored Data are Time Series
5 Performance Visualization in R
6 Possible Hacks
c 2013 Performance Dynamics Mo e Than Monitoring March 30, 2013 7 / 47
11. Potted History of Monitoring
Old Adage: “Nothing New in Computer Science”
Mainframes didn’t need real-time monitoring. Batch processing.
c 2013 Performance Dynamics Mo e Than Monitoring March 30, 2013 8 / 47
12. Potted History of Monitoring
How You Programmed It
c 2013 Performance Dynamics Mo e Than Monitoring March 30, 2013 9 / 47
13. Potted History of Monitoring
Later ... the interface improved
c 2013 Performance Dynamics Mo e Than Monitoring March 30, 2013 10 / 47
14. Potted History of Monitoring
CTSS (Compatible Time-Sharing System) developed in 1961 at MIT on IBM 7094.
Compatible meant compatibility with the standard IBM batch processing O/S.
c 2013 Performance Dynamics Mo e Than Monitoring March 30, 2013 11 / 47
15. Potted History of Monitoring
Multics Instrumentation c.1965
Multics was a multiuser O/S following CTSS time-share.
The Implementation
“a rough measure of response time for a time-sharing console user, an exponential average of the number
of users in the highest priority scheduling queue is continuously maintained. An integrator, L , initially
zero, is updated periodically by the formula
L ← L × m + Nq
where Nq is the measured length of the scheduling queue at the instant of update, and m is an exponential
damping constant”
This equation is an iterative form of exponentially damped moving average.
In modern terminology, it’s a data smoother.
The Lesson
“experience with Multics, and earlier with CTSS, shows that building permanent instrumentation into key
supervisor modules is well worth the effort, since the cost of maintaining well-organized instrumentation is
low, and the payoff is very high.”
c 2013 Performance Dynamics Mo e Than Monitoring March 30, 2013 12 / 47
16. Potted History of Monitoring
You know this better as ...
Linux load average
58 extern unsigned long avenrun[ ]; /* Load averages */
59
60 #define FSHIFT 11 /* nr of bits of precision */
61 #define FIXED_1 (1<<FSHIFT) /* 1.0 as fixed-point */
62 #define LOAD_FREQ (5*HZ) /* 5 sec intervals */
63 #define EXP_1 1884 /* 1/exp(5sec/1min) as fixed-pt */
64 #define EXP_5 2014 /* 1/exp(5sec/5min) */
65 #define EXP_15 2037 /* 1/exp(5sec/15min) */
66
67 #define CALC_LOAD(load,exp,n)
68 load *= exp;
69 load += n*(FIXED_1-exp);
70 load >>= FSHIFT;
Lines 67–70 are identical to the 1965 Multics formula.
See Chap. 4 of my Perl::PDQ book for the details.
UNIX load average
c 2013 Performance Dynamics Mo e Than Monitoring March 30, 2013 13 / 47
17. Potted History of Monitoring
Unix at Bell Labs c.1970
c 2013 Performance Dynamics Mo e Than Monitoring March 30, 2013 14 / 47
18. Potted History of Monitoring
Unix at Bell Labs c.1970
CTSS
c 2013 Performance Dynamics Mo e Than Monitoring March 30, 2013 14 / 47
19. Potted History of Monitoring
Unix at Bell Labs c.1970
CTSS begat Multics
c 2013 Performance Dynamics Mo e Than Monitoring March 30, 2013 14 / 47
20. Potted History of Monitoring
Unix at Bell Labs c.1970
CTSS begat Multics begat Unics
c 2013 Performance Dynamics Mo e Than Monitoring March 30, 2013 14 / 47
21. Potted History of Monitoring
Unix at Bell Labs c.1970
CTSS begat Multics begat Unics begat Unix
c 2013 Performance Dynamics Mo e Than Monitoring March 30, 2013 14 / 47
22. Potted History of Monitoring
Unix at Bell Labs c.1970
CTSS begat Multics begat Unics begat Unix
Get it?
c 2013 Performance Dynamics Mo e Than Monitoring March 30, 2013 14 / 47
23. Potted History of Monitoring
Then Came Screens 9:40
Note the mouse in her right hand.
c 2013 Performance Dynamics Mo e Than Monitoring March 30, 2013 15 / 47
24. Potted History of Monitoring
Unix top: A Legacy App
Green ASCII characters on black background
c 2013 Performance Dynamics Mo e Than Monitoring March 30, 2013 16 / 47
25. Potted History of Monitoring
Desktop GUI c.1995
Lots of colored spaghetti
c 2013 Performance Dynamics Mo e Than Monitoring March 30, 2013 17 / 47
26. Potted History of Monitoring
Static Charts on the Web c.2000
Load average over 24 hr period with 1, 5, 15 min LAs as green, blue, red TS.
(which is completely redundant, BTW)
As informative as watching a ticker chart on Wall Street
c 2013 Performance Dynamics Mo e Than Monitoring March 30, 2013 18 / 47
27. Potted History of Monitoring
Browser-based Dashboards
Interminable strip charts are not good for your brain.
c 2013 Performance Dynamics Mo e Than Monitoring March 30, 2013 19 / 47
28. Performance Visualization Basics
Outline
1 Let’s Get Calibrated about Data
2 Potted History of Monitoring
3 Performance Visualization Basics
4 Monitored Data are Time Series
5 Performance Visualization in R
6 Possible Hacks
c 2013 Performance Dynamics Mo e Than Monitoring March 30, 2013 20 / 47
29. Performance Visualization Basics
The Central Challenge
Find the best cognitive impedance match
between the digital computer and the neural computer
c 2013 Performance Dynamics Mo e Than Monitoring March 30, 2013 21 / 47
30. Performance Visualization Basics
Cognitive Circuitry is Largely Unknown
PerfViz is an N-dimensional problem
Brain is trapped in (3 + 1)-dimensions
No 5-fold rotational symmetry
Physicists have all the fun with SciViz
Time dimension becomes animation sequence
c 2013 Performance Dynamics Mo e Than Monitoring March 30, 2013 22 / 47
31. Performance Visualization Basics
Your Brain is Easily Fooled
All cognition is computation
Your brain is a differential analyzer
Difference errors produce perceptual illusions
c 2013 Performance Dynamics Mo e Than Monitoring March 30, 2013 23 / 47
32. Monitored Data are Time Series
Outline
1 Let’s Get Calibrated about Data
2 Potted History of Monitoring
3 Performance Visualization Basics
4 Monitored Data are Time Series
5 Performance Visualization in R
6 Possible Hacks
c 2013 Performance Dynamics Mo e Than Monitoring March 30, 2013 24 / 47
33. Monitored Data are Time Series
Gothic graphs can hurt your brain (Bad Z value)
c 2013 Performance Dynamics Mo e Than Monitoring March 30, 2013 25 / 47
34. Monitored Data are Time Series
There’s a Whole Science of Color
c 2013 Performance Dynamics Mo e Than Monitoring March 30, 2013 26 / 47
35. Monitored Data are Time Series
Pastel Colors on White
Sandy Bridge 16 VPU Throughput
1200000
800000
LIO/s
600000
400000
test1.HTT.Turb
test2.Turbo
test3.HTT
200000
test4.AllOff
0 1000 2000 3000 4000 5000
t-Index
c 2013 Performance Dynamics Mo e Than Monitoring March 30, 2013 27 / 47
36. Monitored Data are Time Series
Pastel Colors on Black
Sandy Bridge 16 VPU Throughput
1200000
800000
LIO/s
600000
400000
test1.HTT.Turb
test2.Turbo
test3.HTT
200000
test4.AllOff
0 1000 2000 3000 4000 5000
t-Index
c 2013 Performance Dynamics Mo e Than Monitoring March 30, 2013 28 / 47
37. Monitored Data are Time Series
Pastel Colors on Neutral Gray
Sandy Bridge 16 VPU Throughput
1200000
800000
LIO/s
600000
400000
test1.HTT.Turb
test2.Turbo
test3.HTT
200000
test4.AllOff
0 1000 2000 3000 4000 5000
t-Index
c 2013 Performance Dynamics Mo e Than Monitoring March 30, 2013 29 / 47
38. Monitored Data are Time Series
Coordinated Colors on Neutral Gray
Sandy Bridge 16 VPU Throughput
1200000
800000
LIO/s
600000
400000
test1.HTT.Turb
test2.Turbo
test3.HTT
200000
test4.AllOff
0 1000 2000 3000 4000 5000
t-Index
c 2013 Performance Dynamics Mo e Than Monitoring March 30, 2013 30 / 47
39. Monitored Data are Time Series
Time Series Can Reveal Data Correlations 9:50
server.p.65 : 2012-05-03 to 2012-05-04
30
CPU%
20
10
0
02:00 07:00 12:00 17:00 22:00
95
Mem%
85
75
02:00 07:00 12:00 17:00 22:00
10 15 20
ioWait%
5
0
02:00 07:00 12:00 17:00 22:00
0.4
LdAvg-1
0.2
0.0
02:00 07:00 12:00 17:00 22:00
Time
c 2013 Performance Dynamics Mo e Than Monitoring March 30, 2013 31 / 47
40. Monitored Data are Time Series
But Data Doesn’t Tell All: Monitored Server Consumption
Monitored Server Consumption
200
150
Capacity (U%)
Server saturation
100
50
Uavg data
Umax data
0
00:02 02:32 05:08 07:38 10:08 12:38 15:18 17:48 20:18 22:48
Time (m:s)
c 2013 Performance Dynamics Mo e Than Monitoring March 30, 2013 32 / 47
41. Monitored Data are Time Series
Beyond Data: Effective Server Consumption
Lookahead Server Consumption
200
150
Effective max consumption
Capacity (U%)
Server saturation
100
50
Uavg data
Umax data
Ueff predicted
0
00:02 02:32 05:08 07:38 10:08 12:38 15:18 17:48 20:18 22:48
Time (m:s)
c 2013 Performance Dynamics Mo e Than Monitoring March 30, 2013 33 / 47
42. Performance Visualization in R
Outline
1 Let’s Get Calibrated about Data
2 Potted History of Monitoring
3 Performance Visualization Basics
4 Monitored Data are Time Series
5 Performance Visualization in R
6 Possible Hacks
c 2013 Performance Dynamics Mo e Than Monitoring March 30, 2013 34 / 47
43. Performance Visualization in R
Choose Your Cognitive Z in R
100 200 300 400 2 3 4 5
5
30
25
mpg
20
15
10
4
400
300
disp
200
3
100
5.0
4.5
4.0
drat
2
3.5
3.0
5
1
4
wt
3
2
10 15 20 25 30 3D Scatterplot
3.0 3.5 4.0 4.5 5.0
0
30
35
25
30
25
20
mpg
disp
500
20
400
300
15
15
200
100
0
10
10
1 2 3 4 5 6
4 6 8 wt
c 2013 Performance Dynamics Mo e Than Monitoring March 30, 2013 35 / 47
44. Performance Visualization in R
Enhanced Plots in R
Raw bench data Data smoother
300
300
250 250
200 200
Xp
Xp
150 150
100 100
50 50
10 20 30 40 50 60 10 20 30 40 50 60
p p
USL fit USL fit + CI bands
300
300
250 250
200 200
Xp
Xp
150 150
100
100
50
50
10 20 30 40 50 60 10 20 30 40 50 60
p p
c 2013 Performance Dynamics Mo e Than Monitoring March 30, 2013 36 / 47
45. Performance Visualization in R
Chernoff Faces in R
Example (using R)
library(TeachingDemos)
faces2(matrix( runif(18*10), nrow=12), main=’Random Faces’)
c 2013 Performance Dynamics Mo e Than Monitoring March 30, 2013 37 / 47
47. Performance Visualization in R
Treemaps in R
GDAT: Top 100 Websites GDAT: Top 100 Websites
-8e+09 -4e+09 0e+00 4e+09 8e+09 -8e+09 -4e+09 0e+00 4e+09 8e+09
Recruitment Electronic Geeknet
VerizonUPS Free Arts Online
Health Voip File storage
Forum Adobe CraigslistChase iGLimeWire
Commerce Megaupload
Vodafone
ChildrenSport Microsoft Expedia
NYTimesConduit Tribune Newspapers
Reed Business Information Network
iVillage
Portal Photo sharing
Gaming Wal-MartOrkut
Retail File sharing dating
Online Real Network Pornhub
NBC Universal T-Online Vistaprint
Citibank Sears
LinkedIn
Scripps Networks Digital
Media/news WikiAnswers NexTag SuperPages NFL
Experian Fox News Dailymotion
WeatherAdult Travel
Tech news IMDb
Everyday Health Network
Globo
Mozilla Hewlett Packard Buy Priceline Network
TripAdvisor Monster
Best RTL Network
Gorilla Nation Websites
Yahoo! VideolanDellTurner Sports & Apart Comcast
CBS NetflixMeebo Entertainment Digital Network
Six
Video Picasa
Comcast FriendFinder Network and Family Network
Blogging Financial Computer Target Nickelodeon Kids
WebMD Shopping.com Classmates Online
Fox Interactive Media
Skype
Flickr UOL of America
Bank eHowLivejasminESPN Zynga Shopzilla
MSNBing Twitter BBC Terra CNET Orange
Disney Online
NetShelter Technology
AT&T
Ask
Social network Reference About PayPal WordPress
Weather Channel MediaCNN
Glam
Search/portal
AOL eBay Apple Amazon Blogger
Google
Software Media/news
Facebook YouTube Wikipedia
Example (using R)
library(portfolio)
bbc <- read.csv("nielsen100-2010.csv")
map.market(id=seq(1:100), area=bbc$uniqueAudience, group=bbc$categoryBBC,
color=bbc$totalVisits, main="GDAT: Top 100 Websites")
There is another treemap pkg on CRAN
c 2013 Performance Dynamics Mo e Than Monitoring March 30, 2013 39 / 47
48. Performance Visualization in R
Heatmap of Multiple Servers in Time
c 2013 Performance Dynamics Mo e Than Monitoring March 30, 2013 40 / 47
49. Performance Visualization in R
Barry in 2D
p1
p1 p1
p1
=0.1
p2 =0
.1
/3
p3
=0 p2
=1 /3 .3 p3
p3
=1 p2 =1
=0
.3
/3 p3 p2
p1=0.6
=1
/3
p1=0.6
p1=1/3
p1=1/3
p2 p3
p2 p3 p2 p3
p2 p3
p1
Barycentric coordinate system for %CPU = %user + %sys + %idle
p1
c 2013 Performance Dynamics Mo e Than Monitoring March 30, 2013 41 / 47
50. Performance Visualization in R
Barry in 3D: Tukey-like Rotations
Tukey trumps Tufte
Barycentric coordinate system for %BW = %unicast + %multicast + %broadcast + %idle
c 2013 Performance Dynamics Mo e Than Monitoring March 30, 2013 42 / 47
51. Possible Hacks
Outline
1 Let’s Get Calibrated about Data
2 Potted History of Monitoring
3 Performance Visualization Basics
4 Monitored Data are Time Series
5 Performance Visualization in R
6 Possible Hacks
c 2013 Performance Dynamics Mo e Than Monitoring March 30, 2013 43 / 47
52. Possible Hacks
Interactive and Streaming in R
R derives from S at Bell Labs (home of Unix) c.1975, 1980, 1988
R scripting language
console interface > (x^(k-1)*exp^(-x/s))/(gamma(k)*s^k)
x k −1 e −x /θ
cf. Mathematica document paradigm
Γ(k ) θk
No fonts, no symbolic computation
More recent focus is on enabling:
Better IDE integration, e.g., RStudio
Browser-based interaction, e.g., Shiny
Streaming data acquisition, e.g., R plus Hadoop, but ...
R interpreter is single-threaded
Needs a full app stack b/w data and R engine
Revolution Analytics is in this space
Plenty of room for innovative development
c 2013 Performance Dynamics Mo e Than Monitoring March 30, 2013 44 / 47
53. Possible Hacks
Some Ideas for Tomorrow
1 Lots of opportunities
2 Coupling simple statistical analysis to monitored data
3 Display the errors in monitored data
4 Replace the black background in Graphite
5 Apply ColorBrewer to Graphite
6 Apply effective capacity consumption to your monitored data
7 Replacing strip charts with animation
WARNING
Common sense is the p i t of all performance analysis
f
a
l
l
c 2013 Performance Dynamics Mo e Than Monitoring March 30, 2013 45 / 47
54. Possible Hacks
Modelizing GitHub Growth
Since I didn’t discuss modeling part of VAMOOS ...
Donnie Berkholz of redmonk.com wrote on his Jan That’s based on a log-linear model.
21, 2013 blog that GitHub will reach: I claim it’s a log-log model and therefore:
4 million users near Aug 2013 4 million users around Oct 2013
5 million users near Dec 2013 5 million users around Apr 2014
c 2013 Performance Dynamics Mo e Than Monitoring March 30, 2013 46 / 47
55. Possible Hacks
Performance Dynamics Company
Castro Valley, California
www.perfdynamics.com
perfdynamics.blogspot.com
twitter.com/DrQz
Facebook
njgunther@perfdynamics.com
OFF: +1-510-537-5758
c 2013 Performance Dynamics Mo e Than Monitoring March 30, 2013 47 / 47