Measuring Program Comprehension: A Large-Scale Field Study with Professionals
Icst2012 zaman
1. A Large Scale Empirical Study on User-
Centric Performance Analysis
Shahed Zaman
Ahmed E. Hassan
Software Analysis and Intelligence Lab
(SAIL)
Queen’s University
1
Bram Adams
MCIS
École Polytechnique de Montréal
Canada
2. What is this study about?
Users
10 Requests
per user
Software System
2
1,000 Requests
10 Requests
Bad Response Time
Requests
4. User-Centric View
Users
Software System
4
1,000 Requests
10 Requests
Bad Response Time
10 Requests
per user
0% bad
request
instance
50% bad
request
instance
1% bad request
instance
User’s
Perspective
System’s
Perspective
5. Data used in this study
• 3 systems
• 13 most used scenarios
Factor
Enterprise
System 1
Enterprise
System 2
Dell DVD
store
Functionality Telecommunications E-commerce
Vendor’s
Business
Model
Commercial Open-source
Size Ultra Large Large Small
Complexity Complex Complex Simple
5
12. % of bad instances
Median
Median
±
St.Deviation
ResponseTime
Request Instance #
12
13. % of bad instances
Median
Median
±
St.Deviation
Bad instances = Out of “Median ± Standard Deviation”
= 6/20 = 30%
ResponseTime
Request Instance #
13
18. Performance Trend Over Time
Scenario Centric View User Centric View
Old
New
0 15 30 45 60
30354045
ResponseTime
Running Time
0 20 40 60 80 100 120 140
406080100120
ResponseTime
Instance # for a user
Old
New
18
19. Performance Trend Over Time
0 2 4 6 8 10 12 14
600060506100
ResponseTime(mean)
Instance # for a user
0 100 200 300 400 500 600
585059005950600060506100
ResponseTime
Running Time
System’s
perspective
User’s
perspective
19
20. Our Study Dimensions
Overall Trend Consistency
vs
10 out of 13 use-cases
showed a different
view
8 out of 13 use-cases
showed a different
view
20
21. Performance Consistency
Scenario Centric View
0 500 1500 2500 3500
200400600800
ResponseTime
Running Time
Old
New
User Centric View
0 5 10 15 20 25 30
-100005001500
ResponseTime
Instance # for user
Old
New
NewOld
21
22. Our Study Dimensions
Overall Trend Consistency
vs
10 out of 13 use-cases
showed a different
view
8 out of 13 use-cases
showed a different
view
All 13 use-cases
showed a different
view
22
23. Trend
8 out of 13 use-cases
showed a different
view
Our Study Dimensions
Overall
10 out of 13 use-cases
showed a different
view
Consistency
vs
All 13 use-cases
showed a different
view
23
VS
Consistency vs Overall Performance
InconsistentConsistent BadGood
24. 0 10 20 30 40 50 60 70
0100002000030000
Variance
% of bad instances
Old
New
Consistency vs Overall Performance
Overall Performance experience
BadGood
1 2
34
C
I
consistency
24
Test is done to ensure that
Optimal Configuration is chosen
Any change to hardware or software in the system has not degraded the system performance
Test is done to ensure that
Optimal Configuration is chosen
Any change to hardware or software in the system has not degraded the system performance
Test is done to ensure that
Optimal Configuration is chosen
Any change to hardware or software in the system has not degraded the system performance
Test is done to ensure that
Optimal Configuration is chosen
Any change to hardware or software in the system has not degraded the system performance
Test is done to ensure that
Optimal Configuration is chosen
Any change to hardware or software in the system has not degraded the system performance
Collect Data
Response Time
Resource Utilization (CPU, Disk I/O, Memory)
Test is done to ensure that
Optimal Configuration is chosen
Any change to hardware or software in the system has not degraded the system performance
Why this is a problem?
Why this is a problem?
Why this is a problem?
Is it better?
How does the performance evolve over time of execution?
Is the performance consistent over time of execution?
Mean = 44.82 vs 42.25
Median = 46 vs 32
% of bad = 7.19 vs 0.37