Measuring Program Comprehension: A Large-Scale Field Study with Professionals
Msr2011 zaman
1. Security versus Performance Bugs:
A Case Study on
Shahed Zaman, Bram Adams, Ahmed E. Hassan
Software Analysis and Intelligence Lab (SAIL), Queen’s University
1
2. Costly
Bugs have a high impact on companies
Affect reputation
482 bugs/week
Firefox
2
5. Our Study Dimensions
Are security bugs fixed by
more experienced
developers?
Are security fixes more
complex?
Time People Fix
Are security bugs
fixed faster?
5
10. Our Study Dimensions
Are security bugs fixed by
more experienced
developers?
Are security fixes more
complex?
Time People Fix
Are security bugs
fixed faster?
10
12. Security bugs are triaged faster
Log(1 + triage time)
46629
179870
RatioofBugs
X 3.8
12
13. The lifetime of a Bug
FIXEDNEW ASSIGNED CLOSED
FIXING
TRIAGED
FASTER
13
14. Security Bugs are fixed faster
Log(1 + time between assignment and fix)
RatioofBugs
14
15. Rework in the lifetime of a Bug
REOPENED
FIXEDNEW ASSIGNED CLOSED
FIXED
FASTER
TOSSING
TRIAGED
FASTER
15
16. Security Bugs: tossed & re-opened more often
# of times bug tossing
tossed more !
RatioofBugs
# of times bug reopened
RatioofBugs
reopened more !
16
17. Our Study Dimensions
Are security bugs fixed by
more experienced
developers?
Are security fixes more
complex?
Time People Fix
Are security bugs
fixed faster?
YES!
17
18. Security bugs are fixed by more
experienced developers
Experience in # of days
RatioofBugs
More experienced
18
19. Our Study Dimensions
Are security bugs fixed by
more experienced
developers?
Are security fixes more
complex?
Time People Fix
Are security bugs
fixed faster?
YES!YES!
19
20. Entropy as a measure of Complexity
0
2
4
6
V W X Y Z
#ofchangedlines
File
Fix 2
0
2
4
6
A B C D E
#ofchangedlines
File
Fix 1
More Complex
20
22. Our Study Dimensions
Are security bugs fixed by
more experienced
developers?
Are security fixes more
complex?
Time People Fix
Are security bugs
fixed faster?
YES!YES! YES!
22
23. Security Perf. Security Perf.
Fix time +
Triage time + ? ?
# of reopening +
# of tossing +
# of developer
assigned
+ = =
Experience +
# of files changed + = =
Entropy +
more(+) no difference (=) studying (?)
Chrome
+
+
+
+
+
23
24. Security Perf. Security Perf.
Fix time +
Triage time + ? ?
# of reopening +
# of tossing +
# of developer
assigned
+ = =
Experience +
# of files changed + = =
Entropy +
more(+) no difference (=) studying (?)
Chrome
+
+
+
+
+
24
25. Threats to Validity
• Focused on one domain
• Use of heuristics in bug type identification
• Bug disclosure policies
Non-disclosed
security bugs
25
Describe :
Triage time
Use of # of tossing to evaluate triage time
Fix time
Use of # of reopening to evaluate fix time
Bugzilla is the bug tracking system used by Mozilla and CVS is the code repository. We had to use both and merge the data together.
Bug reports in bugzilla are not linked with the bug fixes in CVS. We had to link these two. For that, we used the revision comments from developers found in CVS.
Optionally, they use the corresponding bug id # … which we used to link.
Bugzilla has a keyword field. For performance bug, the word “perf” was occasionally used in this field. Also, performance bugs usually contains the word “perf”, “hang”, “slow” in the bug title or short description.
We used this heuristic to identify the perf bugs.
For security bugs, we used MFSA data. In MFSA, they list the security advisory for end users. For every advisory there, it contains a reference field which contains the link to the corresponding bug in bugzilla.
We used this information for security bug identification.
Drop of number of security bug shows the existence of bug disclosure policy of Firefox.
Newer security bugs are kept secret with restricted access until it is completely fixed and no longer a security threat.
Describe :
Triage time
Use of # of tossing to evaluate triage time
Fix time
Use of # of reopening to evaluate fix time
Although these curves are too close.
This is in log scale, so the difference is large
And, we used t-test which showed that the difference is even statistically significant.
Although these curves are too close.
This is in log scale, so the difference is large
And, we used t-test which showed that the difference is even statistically significant.
Log(1+46629) = 10.75
Log(1+179870) = 12.1
Although these curves are too close.
This is in log scale, so the difference is large
And, we used t-test which showed that the difference is even statistically significant.
Although these curves are too close.
This is in log scale, so the difference is large
And, we used t-test which showed that the difference is even statistically significant.
Although these curves are too close.
This is in log scale, so the difference is large
And, we used t-test which showed that the difference is even statistically significant.
There may be two reasons for more reopening:
Developers hurried to fix the bug and did it incompletely.
Security bugs fixes are harder to test and they couldn’t completely fix it first time.
We also found that security bugs are assigned faster. But fast assignment not necessarily mean correct assignment.
We found that, security bugs are tossed more too.
Describe :
Triage time
Use of # of tossing to evaluate triage time
Fix time
Use of # of reopening to evaluate fix time
We used 2 metrics for developer experience
Number of previously fixed bugs by the developer.
2. Experience in days, i.e., the number of days from the first bug fix of the developer to the current bug's fix date.
Describe :
Triage time
Use of # of tossing to evaluate triage time
Fix time
Use of # of reopening to evaluate fix time
Entropy takes into account both no of lines and files changed
Both fixes change same number of files
There is a huge difference in security. Why ?
From further investigation, we found that some security bugs revealed security flaws that was extremely invasive.
For example, for one bug (id # 289940), we found 296 changes in the code repository.
Describe :
Triage time
Use of # of tossing to evaluate triage time
Fix time
Use of # of reopening to evaluate fix time