2. Estimated Costs 2012
as reported by Britton et al. [2013]
11.11.2014 STARDUST - Fabian Keller 2
3. Agenda
1. Automated Fault Diagnosis
2. State of the Art
3. Case Study: AspectJ
4. Evaluation
5. Conclusions
11.11.2014 STARDUST - Fabian Keller 3
4. Agenda
1. Automated Fault Diagnosis
2. State of the Art
3. Case Study: AspectJ
4. Evaluation
5. Conclusions
11.11.2014 STARDUST - Fabian Keller 4
5. Fault Diagnosis
what is the current practice?
Goal: Pinpoint single/multiple failure/s
Commonly used techniques:
• System.out.println()
• Symbolic Debugging
• Static Slicing / Dynamic Slicing
There is room for improvement!
11.11.2014 STARDUST - Fabian Keller 5
6. Automated Fault Diagnosis
is it possible?
B1 B2 B3 B4 B5 Error
Test1 1 0 0 0 0 0
Test2 1 1 0 0 0 0
Test3 1 1 1 1 1 0
Test4 1 1 1 1 1 0
Test5 1 1 1 1 1 1
Test6 1 1 1 0 1 0
11.11.2014 STARDUST - Fabian Keller 6
By intuition: A block is more suspicious, if:
- It is involved in failing test cases
- It is not involved in passing test cases
8. Agenda
1. Automated Fault Diagnosis
2. State of the Art
3. Case Study: AspectJ
4. Evaluation
5. Conclusions
11.11.2014 STARDUST - Fabian Keller 8
9. Commonly Used Data
and its limiting factors
11.11.2014 STARDUST - Fabian Keller 9
Software-artifact Infrastructure Repository
• Siemens set
• space program
Program Faulty versions LOC Test cases Description
print_tokens 7 478 4130 Lexical anayzer
print_tokens2 10 399 4115 Lexical analyzer
replace 32 512 5542 Pattern recognition
schedule 9 292 2650 Priority scheduler
schedule2 10 301 2710 Priority scheduler
tcas 41 141 1608 Altitude separation
tot_info 23 440 1052 Information measure
space 38 6218 13585 Array definition language
10. Performance Metrics
how can fault localization performance be evaluated?
• Wasted Effort (WE):
Ranking: L4, L3, L2, L7, L6, L1, L5, L9, L10, L8
Wasted Effort (prominent bug): 2 (or 20%)
• Proportion of Bugs Localized (PBL)
Percentage of bugs localized with WE < p%
• Hit@X
Number of bugs localized after inspecting X elements
11.11.2014 STARDUST - Fabian Keller 10
11. Agenda
1. Automated Fault Diagnosis
2. State of the Art
3. Case Study: AspectJ
4. Evaluation
5. Conclusions
11.11.2014 STARDUST - Fabian Keller 11
12. AspectJ – Lines of Code
nearly doubled in the examined timespan
11.11.2014 STARDUST - Fabian Keller 12
13. AspectJ – Commits
active development with mostly 50+ commits per month
11.11.2014 STARDUST - Fabian Keller 13
14. AspectJ – Bugs
nearly 2500 bugs reported in the examined time span
11.11.2014 STARDUST - Fabian Keller 14
15. AspectJ – Data
less than 40% of the investigated bugs are applicable for SBFL
AspectJ AJDT Sum
All bugs 1544 886 2430
Bugs in iBugs 285 65 350
Classified Bugs 99 11 110
Applicable Bugs 41 1 42
Involved Bugs 20 1 21
11.11.2014 STARDUST - Fabian Keller 15
What happened?
16. Bug 36234
workarounds cannot be used as evaluation oracle
11.11.2014 STARDUST - Fabian Keller 16
Bug report: „Getting an out of memory error when compiling with Ajc 1.1 RC1 […]”
Pre-Fix Post-Fix
17. Bug 61411
platform specific bugs are mostly not present in test suites
11.11.2014 STARDUST - Fabian Keller 17
Bug report: „[…] highlights a problem that I've seen using ajdoc.bat on Windows […]”
Pre-Fix Post-Fix
18. Bug 151182
synchronization bugs are mostly not present in test suites
11.11.2014 STARDUST - Fabian Keller 18
Bug report: „[…] recompiled the aspect using 1.5.2 and tried to run it […], but it fails
with a NullPointerException.[…]”
Pre-Fix Post-Fix
19. Agenda
1. Automated Fault Diagnosis
2. State of the Art
3. Case Study: AspectJ
4. Evaluation
5. Conclusions
11.11.2014 STARDUST - Fabian Keller 19
20. Research Questions
• RQ1: How does the program size influence fault localization
performance?
• RQ2: How many bugs can be found when examining a fixed
amount of ranked elements?
• RQ3: How does the program size influence suspiciousness
scores produced by different ranking metrics?
• RQ4: Are the fault localization performance metrics
currently used by the research community valid?
11.11.2014 STARDUST - Fabian Keller 20
21. RQ1: Program Size vs. SBFL Performance?
multiple ranked elements are mapped to the same suspiciousness
11.11.2014 STARDUST - Fabian Keller 21
23. RQ4: Are the Performance Metrics Valid?
on average, no bugs can be found in the first 100 lines
11.11.2014 STARDUST - Fabian Keller 23
24. RQ4: Are the Performance Metrics Valid?
with luck, 33% of all bugs can be found in the first 1000 lines
11.11.2014 STARDUST - Fabian Keller 24
25. Agenda
1. Automated Fault Diagnosis
2. State of the Art
3. Case Study: AspectJ
4. Evaluation
5. Conclusions
11.11.2014 STARDUST - Fabian Keller 25
26. Conclusions
there is still some work to be done
• Bugs need more context to be fully understood
• Current metrics cannot be applied to large projects
• SBFL is not feasible for large projects
• New metrics are starting point for future work
11.11.2014 STARDUST - Fabian Keller 26
27. Thank you for your attention!
Questions?
11.11.2014 STARDUST - Fabian Keller 27
28. RQ2: examining a fixed amount
inspect more than 100 files to find 50% of all bugs
11.11.2014 STARDUST - Fabian Keller 28
29. RQ3: Program Size vs. Suspiciousness
mean suspiciousness drops for larger programs
11.11.2014 STARDUST - Fabian Keller 29