Systematic Architecture Level Fault Diagnosis Using Statistical Techniques

Systematic Architecture Level Fault
Diagnosis Using Statistical Techniques
Bachelor Thesis by Fabian Keller

Estimated Costs 2012
as reported by Britton et al. [2013]
11.11.2014 STARDUST - Fabian Keller 2

Agenda
1. Automated Fault Diagnosis
2. State of the Art
3. Case Study: AspectJ
4. Evaluation
5. Conclusions

Agenda
2. State of the Art
4. Evaluation
5. Conclusions

Fault Diagnosis
what is the current practice?
Goal: Pinpoint single/multiple failure/s
Commonly used techniques:
• System.out.println()
• Symbolic Debugging
• Static Slicing / Dynamic Slicing
 There is room for improvement!

Automated Fault Diagnosis
is it possible?
B1 B2 B3 B4 B5 Error
Test1 1 0 0 0 0 0
Test2 1 1 0 0 0 0
Test3 1 1 1 1 1 0
Test4 1 1 1 1 1 0
Test5 1 1 1 1 1 1
Test6 1 1 1 0 1 0
By intuition: A block is more suspicious, if:
- It is involved in failing test cases
- It is not involved in passing test cases

Ranking Metrics
… it is possible
Tarantula
𝑆𝑆𝑇𝑇 =
#𝐼𝐼 𝐼𝐼
#𝐼𝐼 𝐼𝐼 + #𝑁𝑁𝑁𝑁
#𝐼𝐼 𝐼𝐼
+
#𝐼𝐼 𝐼𝐼
Jaccard
𝑆𝑆𝐽𝐽 =
#𝐼𝐼 𝐼𝐼
#𝐼𝐼 𝐼𝐼 + #𝑁𝑁𝑁𝑁 + #𝐼𝐼 𝐼𝐼
Ochiai
𝑆𝑆𝑂𝑂 =
#𝐼𝐼 𝐼𝐼
(#𝐼𝐼 𝐼𝐼 + #𝑁𝑁𝑁𝑁) ⋅ #𝐼𝐼 𝐼𝐼 + #𝐼𝐼 𝐼𝐼
Involved / Not involved / Failing / Passing
B1 B2 B3 B4 B5 Error
Test1 1 0 0 0 0 0
Test2 1 1 0 0 0 0
Test3 1 1 1 1 1 0
Test4 1 1 1 1 1 0
Test5 1 1 1 1 1 1
Test6 1 1 1 0 1 0
𝑆𝑆𝑇𝑇 0,50 0,56 0,63 0,71 0,63
𝑆𝑆𝐽𝐽 0,17 0,20 0,25 0,33 0,25
𝑆𝑆𝑂𝑂 0,41 0,45 0,50 0,58 0,50
Ranking:
1. B4 2. B3, B5 3. B2 4. B1

Agenda
2. State of the Art
4. Evaluation
5. Conclusions

Commonly Used Data
and its limiting factors
Software-artifact Infrastructure Repository
• Siemens set
• space program
Program Faulty versions LOC Test cases Description
print_tokens 7 478 4130 Lexical anayzer
print_tokens2 10 399 4115 Lexical analyzer
replace 32 512 5542 Pattern recognition
schedule 9 292 2650 Priority scheduler
schedule2 10 301 2710 Priority scheduler
tcas 41 141 1608 Altitude separation
tot_info 23 440 1052 Information measure
space 38 6218 13585 Array definition language

Performance Metrics
how can fault localization performance be evaluated?
• Wasted Effort (WE):
Ranking: L4, L3, L2, L7, L6, L1, L5, L9, L10, L8
Wasted Effort (prominent bug): 2 (or 20%)
• Proportion of Bugs Localized (PBL)
Percentage of bugs localized with WE < p%
• Hit@X
Number of bugs localized after inspecting X elements

Agenda
2. State of the Art
4. Evaluation
5. Conclusions

AspectJ – Lines of Code
nearly doubled in the examined timespan

AspectJ – Commits
active development with mostly 50+ commits per month

AspectJ – Bugs
nearly 2500 bugs reported in the examined time span

AspectJ – Data
less than 40% of the investigated bugs are applicable for SBFL
AspectJ AJDT Sum
All bugs 1544 886 2430
Bugs in iBugs 285 65 350
Classified Bugs 99 11 110
Applicable Bugs 41 1 42
Involved Bugs 20 1 21
What happened?

Bug 36234
workarounds cannot be used as evaluation oracle
Bug report: „Getting an out of memory error when compiling with Ajc 1.1 RC1 […]”
Pre-Fix Post-Fix

Bug 61411
platform specific bugs are mostly not present in test suites
Bug report: „[…] highlights a problem that I've seen using ajdoc.bat on Windows […]”
Pre-Fix Post-Fix

Bug 151182
synchronization bugs are mostly not present in test suites
Bug report: „[…] recompiled the aspect using 1.5.2 and tried to run it […], but it fails
with a NullPointerException.[…]”
Pre-Fix Post-Fix

Agenda
2. State of the Art
4. Evaluation
5. Conclusions

Research Questions
• RQ1: How does the program size influence fault localization
performance?
• RQ2: How many bugs can be found when examining a fixed
amount of ranked elements?
• RQ3: How does the program size influence suspiciousness
scores produced by different ranking metrics?
• RQ4: Are the fault localization performance metrics
currently used by the research community valid?

RQ1: Program Size vs. SBFL Performance?
multiple ranked elements are mapped to the same suspiciousness

RQ4: Are the Performance Metrics Valid?
on average, no bugs can be found in the first 100 lines

RQ4: Are the Performance Metrics Valid?
with luck, 33% of all bugs can be found in the first 1000 lines

Agenda
2. State of the Art
4. Evaluation
5. Conclusions

Conclusions
there is still some work to be done
• Bugs need more context to be fully understood
• Current metrics cannot be applied to large projects
• SBFL is not feasible for large projects
• New metrics are starting point for future work

Thank you for your attention!
Questions?

RQ2: examining a fixed amount
inspect more than 100 files to find 50% of all bugs

RQ3: Program Size vs. Suspiciousness
mean suspiciousness drops for larger programs

WAUC: Weighted Area Under Curve

Systematic Architecture Level Fault Diagnosis Using Statistical Techniques

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Systematic Architecture Level Fault Diagnosis Using Statistical Techniques

Similar to Systematic Architecture Level Fault Diagnosis Using Statistical Techniques (20)

More from Fabian Keller

More from Fabian Keller (8)

Recently uploaded

Recently uploaded (20)

Systematic Architecture Level Fault Diagnosis Using Statistical Techniques