Unraveling Multimodality with Large Language Models.pdf
Mutation-based Fault Localization Using Muffler
1. Muffler: An Approach Using Mutation
to Facilitate Fault Localization
Tao He
elfinhe@gmail.com
Software Engineering Laboratory
Department of Computer Science, Sun Yat-Sen University
Department of Computer Science and Engineering, HKUST
November 2011
Sun Yat-Sen University, Guangzhou, China
1/34
2. Key hypothesis
Mutating the faulty statement tends to maintain the outcome of passed
test cases.
Only one statement is faulty.
By contrast, mutating a correct statement tends to toggle the outcome of
passed test cases (from passed to failed).
Two statements are faulty.
Intuition
Two faulty statements can trigger more failures than one faulty statement.
2/34
3. Intuition
In branches
F M
M: Mutant point
F: Fault point
3/34
4. Intuition
In branches
M
F
M: Mutant point
F: Fault point
4/34
5. Intuition
In branches
F +M
M: Mutant point
F: Fault point
5/34
6. Intuition
In series
M
F
M: Mutant point
F: Fault point
6/34
7. Intuition
In series
F +M
M: Mutant point
F: Fault point
7/34
11. A motivating example
Result changes by Muffler
Vioplot from tcas v27 Vioplot from tcas v27
Y-axis: # of changes from passed to failed Y-axis: proportion of changes from passed to failed
Red line: fault line’s change Red line: fault line’s change 11/34
12. A motivating example
Formula
Susp(s) = Failed(s) * TotalPassed - Passed(s) – PassedToFailed(s)
Improve fault’s worst rank
12/34
13. Formula
Susp(s) = Failed(s) * TotalPassed - Passed(s) – PassedToFailed(s)
Primary Key Secondary Key Additional Key
(imprecise when (invalid when (inclined
multiple faults coincidental to handle
occurs) correctness% coincidental
is high) correctness)
13/34
14. Formula
Susp(s) = Failed(s) * TotalPassed - Passed(s) – PassedToFailed(s)
Coincidental
PtoF (f)
correctness:
i.e. P(f)
Passed
PtoF (c)
After mutating each executable statement
Failed
Total Tests
PtoF (c) is empirically greater than PtoF(f)
14/34
15. Model of our approach - Muffler
Simple and clear target of each input and step
List all possible research questions
We first focus on the key issue, and leave the
others as future work.
15/34
17. Input variables
Program
Test suite
Mutant operators
(No more)
17/34
18. Program
Program
Number of faults
Failure-triggering ability by covering this
statement (Is it easy to cause coincidental
correctness?)
Fault type
Statement type of fault
Fault position
Program structure
Program size
Etc.
18/34
19. Test suite
Number of total test cases
Number of passed/failed test cases
Number of passed test cases cover each
statement to be mutated
test cases that may alter testing results
Number of statements covered by each failed
test case
Help us select suspicious statements.
Can we locate faults without test oracle?
19/34
20. Mutant operators
Types of mutation operators
Applicable condition for each operator
The probability of failure-triggering by
covering mutant point
The ability for mutant to kill test cases
20/34
21. Steps
I list steps for separation of concerns
This can help us locate where our hypothesis is.
21/34
22. Step I: execute test cases
Get testing results
Get coverage
Of failed test cases
For the selection of suspicious statements
Of passed test cases
For the selection of test cases to re-run mutants
22/34
23. Step II: select statements to mutate
Input
Coverage of failed test cases
Output
Statements to be mutated
Covered by least-coverage failed test case
Discussion
Multi-fault scenarios
Our approach depend more on passed test cases
Practical situation (E.g., in gcc bug reports, most
faults are reported with one failure)
Precision of failure clustering is … 23/34
24. Step III: mutate selected statements
Input
Program
Statements to be mutated
Mutant operators
Output
Mutants by mutating each suspicious statement
Discussion
How many mutants to take from each suspicious statement?
What if there is no applicable mutant operator for this statement? (I.e.,
no mutant is generated.)
What if all mutants of a statement do not change the passed testing
results to failed? (I.e., mutation impact is 0.0.) Two possible reasons:
Equivalent mutants
Coincidental correctness
24/34
25. Step IV: select passed test cases
Select passed test cases that cover the mutant point
Select passed test cases that cover less statements
Reduce the number of passed test cases by discarding
similar test cases
25/34
26. Step V: run mutants against passed test
cases that cover the mutated statement
Input
Mutants for each suspicious statement
Selected passed test cases on original program
Output
Number of failed test cases from running on each mutant
26/34
27. Step VI: weight statements
By dynamic impact
By clustering mutant statements by analyzing
program structure
27/34
28. Step VII: compute mutation impact
and rank the suspicious
Discussion
Why not use failed test cases?
How to design the formula of mutation impact?
What if the number of passed test cases that cover the mutant
point is 0?
28/34
29. Robust with potential issues in CBFL?
Coincidental Correctness
Multi-fault
Coverage equivalence
29/34
30. Coincidental Correctness
When coincidental correctness frequently
occurs, our approach can work.
When coincidental correctness rarely occurs,
our approach is as good as CBFL.
This is why we use ‘– PassedToFailed(s)’, rather
than ‘– PassedToFailed(s)/Passed(s)’
30/34
31. Multi-fault
CBFL techniques mainly rely on the coverage
of passed test cases and failed test cases
caused by the same fault. (More on the failed
test cases)
When applying to multi-fault scenarios, the
different coverage between failed test cases
caused by different fault will affect the
effectiveness.
Our approach more depend on passed tests.
31/34
32. Equivalent coverage
Inside a basic block, different statements
have different def-use pair impact on
statements outside the basic block.
32/34
36. Guideline of our approach
Simple and clear target of each step
List all possible research questions
We first focus on the key issue, and leave the
others as future work.
36/34