The document proposes an approach called Muffler that uses mutation analysis to help with fault localization. Muffler instruments a program and runs it against a test suite to collect coverage and test results. It then selects and mutates statements to generate mutants. These mutants are run against the test suite and any changes in test results are used to calculate a suspiciousness score for statements. Statements are then ranked based on their scores to produce a ranking list to help locate faults. An empirical evaluation on several programs shows that Muffler reduces the average code examination effort needed for fault localization by 50.26% compared to an existing technique called Naish.
Brief description on subprograms and functions it helps to u . In any case u cn understand the descripion and concepts with easy manner. I think it helps to u
A WHITE BOX TESTING TECHNIQUE IN SOFTWARE TESTING : BASIS PATH TESTINGJournal For Research
Software Testing is the emerging and important field of IT industry because without the concept of software testing, there is no quality software which is produced in the industry. Verification and Validation are the two basic building blocks of software testing process. There are various testing tactics, strategies and methodologies to test the software. Path Testing is one such a methodology used to test the software. Basically, path testing is a type of White Box/ Glass Box/ Open Box/ Structural testing technique. It generates the test suite based on the number of independent paths that are presented in a program by drawing the Control Flow Graph of an application. The basic objective of this paper is to acquire the knowledge on the basis path testing by considering a sample of code and the implementation of path testing is described with its merits and demerits.
Brief description on subprograms and functions it helps to u . In any case u cn understand the descripion and concepts with easy manner. I think it helps to u
A WHITE BOX TESTING TECHNIQUE IN SOFTWARE TESTING : BASIS PATH TESTINGJournal For Research
Software Testing is the emerging and important field of IT industry because without the concept of software testing, there is no quality software which is produced in the industry. Verification and Validation are the two basic building blocks of software testing process. There are various testing tactics, strategies and methodologies to test the software. Path Testing is one such a methodology used to test the software. Basically, path testing is a type of White Box/ Glass Box/ Open Box/ Structural testing technique. It generates the test suite based on the number of independent paths that are presented in a program by drawing the Control Flow Graph of an application. The basic objective of this paper is to acquire the knowledge on the basis path testing by considering a sample of code and the implementation of path testing is described with its merits and demerits.
The IMPL console executable (IMPL.exe) can be called from any DOS command prompt window where its Intel Fortran source code can be found in Appendix A. The IMPL console is useful given that it allows you to model and solve problems configured in an IML (Industrial Modeling Language) file. Problems coded using IPL (Industrial Programming Language) in many computer programming languages can use the IMPL console source code as a prototype.
The IMPL console reads several input files and writes several output files which are described in this document. There are several console flags that can be specified as command line arguments and are described below.
On the Performance Overhead of BPMN Modeling PracticesAna Ivanchikj
"On the Performance Overhead of BPMN Modeling Practices", work we have presented at Business Process Management (BPM) 2017 conference in Barcelona.
Abstract: Business process models can serve different purposes, from discussion and analysis among stakeholders, to simulation and execution. While work has been done on deriving modeling guidelines to improve understandability, it remains to be determined how different modeling practices impact the execution of the models. In this paper we observe how semantically equivalent, but syntactically different, models behave in order to assess the performance impact of different modeling prac- tices. To do so, we propose a methodology for systematically deriving semantically equivalent models by applying a set of model transforma- tion rules and for precisely measuring their execution performance. We apply the methodology on three scenarios to systematically explore the performance variability of 16 different versions of parallel, exclusive, and inclusive control flows. Our experiments with two open-source business process management systems measure the execution duration of each model’s instances. The results reveal statistically different execution per- formance when applying different modeling practices without total or- dering of performance ranks.
Full paper available at: https://link.springer.com/chapter/10.1007/978-3-319-65000-5_13
Duplicate Code Detection using Control StatementsEditor IJCATR
Code clone detection is an important area of research as reusability is a key factor in software evolution. Duplicate code degrades the design and structure of software and software qualities like readability, changeability, maintainability. Code clone increases the maintenance cost as incorrect changes in copied code may lead to more errors. In this paper we address structural code similarity detection and propose new methods to detect structural clones using structure of control statements. By structure we mean order of control statements used in the source code. We have considered two orders of control structures: (i) Sequence of control statements as it appears (ii) Execution flow of control statements.
The IMPL console executable (IMPL.exe) can be called from any DOS command prompt window where its Intel Fortran source code can be found in Appendix A. The IMPL console is useful given that it allows you to model and solve problems configured in an IML (Industrial Modeling Language) file. Problems coded using IPL (Industrial Programming Language) in many computer programming languages can use the IMPL console source code as a prototype.
The IMPL console reads several input files and writes several output files which are described in this document. There are several console flags that can be specified as command line arguments and are described below.
On the Performance Overhead of BPMN Modeling PracticesAna Ivanchikj
"On the Performance Overhead of BPMN Modeling Practices", work we have presented at Business Process Management (BPM) 2017 conference in Barcelona.
Abstract: Business process models can serve different purposes, from discussion and analysis among stakeholders, to simulation and execution. While work has been done on deriving modeling guidelines to improve understandability, it remains to be determined how different modeling practices impact the execution of the models. In this paper we observe how semantically equivalent, but syntactically different, models behave in order to assess the performance impact of different modeling prac- tices. To do so, we propose a methodology for systematically deriving semantically equivalent models by applying a set of model transforma- tion rules and for precisely measuring their execution performance. We apply the methodology on three scenarios to systematically explore the performance variability of 16 different versions of parallel, exclusive, and inclusive control flows. Our experiments with two open-source business process management systems measure the execution duration of each model’s instances. The results reveal statistically different execution per- formance when applying different modeling practices without total or- dering of performance ranks.
Full paper available at: https://link.springer.com/chapter/10.1007/978-3-319-65000-5_13
Duplicate Code Detection using Control StatementsEditor IJCATR
Code clone detection is an important area of research as reusability is a key factor in software evolution. Duplicate code degrades the design and structure of software and software qualities like readability, changeability, maintainability. Code clone increases the maintenance cost as incorrect changes in copied code may lead to more errors. In this paper we address structural code similarity detection and propose new methods to detect structural clones using structure of control statements. By structure we mean order of control statements used in the source code. We have considered two orders of control structures: (i) Sequence of control statements as it appears (ii) Execution flow of control statements.
Configuration Navigation Analysis Model for Regression Test Case Prioritizationijsrd.com
Regression testing has been receiving increasing attention nowadays. Numerous regression testing strategies have been proposed. Most of them take into account various metrics like cost as well as the ability to find faults quickly thereby saving overall testing time. In this paper, a new model called the Configuration Navigation Analysis Model is proposed which tries to consider all stakeholders and various testing aspects while prioritizing regression test cases.
Training language models to follow instructions with human feedback (Instruct...Rama Irsheidat
Training language models to follow instructions with human feedback (InstructGPT).pptx
Long Ouyang, Jeff Wu, Xu Jiang et al. (OpenAI)
Making language models bigger does not inherently make them better at following a user's intent. For example, large language models can generate outputs that are untruthful, toxic, or simply not helpful to the user. In other words, these models are not aligned with their users. In this paper, we show an avenue for aligning language models with user intent on a wide range of tasks by fine-tuning with human feedback. Starting with a set of labeler-written prompts and prompts submitted through the OpenAI API, we collect a dataset of labeler demonstrations of the desired model behavior, which we use to fine-tune GPT-3 using supervised learning. We then collect a dataset of rankings of model outputs, which we use to further fine-tune this supervised model using reinforcement learning from human feedback. We call the resulting models InstructGPT. In human evaluations on our prompt distribution, outputs from the 1.3B parameter InstructGPT model are preferred to outputs from the 175B GPT-3, despite having 100x fewer parameters. Moreover, InstructGPT models show improvements in truthfulness and reductions in toxic output generation while having minimal performance regressions on public NLP datasets. Even though InstructGPT still makes simple mistakes, our results show that fine-tuning with human feedback is a promising direction for aligning language models with human intent.
Optimal Selection of Software Reliability Growth Model-A StudyIJEEE
People use software and sometime software fails.so they try to quantify software reliability and try to understand how and why it fails.For this purpose so many software Reliability models have been developed to estimate the defects in the software while delivering it to the customer.Till now so many software Reliability models have been developed,but main issue is that it remain largely unsolved that how to calculate software reliability efficiently.In everycircumstance we cannotuse one model because no single model can completely represent all features.This paper describes the circumstances and criteria under which particular model can be selected.
Software Testing and Quality Assurance Assignment 2Gurpreet singh
Short questions :
Q1: What is stress testing?
Q2: What is Cyclomatic complexity?
Q3: Define Object Oriented Testing
Q4: What is regression testing? When it is done?
Q5: How loop testing is different from the path testing?
Q6: What is client server environment?
Q7: What is graph based testing?
Q8: How security testing is useful in real applications?
Q9: What are main characteristics of real time system?
Q10: What are the benefits of data flow testing?
Long Questions:
Q1: Design test case for: ERP, Traffic controller and university management system?
Q2: Assuming a real time system of your choice, discuss the concepts. Analysis and design factors of same, elaborate
Q3: How testing in multiplatform environment is performed?
Q4: Explain graph based testing in detail
Q5: Differentiate between Equivalence partitioning and boundary value analysis
Similar to Muffler a tool using mutation to facilitate fault localization 2.3 (20)
Essentials of Automations: The Art of Triggers and Actions in FMESafe Software
In this second installment of our Essentials of Automations webinar series, we’ll explore the landscape of triggers and actions, guiding you through the nuances of authoring and adapting workspaces for seamless automations. Gain an understanding of the full spectrum of triggers and actions available in FME, empowering you to enhance your workspaces for efficient automation.
We’ll kick things off by showcasing the most commonly used event-based triggers, introducing you to various automation workflows like manual triggers, schedules, directory watchers, and more. Plus, see how these elements play out in real scenarios.
Whether you’re tweaking your current setup or building from the ground up, this session will arm you with the tools and insights needed to transform your FME usage into a powerhouse of productivity. Join us to discover effective strategies that simplify complex processes, enhancing your productivity and transforming your data management practices with FME. Let’s turn complexity into clarity and make your workspaces work wonders!
How to Get CNIC Information System with Paksim Ga.pptxdanishmna97
Pakdata Cf is a groundbreaking system designed to streamline and facilitate access to CNIC information. This innovative platform leverages advanced technology to provide users with efficient and secure access to their CNIC details.
Sudheer Mechineni, Head of Application Frameworks, Standard Chartered Bank
Discover how Standard Chartered Bank harnessed the power of Neo4j to transform complex data access challenges into a dynamic, scalable graph database solution. This keynote will cover their journey from initial adoption to deploying a fully automated, enterprise-grade causal cluster, highlighting key strategies for modelling organisational changes and ensuring robust disaster recovery. Learn how these innovations have not only enhanced Standard Chartered Bank’s data infrastructure but also positioned them as pioneers in the banking sector’s adoption of graph technology.
Securing your Kubernetes cluster_ a step-by-step guide to success !KatiaHIMEUR1
Today, after several years of existence, an extremely active community and an ultra-dynamic ecosystem, Kubernetes has established itself as the de facto standard in container orchestration. Thanks to a wide range of managed services, it has never been so easy to set up a ready-to-use Kubernetes cluster.
However, this ease of use means that the subject of security in Kubernetes is often left for later, or even neglected. This exposes companies to significant risks.
In this talk, I'll show you step-by-step how to secure your Kubernetes cluster for greater peace of mind and reliability.
Climate Impact of Software Testing at Nordic Testing DaysKari Kakkonen
My slides at Nordic Testing Days 6.6.2024
Climate impact / sustainability of software testing discussed on the talk. ICT and testing must carry their part of global responsibility to help with the climat warming. We can minimize the carbon footprint but we can also have a carbon handprint, a positive impact on the climate. Quality characteristics can be added with sustainability, and then measured continuously. Test environments can be used less, and in smaller scale and on demand. Test techniques can be used in optimizing or minimizing number of tests. Test automation can be used to speed up testing.
Epistemic Interaction - tuning interfaces to provide information for AI supportAlan Dix
Paper presented at SYNERGY workshop at AVI 2024, Genoa, Italy. 3rd June 2024
https://alandix.com/academic/papers/synergy2024-epistemic/
As machine learning integrates deeper into human-computer interactions, the concept of epistemic interaction emerges, aiming to refine these interactions to enhance system adaptability. This approach encourages minor, intentional adjustments in user behaviour to enrich the data available for system learning. This paper introduces epistemic interaction within the context of human-system communication, illustrating how deliberate interaction design can improve system understanding and adaptation. Through concrete examples, we demonstrate the potential of epistemic interaction to significantly advance human-computer interaction by leveraging intuitive human communication strategies to inform system design and functionality, offering a novel pathway for enriching user-system engagements.
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!SOFTTECHHUB
As the digital landscape continually evolves, operating systems play a critical role in shaping user experiences and productivity. The launch of Nitrux Linux 3.5.0 marks a significant milestone, offering a robust alternative to traditional systems such as Windows 11. This article delves into the essence of Nitrux Linux 3.5.0, exploring its unique features, advantages, and how it stands as a compelling choice for both casual users and tech enthusiasts.
Pushing the limits of ePRTC: 100ns holdover for 100 daysAdtran
At WSTS 2024, Alon Stern explored the topic of parametric holdover and explained how recent research findings can be implemented in real-world PNT networks to achieve 100 nanoseconds of accuracy for up to 100 days.
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...Neo4j
Leonard Jayamohan, Partner & Generative AI Lead, Deloitte
This keynote will reveal how Deloitte leverages Neo4j’s graph power for groundbreaking digital twin solutions, achieving a staggering 100x performance boost. Discover the essential role knowledge graphs play in successful generative AI implementations. Plus, get an exclusive look at an innovative Neo4j + Generative AI solution Deloitte is developing in-house.
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Albert Hoitingh
In this session I delve into the encryption technology used in Microsoft 365 and Microsoft Purview. Including the concepts of Customer Key and Double Key Encryption.
UiPath Test Automation using UiPath Test Suite series, part 5DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 5. In this session, we will cover CI/CD with devops.
Topics covered:
CI/CD with in UiPath
End-to-end overview of CI/CD pipeline with Azure devops
Speaker:
Lyndsey Byblow, Test Suite Sales Engineer @ UiPath, Inc.
Maruthi Prithivirajan, Head of ASEAN & IN Solution Architecture, Neo4j
Get an inside look at the latest Neo4j innovations that enable relationship-driven intelligence at scale. Learn more about the newest cloud integrations and product enhancements that make Neo4j an essential choice for developers building apps with interconnected data and generative AI.
Generative AI Deep Dive: Advancing from Proof of Concept to ProductionAggregage
Join Maher Hanafi, VP of Engineering at Betterworks, in this new session where he'll share a practical framework to transform Gen AI prototypes into impactful products! He'll delve into the complexities of data collection and management, model selection and optimization, and ensuring security, scalability, and responsible use.
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
Muffler a tool using mutation to facilitate fault localization 2.3
1. Muffler: An Approach Using Mutation
to Facilitate Fault Localization
Tao He
elfinhe@gmail.com
Department of Computer Science, Sun Yat-Sen University
Department of Computer Science and Engineering, HKUST
Group Discussion
February 2012
HKUST, Hong Kong, China
1/34
3. Background
Coverage-Based Fault Localization (CBFL)
Input
Coverage
Testing results (passed or failed)
Output
A ranking list of statements
Ranking functions
Most CBFL techniques are similar with each other
except that different ranking functions are used to
compute suspiciousness.
3/34
4. What is the limitation of existing
CBFL techniques?
4/34
5. Motivation
One fundamental assumption [YPW08] of CBFL
The observed behaviors from passed runs can precisely
represent the correct behaviors of this program;
and the observed behaviors from failed runs can represent the
infamous behaviors.
Therefore, the different observed behaviors of program
entities between passed runs and failed runs will indicate the
fault’s location.
But this does not always hold.
[YPW08] C. Yilmaz, A. Paradkar, and C. Williams. Time will tell: fault localization using time spectra. In Proceedings
of the 30th international conference on Software engineering (ICSE '08). ACM, New York, NY, USA, 81-90. 2008.
5/34
6. Motivation
Coincidental Correctness (CC)
“No failure is detected, even though a fault has been executed.” [RT93]
i.e., the passed runs may cover the fault.
Weaken the first part of CBFL‟s assumption:
The observed behaviors from passed runs can precisely represent
the correct behaviors of this program;
More, CC occurs frequently in practice.[MAE+09]
[RT93] D.J. Richardson and M.C. Thompson, An analysis of test data selection criteria using the RELAY model of
fault detection, Software Engineering, IEEE Transactions on, vol. 19, (no. 6), pp. 533-553, 1993.
[MAE+09] W. Masri, R. Abou-Assi, M. El-Ghali, and N. Al-Fatairi, An empirical study of the factors that reduce the
effectiveness of coverage-based fault localization, in Proceedings of the 2nd International Workshop on Defects in
Large Software Systems: Held in conjunction with the ACM SIGSOFT International Symposium on Software Testing
6/34
and Analysis (ISSTA 2009), pp. 1-5, 2009.
7. Our goal is to address the CC issue via mutation analysis
What is the idea?
7/34
8. Why does our approach work?
- Key hypothesis
Mutating the faulty statement tends to maintain the
results of passed test cases.
By contrast, mutating a correct statement tends to
change the results of passed test cases (from passed to
failed).
8/34
9. Why does our approach work?
- Three comprehensive scenarios (1/3)
- If we mutate an M in different basic blocks with F
Test cases
Passed
Program Failed
F M
M: Mutant point
Test results F: Fault point
3 test results change from passed to failed
9/34
10. Why does our approach work?
- Three comprehensive scenarios (1/3)
- If we mutate an M in different basic blocks with F
Test cases
Passed
M
Program Failed
F
M: Mutant point
Test results F: Fault point
3 test results change from passed to failed
10/34
11. Why does our approach work?
- Three comprehensive scenarios (1/3)
- If we mutate F
Test cases
Passed
Program Failed
F +M
M: Mutant point
Test results F: Fault point
0 test result changes from passed to failed
11/34
12. Why does our approach work?
- Three comprehensive scenarios (2/3)
- If we mutate an M in the same basic block with F
Test cases Due to different data flow to affect output
Passed
F
Program Failed
M
M: Mutant point
F: Fault point
Control Flow
Test results
3 test results change from passed to failed Data Flow
12/34
13. Why does our approach work?
- Three comprehensive scenarios (2/3)
- If we mutate F
Test cases
Passed
F +M
Program Failed
M: Mutant point
F: Fault point
Control Flow
Test results
0 test result change from passed to failed Data Flow
13/34
14. Why does our approach work?
- Three comprehensive scenarios (3/3)
- When CC occurs frequently
Test cases - If we mutate F
Due to weak ability to affect output
Passed
Program Failed
F +M
M: Mutant point
F: Fault point
Test results Weak ability to generate
an infectious state or to
propagate the infectious
state to output
0 test result changes from passed to failed
14/34
16. Why does our approach work?
1000
- A feasibility study 2500
2000
800 800 2000
1500
600 600 1500
400 1000
400 1000
200 200 500 500
0 0 0 0
tcas v7 tot_info v17 schedule v4 schedule2 v1
4000 4000
4000 150
3000 3000
3000
100
2000 2000 2000
50
1000 1000 1000
0 0 0 0
print_tokens v7 print_tokens2 v3 replace v24 space v20
Figure: Distribution of statements’ result changes
and faulty statement’s testing result changes.
The vertical axis denotes the number of testing results changes (from „passed‟ to
„failed‟), and horizontal width denotes the probability density at corresponding amount of
testing results changes. 16/34
17. Why does our approach work?
- Another feasibility study (When CC%≥95%)
25
∎ Result changes (avg. 16.33%)
20
Frequency of faulty versions
∎ Naish (avg. 47.55%)
15
10
5
0
0% 20 % 40 % 60 % 80 %
Percentage of code examined
Figure: Frequency distribution of effectiveness
when CC%≥ 95%.
When CC% is greater or equal than 95%, code examination effort
reduction of result changes is 65.66% (=100%-16.33%/47.55%).
Only 6 faulty versions need to examine less than 20% of statements for
Naish, while 22 versions by using result changes 17/34
19. Our Approach – Muffler
[LRR11] L. Naish, H. J. Lee, and K. Ramamohanarao, A model for spectra-based software diagnosis. ACM
Transaction on Software Engineering Methodology, 20(3):11, 2011.
19/34
20. How do we evaluate our approach?
What is the result?
20/34
21. Empirical Evaluation
Lines of
Number of Number of
Program suite Executable LOC
versions test cases
Code
tcas 41 63-67 1608 133-137
tot_info 23 122-123 1052 272-273
schedule 9 149-152 2650 290-294
schedule2 10 127-129 2710 261-263
print_tokens 7 189-190 4130 341-343
print_tokens2 10 199-200 4115 350-355
replace 32 240-245 5542 508-515
21/34
space 38 3633-3647 13585 5882-5904
23. Empirical Evaluation
% of code
Tarantula Ochiai χDebug Naish Muffler
examined
1% 14 18 19 21 35
5% 38 48 56 58 74
10% 54 63 68 68 85
15% 57 65 80 80 94
20% 60 67 84 84 99
30% 79 88 91 92 110
Table: Number of faults located at different 99
40% 92 98 98 level of code
117
examination effort using Naish and Muffler.
50% 98 99 101 102 121
60% 99 103 105 106 123
70% 101 107 117 119 123
When 1% of the statements have been examined, 123 can reach the
80% 114 122 122 Naish 123
fault in 17.07% of faulty versions. At 122 same time, Muffler 123 reach
90% 123 123
the 123
can
the fault in 28.46% of faulty versions.
100% 123 123 123 123 123
23/34
24. Empirical Evaluation
Tarantula Ochiai χDebug Naish Muffler
Min 0.00 0.00 0.00 0.00 0.00
Max 87.89 84.25 93.85 78.46 55.38
Median 20.33 9.52 7.69 7.32 3.25
Mean 27.68 23.62 20.04 19.34 9.62
Stdev 28.29 26.36 24.61 23.86 13.22
Table: Statistics of code examination effort.
Among these five techniques, Muffler always scores the best in the rows that correspond to
the minimum, median, and mean code examination effort. In addition, Muffler gets much
lower standard deviation, which means that their performances vary less widely than
others, and are shown to be more stable in terms of effectiveness. Results also show that
Muffler reduces the average code examination effort from Naish by 50.26% (=100%-
(9.62%/19.34%).
24/34
27. Conclusion and future work
We propose Muffler, a technique using mutation to
help locate program faults.
On 123 faulty versions of seven programs, we conduct
a comparison of effectiveness and efficiency with
Naish technique. Results show that Muffler reduces the
average code examination effort on each faulty version
by 50.26%.
For future work, we plan to generalize our approach to
locate faults in multi-fault programs.
27/34
30. # Background
Mutation analysis, first proposed by Hamlet [Ham77] and
Demilo et al. [DLS78] , is a fault-based testing technique
used to measure the effectiveness of a test suite.
In mutation analysis, one introduces syntactic code
changes, one at a time, into a program to generate
various faulty programs (called mutants).
A mutation operator is a change-seeding rule to
generate a mutant from the original program.
[Ham77] R.G. Hamlet, Testing Programs with the Aid of a Compiler, Software Engineering, IEEE Transactions
on, vol. SE-3, (no. 4), pp. 279- 290, 1977.
[DLS78] R.A. DeMillo, R.J. Lipton and F.G. Sayward, Hints on Test Data Selection: Help for the Practicing
Programmer, Computer, vol. 11, (no. 4), pp. 34-41, 1978.
30/34
31. # Ranking functions
Tarantula [JHS02], Ochiai [AZV07], χDebug [WQZ+07], and Naish [NLR11]
Table: Ranking faunctions
[JHS02] J.A. Jones, M. J. Harrold, and J. Stasko. Visualization of test information to assist fault localization. In Proceedings of the
24th International Conference on Software Engineering (ICSE '02), pp. 467-477, 2002.
[AZV07] R. Abreu, P. Zoeteweij and A.J.C. Van Gemund, On the accuracy of spectrum-based fault localization, in Proc. Proceedings -
Testing: Academic and Industrial Conference Practice and Research Techniques, TAIC PART-Mutation 2007, pp. 89-98, 2007.
[WQZ+07] W.E. Wong, Yu Qi, Lei Zhao, and Kai-Yuan Cai. Effective Fault Localization using Code Coverage. In Proceedings of the
31st Annual International Computer Software and Applications Conference (COMPSAC '07), Vol. 1, pp. 449-456, 2007.
[NLR11] L. Naish, H. J. Lee, and K. Ramamohanarao, A model for spectra-based software diagnosis. ACM Transaction on Software
Engineering Methodology, 20(3):11, 2011. 31/34
32. # Our Approach – Muffler
Faulty
Test
Program
Suite
Instrument program
&
Execute against test suite
Coverage & Testing Results
Select statements to mutate
Candidate Statements
Mutate selected statements
Mutants
Run mutants against test suite
Legend
Changes of testing results
Calculate suspiciousness Input
&
Sort statements Process
Ranking List of all Output
statements
Figure: Dataflow diagram of Muffler. 32/34
33. # Our Approach – Muffler
Primary Key Secondary Key Additional Key
(imprecise when (invalid when (inclined to handle
multiple faults coincidental coincidental correctness)
occurs) correctness%
is high)
33/34
37. # An Example
Part III Part IV Muffler
Mutated statement for each mutant Changep→f Changep→f Changep→f Changep→f Changep→f Impact susp r
M1 if (!block_queue ) { 1644 1798 1101 1101 1644 1457.6 509354.4 8
M2 count = block_queue->mem_count != 1; 249 1097 1097 249 1382 814.8 510413.2 2
M3 n = (int) (count <= ratio) ; 249 1116 1101 494 1101 812.2 510415.8 2
M4 proc = find_nth(block_queue , ratio); 1088 638 1136 744 1382 997.6 510230.4 5
M5 if (!proc) { 1136 1358 1101 1382 1101 1215.6 510012.4 6
M6 block_queue = del_ele(block_queue , proc-1); 1123 349 1358 814 1358 1000.4 510251.6 4
M7 prio /= proc->priority; 1358 1358 1101 1101 1358 1255.2 509996.8 7
M8 prio_queue[prio] = append_ele(prio_queue[__MININT__] , proc); }} 598 598 1138 1358 1101 958.6 510293.4 3
Code examination effort to locate S2 and S3: 25%
Figure: Faulty version v2 of program “schedule”. 37/34
38. # Empirical Evaluation
Versus Versus Versus Versus
Tanrantula Ochiai χDebug Naish
More effective 102 96 93 89
Same effectiveness 19 23 23 25
Less effective 2 4 7 9
Table: Pair-wise comparison between
Muffler and existing techniques.
Muffler is more effective (examining more statements before encountering the faulty
statement) than Naish for 89 out of 123 faulty versions; is as effective (examining the same
number of statements before encountering the faulty statement) as Naish for 25 out of 123
faulty versions; and is less effective (examining less statements before encountering the
faulty statement) than Naish for only 9 out of 123 faulty versions.
38/34
39. # Empirical Evaluation
Experience on real faults
Faulty versions CC% Code examination effort
Naish Muffler
v5 1% 0% 0%
v9 7% 1% 0%
v17 31% 12% 7%
v28 49% 11% 5%
v29 99% 25% 9%
Table: Results with real faults in space
Five faulty versions are chosen to represent low, medium, and the high occurrence of
coincidental correctness. In this table, the column “CC%” presents the percentage of
coincidentally passed test cases out of all passed test cases. The columns under the head
“Code examination effort” present the percentage of code to be examined before the fault is
encountered.
39/34
40. # Empirical Evaluation
Efficiency analysis
Program suite CBFL (seconds) Muffler (seconds)
tcas 18.00 868.68
tot_info 11.92 573.12
schedule 34.02 2703.01
schedule2 27.76 1773.14
print_tokens 59.11 2530.17
print_tokens2 62.07 5062.87
replace 69.13 4139.19
Average 40.29 2521.46
Table: Time spent by each technique on subject programs.
We have shown experimentally that, by taking advantages from both coverage and mutation
impact, Muffler outperforms Naish regardless the occurrence of coincidental correctness.
Unfortunately, our approaches, Muffler need to execute piles of mutants to compute mutation
impact. The execution of mutants against the test suite may increase the time cost of fault
localization. The time mainly contains the cost of instrumentation, execution, and coverage
collection. From this table, we observe that Muffler takes approximately 62.59 times of
average time cost to the Naish technique.
40/34
41. # Empirical Evaluation
Efficiency analysis
Program Mutated Total Time per mutant
Mutants
suite statements statements (seconds)
tcas 40.15 65.10 199.90 4.26
tot_info 39.57 122.96 191.87 2.92
schedule 80.60 150.20 351.60 7.59
schedule2 75.33 127.56 327.78 5.32
print_tokens 67.43 189.86 260.29 9.49
print_tokens2 86.67 199.44 398.67 12.54
replace 71.14 242.86 305.93 13.30
Average 56.52 142.79 256.90 7.92
Table: Information about mutants generated.
This Table illustrates the detailed data about the number of mutated/total executable
statements, the number of mutants generated, and the time cost of running each mutant. For
example, of the program tcas, there are, on average, 40.15 statements that are mutated by
Muffler; and 65.10 executable statements in total; 199.90 mutants are generated and it takes
4.26 seconds to run each of them, on average. Notice that there is no need to collect coverage
from the mutants‟ executions, and it takes about 1/4 time to run a mutant without
instrumentation and coverage collection.
41/34
Editor's Notes
I assume that you have already known a lot of these techniques, so I only give a quick review.
Please find another definition, using passed runs to describ CC
Please remember to notate the CC, e.g., 1382.Please remember to add amination
Please remember to notate the CC, e.g., 1382.Please remember to add amination
It is worthwhile to mention that Muffler’s time cost can be greatly reduced with a simple test selection strategy. The strategy can be described as: do not re-run a test case that does not cover the mutated statement. Furthermore, because the executions of mutants do not depend on each other, we can parallelize them with not much effort. Nonetheless, we have to admit that Muffler need more time to offer a better effectiveness in fault localization.