3. Automating the process for building reliable software “Industry wide, 50% of a software budget is used for FAA, level A, software structural testing.” Mark Hall, Lockheed Martin, Test Manager
4. Agenda Who are we Unit & Integration Test Automation Traceability Automated regression testing Software Development Philosophies
6. Manual Unit & Integration Testing Determine whichunits to stub Build stubs Build test driverfor the unit (main) Modify harness components for next test case Determine level ofsource code coverage Debug Compile test harness(Test driver + stubs) Build test reportsfor individual test cases Gather test data resultsinto readable format Executing test harness
15. Common testing pitfalls Pitfalls Each developer decides how to test their components No common practices so difficult to peer review Work is done manually or with home-grown tools Manually = time-consuming Home-grown tools have limited capabilities Results Effort is difficult to coordinate Difficult to know how expensive it is Scripts cannot be reused from phase to phase Unit test scripts abandoned when integrating Regression testing difficult
16. Automating the process Automatically build Test framework for the Unit(s) Under Test Parser - determine data / call dependencies between units Build Test Driver - invoke subprograms within UUT Stubbing – automatic stubbing based on specifications or real code Construct Test Cases Intuitive Interface - construction utility with type/data understanding Basis Path Analysis - identify execution paths for 100% code coverage Record & Report Test Data Execution Management - Run Test Cases generated Test Reporting - Standards Compliant Coverage Analysis – Structural, Branch, MC/DC
19. Code Analysis, Structural Code Coverage When to stop testing? If some code is not tested, you need more tests? Testing without measuring code coverage exercises 55% of code (Robert O’Grady, HP) Why measuring code coverage is important: Identify unexecuted program statements Verify that each path is taken at least once Coverage appears to be “useful” BUT: what is Structural Code Coverage? Entry 1 3 2 5 4 Exit Coverage Sample
20. Types of Coverage Statement coverage The executable lines of source code are being encountered as the program runs Statement != line of code Careful about the following: int* p = NULL; if (condition) p = &variable; *p = 123; 100% coverage with 1 test will miss this pointer bug
21. Types of Coverage (cont.) Branch coverage The outcomes have occurred for each branch point in the program Reports whether Boolean expressions tested in control structures evaluated to both true and false. Careful about the following code: if (condition1 && (condition2 || function1())) statement1; else statement2; Could potentially consider this control structure completely exercised without the call to function1()
22. Types of Coverage (cont.) Modified Condition/Decision Coverage Similar to Branch coverage and statement coverage Also reports on the values of all Boolean components of a complex conditional For example: if (a && b) is a complex conditional with two terms MC/DC requires enough test cases to verify every condition that can affect the result of its encompassing decision Created at Boeing and is required for aviation software for DO-178B certification.
23. What does 100% coverage prove? There are classes of errors which certain coverage types cannot determine: int * h = 0; if (x){ h = &(a[1]); } if (y){ return *h; } return 0; 100% coverage with (x, y) set to (1, 1) and (0, 0) But, (0, 1) => OOPS!? Dereference fail Therefore does, 100% coverage imply 100% tested? No! Tests for 100% Bugs Tests for 100% Coverage
24. Basis Path Analysis Decision outcomes within a software function should be tested independently Errors based on interactions between decision outcomes are more likely to be detected Expansion of cyclomatic complexity metric basis path tests cases required = cyclomatic complexity Complexity is correlated with errors testing effort is concentrated on error-prone software. Minimum required number of tests is known in advance testing process can be planned and monitored in greater detail than with other testing strategies Basis paths must be linearly independent
26. Traceability Why is traceability important 100% Code coverage ≠ 100% Application Complete 100% Test Cases Passed ≠ 100% Application Complete 100% Requirements passed ≠ 100% Application Complete 100% Code coverage + 100% Test Cases Passed +100% Requirements passed = 100% Application Complete There needs to be an efficient way to measure and link in real time: System and Derived Requirements Code Coverage Test Case Results
27. Traceability: Requirements Gateway Permit bi-directional flow of data between Requirements Management Tool and Test Case Management Tool Linking Requirements Intuitive and simple while constructing test cases For Each Test Case, the linked requirements should show: Requirement Identifier Requirement Description For Each Requirement, the linked test cases should show: Test Name Test Status {pass | fail | none} Test Coverage {% coverage, type of coverage}
28. Traceability Example - Test Tool & Requirements Tool Integration Test Case Linked Requirements System Requirements
29. Automated Regression testing Why is it important? Cost Reduction Streamline the functional testing process Prevent Defect Propagation New defects caught as soon as they are introduced Project Health Total number of project tests, Percentages of classes and methods, and Pass/failure rates
30. Automating the regression testing process Manages the Regression Test Process Execution of test cases Collation of data Across multiple configurations and baselines Key metrics: number of failures, overall code coverage, total number of tests, and percentage of classes and methods with direct tests Database Stores data from all regression test runs in an SQL database Reporting Provides metrics reports at any level from all Unit(s) Under Test. Graphing of test data from prior runs
32. Regression Testing Reports, an example Graphic Report View: Shows a graph based on the results filtered by Report Options
33. Software Development Philosophies There are many approaches to developing systems Agile, Behavior Driven, Design-driven, Lean software, Rapid application, etc The later we identify a defect in our system, the costlier it is to fix this defect common solution to this is to introduce code inspections, or code static analysis
34. Errors by classification The scope of most errors is fairly limited 85% of errors can be corrected without modifying more than one routine (Endres 1975) Most construction errors are the programmers’ fault 95% are caused by programmers, 2% by systems software (i.e. Compiler and OS), 2% by some other software, and 1% by the hardware (Code Complete) Therefore, we can draw the conclusion, the sooner we test the programmers software, the sooner we can find the majority of the errors in the system
36. Test Driven Development Derived from test-first programming concepts of extreme programming Develop test cases prior to developing the application code you are about to write. Relies on the repetition of a very short development cycle Can also be used for legacy code projects
37. The 3 Laws of Test Driven Design You may not write production code until you have written a failing unit test You may not write more of a unit test than is sufficient to fail, and not compiling is failing You may not write more production code than is sufficient to pass the currently failing test
39. Benefits of Test Driven Design Catch design flaws earlier Design tested at granular level prior to writing code Easily identify missing requirements Easily identify missing test cases Reduce testing time by a factor of 10 Easy to migrate tests into continuous integration Cheaper to fix a bug
40. Requirements for Automating Test Driven Design Unit test automation tool can be used Unit test automation tool should be able to handle changes to underlying code base as it matures Direct link between test case and resulting code coverage The ability to stub functions in the same module
43. In Summary Typically 50% of the cost of a reliability process is in validation Validation costs can be reduced by using an automated unit testing tool Code coverage mixed with basis path testing can improve the quality of software Traceability between test cases, requirements and code coverage helps ensure a complete application is shipped Automated regression testing leads to continuous validation Test Driven Development is a cost effective way of developing reliable systems
45. References Wikipedia Clean Code, Robert C Martin, Prentice Hall Code Complete, Steve McConnell, Microsoft Press
Editor's Notes
Introduce yourselfHi my name is Simon Watson, and I am a field applications engineer for Vector Software in London. My role is to support our customers in Europe in the use of VectorCAST, our software test automation tool. I also work closely with our partners and distributors helping them to promote VectorCAST, our test automation tool solution.Before joining Vector Software, I was working as a developer and system integrator writing embedded software to provide secure communications products for financial services companies.The presentation for today will discuss how we can go about automating, and hence reducing the cost of, developing reliable software. Most of the cost and effort involved in developing reliable software goes into the software process around each line of code. By process we mean traceability back to design and requirements and also verification that we have completely tested the system to an appropriate level for the required reliability. Looking at these elements of the process the verification aspect is the most expensive. Hopefully, there will be some pointers in this presentation that you can take into your current projects. If you have any questions at any point, please free to stop me and ask them. Opening QuestionsBefore we go on, I have some questions, just to give me an idea of the kind of audience we have in the room.How many projects are represented here today and are any of those safety or mission critical?How many of you are software testers?How many of you are software developers ? Out of curiosity of those of you that develop software, do you also unit test your own software?How do you unit test your software? Are you using a tool, or it automatedWhich languages are you using on these projects, C, C++, Ada?What internal or external standards or processes are used in developing software?How does your company know that your product is ready to ship to the customer?
Vector Software has the opportunity to work with a huge variety of customers and understand how their testing process works. One of the key metrics that stands out to us, is how much it costs to build reliable software. Typically, we see customers saying that using a manual process, the cost of testing their software represents about 50-60% of their budget. In fact, we have this quote from Lockheed Martin in the US, a gentleman by the name of Mark Hall. Industry wide, 50% of a software budget is used for FAA, level A, software structural testing.” This is the reason our company was formed 20 years ago, to try and find ways to reduce the significant cost of validating and testing high reliability applications.
This slide shows us the manual unit and integration testing process.We choose a file that contains a function we want to test.Since we can only execute a complete application, to satisfy the linker we have to build stubs to replace all the called/dependent functions that are not in our source file, we have to supply all the global objects and a driver to call the function under test.Then we have to define input data that we are going to provide as parameters to the function under test, initialisation for global objects and input data returned by stubs that our function callsThen we have to define at least one expected value that is some kind of output or result from the function under test. That might be a C return value, a value passed by reference to our called function, a global that is modified or a value passed into a called function that has been stubbed. The expected value is a test that determines whether the function behaved correctly.We have to add lines of instrumentation to record coverage outputWe compile and link and run the test harness program, debug it as necessary and then record the results.We do that multiple times for a given function under test to obtain 100% coverage with correct and malicious input values chosen both to prove correct operation and to try and break the function.We repeat that process for other source files.We repeat that process for groups of source files to test a complete class or project.
In the previous slide we talked about writing driver code to call a function under test and stubs to replaced called functions. How much code do we typically need ?Looking at Microsoft Windows NT4 as a sample project, this was supposed to be Microsoft’s secure server platform; that application had 6 million lines of code and Microsoft wrote 12 million lines of test code to verify it. So that is twice as many lines of test code as application code. Even so, we all know how many defects NT4 shipped with. Looking at safety critical applications, typically we need not 2 but 5 lines of test code per line of application code to achieve statement and branch coverage, rising to 10 lines of test code to achieve 100% MCDC coverage.As you can see there is a significant amount of test code to manage and maintain, in addition to the application code.Furthermore that test code is more likely to contain errors than the application code because the application code was written according to a process but there is no defined process for writing the test code.Our experience working with customers is that, for manual testing, 75% of the testing effort is spent building test harnesses and just 25% of the effort goes into running the test cases.Since 50% of the overall budget is spent on testing, that means more than 30% of the total software budget is spent on preparing to test.You can see that any improvement we can make in the process of building test harnesses will bring significant cost savings to the project.
Lets look at some of the problems that come from the manual test harness process.Since every developer writes their own test framework, 2 developers will write 2 different test harnesses for the same test. If they peer-review or share their work, they will waste time understanding or disagreeing with each other’s testing work. When a developer leaves the team, part of the test process knowledge will leave with them.When test harnesses are built manually, developers will typically write some kind of in-house scripts to automate the execution, the collecting of results and the flagging up of test cases that failed to obtain the expected results. What you will find is that those scripts end up strongly dependent on the project or the execution platform in some way. The more sophisticated those in-house scripts are, the more probably they wont be portable, for example from execution under Linux to Windows, or from project to project.Test harnesses will be locked to the syntax of the functions that are being tested and of the stubs of the dependent functions. When the parameter list changes as the application is developed, the test harness will need to be edited to match the changes. When we move from the unit test phase to the integration test phase, the original test harness will need to be updated because the list of stubs will change and we risk losing the original unit test framework. If system testing encounters a bug that is fixed in the source code, the unit test probably will not be updated because the system tester didn’t write the unit test framework and doesn’t know how to make the change. All these factors lead to manual testing becoming unmaintainable in practice.
In this slide we are looking at the test harness construction diagramatically :Your project is the set of source files to the left had side, categorised in those 3 colours, red, blue and purple.The red files are the files containing functions we want to test. When we talk about a unit we mean a C or C++ source file or an Ada package. The white rectangles are those functions and the test harness will make those functions visible for test cases. The blue files are source code we have decided not to include in our test. That might be because they have not been written yet or because excluding them, and replacing them by stubs, will allow better control over the testing process. The tool will need to write stubs for all the functions in the blue files that are needed to satisfy the linker.The purple files are code that we want to call when we run the test cases. These usually contain libraries or operating system functionality.The green section is the test driver that will stimulate the function under test. Again this will be written by the automation tool.
Lets take a look at a concept called basis path analysis. How many people have heard of the cyclometric code complexity measurement by McCabe?How many people are familiar with basis path analysis – which is the application of this measurement to testing?The cyclocmetric code complexity of a piece of code is equal to the number of decision points plus 1.As an example, a straight line function has a complexity of 1. Every deviation from the straight line for an if statement or while loop or for loop adds 1. Each case statement beyond the default inside a switch statement adds 1.The theory is that complexity is directly correlated with errors. Many organisations place a limit on the complexity of their software when developing the code but never use the metric any further in the software process.The key concept behind the basis path analysis is that decision outcomes should be tested independently. So the number of basis path tests will be equal to the number of basis paths which will be equal to the cyclometric code complexity.Basis path will also tell us the minimum number of test cases that are required to verify every independent path in a code segment.Finally, all of the paths in a basis path set should be linearly independent. This means that no path in the basis path set should be definable using other basis paths in the set.
If we return to our example we can now see that the code complexity for this function will be 3. We get this from the fact that there are 2 ‘if’ conditionals + 1. We then also get our 3 linearly independent basis paths, (False, True), (False, False) and (True, True). These paths translate to the values for X and Y, (0,1), (0,0) and (1,1). As can be seen here, the first case will execute the scenario resulting in the NULL pointer dereference and so correctly identify the coding error.A useful measure when validating an existing test set, is to use Basis Path coverage in conjunction with statement and MC/DC coverage. This can quickly generate missing test cases to ensure the code has been covered completely.
Again a further screen shot, showing some graphing of the results. This is to give you an idea of how powerful visualising the data can be in reviewing the health of a system. We can see straight away the test environments and hence source code that needs attention.
There are many approaches to developing systems. IF we recall the graph we review earlier, the later we identify a defect in our system, the costlier it is to fix this defect. A common approach to reduce bug detection cost, is to introduce code inspections, or code static analysis. However there are approaches to software development that can produce even faster bug detection. The techniques are knows as Extreme Programming or Agile techniques. In particular, Test Driven Development.