Introduce the outline of the course. Explain to students that the following module (Module 0) is included in all Rational University methodology courses to give an overview of software development. To complete this course in the time available, it is important that you cover this module briefly; do not get bogged down in the details. The extra information is there to provide context and further information to be read later. You can offer to discuss further during lunch or after class.
Note that there are other courses in the Rational University curriculum that focus on using Rational tools, and that this course does not. Rather, this course focuses on the concepts, process, and practices that will help you be a better software tester.
Introduce the outline of the course. Note the comments in the student notes. These comments are designed to ease the student into the somewhat “non-linear” style of delivery the course uses. Explain to students that the following module (Module 1) is included in a similar format in all Rational University methodology courses to give an overview of software development. The idea is to level-set the classes understanding of these practices, and discuss them as they apply to the specific focus of this course. To complete this course in the time available, it is important that you cover the first module s briefly. This may be a challenge! This course will introduce various key concepts incrementally , in a “fundamentals first” delivery . Initially, each concept will be discussed somewhat in isolation, and progressively “woven” into a collective framework of concepts as the course progresses .
In this module, we explore a number of software engineering practices and explain why these are considered to be good practices to follow. We will also look at how a software engineering process helps you to implement these and many other engineering practices. The important thing for you to teach your students in this module is how each of these practices relate to software testing. Your goal should be to help your students understand enough about each practice that they can work cohesively as productive members of a software project , especially one that follows RUP. Highlight how each practice will have an effect on the way in which testing is undertaken in terms of planning (i.e. iterations), the role that testing will play (i.e. objective assessment of iteration objectives) and in terms of the information that is available to base testing on (e.g. risks, use cases, architecture). Note that for many slides the title offers a good cue to talk to. You can also use the animation callouts to cue your delivery.
It can be effective to present this slide as a group discussion, asking the students to give examples of where they have seen these symptoms. This acts as an “ice breaker”, encouraging early participation and gives the students the opportunity to share some of their experiences. Alternatively, you might pick one or two to talk about from your own experience. Avoid reciting the entire list - it doesn’t add a lot of value. Be careful to limit any discussion to an appropriate amount of time.
In the case of our six software engineering practices, the whole is much greater than the sum of the parts. Each of the practices reinforces and, in some cases, enables the others. The slide shows just one example: how iterative development leverages the other five software engineering practices. However, each of the other five practices also enhances iterative development. For example, iterative development done without adequate requirements management typically fail s to converge on a solution : requirements change at will, users can’t agree, and the iterations never reach closure . When requirements are managed, this is less likely to happen. Changes to requirements are visible, and the impact to the development process assessed before they are accepted. Convergence on a stable set of requirements is assured. Similarly, each pair of practices provides mutual support. Hence, although it is possible to use one practice without the others, additional benefits are realized by combining them. This slide may be confusing unless it is explained properly. We have discussed each Proven Practice individually in this module. This slide is intended to illustrate how the se Practices used together provide more benefit than each individually. The slide only illustrates how ONE of the Practices (Develop Iteratively) supports the others. As revision, you might ask your students to share their thoughts on how Quality from the Start relate s to the other practices. Many comments are made in the preceding slides that will help both you as instructor and the students understand some of the interrelationships. Be careful not to spend too much time here. If you have already taken enough time on this module, skip the discussion.
See accompanying Word Doc for detailed instructor notes.
The highlighted terms are the key terms and concepts introduced in this module.
Transition Slide. Don’t spend any time here.
Functional testing started with the proposal that we treat a program as a function. To test it, we would feed it inputs and check its outputs. In functional testing, knowledge of the inner workings of the “function” is less important than knowing what the function is supposed to do. For that reason, functional testing is often called Black box testing (testing the program as a “black box” without knowledge of the internals) or Behavioral testing (focusing on the visible behavior of the program in response to, which may or may not be predicted from, analysis of the source code). Functional testing was sometimes distinguished from “non-functional” testing, which looked at “qualities of service” characteristics of the program that spanned many functions, such as performance or usability.
Many software developers and QA staff define defects in terms of a failure to conform to a specification. Unfortunately, the product can be defective even if it conforms perfectly to a (defective) specification. The spec is a partial description of the intended product. Quality is not defined in terms of match to a spec. It is defined in terms of match to the needs of the stakeholders. References for these definitions: J. M. Juran & Frank Gryna, Quality Planning & Analysis: From Product Development Through Use, 2nd Edition, 1980. Jack Campanella (Ed.), Principles of Quality Costs: Principles, Implementation and Use, 2nd Edition, 1990. Philip R. Crosby , Quality Without Tears, 1984. Armand Feigenbaum, Total Quality Control,Revised (Fortieth Anniversary Edition) , 1991.
Prof. Kaner provides these examples: Satisfiers are the aspects of the program that make you want to buy it and want to keep using it. Feature comparison lists are comparisons of satisfiers. Consumer Reports product reviews primarily focus on satisfiers. Dissatisfiers are the aspects of the program that make you complain about it or decide not to keep using it. A good illustration of the distinction between a focus on satisfiers and a focus on dissatisfiers is the often repeated debates between marketers and testers: One group will shout about the need for features The other will answer, “First fix what you have…” For Juran’s discussion of these terms, see: J.M. Juran, Juran on Planning for Quality, 1988
From Gerald M. Weinberg, Quality Software Management: Volume 1, Systems Thinking, 1997.
Early in development, report any defect, even if you believe it will be triaged out. Later in development, testers should exercise more judgment. It costs time to process a change request—up to 8 hours (all people’s time included) in some companies, and problems are less likely to be fixed. Low priority change requests may be a distraction at the end of the project. Processing them takes time away from fixing other problems, and they are often seen as a distraction. Some companies set up a second change request database. Testers report late-found low-priority defects into this database. Other companies tag defects for future releases. The project manager or a senior programmer skims these reports for issues that raise red flags. These requests are re-opened and fully evaluated at the start of development of the next major release.
The traditional view of FURPS is: Functional testing verifies that a system executes the required use-case scenarios as intended. Functional tests may include the testing of features, usage scenarios and security. Usability testing evaluates the application from the user’s perspective. Usability tests focus on human factors, aesthetics, consistency in the user interface, online and context-sensitive help, wizards and agents, user documentation, and training materials. Reliability testing verifies that the application performs reliably and is not prone to failures during execution (crashes, hangs, memory leaks). Effective reliability testing requires specialized tools. Reliability tests include integrity, structure, stress, contention and volume tests. Performance testing checks that the target system works functionally and reliably under production load. Performance tests include benchmark tests, load tests, and performance profile tests. Supportability testing verifies that the application can be deployed as intended. Supportability tests include installation and configuration tests. Consider the FURPS model a generator of ideas and summary of categories. There are really many more dimensions of the problem to consider as we see on the next slide.
FURPS is often seen as a conceptually complete system, and it may be. But where would you list “accessibility” in FURPS? (Answer – probably in Usability). Or printer compatibility? (Answer – probably in supportability). Even though many different dimensions fit within the FURPS five, test designers often find it useful to work from a longer list of qualities of service, perhaps generating several test cases of a given feature from each dimension. Note that you cannot test every area of the program fully against a list of quality dimensions. There are (as always with testing) too many possible tests. Somehow, you will have to find a way to cull the “best ideas out” and test using those. Projects vary in risk and objective, so the “best ideas” list will be different for each program. Note these are dimensions can be thought of as qualities of service you wish your application to provide. They are not test techniques. Test techniques tend to evaluate several of these at a time. The value of these dimensions is to help you ask the question: Do I need to check the quality of service of application X along this dimension? Soon we’ll talk about the Test Ideas List as a way of capturing the drill down questions for the dimensions. ============= Notes on some of the dimensions: Accessibility refers to the usability of the program for someone with a handicap. For example, a program could be accessible to the colorblind or to the deaf or to someone who cannot control a mouse. Supportability refers to the extent to which the program can be supported, e.g. by a tech support representative. A more supportable program might have diagnostics for troubleshooting, or informative error messages. Maintainability refers to the extent to which a program can be safely and easily modified.
There are three concepts to convey in this section: A Test Idea –A single test idea as described in the slide text. A Test-Ideas List –A list of test ideas applicable to a specific target of testing. A Test-Ideas Catalog –A generalized collection of test ideas that can be applied to other testing targets in a similar context.
Facilitating and Recording Suggestions: Exercise patience: Goal is to get lots of ideas. Encourage non-speakers to speak. Use multiple colors when recording Echo the speaker’s words. Record the speaker’s words The rule of three 10’s—don’t cut off the brainstorm until there have been three 10 second (or longer) silent periods. Silent times during a brainstorm are useful—people are thinking. Silence is OK. Switch levels of analysis. Some references: S. Kaner, Lind, Toldi, Fisk & Berger, Facilitator’s Guide to Participatory Decision-Making Freedman & Weinberg, Handbook of Walkthroughs, Inspections & Technical Reviews Doyle & Straus, How to Make Meetings Work .
Do this as a group brainstorm exercise, according to the brainstorming rules covered on the previous slide.
Here is an answer key. (In a minute, we’ll generalize the exercise.) Murphy’s Law of Zero: If you can enter a zero into a field, someone can divide by it. Test for overflows on all fields. A large percentage of security problems involve buffer overflows.
For each item that you target some testing against, it is useful to create a list of test ideas to be considered against that item. It’s easier to maintain a “task list” of ideas for each item to be tested, rather than maintaining detailed documentation about each specific test. What are some other good sources for test ideas lists? Bug lists (for example, Testing Computer Software’s appendix, and www.bugnet.com ) Business domain. For example, walk through the auction web site, the shopping cart, customer service app, and for each one, list a series of related ideas for testing. Technology framework: COM, .NET J2EE, … Fault models Representative exemplars (such as the “best” examples of devices to use for compatibility and configuration testing. Testing Computer Software illustrates this in its chapter on printer testing.) A best example might not be the most popular or the most reliable. It is "best" representative of a class if testing it is likely to yield more information than testing other members of the class. So if a group of printers are allegedly compatible, but one has slightly weaker error handling, you might test with the weaker printer. If the program can pass testing with that one, it can pass with the rest. A good example of this process in a business domain is the paper by Giri Vijayaraghavan & Cem Kaner, “Bugs in your shopping cart: A Taxonomy”, in Proceedings of 15th International Software Quality Week , 2002. Now let’s step back. In the RUP, the Test Ideas List is input to the activities: Implement Test Define Test Details Develop Test Guidelines Determine Test Results
Would anyone really conduct tests for all of these ideas ? It depends on the perceived importance and risk of the feature, but in general – No. You can’t run all of the interesting tests against all of the variables you would like to. There just isn’t enough time. A generic list of test ideas gives you a collection of good ideas that you can sample from : this is referred to as a test-ideas catalog . A good catalog helps you manage the infinite number of possible tests. For example, y ou might not test every integer variable with every member of this idea s catalog , but you might make sure that you test every variable, and that any specific member of the catalog is tested against at least a few variables. If you find an error, you might base more test s on the idea that led you to the error . As you refine the brainstorm list, look for variants on each test that might yield more powerful cases. For example, if a student suggests alpha characters (not numeric), point out that numbers are received (inputs) as ASCII characters 48-57 (Decimal). So, the characters whose codes are Decimal 47 (“/”) and 58 (“;”) are interesting boundary cases. Do we use all of these test ideas all the time everywhere? Of course not. How much do you need to do to “check off the idea”? When are we comfortable with having done enough? It depends. For example, if the programmers do extensive unit testing in your company already, you might do these simple boundary tests very lightly, saving your time for complex scenarios.
A test - idea catalog is a collection of test ideas useful in addressing a particular test objective (e.g. boundary test cases for numeric fields). A test project can make use of many catalo g s based on test scope. Test - idea catalogs are good reminders for experienced staff, because they capture thinking that doesn’t need to be redone time and again. They are especially handy when adding a new tester near the end of the project. It’s best to hire (or contract with) experienced testers for end-of-project work, but even if they are experienced in general, they still won’t know your product or how best to test it. In either case, t he challenge of late additions to staff is that you don’t have time to train them and you need their productivity immediately. Test - idea catalogs can help a new person gain productivity quickly. They act as “training wheels”. It’s worth repeating: Test ideas are not test cases : they are the ideas from which you derive test cases Test ideas are not test techniques : they provide ideas from which you decide which techniques to apply Although you might consider all the test ideas in the catalog each time you use it, you usually won’t conduct a test for every idea in the catalog each time you test. Brainstorm exercise (part 1): Where else would you want to have a test - idea catalog ? Where would it make sense to develop (harvest) one in your company?
A test matrix is a useful way of working with the ideas from a catalog. Imagine looking through the test ideas in the catalog , selecting a subset that you think will be useful in the context of your current work. Next, imagine walking through all of the dialog boxes in your application and filling in the field names of all the numeric input fields. Now test each field in turn, checking each test idea in the matrix off as you conduct an associated test. Perhaps you would highlight cases that pass by coloring the cells green (use a green highlighter if you’re doing this on a printout of the matrix) and highlight the failing cases pink. A matrix like this is easy to delegate to a new member of the testing team. Many groups bring experienced testers into the project late in development to cope with schedule slippage. The new testers know how to test, but they don’t know what to do on this project. They can figure out the how (what steps to take to conduct a test) but don’t yet know which tests to run or why. A matrix or catalog can be very useful for these testers, helping them to get up to speed quickly. Many groups find matrices like these much more useful, for dealing with tests that are routine and well understood than full test case descriptions. Many test managers find matrices like this a good assessment tool for evaluating the test work being performed.
See accompanying Word Doc for detailed instructor notes. We just looked at test ideas; now we’re going to the other end of the management scale: How do we decide what we are going to do in the project? In the iteration? Our focus in this module will cover: Test plans: What level of detail is appropriate? IEEE Standard 829-type templates The issues of time, cost, flexibility and maintainability Requirements for test docs Formulating a mission statement for test docs.
How good a test group is depends on how well the group satisfies its mission. How good its mission is depends on the test group’s interaction with the rest of the organization. Group 1 is a failing group if the organization’s goal for testing is assessment Group 2 is failing if the organization’s goal is maximize bug count. The mission as seen by the test group must match the mission as seen outside of the test group, or whatever the test group achieves will be perceived by others as inadequate. James Bach uses an interesting metaphor here (ref. Lessons Learned in Software Testing, page 1), when he calls the test team the “headlights of the project”. Of the two sensible answers to the mission that we’ve considered so far, which one fits you? Note that this is closely tied to your definition of quality. Are you looking for: Conformance to specs, or Nonconformance to user expectation? Discussion point: What is your mission?
Even if the test group has an overall mission, its objectives will vary over the life of the project. For example, a group whose primary role was defect-hunting through most of the project might be expected to provide quality evaluations as the project gets closer to its planned release date. It is important for the test group to decide its guiding objectives for each iteration, and to reassess these as part of the preparation for each iteration. In addition to thinking about your own mission, think about other companies. Where would you guess Boeing’s mission is? How about Microsoft’s? Is the test activity creative or investigative? How independent do you think a test group should be? What happens if you do not negotiate the mission?
A programmer’s public bug rate includes all bugs left in the code when it is given to someone else (such as a tester.) Rates of one bug per hundred statements are not unusual, and several programmers’ rates are higher (such as three bugs per hundred). A programmer’s private bug rate includes all the bugs that are produced, including the ones already fixed before passing the program to testing. Estimates of private bug rates have ranged from 15 to 150 bugs per 100 statements. Therefore, programmers must be finding and fixing between 80% and 99.3% of their own bugs before their code goes into test. (Even the sloppy ones find and fix a lot of their own bugs.) What does this tell us about our task? It says that we’re looking into the programmer’s (and tools’) blind spots. Merely repeating the types of tests that the programmers did won’t yield more bugs. That’s one of the reasons that an alternative approach is so valuable. Conclusion: Unless the tester's methods are different from the programmer's, the tester will be going over already well tested grounds. Test activity does not happen independent of a development process. Take a look at your development process. How much unit testing do your developers do and how is its coverage tracked? Do testers know what the devs are already doing? Do testers know the risks by dev asset? Do they know whether boundary testing is being done in devt? Thinking about public vs. private bug rates takes a very different look at the problem.
Diversified . Include a variety of techniques. Each technique is tailored to expose certain types of problems, and is virtually blind to others. Combining them allows you to find problems that would be hard to find if you spent the same resource on a narrower collection of techniques. Risk-focused . Tests give you the opportunity to find defects or attributes of the software that will disappoint, alienate, or harm a stakeholder. You can’t run all possible tests. To be efficient, you should think about the types of problems that are plausibly in this product or that would make a difference if they were in this product, and make sure that you test for them. Product-specific . Generic test approaches don’t work. Your needs and resources will vary across products. The risks vary across products. Therefore the balance of investment in different techniques should vary across products. Practical . There’s no point defining an approach that is beyond your project’s capabilities (including time, budget, equipment, and staff skills). For example, you won’t be likely to succeed if you try to build a fully automated test plan if you have a team full of non-programmers. Defensible . Can you explain and justify the work that you are doing? Does your approach allow you to track and report progress and effectiveness? If you can’t report or justify your work, are you likely to be funded as well as you need? The Mission guides your testing objective in the iteration. Now we’re going to think about the How question: The Test Approach.
It would be helpful to crack the book, look at the table pages (257-259), and talk about some of the heuristics. For example, Bach suggests that you maximize diversity in your testing. Is this always practical? Do you risk trading off depth against breadth? Has anyone in the class worked on a project in which significant bugs were missed that might have been more easily found by a technique that wasn’t used? A heuristic is a rule of thumb, a rule that is useful but not always correct. Example of a heuristic The test approach should focus most effort on areas of potential technical risk, while still putting some effort into low risk areas just in case the risk analysis is wrong. Basis for this heuristic Complete testing is impossible, so we have to select the tests to run. Ideally, we would run the tests that promise to provide the most useful information. However, no risk analysis is perfect. We have to put some effort into checking out the areas that appear to be low risk, just in case. The full list is included as a table in the course textbook, Lessons Learned in Software Testing , pages 257-259.
IEEE Standard 829 for software test documentation is a standard initially published by the Institute for Electrical and Electronics Engineers (1983) and later approved by the American National Standards Institute. The standard describes a wide range of types of information that can be included in test documentation. For examples, see the next slide. This is an overview slide. We’ll cover the topics listed here in the next slides. The underlying topic can be summarized as follows: What test documentation makes sense for your project? Don’t start with “How?” or “What should it look like?”, but start with, “What is it supposed to do?”
The purpose statement for test documentation should follow from the test mission statement. Get explicit stakeholder agreement on the purpose and nature of documentation you are expected to produce.
See accompanying Word Doc for detailed instructor notes.
Context Slide. Avoid spending too much time here: spend enough time to give the students a high-level understanding. Answer any related questions they have, then move on. The main point is to briefly explain the context and scope of this module relative to the rest of the course. Explain that the Test and Evaluate Workflow Detail will be delivered in two parts: the first focusing on Test Techniques, the second on Evaluating the results of the Tests. Explain briefly that Test & Evaluate is the core RUP Workflow Detail for the Software Tester. Relate it back to the Define Evaluation Mission Workflow Detail in which Mission and Test Approach were discussed. The purpose of this workflow detail is to achieve appropriate breadth and depth of the test effort to enable a sufficient evaluation of the Target Test Items — where sufficient evaluation is governed by the Test Motivators and Evaluation Mission. For each test cycle, this work is focused mainly on: Achieving suitable breadth and depth in the test and evaluation work This is the heart of the test cycle, doing the testing itself and analyzing the results.
Context Slide. Avoid spending too much time here: spend enough time to help the students understand the concepts covered in this module in relation to the previous modules. Answer any related questions they have, then move on. Explain that this module covers the use of different test techniques to implement tests. Discuss how different test ideas may be best realized using very different techniques. Explain how the test ideas discussed in the earlier modules are key input to selecting the appropriate techniques. Note that the remaining items in this RUP workflow detail will be covered in the following module. Here are the roles, activities and artifacts RUP focuses on in this work. In earlier modules, we discussed how identifying test ideas is a useful way to reason about tests early in the lifecycle without needing to completely define each specific test. In this module we’ll look at a selection of techniques that can be used to apply those test ideas. In the next module, we’ll talk more about evaluating the output of the tests that have been run. Note that diagram shows some grayed-out elements: these are additional testing elements that RUP provides guidance for which not covered directly in this course. You can found out more about these elements by consulting RUP directly.
In Module 4, we discussed Test Approach and mentioned techniques. Here we’ll drill into the techniques that you might use.
When people talk about test techniques, it’s often hard to tell what they mean. They use different words to mean the same thing, and the same words to mean different things. Class discussion exercise. What’s the distinction among these three? User testing involves testing with people who will be the users of the product. Usability testing looks at how easy the product is to learn and use. You might do this testing with end users but many usability tests (such as performance tests, or counts of the number of steps involved to complete a task) can be done by anyone. User interface testing involves testing the elements of the user interface, such as the menus and other controls. They’re not mutually exclusive. Each focuses on a different dimension: The tester who does the testing The risk that the testing mitigates The coverage desired from the testing Now let’s generalize…
Examples of the dimensions: Testers : User testing is focused on testing by members of your target market, people who would normally use the product. Coverage : User interface testing is focused on the elements of the user interface, such as the menus and other controls. Focusing on this testing involves testing every UI element. Potential problems : Testing for usability errors or other problems that would make people abandon the product or be unhappy with it. Activities : Exploratory testing. Evaluation : Comparison to a result provided by a known good program, a test oracle . Functional testing is roughly synonymous with “behavioral testing” or “black box” testing. The fundamental idea is that your testing is focused on the inputs that you give the program and the responses you get from it. A wide range of techniques fit within this general approach. We just gave examples of the first three. Activity based testing – examples here would be GUI regression test or exploratory testing. Evaluation is about how you determine whether the test passed or failed – often this is called the oracle (in the Greek sense).
No one uses all of these techniques. Some companies focus primarily on one of them (different ones for different companies). This is too narrow—problems that are easy to find under one technique are much harder to find under some others. We’ll walk through a selection of techniques, trying to get a sense of what it’s like to analyze a system through the eyes of a tester who focuses on one or another of these techniques. You might be tempted to try to add several of these approaches to your company’s repertoire at the same time. That may not be wise. You might be better off adding one technique, getting good at it, and then adding the next. Many highly effective groups focus on a few of these approaches, perhaps four, rather than trying to be excellent with all of them. Discussion points: Applicability of each technique based on phases in an iterative software development lifecycle Multiple techniques work Do simple tests first.
In Module 3, we introduced the concept of RUP phases and iterations within the phases. In considering and planning test techniques for an iteration, it is important to look at the techniques according to several characteristics. The techniques that are appropriate in early iterations may lose their effectiveness in later iterations, when the software under test is more robust. Similarly, techniques that are useful in late iterations may be inefficient if applied too early. This refers back to the introduction to the RUP in Module 3.
Zip through this – where are we in the agenda slide only.
About these grids: They all have the same format, to help you compare and contract techniques. However, for any one of the techniques, many of the characteristics are secondary, so they are in gray. The primary characteristics are in yellow.
Some function testing tasks: Identify the program’s features / commands From specifications or the draft user manual From walking through the user interface From trying commands at the command line From searching the program or resource files for command names Identify variables used by the functions and test their boundaries. Identify environmental variables that may constrain the function under test. Use each function in a mainstream way (positive testing) and push it in as many ways as possible, as hard as possible. Many companies use a function testing approach early in testing, to check whether the basic functionality of the program is present and reasonably stable. Take Home Exercise (~1 Hour) Agree on a familiar part of a familiar program for everyone to use (e.g. the Bullets and Numbering command in MS Word). Break into pairs, with one computer per pair. Go through the function testing tasks above and make notes. Photocopy your notes, share with other teams and discuss. This is a coverage strategy. Depending on your process, this may be done as part of developer unit testing or as part of the test team’s work. There are strong reasons to ensure that developers do this well. The XUnit family of open-source tools have become very popular among developers who follow eXtreme Programming and Agile Methods as a way of doing function testing. Function Testing is a good initial test technique to ensure that you catch simple defects. Using this strategy, you can say, “I don’t know if this product is any good, but none of the components is obviously broken.” The weakness is that, by itself, function testing can miss inconsistencies, broken interactions, poor performance, poor user experience, etc.
Glenford J. Myers described equivalence analysis in The Art of Software Testing (1979). It is an essential technique in the arsenal of virtually every professional tester. To quote from RUP: Equivalence partitioning is a technique for reducing the required number of tests. For every operation, you should identify the equivalence classes of the arguments and the object states. An equivalence class is a set of values for which an object is supposed to behave similarly. For example, a Set has three equivalence classes: empty , some element, and full .
Prof. Kaner draws this comparison: Public opinion polls like Gallup apply the method of stratified sampling. Pollsters can call up 2000 people across the US and predict with some accuracy the results of the election. It’s not a random sample. They subdivide the population into equivalence classes. It’s not just people who make lots of money, people who make a fair amount of money, people who don’t make quite as much, and people who really should make a lot more. That’s one dimension, but we also have where people live, what their gender is, what their age is, what their race is, and what kind of car they drive as other variables. But we end up picking somebody who is a point on many different places – this kind of car, that age, and so forth, and we say they represent a bunch of other people who have this kind of car or this kind of income group, and so forth. What you want as a representative -- the best representative from the point of view of pollsters -- is the most typical representative, t he one who would vote the way most of them would vote. They’re dividing the world 3 or 4 or 5 dimensionally, but they still end up with equivalence classes. And then they call up their list of 2000 great representatives and weight them according to how often that subgroup fits into the population and then predict on what these folks say what the whole subgroup would do. They actually take more than one representative from each subgroup just in case. That’s called stratified sampling. You divide your population into different strata, into different layers, and you make sure you sample from each one. We’re doing stratified sampling when we do equivalence class analysis. These strata are just equivalence classes. The core difference between testing and Gallup-poll-type sampling is that, when we pick somebody in this case, we’re not looking for the test case that is most like everybody else, we’re looking for the one most likely to show a failure.
Some of the Key Tasks If you wanted to practice your domain testing skills, here are things that you would practice: Partitioning into equivalence classes Discovering best representatives of the sub-classes Combining tests of several fields Create boundary charts Find fields / variables / environmental conditions Identify constraints (non-independence) in the relationships among variables. Ideas for Exercises Find the biggest / smallest accepted value in a field Find the biggest / smallest value that fits in a field Partition fields Read specifications to determine the actual boundaries Create boundary charts for several variables Create standard domain testing charts for different types of variables For finding variables, see notes on function testing Further reading The classic issue with Equivalence Analysis is combinatorial explosion – you get too many test cases. One technique worth learning for reducing the combinations is All Pairs. See Lessons Learned , pp. 52-58. This is a stratified sampling technique. continued…
Common Tasks in Spec-Driven Testing Review specifications for Ambiguity Adequacy (it covers the issues) Correctness (it describes the program) Content (not a source of design errors) Testability support Create traceability matrices Document management (spec versions, file comparison utilities for comparing two spec versions, etc.) Participate in review meetings Ideas for Mixing Techniques Medical device and software makers provide an interesting example of a mixed strategy involving specification-based testing. The Food and Drug Administration requires that there be tests for every claim made about the product. Those tests are normally documented in full detail, and often automated. However, this is a minimum set, not the level of testing most companies use. Even if the product meets FDA standards, it may be unsafe. The company will therefore run many additional tests, often exploratory. These don’t have to be reported to the FDA unless they expose defects. (In which case, the tests are probably added to the regression test suite.) If you specify what your product is, then you need spec-driven tests. You’d be crazy not to test that the claims you make are true. (And you’d be creating a business problem for your company.) This is not all the testing you do, but it is not testing that you can skip.
Some of the Skills Involved in Spec-Based Testing Understand the level of generality called for when testing a spec item. For example, imagine a field X: We could test a single use of X Or we could partition possible values of X and test boundary values Or we could test X in various scenarios Which is the right one? Ambiguity analysis Richard Bender teaches this well. If you can’t take his course, you can find notes based on his work in Rodney Wilson’s Software RX: Secrets of Engineering Quality Software Another book provides an excellent introduction to the ways in which statements can be ambiguous and provides lots of sample exercises: Cecile Cyrul Spector, Saying One Thing, Meaning Another : Activities for Clarifying Ambiguous Language
The traceability matrix is a useful chart for showing what variables (or functions or specification items) are covered by what tests. The columns can show any type of test item, such as a function, a variable, an assertion in a specification or requirements document, a device that must be tested, any item that must be shown to have been tested. The rows are test cases. The cells show which test case tests which items. If a feature changes, you can quickly see which tests must be reanalyzed, probably rewritten. In general, you can trace back from a given item of interest to the tests that cover it. This doesn’t specify the tests, it merely maps their coverage. The variables might be requirements, documentation claims, contract items, whatever.
Optional Exercise Think about standards or expert documents as sources. Imagine you’re testing a website. Consider the difference between saying. “I can’t navigate…” and saying “This site violates these principles of Jakob Nielsen’s Designing Web Usability …” Generally respected texts or standards may not necessarily be for your project, but they are useful. For example, if you criticize some aspect of the user interface, your criticism might be dismissed as “just your opinion.” But if you make the same criticism and then show that this aspect of the UI doesn’t conform to a published UI design guidelines document for your platform (there are several books available), the criticism will be taken more seriously. Even if the programmers and marketers don’t fix the problem that you identified, they will evaluate your report of the problem as credible and knowledgeable.
No specification??? Companies vary in the ways they develop software. Even companies that follow the Rational Unified Process will adapt RUP to their needs, and they may not do everything that you might expect them to do. Some companies write very concise specifications, or very incomplete ones, or they don’t update their specs as the project evolves. Testers have to know how to deal with the project as it is. Sometimes you will be able to influence the fundamental development style of the project, but often, you will have limited influence. In those cases, you still have to know how to do an effective job of testing. Sources of information for spec-based testing Whatever specs exist Software change memos that come with new builds of the program User manual draft (and previous version’s manual) Product literature Published style guide and UI standards Published standards (such as C-language) 3rd party product compatibility test suites Published regulations Internal memos (e.g. project mgr. to engineers, describing the feature definitions) Marketing presentations, selling the concept of the product to management Bug reports (responses to them) Reverse engineer the program. Interview people, such as development lead, tech writer, customer service, subject matter experts, project manager Look at header files, source code, database table definitions Specs and bug lists for all 3rd party tools that you use Prototypes, and lab notes on the prototypes Interview development staff from the last version. Look at customer call records from the previous version. What bugs were found in the field? Usability test results Beta test results Ziff-Davis SOS CD and other tech support CD’s (These are answerbooks sold to help desks), for bugs in your product and common bugs in your niche or on your platform BugNet magazine / web site for common bugs, and other bug reporting websites. Localization guide (probably one that is published, for localizing products on your platform.) Get lists of compatible equipment and environments from Marketing (in theory, at least.) Look at compatible products, to find their failures (then look for these in your product), how they designed features that you don’t understand, and how they explain their design. See listserv’s, NEWS, BugNet, etc. Exact comparisons with products you emulate Content reference materials (e.g. an atlas to check your on-line geography program)
Here’s an everyday analogy for thinking about risk based testing. Hazard: A dangerous condition (something that could trigger an accident) Risk: Possibility of suffering loss or harm (probability of an accident caused by a given hazard). Accident: A hazard is encountered, resulting in loss or harm. A term that is sometimes used for this is FMEA – Failure Mode Effects Analysis. In FMEA, you start with a list of the ways that a product could fail. These are the failure modes. Next you ask what the effects of the failure could be. Based on that analysis, you decide how to focus your testing and what problems to look for. Many of us who think about testing in terms of risk, analogize testing of software to the testing of theories. Karl Popper, in his famous essay Conjectures and Refutations , lays out the proposition that a scientific theory gains credibility by being subjected to (and passing) harsh tests that are intended to refute the theory. We can gain confidence in a program by testing it harshly. (We gain confidence if it passes our best tests). Subjecting a program to easy tests doesn’t tell us much about what will happen to the program in the field. In risk-based testing, we create harsh tests for vulnerable areas of the program. This is a different notion of risk than the project manager’s view of risk. Project Managers think in terms of what’s the risk that we’ll be over budget, miss the deadlines, etc. Those are real considerations, but are not what we mean here. Here we’re talking about: What kind of defects are likely to be hidden in the software under test and what is their impact? Everyday analogy: Hazard – ice on the sidewalk Risk – someone could fall Accident – someone falls and breaks a hip
Examples of Risk-Based Testing Tasks Identify risk factors (hazards: ways in which the program could go wrong) For each risk factor, create tests that have power against it. Assess coverage of the testing effort program, given a set of risk-based tests. Find holes in the testing effort. Build lists of bug histories, configuration problems, tech support requests and obvious customer confusions. Evaluate a series of tests to determine what risk they are testing for and whether more powerful variants can be created. Here’s one way: Risk-Based Equivalence Class Analysis Our working definition of equivalence: Two test cases are equivalent if you expect the same result from each. This is fundamentally subjective. It depends on what you expect. And what you expect depends on what errors you can anticipate: Two test cases can only be equivalent by reference to a specifiable risk. Two different testers will have different theories about how programs can fail, and therefore they will come up with different classes. A boundary case in this system is a “best representative.” A best representative of an equivalence class is a test that is at least as likely to expose a fault as every other member of the class. Risk-based testing is usually not the first testing technique that you apply. By the time we get to risk-based testing, we’ll have used other techniques (like function testing and spec-based testing). We’ll have plenty of evidence that the software performs as it is supposed to in theory. Confirming it further adds no new information. Risk-based testing should be an important part of what you do, but you need to combine it with systematic, coverage based approaches (function testing, spec testing).
Take-Home Exercises The intent of this list of exercises is to illustrate the thinking that risk-based testers use. You can do these at work, after the course either alone or, preferably, in pairs with another tester. List ways that the program could fail. For each case: Describe two ways to test for that possible failure Explain how to make your tests more powerful against that type of possible failure Explain why your test is powerful against that hazard. Given a list of test cases Identify a hazard that the test case might have power against Explain why this test is powerful against that hazard. Collect or create some test cases for the software under test. Make a variety of tests: Mainstream tests that use the program in “safe” ways Boundary tests Scenario tests Wandering walks through the program If possible, use tests the students have suggested previously. For each test, ask: How will this test find a defect? What kind of defect did the test author probably have in mind? What power does this test have against that kind of defect? Is there a more powerful test? A more powerful suite of tests? These exercises were not intended for in-class use. The setup requires you to have a program under test, that the class knows and is thinking about how to test. They are here for practice when the students go home, to illustrate some of the ways that risk-focused testers do their analyses. If you decide to do the first (and simplest) of the exercises in-class: Before class starts, choose your product and get a list of ways that products like this fail. If you’re teaching at one company, get examples of bugs found in testing (for example, bugs found in previous versions if the current version is in testing) or bugs missed in testing but found by customers. Divide the class into small groups. This works well with pairs or triples. Give each group a list of 3 to 5 ways the program could fail and let them pick 2 of these to analyze.
Here are some more risk heuristics to consider: Tired programmers: long overtime over several weeks or months yields inefficiencies and errors Other staff issues: alcoholic, mother died, two programmers who won’t talk to each other (neither will their code)… Just slipping it in: pet feature not on plan may interact badly with other code. N.I.H.: (Not invented here) external components can cause problems. N.I.B.: (Not in budget) Unbudgeted tasks may be done shoddily. Ambiguity: ambiguous descriptions (in specs or other docs) can lead to incorrect or conflicting implementations. Conflicting requirements : ambiguity often hides conflict, result is loss of value for some person. Unknown requirements : requirements surface throughout development. Failure to meet a legitimate requirement is a failure of quality for that stakeholder. These heuristics are adapted from a course developed by James Bach, and reprinted in Lessons Learned, p. 61-62.
more risk heuristics (continued): Evolving requirements: people realize what they want as the product develops. Adhering to a start-of-the-project requirements list may meet contract but fail product. (check out http//www.agilealliance.org/) Weak testing tools : if tools don’t exist to help identify / isolate a class of error (e.g. wild pointers), the error is more likely to survive to testing and beyond. Unfixability : risk of not being able to fix a bug. Language-typical errors : such as wild pointers in C. See Bruce Webster, Pitfalls of Object-Oriented Development Michael Daconta et al. Java Pitfalls Criticality : severity of failure of very important features. Popularity : likelihood or consequence if much used features fail. Market : severity of failure of key differentiating features. Bad publicity : a bug may appear in PC Week. Liability : being sued.
Prof. Kaner, senior author of Testing Computer Software , says: Too many people start and end with the TCS bug list. It is outdated. It was outdated the day it was published. And it doesn’t cover the issues in your system. Building a bug list is an ongoing process that constantly pays for itself. Here’s an example and further discussion from Hung Nguyen (co-author of Testing Computer Software ): This problem came up in a client/server system. The system sends the client a list of names, to allow verification that a name the client enters is not new. Client 1 and 2 both want to enter a name and client 1 and 2 both use the same new name. Both instances of the name are new relative to their local compare list and therefore, they are accepted, and we now have two instances of the same name. As we see these, we develop a library of issues. The discovery method is exploratory, requires sophistication with the underlying technology. Capture winning themes for testing in charts or in scripts-on-their-way to being automated. There are plenty of sources to check for common failures in the common platforms, such as www.bugnet.com and www.cnet.com
Common Tasks List all areas of the program that could require testing On a scale of 1-5, assign a probability-of-failure estimate to each On a scale of 1-5, assign a severity-of-failure estimate to each For each area, identify the specific ways that the program might fail and assign probability-of-failure and severity-of-failure estimates for those Prioritize based on estimated risk Develop a stop-loss strategy for testing untested or lightly-tested areas, to check whether there is easy-to-find evidence that the areas estimated as low risk are not actually low risk.
Optional Exercise Optional Exercise: Suppose you were testing the Amazon.com Web application. First, break down the functional areas of the application. Try this as a brainstorm, but if the class gets stuck, here are some examples of the functions: Shopping cart Credit card processing Shipping Tracking of shipment history Tracking of customer purchase history Creation and retention of customer search pages Friends and family list Special discounts Search (for books) Used vs new books Advance ordering Publisher and customer reviews Ordering of used books that are not yet in stock Now work through the list. What are some of the ways that each of these could fail? How likely do you think they are to fail? Why? How serious would each of the failure types be? Then collect the ideas and evaluate each area in terms of probability and probable severity of failure.
There are a few different definitions of stress testing. This one is focused on doing things that are so difficult for the program to handle that it will eventually fail. How does it fail? Does the program handle the failure graciously? Is that how and when it should fail? Are there follow-up consequences of this failure? If we kept using the program, what would happen? This is a specialist’s approach. For example, Some security testing experts use this to discover what holes are created in the system when part of the system is taken out of commission. Giving the program extremely large numbers is a form of stress testing. Crashes that result from these failures are often dismissed by programmers, but many break-ins start by exploiting a buffer over-run. For more on this approach, see James Whittaker, How to Break Software (2002). Some people use load testing tools to discover functional weaknesses in the program. Logic errors sometimes surface as the program gets less stable (because of the high load and the odd patterns of data that the program has to deal with during high load.) This is an extreme form of risk-based testing.
This is what hackers do when they pummel your site with denial of service attacks. A good vision for stress testing is that the nastiest and most skilled hacker should be a tester on your team, who uses the technique to find functional problems.
Regression testing refers to the automated testing of the SUT after changes. The name implies that its primary function is to prevent regression, i.e. the reappearance of a defect previously fixed, but in practice, the term is widely used to refer to any test automation that repeats the same tests over and over. Regression testing is most effective when combined with other testing techniques, which we’ll discuss at the end of this module. Where should you use regression testing? Where efficiency of executing the tests time and time again is a primary concern. For example: Build Verification Tests (BVTs or “smoke tests”) are a form of regression testing used to determine whether to accept a build into further testing and are covered in Module 8 of the course. Configuration Tests , where you check that an application functions identically with different operating systems, database servers, web servers, web browsers, etc., are another example where you need highly efficient execution. Pay careful attention to the stability of the interfaces that you use to drive the SUT. Testing through an API is generally a better strategy than testing through the GUI, for two reasons. GUIs change much more frequently than APIs, as usability issues are discovered and improvements are made. It’s usually much easier to achieve high coverage of the underlying program logic by using the API. The majority of the code in any modern system deals with error conditions that may be hard to trigger through the GUI alone. If you have a highly stateful application, you may want to combine tests where you stimulate through the API and observe at the GUI, or vice-versa.
Lessons Learned , Chapter 5, has useful guidance for regression testing. In planning regression testing, be sure that you understand the extent to which you can vary the tests effectively for coverage and track the variance in the test results. Use different sequences (see scenario testing) Apply data for different equivalence class analyses Vary options and program settings, and Vary configurations. Carefully plan the testability of the software under test to match the capabilities of any test tool you apply. Do testing that essentially focuses on similar risks from build to build but not necessarily with the identical test each time. There are a few cases (such as BVTs) where you may want to limit the variation.
With exploratory testing you simultaneously : Learn about the product Learn about the market Learn about the ways the product could fail Learn about the weaknesses of the product Learn about how to test the product Test the product Report the problems Advocate for repairs Develop new tests based on what you have learned so far. Everyone does some exploratory testing. For example, whenever you do follow-up testing to try to narrow the conditions underlying a failure or to try to find a more serious variation of a failure, you are doing exploratory testing. Most people do exploratory testing while they design tests. If you test the program while you design tests, trying out some of your approaches and gathering more detail about the program as you go, you are exploring. If you do testing early in the process – during elaboration or in the first few iterations of implementation – the product is still in an embryonic state. Many artifacts that would be desirable for testing are just not available yet, and so the testers either have to not do the testing (this would be very bad) or learn as they go. Acknowledgement: Many of these slides are derived from material given to us by James Bach (www.satisfice.com) and many of the ideas in these notes were reviewed and extended at the 7 th Los Altos Workshop on Software Testing. We appreciate the assistance of the LAWST 7 attendees: Brian Lawrence, III, Jack Falk, Drew Pritsker, Jim Bampos, Bob Johnson, Doug Hoffman, Cem Kaner, Chris Agruss, Dave Gelperin, Melora Svoboda, Jeff Payne, James Tierney, Hung Nguyen, Harry Robinson, Elisabeth Hendrickson, Noel Nyman, Bret Pettichord, & Rodney Wilson. Every tester does Exploratory Testing on every project, although only some say they do. As soon as you start investigating a bug, you’re doing ET. Exploratory Testing is a great way to determine whether X is an area of the software to worry about. Some programmers are notoriously bad at identifying where the most risky areas are in their own work. NOTE: Some people characterize exploratory testing as random hacking by unskilled people. And some test groups have several unskilled people who do random hacking and call it testing. They don’t do a particularly good job. Exploratory testing involves constant learning and careful thinking about the best things to do next. It is testing with your brain engaged, not with your brain in neutral while your fingers do the walking on the keyboard.
Doing Exploratory Testing Keep your mission clearly in mind. Distinguish between testing and observation. While testing, be aware of the limits of your ability to detect problems. Keep notes that help you report what you did, why you did it, and support your assessment of product quality. Keep track of questions and issues raised in your exploration. Problems to Be Aware Of Habituation may cause you to miss problems. Lack of information may impair exploration. Expensive or difficult product setup may increase the cost of exploring. Exploratory feedback loop my be too slow. Old problems may pop up again and again. High MTBF may not be achievable without well defined test cases and procedures, in addition to exploratory approach. The question is not whether testers should do exploratory testing (that’s like asking whether people should breathe). Instead, we should ask: How systematically should people explore? How visible should exploratory testing practices be in the testing process? How much exploratory training should testers have? How do you tell if someone is a good explorer? Watch the person troubleshoot bugs. Look for curiosity and a willingness to run with it. Look for intuition and a good understanding of the customer.
Beta testing is normally defined as testing by people who are outside of your company. These are often typical members of your market, but they may be selected in other ways. Beta tests have different objectives. It’s important to time and structure your test(s) in ways that help you meet your goals: Expert advice—the expert evaluates the program design. It is important to do this as early as possible, when basic changes are still possible. Configuration testing—the beta tester runs the software on her equipment, and tells you the results. Compatibility testing—the beta tester (possibly the manufacturer of the other software) runs the software in conjunction with other software, to see whether they are compatible. Bug hunting—the beta tester runs the software and reports software errors. Usability testing—the beta tester runs the software and reports difficulties she had using the product. Pre-release acceptance tests—the beta tester runs the product to discover whether it behaves well on her system or network. The goal is convincing the customer that the software is OK, so that she’ll buy it as soon as it ships. News media reviews—some reporters want early software. They are gratified by corporate responsiveness to their suggestions for change. Others expect finished software and are intolerant of pre-release bugs. For more discussion of the diversity of beta tests, see Kaner, Falk & Nguyen, Testing Computer Software, pp. 291-294. User Testing is many more things than beta testing. (We touched on this in our earlier exercise.) The primary element in/goal of user testing is bringing in an expert from the user community to find design flaws.
Prof. Kaner comments: There is a very simple example of something that we did at Electronic Arts. We made many programs that printed in very fancy ways on color printers. We gave you the files to print as part of the beta, you made print outs and wrote on the back of the page what your printer was and what your name was. If you were confused about the settings, when we got your page back, we called you up. We had a large population of people with a large population of strange and expensive printers that we couldn't possibly afford to bring in-house. So we could tell whether it passed or failed, we also did things like sending people parts of the product and a script to walk through and we would be on the phone with them and say what do you see on the screen? We wanted to do video compatibility where they’re across the continent. So you are relying on their eyes to be your eyes. But you’re on the phone, you don't ask them if it looks okay, you ask them what is in this corner? And you structure what you're going to look at If you think you are at risk on configuration you should have some sense of how configurations will show up the configuration failures. Write tests to expose those, get them to your customers, and then find out whether those tests passed or failed by checking directly on these specific tests.
Scenarios are great ways to capture realism in testing. They are much more complex than most other techniques and they focus on end-to-end experiences that users really have.
Scenario tests are expensive . So it’s important to get them right. Realism is important for credibility. Don’t use scenarios to find simple bugs efficiently. Scenario tests are too complex and tied to too many features. Start your testing effort with simpler tests to find the simple defects. If you start with scenario tests, you will be blocked by simple bugs. There’s a risk of missing coverage by relying too heavily on scenario testing alone. A mitigation strategy for that risk is to use a traceability matrix for assessing coverage, as we’ve shown before.
Use Cases may be a good source of test scenarios. Usually, you will want to string several use cases together as a test scenario. Use-Case Contents 1. Brief Description 2. Flow of Events Basic Flow of Events Alternative Flows of Events 3. Special Requirements 4. Pre-Conditions 5. Post-Conditions 6. Extension Points 7. Relationships 8. Candidate Scenarios 9. Use-Case Diagrams 10. Other Diagrams/Enclosures The Flow of Events of a use case contains the most important information derived from use-case modeling work. It should describe the use case's flow of events clearly enough for an outsider to easily understand it. Remember the flow of events should present what the system does, not how the system is designed to perform the required behavior. Use cases are a great source of test scenarios. What is the difference between a use case specification and a test scenario? Good test scenarios are typically broader and string together several granular use cases into an end-to-end experience. In UML jargon, the test use cases include or extend the requirements use cases. Good test scenarios are built to confirm or refute test hypotheses. Examples of the hypotheses would be faults of omission (e.g. unforeseen interactions, incomplete interface contracts), environmental faults, third-party component misbehavior, developer tunnel vision, etc. Good test scenarios tend to rely on much richer data examples than are available with written requirements.
Why develop test scenarios independently of use cases? Some development teams don’t do a thorough job of use case analysis. Certainly, use cases play an important role in RUP. But users of RUP may not adopt all of the RUP recommendations. The testing group has to be prepared to derive test cases from whatever information is available. Even if a development team creates a strong collection of use cases, an analysis from outside of the developers’ design thinking may expose problems that are not obvious from analysis of the use cases. The tester, collecting data for the scenario test, may well rely on different people’s inputs than the development team when it developed use cases. Some ways to trigger thinking about scenarios: Benefits-driven : People want to achieve X. How will they do it, for the following X’s? Sequence-driven : People (or the system) typically does task X in an order. What are the most common orders (sequences) of subtasks in achieving X? Transaction-driven : We are trying to complete a specific transaction, such as opening a bank account or sending a message. What are the steps, data items, outputs , displays etc . ? Get use ideas from competing product : Their docs, advertisements, help, etc., all suggest best or most interesting uses of their products. How would our product do these things? Competitor driven : Hey, look at these cool documents they can create . Look at how they display things ( e.g. Netscape’s superb handling of malformed HTML code). How do we handle this ? Customer’s forms driven : Here are the forms the customer produces. How can we work with (read, fill out, display, verify, whatever) them? What makes a good scenario? You know people do it. You can tell quickly whether it passed. People would do these things as a real sequence, not the first day, but after a few months of experience. You know who cares. There’s a person you can go back to when you discover the failure who will champion the fix.
These are example soap opera scenarios from: Hans Buwalda, Soap Opera Testing , Software Testing Analysis & Review conference, Orlando, FL, May 2000. Pension Fund William starts as a metal worker for Industrial Entropy Incorporated in 1955. During his career he becomes ill, works part time, marries, divorces, marries again, gets 3 children, one of which dies, then his wife dies and he marries again and gets 2 more children…. World Wide Transaction System for an international Bank A fish trade company in Japan makes a payment to a vendor on Iceland. It should have been a payment in Icelandic Kronur, but it was done in Yen instead. The error is discovered after 9 days and the payment is revised and corrected, however, the interest calculation (value dating)… You can skip this slide, if you’re not comfortable with it.
The essence of this technique is that, while the strategy is designed by a human; the individual test cases are generated by machine. Kaner’s Architectures of Test Automation , in your student kit, discusses this in more detail. Noel Nyman of Microsoft coined the term “monkey testing” and has developed some of the best material on this subject. The name was inspired by the teaser: “ If 12 monkeys pound on keyboards at random, how long will it take before they re-create the works of Shakespeare?” Nyman’s description and source code for “Freddy”, a monkey tester used for compatibility testing at Microsoft, can be found in is the appendix to Tom Arnold’s VT 6 Bible. For experience reports, see Noel Nyman, “Using Monkey Test Tools,” Software Test and Quality Engineering Magazine, January/February 2000, available at www.stickyminds.com Harry Robinson, also of Microsoft, has published a few papers on this style of test generation at his site www.model-based-testing.org. In Robinson’s terminology, the “model” is the combinatorial space and set of algorithms used to generate tests. “ Monkey testing should not be your only testing. Monkeys don’t understand your application, and in their ignorance they miss many bugs.” — Noel Nyman, “Using Monkey Test Tools,” STQE, Jan/Feb 2000 This material is just too complex to teach in detail in an introductory course. I suggest that you point to Kaner’s Architectures of Test Automation , which you can find in the student kit.
What do we mean by random and stochastic ? A variable that is random has a value that is basically unpredictable. If you're talking about a set of values that are random then the set are basically unpredictable. If the random value depends upon the sequence, then you're not just dealing with something that is random, you're dealing with something that is randomly changing over time -- that is a stochastic variable. For example, if you go to Las Vegas and play Blackjack how much you will win or lose on a given hand is a random variable, but how much is left in your pocket is a stochastic variable. It depends not just on how much you won or lost this time but rather on what's been going on time after time. The Dow Jones Index is a stochastic variable. How much it changes today is the random variable. In high-volume random testing, where you go next depends on where you are now and the next random variable -- it is a stochastic process. An important theorem is that a stochastic process, that depends only on current position and one random variable to move to the next place, will reach every state that can theoretically be reached in that system, if you run the process for a long enough time. You can prove that over a long enough period you will have 100% state coverage, as long as you can show that the states could ever be reached. This material is just too complex to teach in detail in an introductory course. I suggest that you point to Kaner’s Architectures of Test Automation , which you can find in the student kit.
Earlier in this module, the concept of of complementary techniques was introduced. Now that you have visited the techniques in detail, it’s useful to think about two valuable ways of combining them: Using opposite techniques independently Using complementary techniques together The next two slides cover examples of each.
Regression Testing and Exploratory Testing are perhaps the easiest techniques to contrast. Consider the two as processes with inputs and outputs. The regression tester starts with test cases that he will reuse and the motivations for those test cases. The regression tester executes those tests, discovers some are out of date, some can be stricken, and generates two different types of documents. 1) bug reports and 2) improved tests. The regression tester is focused on creating materials for reuse. The exploratory tester, on the other hand, comes in with whatever information is available, but not with defined test cases. The exploratory tester does testing and makes notes in a private notebook. From those scribbles the exploratory tester also writes bug reports. But the scribbles in the book are not going anywhere outside this book. There’s nothing available for reuse – just the bug reports. Neither technique would be safe as the only approach to testing. Applying them both, however, significantly improves the diversification of your test approach. The Explorer: Isn’t facing old test cases (except to see what not to do) Looks at use cases Builds (throwaway) models Rapidly generates hypotheses Produces personal notes Works very fast An Explorer’s models are transient, unarchived, whiteboard sketches to understand the system. There’s no archival material (other than defect reports). The Explorer’s notes don’t support long-term reuse, except perhaps for cross-training.
Another way of combining techniques is to use one technique to extend another. For example , Regression testing is much more effective when extended with other testing techniques than when used in isolation. Examples of combination include… Equivalence analysis : There are many techniques available for extending test automation with variable data and all regression tools support variable data. If you have done good risk-based equivalence analysis, and can extend function regression testing with good test data, you can achieve the combined benefits of those techniques. Function testing : XP (eXtreme Programming) advocates that developers produce exhaustive automated unit tests that are run after every coding task to facilitate refactoring (changing code). Because the XP test suites are sufficiently comprehensive and are run continuously, they provide immediate feedback of any unforeseen breakage caused by a change. JUnit is a popular open source tool for this. Specification-based testing : An important extension to spec-based testing is the practice of Test-first Design (covered in RUP as a developer practice and also advocated by XP). With Test-first Design, you use tests as a primary form of requirements specification and rerun the tests on every build to provide immediate feedback on any breakage. Scenario testing: Some teams have success automating simple scenarios and interactions. This works when you can easily maintain the tests are are conscientious about discarding tests that no longer add useful information. A good heurisitc is to make sure that test maintenance cost is kept low to avoid blocking any test development.
Principles of Software TestingPart ISatzinger, Jackson, and Burd
Testing Testing is a process of identifying defects Develop test cases and test data A test case is a formal description of• A starting state• One or more events to which the softwaremust respond• The expected response or ending state Test data is a set of starting states andevents used to test a module, group ofmodules, or entire system
Unit Testing The process of testing individualmethods, classes, or components beforethey are integrated with other software Two methods for isolated testing of units Driver• Simulates the behavior of a method thatsends a message to the method beingtested Stub• Simulates the behavior of a method thathas not yet been written
Integration Testing Evaluates the behavior of a group ofmethods or classes Identifies interface compatibility, unexpectedparameter values or state interaction, and run-time exceptions System test Integration test of the behavior of an entiresystem or independent subsystem Build and smoke test System test performed daily or several times aweek
Usability Testing Determines whether a method, class,subsystem, or system meets user requirements Performance test Determines whether a system or subsystem canmeet time-based performance criteria• Response time specifies the desired or maximumallowable time limit for software responses toqueries and updates• Throughput specifies the desired or minimumnumber of queries and transactions that must beprocessed per minute or hour
User Acceptance Testing Determines whether the system fulfills userrequirements Involves the end users Acceptance testing is a very formal activityin most development projects
Who Tests Software? Programmers Unit testing Testing buddies can test other’s programmer’scode Users Usability and acceptance testing Volunteers are frequently used to test betaversions Quality assurance personnel All testing types except unit and acceptance Develop test plans and identify needed changes
Part IIPrinciples of Software Testing forTestersModule 0: About This Course
Course Objectives After completing this course, you will be a moreknowledgeable software tester. You will be able tobetter: Understand and describe the basic concepts offunctional (black box) software testing. Identify a number of test styles and techniques andassess their usefulness in your context. Understand the basic application of techniques used toidentify useful ideas for tests. Help determine the mission and communicate thestatus of your testing with the rest of your project team. Characterize a good bug report, peer-review the reportsof your colleagues, and improve your own report writing. Understand where key testing concepts apply within thecontext of the Rational Unified Process.
Course Outline0 – About This Course1 – Software Engineering Practices2 – Core Concepts of Software Testing3 – The RUP Testing Discipline4 – Define Evaluation Mission5 – Test and Evaluate6 – Analyze Test Failure7 – Achieve Acceptable Mission8 – The RUP Workflow As Context
Principles of Software Testing forTestersModule 1: Software Engineering Practices(Some things Testers should know about them)
Objectives Identify some common softwaredevelopment problems. Identify six software engineering practicesfor addressing common softwaredevelopment problems. Discuss how a software engineeringprocess provides supporting context forsoftware engineering practices.
Symptoms of Software Development Problems User or business needs not met Requirements churn Modules don’t integrate Hard to maintain Late discovery of flaws Poor quality or poor user experience Poor performance under load No coordinated team effort Build-and-release issues
Software Engineering Practices Reinforce Each OtherValidates architecturaldecisions early onAddresses complexity of design/implementation incrementallyMeasures quality early and oftenEvolves baselines incrementallyEnsures users involvedas requirements evolveDevelop IterativelyManage RequirementsUse Component ArchitecturesModel Visually (UML)Continuously Verify QualityManage ChangeSoftware EngineeringPractices
Principles of Software Testing forTestersModule 2: Core Concepts of SoftwareTesting
Objectives Introduce foundation topics of functionaltesting Provide stakeholder-centric visions ofquality and defect Explain test ideas Introduce test matrices
Module 2 Content OutlineDefinitions Defining functional testing Definitions of quality A pragmatic definition of defect Dimensions of quality Test ideas Test idea catalogs Test matrices
Functional Testing In this course, we adopt a common, broadcurrent meaning for functional testing. It is Black box Interested in any externally visible ormeasurable attributes of the software other thanperformance. In functional testing, we think of theprogram as a collection of functions We test it in terms of its inputs and outputs.
How Some Experts Have Defined Quality Fitness for use (Dr. Joseph M. Juran) The totality of features and characteristics of aproduct that bear on its ability to satisfy a givenneed (American Society for Quality) Conformance with requirements (Philip Crosby) The total composite product and servicecharacteristics of marketing, engineering,manufacturing and maintenance through whichthe product and service in use will meetexpectations of the customer (Armand V.Feigenbaum) Note absence of “conforms tospecifications.”
Quality As Satisfiers and Dissatisfiers Joseph Juran distinguishes betweenCustomer Satisfiers and Dissatisfiers askey dimensions of quality: Customer Satisfiers• the right features• adequate instruction Dissatisfiers• unreliable• hard to use• too slow• incompatible with the customer’s equipment
A Working Definition of QualityQuality is value to some person.---- Gerald M. Weinberg
Change Requests and Quality A “defect” – in the eyes of a projectstakeholder– can include anything aboutthe program that causes the program tohave lower value. It’s appropriate to report any aspect of thesoftware that, in your opinion (or in theopinion of a stakeholder whose interestsyou advocate) causes the program to havelower value.
Dimensions of Quality: FURPSReliability e.g., Test the applicationbehaves consistently andpredictably.Performance e.g., Test onlineresponse under averageand peak loadingFunctionality e.g., Test the accurateworkings of eachusage scenarioUsability e.g., Test application fromthe perspective ofconvenience to end-user.Supportability e.g., Test the ability tomaintain and supportapplication underproduction use
A Broader Definition of Dimensions of Quality Accessibility Capability Compatibility Concurrency Conformance tostandards Efficiency Installability anduninstallability Localizability Maintainability Performance Portability Reliability Scalability Security Supportability Testability UsabilityCollectively, these are often called Qualities of Service,Nonfunctional Requirements, Attributes, or simply the -ilities
Test Ideas A test idea is a brief statement thatidentifies a test that might be useful. A test idea differs from a test case, in thatthe test idea contains no specification of thetest workings, only the essence of the ideabehind the test. Test ideas are generators for test cases:potential test cases are derived from a testideas list. A key question for the tester or test analystis which ones are the ones worth trying.
Exercise 2.3: Brainstorm Test Ideas (1/2) We’re about to brainstorm, so let’s review… Ground Rules for Brainstorming The goal is to get lots of ideas. You brainstormtogether to discover categories of possible tests—good ideas that you can refine later. There are more great ideas out there than you think. Don’t criticize others’ contributions. Jokes are OK, and are often valuable. Work later, alone or in a much smaller group, toeliminate redundancy, cut bad ideas, and refine andoptimize the specific tests. Often, these meetings have a facilitator (who runs themeeting) and a recorder (who writes good stuff ontoflipcharts). These two keep their opinions tothemselves.
Exercise 2.3: Brainstorm Test Ideas (2/2) A field can accept integer values between20 and 50. What tests should you try?
A Test Ideas List for Integer-Input Tests Common answers to the exercise would include:Test Why it’s interesting Expected result20 Smallest valid value Accepts it19 Smallest -1 Reject, error msg0 0 is always interesting Reject, error msgBlank Empty field, what’s it do? Reject? Ignore?49 Valid value Accepts it50 Largest valid value Accepts it51 Largest +1 Reject, error msg-1 Negative number Reject, error msg4294967296 2^32, overflow integer? Reject, error msg
Discussion 2.4: Where Do Test Ideas Come From? Where would you derive Test Ideas Lists? Models Specifications Customer complaints Brainstorm sessions among colleagues
A Catalog of Test Ideas for Integer-Input tests Nothing Valid value At LB of value At UB of value At LB of value - 1 At UB of value + 1 Outside of LB of value Outside of UB of value 0 Negative At LB number of digits or chars At UB number of digits or chars Empty field (clear the defaultvalue) Outside of UB number of digitsor chars Non-digits Wrong data type (e.g. decimalinto integer) Expressions Space Non-printing char (e.g.,Ctrl+char) DOS filename reserved chars(e.g., " * . :") Upper ASCII (128-254) Upper case chars Lower case chars Modifiers (e.g., Ctrl, Alt, Shift-Ctrl, etc.) Function key (F2, F3, F4, etc.)
The Test-Ideas Catalog A test-ideas catalog is a list of related testideas that are usable under manycircumstances. For example, the test ideas for numeric inputfields can be catalogued together and used forany numeric input field. In many situations, these catalogs aresufficient test documentation. That is, anexperienced tester can often proceed withtesting directly from these without creatingdocumented test cases.
Apply a Test Ideas Catalog Using a Test MatrixField nameField nameField name
Review: Core Concepts of Software Testing What is Quality? Who are the Stakeholders? What is a Defect? What are Dimensions of Quality? What are Test Ideas? Where are Test Ideas useful? Give some examples of a Test Ideas. Explain how a catalog of Test Ideas couldbe applied to a Test Matrix.
Principles of Software Testing forTestersModule 4: Define Evaluation Mission
So? Purpose of Testing? The typical testing group has two keypriorities. Find the bugs (preferably in priority order). Assess the condition of the whole product(as a user will see it). Sometimes, these conflict The mission of assessment is the underlyingreason for testing, from management’sviewpoint. But if you aren’t hammering hard onthe program, you can miss key risks.
Missions of Test Groups Can Vary Find defects Maximize bug count Block premature product releases Help managers make ship / no-ship decisions Assess quality Minimize technical support costs Conform to regulations Minimize safety-related lawsuit risk Assess conformance to specification Find safe scenarios for use of the product (findways to get it to work, in spite of the bugs) Verify correctness of the product Assure quality
A Different Take on Mission: Public vs. Private Bugs A programmer’s public bug rate includes allbugs left in the code at check-in. A programmer’s private bug rate includesall the bugs that are produced, including theones fixed before check-in. Estimates of private bug rates have rangedfrom 15 to 150 bugs per 100 statements. What does this tell us about our task?
Defining the Test Approach The test approach (or “testing strategy”)specifies the techniques that will be used toaccomplish the test mission. The test approach also specifies how thetechniques will be used. A good test approach is: Diversified Risk-focused Product-specific Practical Defensible
Heuristics for Evaluating Testing Approach James Bach collected a series of heuristicsfor evaluating your test approach. Forexample, he says: Testing should be optimized to find importantproblems fast, rather than attempting to find allproblems with equal urgency. Please note that these are heuristics – theywon’t always the best choice for yourcontext. But in different contexts, you’ll finddifferent ones very useful.
What Test Documentation Should You Use? Test planning standards and templates Examples Some benefits and costs of using IEEE-829standard based templates When are these appropriate? Thinking about your requirements for testdocumentation Requirements considerations Questions to elicit information about testdocumentation requirements for your project
Write a Purpose Statement for Test Documentation Try to describe your core documentationrequirements in one sentence that doesn’thave more than three components. Examples: The test documentation set will primarilysupport our efforts to find bugs in this version,to delegate work, and to track status. The test documentation set will support ongoingproduct and test maintenance over at least 10years, will provide training material for newgroup members, and will create archivessuitable for regulatory or litigation use.
Review: Define Evaluation Mission What is a Test Mission? What is your Test Mission? What makes a good Test Approach (TestStrategy)? What is a Test Documentation Mission? What is your Test Documentation Goal?
Principles of Software Testing forTestersModule 5: Test & Evaluate
Test and Evaluate – Part One: Test In this module, we drill intoTest and Evaluate This addresses the “How?”question: How will you test thosethings?
Test and Evaluate – Part One: Test This module focuseson the activityImplement Test Earlier, we coveredTest-Idea Lists, whichare input here In the next module,we’ll cover AnalyzeTest Failures, thesecond half of Testand Evaluate
Review: Defining the Test Approach In Module 4, we covered Test Approach A good test approach is:DiversifiedRisk-focusedProduct-specificPracticalDefensible The techniques you apply should followyour test approach
Discussion Exercise 5.1: Test Techniques There are as many as 200 published testingtechniques. Many of the ideas areoverlapping, but there are common themes. Similar sounding terms often mean differentthings, e.g.: User testing Usability testing User interface testing What are the differences among thesetechniques?
Dimensions of Test Techniques Think of the testing you do in terms of fivedimensions: Testers: who does the testing. Coverage: what gets tested. Potential problems: why youre testing (whatrisk youre testing for). Activities: how you test. Evaluation: how to tell whether the test passedor failed. Test techniques often focus on one or twoof these, leaving the rest to the skill andimagination of the tester.
Test Techniques—Dominant Test Approaches Of the 200+ published Functional Testingtechniques, there are ten basic themes. They capture the techniques in actual practice. In this course, we call them: Function testing Equivalence analysis Specification-based testing Risk-based testing Stress testing Regression testing Exploratory testing User testing Scenario testing Stochastic or Random testing
“So Which Technique Is the Best?”TestersCoveragePotential problemsActivitiesEvaluationTechnique ATechnique BTechnique CTechnique ETechnique FTechnique GTechnique H Each hasstrengths andweaknesses Think interms ofcomplement There is no“one true way” Mixingtechniquescan improvecoverageTechnique D
InceptionInception ElaborationElaboration ConstructionConstruction TransitionTransitionApply Techniques According to the LifeCycle Test Approach changes over the project Some techniques work well in early phases;others in later ones Align the techniques to iteration objectivesA limited set of focused tests Many varied testsA few components of software under test Large system under testSimple test environment Complex test environmentFocus on architectural & requirement risks Focus on deployment risks
Module 5 Agenda Overview of the workflow: Test and Evaluate Defining test techniques Individual techniques Function testing Equivalence analysis Specification-based testing Risk-based testing Stress testing Regression testing Exploratory testing User testing Scenario testing Stochastic or Random testing Using techniques together
At a Glance: Function TestingTag line Black box unit testingObjective Test each function thoroughly, one at atime.Testers AnyCoverage Each function and user-visible variablePotential problems A function does not work in isolationActivities Whatever worksEvaluation Whatever worksComplexity SimpleHarshness VariesSUT readiness Any stage
Strengths & Weaknesses: Function Testing Representative cases Spreadsheet, test each item in isolation. Database, test each report in isolation Strengths Thorough analysis of each item tested Easy to do as each function is implemented Blind spots Misses interactions Misses exploration of the benefits offered by theprogram.
At a Glance: Equivalence Analysis (1/2)Tag line Partitioning, boundary analysis, domaintestingObjectiveThere are too many test cases to run.Use stratified sampling strategy toselect a few test cases from a hugepopulation.Testers AnyCoverageAll data fields, and simple combinationsof data fields. Data fields include input,output, and (to the extent they can bemade visible to the tester) internal andconfiguration variablesPotential problems Data, configuration, error handling
At a Glance: Equivalence Analysis (2/2)ActivitiesDivide the set of possible values of a field intosubsets, pick values to represent each subset.Typical values will be at boundaries. Moregenerally, the goal is to find a “bestrepresentative” for each subset, and to runtests with these representatives.Advanced approach: combine tests of several“best representatives”. Several approaches tochoosing optimal small set of combinations.Evaluation Determined by the dataComplexity SimpleHarshnessDesigned to discover harsh single-variabletests and harsh combinations of a fewvariablesSUT readiness Any stage
Strengths & Weaknesses: Equivalence Analysis Representative cases Equivalence analysis of a simple numeric field. Printer compatibility testing (multidimensional variable,doesn’t map to a simple numeric field, but stratifiedsampling is essential) Strengths Find highest probability errors with a relatively small setof tests. Intuitively clear approach, generalizes well Blind spots Errors that are not at boundaries or in obvious specialcases. The actual sets of possible values are oftenunknowable.
Optional Exercise 5.2: GUI Equivalence Analysis Pick an app that you know and some dialogs MS Word and its Print, Page setup, Font format dialogs Select a dialog Identify each field, and for each field• What is the type of the field (integer, real, string, ...)?• List the range of entries that are “valid” for the field• Partition the field and identify boundary conditions• List the entries that are almost too extreme and tooextreme for the field• List a few test cases for the field and explain why thevalues you chose are the most powerfulrepresentatives of their sets (for showing a bug)• Identify any constraints imposed on this field by otherfields
At a Glance: Specification-Based TestingTag line Verify every claimObjective Check conformance with every statement inevery spec, requirements document, etc.Testers AnyCoverage Documented reqts, features, etc.PotentialproblemsMismatch of implementation to specActivities Write & execute tests based on the spec’s.Review and manage docs & traceabilityEvaluation Does behavior match the spec?Complexity Depends on the specHarshness Depends on the specSUT readiness As soon as modules are available
Strengths & Weaknesses: Spec-Based Testing Representative cases Traceability matrix, tracks test cases associated witheach specification item. User documentation testing Strengths Critical defense against warranty claims, fraud charges,loss of credibility with customers. Effective for managing scope / expectations ofregulatory-driven testing Reduces support costs / customer complaints byensuring that no false or misleading representations aremade to customers. Blind spots Any issues not in the specs or treated badly in thespecs /documentation.
Traceability Tool for Specification-Based TestingStmt 1 Stmt 2 Stmt 3 Stmt 4 Stmt 5Test 1 X X XTest 2 X XTest 3 X X XTest 4 X XTest 5 X XTest 6 X XThe Traceability Matrix
Optional Exercise 5.5: What “Specs” Can You Use? Challenge: Getting information in the absence of a spec What substitutes are available? Example: The user manual – think of this as a commercialwarranty for what your product does. What other “specs” can you/should you beusing to test?
Exercise 5.5—Specification-Based Testing Here are some ideas for sources that youcan consult when specifications areincomplete or incorrect. Software change memos that come with newbuilds of the program User manual draft (and previous version’smanual) Product literature Published style guide and UI standards
Definitions—Risk-Based Testing Three key meanings:1. Find errors (risk-based approach to the technicaltasks of testing)2. Manage the process of finding errors (risk-basedtest management)3. Manage the testing project and the risk posed by(and to) testing in its relationship to the overallproject (risk-based project management) We’ll look primarily at risk-based testing (#1),proceeding later to risk-based test management. The project management risks are veryimportant, but out of scope for this class.
At a Glance: Risk-Based TestingTag line Find big bugs firstObjective Define, prioritize, refine tests in terms ofthe relative risk of issues we could test forTesters AnyCoverage By identified riskPotential problems Identifiable risksActivities Use qualities of service, risk heuristics andbug patterns to identify risksEvaluation VariesComplexity AnyHarshness HarshSUT readiness Any stage
Strengths & Weaknesses: Risk-Based Testing Representative cases Equivalence class analysis, reformulated. Test in order of frequency of use. Stress tests, error handling tests, security tests. Sample from predicted-bugs list. Strengths Optimal prioritization (if we get the risk list right) High power tests Blind spots Risks not identified or that are surprisingly more likely. Some “risk-driven” testers seem to operate subjectively.• How will I know what coverage I’ve reached?• Do I know that I haven’t missed something critical?
Optional Exercise 5.6: Risk-Based Testing You are testing Amazon.com(Or pick another familiar application) First brainstorm: What are the functional areas of the app? Then evaluate risks:• What are some of the ways that each of thesecould fail?• How likely do you think they are to fail? Why?• How serious would each of the failure types be?
At a Glance: Stress TestingTag line Overwhelm the productObjectiveLearn what failure at extremes tellsabout changes needed in theprogram’s handling of normal casesTesters SpecialistsCoverage LimitedPotential problems Error handling weaknessesActivities SpecializedEvaluation VariesComplexity VariesHarshness ExtremeSUT readiness Late stage
Strengths & Weaknesses: Stress Testing Representative cases Buffer overflow bugs High volumes of data, device connections, longtransaction chains Low memory conditions, device failures, viruses, othercrises Extreme load Strengths Expose weaknesses that will arise in the field. Expose security risks. Blind spots Weaknesses that are not made more visible by stress.
At a Glance: Regression TestingTag line Automated testing after changesObjective Detect unforeseen consequences of changeTesters VariesCoverage VariesPotentialproblemsSide effects of changesUnsuccessful bug fixesActivities Create automated test suites and run againstevery (major) buildComplexity VariesEvaluation VariesHarshness VariesSUT readiness For unit – early; for GUI - late
Strengths & Weaknesses—Regression Testing Representative cases Bug regression, old fix regression, general functionalregression Automated GUI regression test suites Strengths Cheap to execute Configuration testing Regulator friendly Blind spots “Immunization curve” Anything not covered in the regression suite Cost of maintaining the regression suite
At a Glance: Exploratory TestingTag line Simultaneous learning, planning, andtestingObjectiveSimultaneously learn about theproduct and about the test strategiesto reveal the product and its defectsTesters ExplorersCoverage Hard to assessPotential problems Everything unforeseen by plannedtesting techniquesActivities Learn, plan, and test at the same timeEvaluation VariesComplexity VariesHarshness VariesSUT readiness Medium to late: use cases must work
Strengths & Weaknesses: Exploratory Testing Representative cases Skilled exploratory testing of the full product Rapid testing & emergency testing (including thrown-over-the-wall test-it-today) Troubleshooting / follow-up testing of defects. Strengths Customer-focused, risk-focused Responsive to changing circumstances Finds bugs that are otherwise missed Blind spots The less we know, the more we risk missing. Limited by each tester’s weaknesses (can mitigate thiswith careful management) This is skilled work, juniors aren’t very good at it.
At a Glance: User TestingTag line Strive for realismLet’s try real humans (for a change)Objective Identify failures in the overallhuman/machine/software system.Testers UsersCoverage Very hard to measurePotential problems Items that will be missed by anyoneother than an actual userActivities Directed by userEvaluation User’s assessment, with guidanceComplexity VariesHarshness LimitedSUT readiness Late; has to be fully operable
Strengths & Weaknesses—User Testing Representative cases Beta testing In-house lab using a stratified sample of target market Usability testing Strengths Expose design issues Find areas with high error rates Can be monitored with flight recorders Can use in-house tests focus on controversial areas Blind spots Coverage not assured Weak test cases Beta test technical results are mixed Must distinguish marketing betas from technical betas
At a Glance: Scenario TestingTag line Instantiation of a use caseDo something useful, interesting, and complexObjective Challenging cases to reflect real useTesters AnyCoverage Whatever stories touchPotentialproblemsComplex interactions that happen in real useby experienced usersActivities Interview stakeholders & write screenplays,then implement testsEvaluation AnyComplexity HighHarshness VariesSUT readiness Late. Requires stable, integrated functionality.
Strengths & Weaknesses: Scenario Testing Representative cases Use cases, or sequences involving combinations of usecases. Appraise product against business rules, customer data,competitors’ output Hans Buwalda’s “soap opera testing.” Strengths Complex, realistic events. Can handle (help with)situations that are too complex to model. Exposes failures that occur (develop) over time Blind spots Single function failures can make this test inefficient. Must think carefully to achieve good coverage.
At a Glance: Stochastic or Random Testing (1/2)Tag lineMonkey testingHigh-volume testing with new cases allthe timeObjectiveHave the computer create, execute,and evaluate huge numbers of tests.The individual tests are not all thatpowerful, nor all that compelling.The power of the approach lies inthe large number of tests.These broaden the sample, andthey may test the program over along period of time, giving us insightinto longer term issues.
At a Glance: Stochastic or Random Testing (2/2)Testers MachinesCoverage Broad but shallow. Problems withstateful apps.Potential problems Crashes and exceptionsActivities Focus on test generationEvaluation Generic, state-basedComplexity Complex to generate, but individualtests are simpleHarshness Weak individual tests, but hugenumbers of themSUT readiness Any
Combining Techniques (Revisited) A test approach should be diversified Applying opposite techniques can improvecoverage Often one technique canextend anotherTestersCoveragePotential problemsActivitiesEvaluationTechnique GTechnique ATechnique BTechnique CTechnique ETechnique FTechnique HTechnique D
Applying Opposite Techniques to Boost CoverageRegression Inputs:• Old test cases andanalyses leading to newtest cases Outputs:• Archival test cases,preferably welldocumented, and bugreports Better for:• Reuse across multi-version productsExploration Inputs:• models or other analysesthat yield new tests Outputs• scribbles and bug reports Better for:• Find new bugs, scoutnew areas, risks, or ideasContrast these two techniquesExploration Regression
Applying Complementary Techniques Together Regression testing alone suffers fatigue The bugs get fixed and new runs add little info Symptom of weak coverage Combine automation w/ suitable variance E.g. Risk-based equivalence analysis Coverage of the combinationcan beat sum of the parts EquivalenceRisk-basedRegression
How To Adopt New Techniques1. Answer these questions: What techniques do you use in your test approachnow? What is its greatest shortcoming? What one technique could you add to make thegreatest improvement, consistent with a good testapproach:• Risk-focused?• Product-specific?• Practical?• Defensible?2. Apply that additional technique until proficient3. Iterate