Testing a &quot;big-bang&quot; integration is about as hopeless as doing one. ADTs or object classes are ideal subsystems to accumulate. Input sequences throw the software into various &quot;modes&quot; in which its behavior may differ radically. Interface checking is difficult and requires detailed knowledge of the system; it isn't usually attempted.
It might as well be done by third parties -- understanding has been left with developers (who may be gone). Fault-based tests are like looking for a needle in a haystack. Coverage measurement may be impossible; only the simplest (statement coverage?) is feasible. The problem is only partly tool limitations: people don't have the analysis time.
The UNIX `profile' command does module coverage. The current buzz word for &quot;mode-covering sequence&quot; is &quot;use case.&quot; Using an environment simulation coupled to the system under test is dangerous, because they may &quot;accommodate&quot; each other in a way that the real world will not do for the system. Test scripts (&quot;automated testing&quot;) provides the bookkeeping.
Special values testing is perhaps the best at failure finding; little use in gaining confidence. The user weighting may be different for each user!
A practical histogram may have about 100 input classes.
In practice, nothing like a real profile is ever available. But the concept of a &quot;usage spike&quot; explains why failures go unobserved during test, then appear magically after release.
Users can supply rough histogram information, which may be inaccurate, and may differ for different users. The &quot;novice&quot; vs. &quot;expert&quot; user distinction is particularly important, and their different profiles (and the latter being used for test) explains why novices so easily `break' software.
&quot;Systematic&quot; in measurement means errors that do not cancel on average; random errors do cancel. Pseudorandom number generators are often really lousy, particularly when a few bits are extracted (e.g., to model a coin toss). Don't just use the one in your programming language library if you care. What, for example, would constitute a &quot;random C program&quot;?
The example is unrealistic in many ways: There is a user profile. Input is numeric. There is an effective oracle. With this test, the confidence in a MTTF of 100000 runs is 1%; the confidence in a MTTF of 1000 runs is 63%.
The work on comparison of systematic and random testing is almost all theoretical. How could an experiment be done? The stopping rule is simply that the reliability goal has been reached with high enough confidence.
The regression testing problem is much easier in principle than the general testing problem, if it assumed that the existing testset is adequate. It usually isn't. A regression testset (from the existing one) is called safe if all the tests whose outcome might differ are found. Finding new tests could use any of the usual methods.
Dependency analysis has other uses that have provided some of the technology: optimization, parallelization, etc. Dynamic methods are more precise than static, but also more expensive.
Example: Buying a coverage analyzer. The closest one gets to a specification may be a user's manual. Examples of standards with acceptance tests: protocols; the POSIX standard.
Object libraries are one of the main features of O-O languages, and have given hope for actual reuse. COTS is still little more than a buzzword. (&quot;Reuse&quot; is another.) Specification seems to require some kind of formal language. Component quality is mostly process-defined. More about quality (in terms of reliability) later.