Week6 testing-intro

COMP23420 Sem 2 week 6
Software testing concepts
John Sargeant
johns@cs.man.ac.uk

REMINDER: PLEASE ENSURE THAT YOUR PHONE
IS SWITCHED OFF

C om b ining th e s tre ngth s of U M IS T and
Th e Victoria U nive rs ity o f M anch e s te r

Overview

• Testing strategy
• Practical issues
• Safety critical systems
• Basic testing techniques
• Kinds of testing


This software has bugs in…
Embury’s law: “This software has bugs in, we just don’t
know what they are yet”.
Applies to any SW system of any size, e.g.
• Eurofighter fuel control system 80K lines
• Modern airliner ~10M lines
• ABC ~150K (our code) + ~300K (Java library code)

Also, unlike e.g. rostering, most real systems are
concurrent and reactive not simple functions from
inputs to outputs.


Testing to fail
The fundamental rule of testing: a successful test is one
which causes the software to fail.
Difficult for programmers to do with their own code – not
the same as debugging.
• Traditional solution: have a separate testing team
who are as nasty to the software as possible (see
Software testing, Ron Patton, Sams 2006 for a guide to this approach).

• Agile solution – write the tests before the code – also
helps to clarify requirements.


How many bugs?
Traditional estimate is 3-5 per hundred lines of C code.
That’s 30,000 – 50,000 for a million line program!

• Of those ~90% will probably be found by routine
debugging
• And ~90%of the rest by sensible testing
• And ~90% of the rest by really rigorous testing

But that still leaves 30-50, and getting to 99.9% is very
expensive.

Factors affecting bug density
• Design – good designs lead to fewer bugs
• Type of application: concurrent, reactive systems are
much more difficult than sequential transformation
systems.
• Programming language: Java < C < Perl
• Programmer competence and experience
• In general expect 1-10 bugs per hundred lines
• (Probably a lot less for pair programming, but at a
factor of 2 cost).


When you can’t afford 30-50 bugs
Major issue for safety critical systems, e.g. fly-by-wire.
• Conventional aircraft: pilot input directly controls flight
surfaces (via hydraulics in large aircraft).
• FBW aircraft: a computer interprets the pilot’s inputs,
and relays these electronically to the flight surfaces
• First used in the F16, 1974. Allows military aircraft to
be inherently unstable, hence more manoeuvrable
(also helps stealth).
• First civilian application Airbus A320, early 1980s.
Provides protection and reduces pilot workoad.

Triple redundancy (1)
• Airbus claimed that the A320 FBW software was
designed to fail no more than once in 109 flight hours.
How could they possible claim that?
• Once in 104.5 (33K) hours might be plausible – but
nowhere near enough.
• But add a second computer, with different software
written by a different team. Now (in theory) both will
fail at the same time once in 109 hours.
• But if there is a discrepancy you don’t know which
one’s wrong – so you need a third computer and take
a majority vote: triple redundancy.


Triple redundancy (2)
Other advantages:
• Provides graceful degradation: don’t have to revert to
manual control immediately with one computer out.
• Gives continuous testing for free – each discrepancy
reveals a bug! So eventually the system should be
extremely reliable – recent safety record of the A320
is outstanding.
• Redundancy is essential to providing reasonable
levels of safety in complex safety-critical systems.
• Note: the actual A320 system has a lot more
redundancy than described above

Quiz(1)

Suggest at least three reasons why the theoretical sum
104.5 x 104.5 = 109 may not reflect what happens in
practice. Hints:
• Is there still a single point of failure in the system?
• Remember: not all bugs are in the actual software
• When is it true that P(A and B) = P(A) x P(B)?


Exhaustive testing is impossible
public static double divide(double a, double b){
return a / b;
}
A Java double is 64 bits so there are 2128 possibilities –
intractable.
Similarly a reactive system such as FBW has a huge
number of possible (state, input, time) combinations.
So we have to find a large number of bugs within a
huge search space – we have to focus effort on the
most “interesting” parts of that space.


Testing as a search problem
• Equivalence partitioning: split up the space into areas
where similar tests are likely to lead to similar results
• e.g. if 2/3 works then 3/5 probably works too (but not
necessarily 3/3
• Boundary value analysis: concentrate on boundaries
between different parts of the space.
• E.g. b == 0, b very close to 0, a and/or b close to
MAX_DOUBLE etc.


Black box or white box?
Black box: testing the software against its spec without
access to the code:
• Means tests will be written without preconceptions
about how the code works.
• If the code is changed, the same tests are still valid.
White box: testing with access to the code:
• Allows more tests to be done
• Allows tester to apply pressure to those places which
look most likely to break.


Quiz(2)

2. You are testing an algorithm which sorts strings into
alphanumeric order for a dictionary program.
Suggest some of the most important tests you’ll
need to do.
3. You are asked to thoroughly test floating point
division software which will be burnt into a processor
chip. Would you go about this primarily though black
box or white box testing, and why?


Kinds of testing
• Unit testing – test one unit (in OO one class) at a
time.
• Integration testing – test that the components of a
system (or subsystem) work together correctly
• Regression/smoke testing – check that you haven’t
broken it.
• System testing – test that the system works in the
context in which it will be required to work.
• Alpha and beta testing – test with real users
• Acceptance testing – get the customer to come up
with the dosh.

Unit testing
• Testing one unit – class - at a time.
• Relatively simple, but the class you’re testing will
usually rely on other classes.
• In general almost all software relies on other software
(e.g. Java library classes).
• The search space is generally well defined so
techniques like EP ad BVA are most useful here.
• Often possible to be systematic and reasonably
confident that a single class is bug-free.
• In Java, often done with the JUnit testing framework.

Integration testing
• Testing that the components of a system work
together.
• Harder to define than unit testing; shape of testing
space is less obvious.
• Concentrate on important mission-critical features
• Check that the use cases can be performed without
problems.
• Don’t get upset when your code causes a problem;
don’t get annoyed when somebody else’s does.


Regression/smoke testing

• Regression testing: repeat the tests you did before, to
make sure you haven’t broken anything.
• Especially important after significant changes but the
more often the better.
• Smoke testing: repeat the most critical tests as often
as possible – check that it’s not going up in smoke.
• Integration and Regression/smoke testing are often
done in the form of a daily (or nightly) “build” of the
system – requires that at least some tests are
automated.

System testing
• Testing of the system in the context(s) in which it will
operate.
• This will generally be a lot more varied than the
context in which it was developed.
• May involve different hardware, operating systems,
performance issues etc.
• Need to check the documentation and procedures as
well as the code.
• Many systems which (seem to) work perfectly in a
development environment fail in a customer
environment.

Alpha and beta testing
• System testing with real users
• Important because they don’t use the software the
way you assume.
• Alpha testing: a small group of users, done with SW
developers present.
• Beta testing: a wider group, remote from the
development team, asked to submit bug reports.
• Better not to start Beta testing until the software will
work for most of the users most of the time!


Acceptance testing
Where a SW product has a small number of large
customers (e.g. Campus Solutions) the customer(s)
may specify a set of tests which the SW must pass
before they will accept it and pay the dosh. There are
some serious issues with this:
• In general, users don’t really know in advance what
they want (the “waterfall fallacy”).
• Who within the customer organisation defines the
spec? e.g. Managers and end users will have
different views.
• Fixating on passing the acceptance test could result
in serious problems being missed.

Week6 testing-intro

Recommended

Recommended

More Related Content

Viewers also liked

Viewers also liked (8)

Similar to Week6 testing-intro

Similar to Week6 testing-intro (20)

Recently uploaded

Recently uploaded (20)

Week6 testing-intro