2.
TesT-Driven DevelopmenT the desired functionality. Writing tests first forces you to
think clearly about your program’s desired behavior.
In test-driven development, then, test writing is an
I n Microsoft researcher Simon Peyton Jones’ presentation
about writing research papers,1 he provocatively reverses
the commonsense notion that you should perform the
integral part of the code construction process rather than
an activity performed later and intended solely to find bugs.
Test-driven development can take some getting used to, but
research first and then write the paper. He proposes in-
it can also significantly improve code quality and maintain-
stead that you write the paper first—after all, writing forces
ability. Once you become familiar with the basic mechanics
you to think clearly about your ideas and also clarifies the
of writing and running tests, you might want to give it a try.
research needed to support them.
Many software developers similarly reverse the pattern references
of coding and then testing. In test-driven development, 2 1. S.P. Jones, “How to Write a Great Research Paper,” Microsoft
developers first write the tests. Naturally, the tests fail un- Research, 2004; http://research.microsoft.com/en-us/um/people/
til the necessary code is written. This process is repeated simonpj/papers/giving-a-talk/writing-a-paper-slides.pdf.
in small, incremental steps until they’ve implemented all 2. K. Beck, Test Driven Development: By Example, Addison-Wesley, 2002.
test files
As lines 1-2 of Figure 1 show, the first function in 1 function suite = test_upslope_area
initTestSuite;
a test file always has the same form. Line 1 shows 2
the function name, which begins with the prefix 3
test, and the output argument name, which is al- 4 function test_normalCase
5 E = [ ...
ways suite. The next line calls the helper script 6 2 2 2 2 2 3 NaN
initTestSuite, which examines the rest of the 7 2 1 2 2 2 3 NaN
test file, identifies all test cases in it, and bundles 8 2 1 2 2 2 3 NaN
9
them up into a test suite, which is returned as the 2 2 2 2 2 3 NaN];
output argument. Other functions in the file that 10
11 % Expected result hand-computed.
begin with test are individual test cases. Their 12 exp_A = [ ...
names make it easy for the test code reader to un- 13 1, 1, 31/9, 8/3, 2, 1, 0
derstand the test case’s purpose. 14 1, 12, 41/9, 10/3, 2, 1, 0
15 1, 12, 41/9, 10/3, 2, 1, 0
16 1, 1, 31/9, 8/3, 2, 1, 0];
test Cases
17
Test-case functions follow a simple pattern
18 R = dem_flow(E);
(see Figure 1, lines 4–22). First, you initial- 19 T = flow_matrix(E, R);
ize a set of input values to pass to the func- 20 A = upslope_area(E, T);
tion you’re testing. Next, you call the function, 21
capturing the output results. Finally, you 22 assertElementsAlmostEqual(A, exp_A);
use an assertion utility function—such as 23
24 function test_emptyInput
assertElementsAlmostEqual on line 22—to
25 E = [];
state what you expect to be true about the re- 26 R = dem_flow(E);
sults. If A equals B within some tolerance, then 27 T = flow_matrix(E, R);
assertElementsAlmostEqual(A, B) returns 28 A = upslope_area(E, T);
29
immediately (and quietly) and the test case pass- 30 assertEqual(A, []);
es. Otherwise, it throws an exception; the test 31
framework then catches the exception and noti- 32 function test_constantInput
fies you that the test case has failed. 33 E = ones(5, 10);
34 R = dem_flow(E);
35 T = flow_matrix(E, R);
running tests 36 A = upslope_area(E, T);
To make test running easy, Matlab xUnit pro- 37
vides a main driver function, runtests. This 38 assertEqual(A, ones(5, 10));
function finds all the test cases in all the test files
in your current working directory, gathers them Figure 1. Sample test file. The top function,
into a test suite, runs the test suite, and displays test_upslope_area, initializes the test suite.
The functions test_normalCase, test_
the results. Figure 2 shows the output of run emptyInput, and test_constantInput each
tests for Matlab xUnit’s own test suite. As the form one test case. (Used with permission from The
figure shows, runtests displays the number of MathWorks, Inc. All rights reserved.)
november/deCember 2009 49
3.
and determine the expected outputs. For an ex-
ample, see Figure 1 lines 4–22. Next, consider
inputs at and near the end of the allowed data
range (that is, conduct a boundary analysis 9). These
test cases help identify off-by-one errors, such as
writing N instead of N1 or vice versa.
With more experience, you can gain insight
into coding-error patterns and add test cases
that are likely to expose such errors. Experienced
Figure 2. Running tests. The function runtests Matlab quality engineers, for example, routinely
automatically runs all tests and determines whether
they pass or fail. (Used with permission from The
write test cases with empty input matrices or NaN
MathWorks, Inc. All rights reserved.) and Inf values. From my own experience with
image processing software, I anticipate possible
coding errors for constant-valued images or those
that have a single row or column. Lines 24–38 of
Figure 1 show a couple of error-guessing9 test
cases for the upslope_area function.
Logic test Cases
In addition to considering input data variations,
you should also think about the various logic paths
through your code. Structured basis testing9 is a
straightforward way to count the minimum num-
ber of test cases you need to fully exercise a par-
ticular function’s logic. You start by counting one
for the straight-line path through the code; that
is, for the path without loops or branches. You
then count one for each appearance of the follow-
ing Matlab keywords and operators: if, elseif,
Figure 3. A failing test. When a test case fails, while, for, case, otherwise, try, &&, and ||.
runtests shows the failing test case’s location Figure 4 shows a code fragment of the run
and name and provides clickable links that open the method of the Matlab xUnit TestSuite class.
code in the editor at the point of failure. (Used with
permission from The MathWorks, Inc. All rights
Minimally exercising this code’s logic paths re-
reserved.) quires at least four test cases corresponding to the
highlighted code fragments. The test cases might
include
test cases found, how long the test suite ran, and
whether the suite passed or failed. • two input arguments (nargin == 2) and no test
When test cases fail, runtests displays components (numel(self.testComponents)
additional information about each failure (see == 0), a combination that tests the straight-line
Figure 3). The display includes the name and lo- path;
cation of each failing test case, as well as clickable • one input argument (nargin == 1);
links that load the code directly into the Matlab • multiple test components that all pass
editor at the point of failure. (numel(self.testComponents) > 0); and
• some failing and some passing test com-
Constructing test Cases ponents.
Gathering test cases for your programs’ functions
and methods is something of an art. You want to There are more sophisticated techniques for
find a minimal set of cases that covers the mean- selecting test cases,9 but this straightforward
ingful variations in input data and that also covers method will get you started and ensure basic test
the code logic. coverage of all the code paths.
Data test Cases Determining the expected Values
To gather your test cases, start with the nominal Determining expected values can be difficult,
case—that is, choose a typical set of input values especially for complex scientific and engineering
50 Computing in SCienCe & engineering
4.
research simulations. Full verification and valida- 1 function did_pass = run(self, monitor)
tion requires that we consider factors such as the 2 if nargin < 2
3 monitor = CommandWindowTestRunDisplay();
appropriateness of physical models and approxi-
4 end
mations, error estimates, and numerical solution 5
stability.7,8 6 monitor.testComponentStarted(self);
As a practical matter, to deal with testing 7 did_pass = true;
8 for k = 1:numel(self.TestComponents)
complex models, I recommend that you start 9 this_component_passed = ...
by focusing on the “unit” in “unit test.” Even 10 self.TestComponents{k}.run(monitor);
complex simulations comprise simpler compu- 11 did_pass = did_pass && this_component_passed;
tational units that you can individually verify. 12 end
13
Verifying individual computational units’ be- 14 monitor.testComponentFinished(self, did_pass);
havior is a helpful step toward verifying overall
system behavior. Focusing on the testability of Figure 4. Structured basis testing. We count one each for the straight-
small computational units also encourages the line path, the if statement on line 2, the for loop on line 8, and the
development of modular, more maintainable && operator on line 11. The total test-case count is four. (Used with
code. permission from The MathWorks, Inc. All rights reserved.)
When it’s infeasible to hand compute expected
results—even for small computational units—
there are several strategies you can try. 1 function test_iccWhitePoint
2 exp_ICCwhite = ...
3 [hex2dec('7b6b')/hex2dec('8000') 1 ...
compare with alternate computation method. The 4 hex2dec('6996')/hex2dec('8000')];
Matlab fft function computes the discrete 5 assertEqual(whitepoint('ICC'), exp_ICCwhite);
Fourier transform (DFT) using a fast algorithm. 6
Some fft test cases compare the function’s out- 7 function test_d50WhitePoint
8 exp_d50 = [0.96419865576090, 1.0, ...
put against the output computed directly using 9 0.82511648322104];
the standard DFT mathematical equation. The 10 assertEqual(whitepoint('d50'), exp_d50);
direct method’s relative slowness doesn’t matter 11
12 function test_d65WhitePoint
for testing purposes. 13 exp_d65 = [.9504 1.0 1.0889];
14 assertEqual(whitepoint('d65'), exp_d65);
check the output’s expected properties. Sometimes, 15
you can check program output against the solu- 16 function test_caseInsensitivity
17 assertEqual(whitepoint('ICC'), whitepoint('icc'));
tion’s expected mathematical or physical proper- 18
ties. For example, the Matlab backslash operator 19 function test_default
() solves linear systems of equations. You can 20 assertEqual(whitepoint, whitepoint('ICC'));
check its output, x = Ab, by ensuring that its re-
sidual is sufficiently close to zero—that is, res = Figure 5. Test file for the Image Processing Toolbox’s whitepoint
A*x – b should be very small. function. Each of the five test cases verifies just one aspect of program
behavior and is easy to understand at a glance. (Used with permission
from The MathWorks, Inc. All rights reserved.)
compare with baseline results. You might be able to
independently verify your program’s output. If so,
you can save the verified output to a data file that understand test failures. Here are two suggested
the test-case code can load as the expected value. ways to do this.
Baseline tests can have some value even when in-
dependent verification isn’t available. In particular, avoid conditional logic. Using conditional logic
baseline tests help you catch unintended changes makes it hard to quickly understand the code.
in program behavior. Writing baseline tests is a Also, when different test code executes during
good way to get started when you face an existing different runs, tracking down test failures can be
body of untested code. frustrating. Strive therefore to write test code that
has no conditional logic.
writing Clear test Cases
Test code is subject to the same problems as any verify one condition per test case. Figure 5 shows a
other type of code. But, when test code becomes portion of a test file for the Matlab Image Pro-
hard to debug and maintain, it tends to fall into cessing Toolbox’s whitepoint function. As the
disuse. Therefore, you should write your test figure shows, there are five different test cases
code so that it’s trivially easy to track down and and each verifies a single, specific aspect of the
november/deCember 2009 51
5.
program behavior. The first three test cases verify The actual difference between the two expres-
the correct output for three different inputs (ICC, sions is very small, but not zero:
d50, and d65). The fourth test case (lines 16-17)
verifies the inputs’ case insensitivity, and the fifth >> (3/14 + (3/14 + 15/14)) ...
test case (lines 19-20) verifies the correct default – ((3/14 + 3/14) + 15/14)
behavior. The five tests cases are easy to under- ans =
stand at a glance and have specific names that 2.2204e016
clearly communicate the programmer’s intent.
Because the order of operations matters, two
Using random Values appropriately mathematically equivalent computational pro-
Novice test writers often use randomly generated cedures can produce different answers when im-
input data for their test cases. As Gerard Meszaros plemented using floating-point arithmetic. Even
notes, this kind of nondeterministic test case can identical code can produce different answers on
be difficult to debug because it can be hard to re- different computers, or when compiled using dif-
produce the failure.10 In the worst case, a particular ferent compilers, because of optimization varia-
test failure might never appear again despite nu- tions or differing versions of underlying runtime
merous test runs. To avoid frustration when using libraries, such as the basic linear algebra subpro-
randomly generated input data, write your test to grams (BLAS).
use the same set of randomly generated input data For testing, then, we should usually avoid exact
for every test run by using a fixed “seed” for the equality tests when checking floating-point results
random number generator. Alternatively, if you and instead use some sort of tolerance. There are
want to use a different seed for every test run, two types of floating-point tolerance tests com-
display or save the seed state so that if you have a monly used: absolute and relative. Absolute toler-
test failure, you can reproduce and debug it. ance tests for two values a and b:
Comparing floating-Point Values
| a − b| ≤ T ,
For numerical code, checking that the values in
your test cases are correct can be surprisingly
difficult. The problem is illustrated by an oft- where T is the tolerance. The relative tolerance
heard complaint on the Usenet newsgroup comp. test is
soft-sys.matlab: “In Matlab, why is 0.1 + 0.1 +
0.1 not equal to 0.3?!” That is, | a − b|
≤ T.
max (| a | , | b |)
>> 0.1 + 0.1 + 0.1 == 0.3
ans =
0 To understand the difference between these
tests, consider the values 10.1 and 11.2 and the tol-
The explanation is that Matlab, like most other erance 10 −3. The absolute difference between
numeric computation languages today, computes these values is 1.1; given the specified tolerance,
using IEEE floating-point numbers and arithme- they clearly fail the absolute tolerance test. Now,
tic.11,12 The decimal fractions above, 0.1 and 0.3, consider the values 134,712.5 and 134,713.6. Their
cannot be represented exactly in IEEE floating absolute difference is also 1.1, so they also fail the
point because 1/10 and 3/10 don’t have a finite absolute tolerance test. However, they pass the
representation as a binary fraction. Also, floating- relative tolerance test:
point arithmetic operations are approximate, not
exact, and are thus subject to round-off error. As a |134712.5 −134713.6 |
= 8.1655 ⋅ 10−6 <= 10−3.
surprising consequence, floating-point arithmetic max (|134712.5 | , | 134713.6 |)
systems don’t strictly obey associativity for mul-
tiplication and addition. You can see this readily
in Matlab: Specifying a relative tolerance of 10 −n is roughly
equivalent to saying you expect two values to dif-
>> 3/14 + (3/14 + 15/14) ... fer by no more than one unit in the nth significant
== (3/14 + 3/14) + 15/14 digit.
ans = When you compare two vectors, as opposed to
0 two scalars, you should consider whether to apply
52 Computing in SCienCe & engineering
6.
the tolerance test to each vector element indepen- • The framework should support the develop-
dently or whether to apply it using a vector norm. ment of other programs—such as graphical
For example, suppose you compared the two vec- user interfaces—that can control test execution
tors [1 100,000] and [2 100,000] using a relative and reporting.
tolerance of 10 −3. If you compare the two vectors • The framework should be scalable, handling
elementwise, they clearly fail the tolerance test several tests on a simple, short program as well
because the relative difference between 1 and 2 is as extensive test suites for large applications.
much higher than the tolerance value. However, • The framework should borrow familiar, proven
another way to compare vectors is using a vector design and usage concepts from test frameworks
norm, such as L2: created for other languages.
|| v1 − v2 || To help satisfy these goals, we modeled the frame-
≤ T. work on xUnit, a class of testing frameworks.10
max (|| v1 || , || v2 || )
xUnit
The two vectors [1 100,000] and [2 100,000] The xUnit family of testing frameworks was pop-
pass this form of relative tolerance test because ularized both by the JUnit Java test framework
they differ in the L2-norm sense by only about 1 and the article “Test Infected—Programmers
part in 100,000. Love Writing Tests” by Erich Gamma and Kent
To compare floating-point values, Matlab Beck.13 You can find xUnit-style test frameworks
xUnit provides the functions assertElements for most widely used programming languages.
AlmostEqual and assertVectorsAlmost Although different xUnit frameworks vary in
Equal, either of which can use a relative or an the details, most share several common design
absolute tolerance. elements. I’ll now describe the xUnit design ele-
When choosing a tolerance, it helps to consider ments that are most relevant to the workings of
the machine precision, or ε. This quantity is the Matlab xUnit.
spacing of floating-point numbers between suc- First, Matlab xUnit uses test-case objects and
cessive powers of two, relative to the lower pow- methods. Each test case is instantiated as a full ob-
er of two. For example, the next floating-point ject. The test framework instantiates one object
number higher than 1 is 1 + ε, while the next of the test subclass for each of the test methods it
floating-point number higher than 2 is 2 + 2ε. contains. The corresponding Matlab xUnit base
The Matlab function eps returns the machine class is TestCase.
precision. For double-precision floating point, ε is Second, running a particular test case proceeds
approximately 2.2 ∙ 10 −16, and for single precision in four phases:
it’s approximately 1.2 ∙ 10 −7.
For computations involving relatively little 1. Perform setup steps, such as opening a file or
floating-point arithmetic, a small multiple of ε connecting to a database.
might be appropriate, such as 100ε. For computa- 2. Call the function being tested.
tions involving coarse approximations, we might 3. Verify that the expected outcome has
use much higher tolerances. The default rela- occurred.
tive tolerance for the Matlab xUnit assertion 4. Perform any required teardown steps, such
functions is approximately 10 −8 , correspond- as closing a file or database connection.
ing to eight significant digits. In general, your
choice of tolerance depends on the particular Third, test suites are hierarchical. A Matlab
domain and the nature of the your models and xUnit test suite is a collection of individual test
approximations. cases as well as other test suites. This hierarchi-
cal nature allows test suites to scale from small to
Matlab xUnit Design large applications.
and architecture Four, most xUnit frameworks support auto-
Matlab xUnit was designed to achieve several matic test-case discovery so that the programmer
objectives: doesn’t have to create and update an explicit list of
test cases. The Matlab xUnit function runtests
• Typical M atlab users should be able to write discovers test cases automatically by scanning a
and run tests easily. directory for test files and then it automatically
• Tests should run automatically. runs all the discovered test cases.
november/deCember 2009 53
7.
adapting xUnit for Procedural Programming Chang and Roth, Science 293 (5536) 1793–1800,”
Most Matlab code today is procedural, and most Science, vol. 314, no. 5807, 2006, p. 1875; www.
scientists and engineers who use Matlab prefer sciencemag.org/cgi/content/full/314/5807/1875b.
procedural programming. Although implemented 4. B.G. Hall and S.J. Salipante, “Retraction: Measures of
in a fully object-oriented xUnit fashion, Matlab Clade Confidence Do Not Correlate with Accuracy
xUnit was designed from the start to be used by of Phylogenetic Trees,” PLoS Computational Biology,
procedural programmers and its documentation vol. 3, no. 3, 2007; www.ploscompbiol.org/article/
focuses on procedural test writing. info:doi%2F10.1371%2Fjournal.pcbi.0030051.
The FunctionHandleTestCase—which is 5. G. Miller, “A Scientist’s Nightmare: Software Problem
a TestCase subclass—supports the procedural Leads to Five Retractions,” Science, vol. 314, no. 5807,
testing style. The helper script initTestSuite 2006, pp. 1856–1857.
(see Figure 1 line 2) automatically turns each test 6. G. Wilson, “What Should Computer Scientists
function in the file into a FunctionHandleTest Teach to Physical Scientists and Engineers?” IEEE
Case object and returns the collection of them as Computational Science & Eng., vol. 3, no. 2, 1996,
a TestSuite object. pp. 46–65.
7. D.E. Stevenson, “A Critical Look at Quality in Large-
I
Scale Simulations,” Computing in Science & Eng.,
f you’re a scientist or engineering research- vol. 1, no. 3, 1999, pp. 53–63.
er, you probably don’t regard yourself as a 8. D. Kelly and R. Sanders, “The Challenge of Testing
professional software developer. Still, you Scientific Software,” Proc. Conf. Assoc. Software Testing:
probably write software to get your work Beyond the Boundaries (CAST 2008), Assoc. Software
done. That software’s quality significantly affects Testing, 2008; www.associationforsoftwaretesting.
the quality of your research. Matlab is widely org/documents/cast08/DianeKellyRebeccaSanders_
used for research in many science and engineer- TheChallengeOfTestingScientificSoftware_paper.pdf.
ing disciplines; the availability of an industry- 9. S. McConnell, Code Complete: A Practical Handbook of
standard testing approach for software created Software Construction, 2nd ed., Microsoft Press, 2004.
with Matlab can help you incorporate automated 10. G. Meszaros, xUnit Test Patterns: Refactoring Test
software testing into your regular practice. By do- Code, Pearson Education, 2007.
ing so, you can increase your confidence in both 11. C. Moler, “Floating Points: IEEE Standard Unifies
your software and in the accuracy of the research Arithmetic Model,” Cleve’s Corner, The MathWorks,
results based upon it. 1996; www.mathworks.com/company/newsletters/
news_notes/pdf/Fall96Cleve.pdf.
acknowledgments 12. D. Goldberg, “What Every Computer Scientist
Special thanks to Greg Wilson and Gerard Meszaros Should Know About Floating-Point Arithmetic,”
for their helpful comments on Matlab xUnit’s design Computing Surveys, vol. 23, no. 1, 1991, pp. 5–48.
and architecture, as well as to Mara Yale, Rob Comer, 13. E. Gamma and K. Beck, “JUnit Test Infected: Pro-
and Bill McKeeman for their suggestions on this ar- grammers Love Writing Tests,” Source Forge, 1998;
ticle. I also thank the anonymous reviewers for their http://junit.sourceforge.net/doc/testinfected/testing.
helpful and constructive comments. Matlab is a reg- htm.
istered trademark of The MathWorks, Inc.
steven l. eddins manages the image processing and
references geospatial computing software development team at
1. J.E. Hannay et al., “How Do Scientists Develop and The MathWorks, Inc. He is coauthor of Digital Im-
Use Scientific Software?” Proc. 2nd Int’l Workshop age Processing Using Matlab (Gatesmark Publish-
Software Eng. Computational Science and Eng., IEEE CS ing, 2009). Eddins has a PhD in electrical engineering
Press, 2009, pp. 1–8. from the Georgia Institute of Technology. He’s a
2. L. Hatton, “The T Experiments: Errors in Scientific senior member of IEEE and a member of SPIE. Con-
Software,” IEEE Computational Science & Eng., vol. 4, tact him at steve.eddins@mathworks.com.
no. 2, 1997, pp. 27–38.
3. G. Chang et al., “Retraction of Pornillos et al., Sci- Selected articles and columns from IEEE Computer
ence 310 (5756) 1950–1953. Retraction of Reyes and Society publications are also available for free at
Chang, Science 308 (5724) 1028–1031. Retraction of http://ComputingNow.computer.org.
54 Computing in SCienCe & engineering
Clipping is a handy way to collect and organize the most important slides from a presentation. You can keep your great finds in clipboards organized around topics.
Be the first to comment