Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Sergeant

302 views

Published on

  • Be the first to comment

  • Be the first to like this

Sergeant

  1. 1. Integrating a Behavior-Driven Development Tool into Perl’s Testing Ecosystem Peter Sergeant Kellogg College University of Oxford Trinity Term 2016 A dissertation submitted for the MSc in Software Engineering
  2. 2. Abstract Cucumber is a suite of software written in Ruby for facilitating Behavior-Driven Devel- opment by allowing testing fixtures and assertions to be organized into Features and Scenarios, written in a subset of natural language. Cucumber has been ported to many languages including Perl and Python. This dissertation starts by examining and contrasting the architectures of the testing ecosystems in Perl, Python, and Ruby – from creating basic test assertions to produc- ing parse-able test-run summaries – and in particular the differences in approach for facilitating interoperability between testing libraries. Cucumber-style testing is investigated through this lens – individual features such as tags, step definitions and command-line tooling are explained and linked back to a more general hierarchy suggested in the first section. Finally the design and implementation of Test::BDD::Cucumber –– which shares primary authorship with this dissertation — is detailed, along with reflection on that design and implementation.
  3. 3. Acknowledgements Thank-you to my tutor — Professor Jeremy Gibbons — for keeping the faith for two years, and for the occasional 24 hour turn-around of drafts. Additionally to my long-suffering wife who has may times found herself left alone to explore during our holidays while I sat in hotel rooms finishing this document, and whose own academic achievements inspired me to undertake this MSc. Finally to the other contributors to the Test::BDD::Cucumber project, especially Erik Huelsmann who has contributed both code and gentle pressure to improve the code base.
  4. 4. Contents 1 Introduction 1 1.1 Motivation of this dissertation . . . . . . . . . . . . . . . . . . . . . . . . 1 1.2 Cucumber and the Platform . . . . . . . . . . . . . . . . . . . . . . . . . 2 1.3 Objectives and Expected Contribution . . . . . . . . . . . . . . . . . . . 2 2 A Model for Testing in Perl, Python, and Ruby 3 2.1 The Anatomy of a Simple Test Assertion . . . . . . . . . . . . . . . . . . 3 2.1.1 Perl . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 2.1.2 Python . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 2.1.3 Ruby . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 2.1.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 2.2 Creating Extended Assertions . . . . . . . . . . . . . . . . . . . . . . . . 10 2.2.1 Perl . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 2.2.2 Python . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 2.2.3 Ruby . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 2.2.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 2.3 A Model For Test Suites and Test Harnesses . . . . . . . . . . . . . . . . 17 2.3.1 Predicates and Test Assertions . . . . . . . . . . . . . . . . . . . 17 2.3.2 Sequencing Test-Assertions and Control Flow . . . . . . . . . . . 18 2.3.3 Modeling Meta Test-Assertion Control Flow . . . . . . . . . . . . 20 2.4 Decisions for Test Library Implementors . . . . . . . . . . . . . . . . . . 21 3 The Cucumber Model 23 3.1 A very High-Level Overview of Cucumber . . . . . . . . . . . . . . . . . 23 3.2 Organization of Assertions . . . . . . . . . . . . . . . . . . . . . . . . . . 24 3.2.1 Steps . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24 3.2.2 Scenarios . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25 3.2.3 Features . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26 3.2.4 Tags . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26 3.3 Test Data and Fixtures . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27 3.3.1 Parameterizable Step Definitions . . . . . . . . . . . . . . . . . . 27 3.3.2 Step Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28 3.3.3 Outlines . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28 3.3.4 Background Sections as Transformers . . . . . . . . . . . . . . . . 28 3.4 Reporting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29 3.5 Implementation Details . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31 3.5.1 Integrating with Test Assertion Providers . . . . . . . . . . . . . 31 3.5.2 World . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31 i
  5. 5. 3.5.3 Running the Test Suite . . . . . . . . . . . . . . . . . . . . . . . . 32 4 Implementing Perl’s Test::BDD::Cucumber 33 4.1 An Exculpatory Note on the Code Ahead . . . . . . . . . . . . . . . . . 33 4.2 Step Definitions with Test::Builder . . . . . . . . . . . . . . . . . . . . . 33 4.2.1 Why Integrate with Test::Builder? . . . . . . . . . . . . . . . . . 33 4.2.2 A Meta Test-Assertion for Step Definitions . . . . . . . . . . . . . 34 4.3 Handling Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36 4.3.1 What’s Needed . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36 4.3.2 Foldable Results . . . . . . . . . . . . . . . . . . . . . . . . . . . 37 4.3.3 Control Flow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38 4.4 Data Provision, Fixtures, and the Environment . . . . . . . . . . . . . . 38 4.4.1 Test::BDD::Cucumber::StepContext . . . . . . . . . . . . . . . . . 39 4.4.2 The Stash . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39 4.4.3 Easy Access . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39 4.5 Output Harnesses . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40 4.5.1 Test::BDD::Cucumber::Harness::TestBuilder . . . . . . . . . . . . 40 5 Reflection 42 5.1 Comparing Perl, Python and Ruby . . . . . . . . . . . . . . . . . . . . . 42 5.2 The Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43 5.2.1 The Choice of Haskell . . . . . . . . . . . . . . . . . . . . . . . . 43 5.2.2 Generating Formatted Reports . . . . . . . . . . . . . . . . . . . 44 5.2.3 The Extension Model . . . . . . . . . . . . . . . . . . . . . . . . 44 5.2.4 Further Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44 5.3 Test::BDD::Cucumber . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44 5.3.1 A Brief History . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44 5.3.2 Further Work Planned . . . . . . . . . . . . . . . . . . . . . . . . 45 5.3.3 Reflections on the Development Process . . . . . . . . . . . . . . 46 5.4 Summary of Work Complete . . . . . . . . . . . . . . . . . . . . . . . . . 46 A A Simple Haskell Testing Library 47 A.1 Completing the Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47 A.1.1 A Monadic ResultEnv . . . . . . . . . . . . . . . . . . . . . . . . 47 A.1.2 The Test Harness . . . . . . . . . . . . . . . . . . . . . . . . . . . 47 A.2 Adding Assertions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48 A.3 Outputting TAP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49 Bibliography 53 ii
  6. 6. 1 Introduction This dissertation will explore the Behavior-Driven Development tool called Cucumber, and examine the challenges and considerations experienced when writing an entirely new implementation of it in Perl, tightly integrated with Perl’s extensive testing ecosystem. The implementation whose development is explored (Test::BDD::Cucumber) was devel- oped during attendance of the MSc in Software Engineering, and is currently being used in a range of applications, from testing warehouse automation robots for NET-A-PORTER, to coordinating hard-drive tests for Seagate, to providing a basis for a major open-source ERP’s acceptance tests. 1.1 Motivation of this dissertation Cucumber describes a suite of Ruby software which allow software tests to be defined in a subset of natural language, and then allows those natural-language tests to be ex- ecuted. While Cucumber refers to the specific Ruby implementation for running tests, and Gherkin refers to the natural language subset used, common usage favors the word Cucumber or the phrase “Cucumber-style testing” to describe any testing performed in this style. This dissertation will use the term Cucumber to describe the general style of testing sug- gested by Cucumber, RCucumber when talking about the specific Ruby implementation of Cucumber, and Test::BDD::Cucumber when talking about the Perl implementation whose design and implementation decisions form a part of this dissertation. Cucumber promises that it will “encourage closer collaboration, helping teams keep the business goal in mind at all times”1 . It achieves this by defining an extensible natural language subset to organize software tests. An example of a test case used later in this dissertation is: Scenario: Combining Key Presses on the Display When I press 1 And I press 2 Then the display should show 12 Much of the value of tests written using Cucumber comes from act of collaboratively creating descriptions of expected behavior (Wynne and Hellesøy 2012). Producing a natural language description of a feature which a product manager agrees is a faithful description, a developer believes has enough information to be used as the starting point for development, and that a tester believes forms a good basis for a test helps to make 1 https://cucumber.io 1
  7. 7. 2 sure the test strikes the right balance between describing what is being tested, and how it’s being tested. 1.2 Cucumber and the Platform Perl already has an especially well-developed testing ecosystem. Perl has literally thou- sands of test-related software packages on the Comprehensive Perl Archive Network (CPAN)2 , the vast majority of which inter-operate nicely through the Test Anything Protocol (TAP). These packages cover almost every conceivable testing paradigm, from unit-testing to automated specification-based testing which mirror Haskell’s QuickTest packages. The original Cucumber suite targeted Ruby, and Ruby’s testing eocsystem has some interesting and significant differences from Perl’s. In order to create a well-integrated implementation of Cucumber for Perl, these differences and their implications should be considered. 1.3 Objectives and Expected Contribution In Chapter 2, differences between Perl and Ruby’s testing ecosystems and architecture are considered, as is the ecosystem and architecture of a similar popular language, Python. A general model for describing the differences — specifically in composition and collation of test assertions — is shown, and a set of considerations for implementers of testing libraries is suggested. Chapter 3 then examines RCucumber through that model and the set of considerations from Chapter 2. The lessons from this are used to illustrate the reasoning behind design decisions taken during the development of Test::BDD::Cucumber in Chapter 4, and some of the more interesting aspects of the implementation of it are illustrated. Finally, the general applicability and utility of the model, and the use of Test::BDD::Cucumber since its development and release is considered and reflected upon in Chapter 5. 2 https://metacpan.org
  8. 8. 2 A Model for Testing in Perl, Python, and Ruby A programmer moving between Perl, Python, and Ruby is unlikely to run into too many conceptual challenges. There’s new syntax to learn, and there are a few wrinkles: someone new to Perl will have to get used to adding strange characters to the beginning of their variable names, someone new to Python will need to study the scoping rules (Ascher and Lutz 1999), and someone new to Ruby will probably spend some time trying to understand monkey patching and Eigen classes (Flanagan and Matsumoto 2008), but the similarities vastly outweigh the differences. One interesting variation between the three languages comes from their automated software-testing ecosystems. None has built-in primitives for software testing in the language itself1 , but each provides at least one library in the core distribution for performing automated software testing, and each has a rich set of externally provided libraries. However, the approaches taken vary not just in implementation but the philosophy of how testing should be approached. There’s variance inside each language’s suite of approaches too. Comparisons of the approaches taken are hard to find online (an early draft of this article published on a blog leads Google’s results), so this chapter dives into the differences in some detail. This is achieved by examining the approach taken at various levels of testing: • The anatomy of a simple test assertion in each language • Extending assertions to provide improved diagnostics • Reporting the status of a test script to users or computers, and a general model for assertions The lessons are then summarized, and a range of considerations for implementing software testing libraries are described, for use in the subsequent chapter on RCucumber. 2.1 The Anatomy of a Simple Test Assertion The term assertion in a computing context has a pedigree stretching back to Turing (Turing 1949), Goldstine and von Neumann (Goldstine and Von Neumann 1948). An assertion consists at least of “Boolean-valued expressions … to characterize invalid pro- gram execution states” and a “runtime response that is invoked if the logical expression is violated upon evaluation” (Clarke and Rosenblum 2006). 1 although Python has built-in assertions 3
  9. 9. 4 Finding a good definition for a test assertion is a little more challenging. Wikipedia2 has a plausible definition of a test assertion as being “an assertion, along with some form of identifier” albeit with uninspiring references to back it up. Kent Beck’s seminal “Simple Smalltalk Testing: With Patterns” (Beck 1994) which begat SUnit3 , which begat xUnit4 , doesn’t mention assertions at all. Instead, he talks about Test Case methods on a testing object which raise catchable exceptions via the testing object. “Unit Test Frameworks” (Hamill 2004), which covers xUnit-style testing but with a focus on Java and JUnit uses the term “test assert” to describe this: Test conditions are checked with test assert methods. Test asserts result in test success or failure. If the assert fails, the test method returns immediately. If it succeeds, the execution of the test method continues. In both cases, the result of assertion failure is the raising of an exception, although this need not be the case: some assertion capabilities abort immediately, some report the violation and then continue execution, and some either abort or continue based on the type of the assertion (Clarke and Rosenblum 2006) Essentially then, failed assertions can be treated as reportable events or exceptional events. One might speculate that these differences in approaches to assertions themselves — re- portable vs exceptional — might be expressed in differences between language approaches to test assertion, and start to consider how this might in turn affect suggested practices for test design. 2.1.1 Perl By convention, test assertions written in Perl are reportable. On success or failure they emit the string ok # or not ok # (where # is a test number) to STDOUT and this output is intended to be easily aggregated and parsed by the software running the test cases. These basic strings form the basis of TAP — the Test Anything Protocol. TAP is “a sim- ple text-based interface between testing modules in a test harness”5 , and both consumers and emitters of it exist for many programming languages (including Python and Ruby). The simplest possible valid Perl test assertion then is: printf(”%s %d - %sn”, ($test_condition ? ’ok’ : ’not ok’), $test_count++, ”Test description” ) Test::More — a very popular testing library that introduces basic test assertions and is used by about 80% of all released Perl modules6 — describes its purpose as: 2 https://en.wikipedia.org/wiki/Test_assertion 3 https://en.wikipedia.org/wiki/SUnit 4 https://en.wikipedia.org/wiki/XUnit 5 https://testanything.org/ 6 https://en.wikipedia.org/wiki/Test::More
  10. 10. 5 to print out either “ok #” or “not ok #” depending on if a given [test assertion] succeeded or failed” Test::More provides ok() as its basic unit of assertion: ok( $test_condition, ”Test description” ); and while one could use ok() to test string equality: ok( $actual eq $expected, ”String matches $expected” ); Test::More builds on ok() to provide tests which “produce better diagnostics on failure … [and that] know what the test was and why it failed”. is() can be used to rewrite the assertion: is( $actual, $expected, ”String matches $expected” ); and returns detailed diagnostic output, also in TAP format: not ok 1 - String matches Bar # Failed test ’String matches Bar’ # at -e line 1. # got: ’Foo’ # expected: ’Bar’ Test::More ships with a variety of these extended equality assertions. However, the Test::More documentation has not been entirely honest with us. A quick look at its source code7 shows nary a print statement, but instead a reliance on what appears to be a singleton object of the class Test::Builder, to which ok() and other test assertions delegate. Test::Builder is described as8 : a single, unified backend for any test library to use. This means two test libraries which both use Test::Builder can be used together in the same pro- gram Which is an intriguing claim to be re-examined when looking at how assertions are ex- tended. The following code: ok( 0, ”Should be 1” ); ok( 1, ”Should also be 1” ); will correctly flag that the first assertion failed, but then will continue to test the second one. 2.1.2 Python Python provides a built-in assert function whose arguments closely resemble the afore- mentioned “assertion, along with some form of identifier”: assert test_condition, ”Test description” When an assertion is false, an exception of type AssertionError is raised associated with the string Test description. Uncaught, this will cause the Python test to exit with a non-zero status and a stack-trace printed to STDERR, like any other exception. 7 https://metacpan.org/source/EXODIST/Test-Simple-1.302047/lib/Test/More.pm 8 https://metacpan.org/pod/Test::Builder
  11. 11. 6 AssertionError has taken on a special signifying value for Python testing tools — vir- tually all of them will provide test assertions which raise AssertionErrors, and will attempt to catch AssertionErrors having evaluated an assertion, transmuting them to failures rather than exceptions. The unittest documentation provides9 a clear description of this distinction: If the test fails, an exception will be raised, and unittest will identify the test case as a failure. Any other exceptions will be treated as errors. This helps you identify where the problem is: failures are caused by incorrect results — a 5 where you expected a 6. Errors are caused by incorrect code — e.g., a TypeError caused by an incorrect function call. Which is foreshadowed by Kent Beck’s (Beck 1994) explanation: A failure is an anticipated problem. When you write tests, you check for expected results. If you get a different answer, that is a failure. An error is more catastrophic, a error condition you didn’t check for. Like Test::More’s ok(), there’s no default useful debugging information provided by assert other than what you provide. One can explicitly add it to the assert statement’s name, which is evaluated at run-time: assert expected == actual, ”%s != %s” % (repr(expected), repr(actual)) However, evidence that assert was not really meant for software testing starts to emerge as one digs deeper. Calls to the function are stripped out when running the code in production mode, and there’s no built-in mechanism for seeing how many times assert was called — code runs that are devoid of any assert statement are indistinguishable from code in which every assert statement passed. This makes use of it by itself an unsuitable building block for testing tools, despite its initial promise — an assertion borne out by the approaches taken by Python’s testing libraries: PyTest — a “mature full-featured Python testing tool”10 — deals with this by essentially turning assert into a macro. Testing libraries imported by code running under PyTest will have their assert statements rewritten on the fly to produce testing code capable of useful diagnostics and instrumentation. Running: assert expected == actual, ”Test description” using python directly gives us a simple error message: Traceback (most recent call last): File ”assert_test.py”, line 4, in <module> assert expected == actual, ”Test description” AssertionError: Test description where running the same code under py.test gives us proper diagnostic output: ======================= ERRORS ======================== ___________ ERROR collecting assert_test.py ___________ assert_test.py:4: in <module> assert expected == actual, ”Test description” E AssertionError: Test description E assert ’Foo’ == ’Bar’ 9 https://docs.python.org/2/library/unittest.html 10 http://www.pytest.org
  12. 12. 7 =============== 1 error in 0.01 seconds =============== unittest, an SUnit descendant that’s bundled as part of Python’s core library, provides its own functions that directly raise AssertionErrors. The basic unit is assertTrue: self.assertTrue( testCondition ) unittest’s test assertions are meant to be run inside xUnit-style Test Cases which are run inside try/except blocks which will catch AssertionErrors. Amongst others, unittest provides an assertEqual: self.assertEqual( expected, actual ) However, the test name has had to be removed in order to get a useful diagnostic message (this default functionality can be changed): Traceback (most recent call last): File ”test_fizzbuzz.py”, line 11, in test_basics self.assertEqual( expected, actual ) AssertionError: ’Foo’ != ’Bar’ Much as Perl unifies around TAP and Test::Builder, Python’s test tools unify around the raising of AssertionErrors. This means they raise exceptional test assertions, rather than reportable ones. A practical difference between these approaches is that a single test assertion failure in Python will cause other test assertions in the same try/except scope not to be run — and in fact, not even acknowledged. Running: assert 0, ”Should be 1” assert 1, ”Should also be 1” gives us no information at all about the second assertion. This is consistent with Hamill (Hamill 2004): Since a test method should only test a single behavior, in most cases, each test method should only contain one test assert. 2.1.3 Ruby Ruby’s testing libraries have no unification point, other than that each uses exceptional rather than reporting test assertions. This can perhaps best be illustrated by examining a library like wrong11 , which aims to provide test assertions to be used inside tests targeting the three most common testing libraries: RSpec12 , minitest13 and Test::Unit14 . wrong provides only one main test assertion, assert, which accepts a block expected to evaluate to true; if it doesn’t, a Wrong::Assert::AssertionFailedError exception is raised. This is a subclass of RuntimeError, which is “raised when an invalid operation is attempted”15 , which in turn is a subclass of StandardError, which in turn is a subclass of Exception, the base exception class. 11 https://github.com/sconover/wrong — wrong is not a particularly popular testing library, but will serve a purpose in showing how the popular ones work 12 http://rspec.info/ 13 https://github.com/seattlerb/minitest 14 https://github.com/test-unit/test-unit 15 http://ruby-doc.org/core-2.1.5/RuntimeError.html
  13. 13. 8 Exception has some useful methods for an exception class — a way of getting a stack trace (backtrace), a descriptive message (message), and a way of contextually re-raising excep- tions (cause). RuntimeError and StandardError don’t specialize the class in anything but name, and thus simply provide meta-descriptions of exceptions. Figure 2.1: Ruby testing library exception class hierarchy Ruby requires one to raise exceptions that are descended from Exception16 . Exceptions that inherit from StandardError are intended to be ones that are to some degree expected, and thus caught and acted upon. As a result, rescue — Ruby’s catch mechanism — will catch these exceptions by default17 . Those that don’t inherit from StandardError have to be explicitly matched, either by naming them directly in the rescue statement, or by specifying that you want to catch absolutely any type of exception raised. Thus wrong considers its exceptions to be expected errors that should be caught. How- ever, wrong also knows how to raise exceptions for the three major/most-popular testing libraries. When being used with RSpec — “a behavior driven development framework”18 — wrong’s 16 https://robots.thoughtbot.com/rescue-standarderror-not-exception 17 http://ruby-doc.org/core-2.1.5/StandardError.html 18 http://rspec.info/
  14. 14. 9 RSpec adapter will raise exceptions of type RSpec::Expectations::ExpectationNotMetError, which inherit directly from Exception. No structured diagnostic data is included in the exception — the diagnostics have been serialized to a string by the time they’re raised and are used as the Exception’s msg attribute. No additional helper methods are included directly in the exception class, either. When wrong is used with minitest — “a small and incredibly fast unit testing frame- work”19 — it uses an adapter to raise instances of MiniTest::Assertion. Like RSpec’s exception class, this also inherits directly from Exception, meaning it won’t get caught by a default use of rescue. While it does include a couple of helper methods, these are simply convenience methods for formatting the stack trace, and any diagnostic data is only available serialized into a string. Finally, wrong will raise Test::Unit::AssertionFailedError exceptions when used with Test::Unit — a “unit testing framework … based on xUnit principles”20 . These inherit from StandardError, so are philosophically “expected” exceptions. Of more interest is the availability of public attributes attached to these exceptions — expected and actual values are retained, as is the programmer’s description of the exception. This allows for some more exciting possibilities for those who might wish to extend it — for example by adding diagnostic information showing a line-by-line diff, language localization, or type-sensitive presentations. In any case, as per Python’s use of raised exceptions to signal test failure, some form of catcher is needed around any block containing these assertions. As raising an Exception will terminate execution of the current scope until it’s caught or rescued, only the first test assertion in any given try/catch scope will actually run. Ruby’s testing libraries’ assertions lack any form of built-in unification — as wrong shows, to write software integrating with several of them requires you to write explicit adapters for them. As there’s no built-in assertion exception class (like there is in Python), different testing libraries have chosen different base classes to inherit from, and thus there’s no general way of distinguishing a test failure raised via a test assertion from any other runtime exception, short of having a hard-coded list of names of exception-class names from various libraries. 2.1.4 Summary The effects of these differences will become more apparent as the ways of extending the test assertions provided are examined. In summary: • Perl communicates the result of test assertions by pushing formatted text out to a file-handle — perhaps appropriately for the Practical Extraction and Reporta- tion Language21 . However, in practice this is managed by a singleton instance of Test::Builder which the vast majority of Perl testing libraries make use of. Perl’s test assertions report their results, rather than raising exceptions. 19 http://docs.seattlerb.org/minitest/ 20 https://github.com/test-unit/test-unit 21 Even if the acronym has been retro-fitted
  15. 15. 10 • Python’s testing infrastructure appears to have evolved from overloading the use of a more generic assert function which provides runtime assertions and raises exceptions of the class AssertionError on failure. The use of a shared class be- tween testing libraries for signaling test failure would seem to allow a fair amount of interoperability. Python’s test assertions are then all exceptional, rather than reporting. • Ruby has several mutually incompatible libraries for performing testing, each of which essentially raises a subclass of Exception on test failure, but each of which has its own ideas about how that exception should look, leading to an ecosystem where auxiliary testing modules need to have adapters to work with each specific test system. Again, Ruby’s test assertion are exceptional, rather than reporting. 2.2 Creating Extended Assertions The illustration of the design and architecture of each language’s testing tools continues with examining how to add customized and extended test assertions for each language. This section will start with the relatively straightforward task of checking a hash (Ruby and Perl’s name for an associative array22 ) for the presence of a given key — with diag- nostics returned to the user on failure — but will then complicate that task by examining how one would go about generalizing the extended assertion to work across different testing libraries. 2.2.1 Perl Perl — like Python and Ruby — has a built-in feature for checking if a key exists in a hash, which returns a true or false value: exists $hash{$key}. Perl’s basic test assertion Test::More::ok(), accepts a boolean input, so the two can be simply combined: ok( exists $hash{$key}, ”Key $key exists in hash” ); Ideally one would make life easier for developers writing tests by showing more diagnostic information to whomever is investigating the failed test assertion — perhaps a list of available keys. Perl’s test assertions report, rather than raise exceptions, which means they can addi- tionally return a value to the caller. Test::More::ok() helpfully returns the value of the predicate under test, and that can be used to decide whether to print diagnostics using Test::More::diag(): unless ( ok( exists $hash{$key}, ”Key $key exists in hash” ) ) { diag(”Hash contained keys: ” . join ’, ’, sort keys %hash ) } This prints the more helpful TAP output: not ok 1 - Key Waldo exists in hash # Failed test ’Key Waldo exists in hash’ # at waldo.t line 16. # Hash contained keys: Bar, Baz, Foo 22 From the term hash table, which is evocative of an obvious implementation detail
  16. 16. 11 where diagnostics are separated from test results using a # — reflecting Perl’s commenting style. This is easily packaged up into a reusable function (which also returns the predicate value): sub test_hash_has_key { my ( $class, $hash, $key ) = @_; if ( ok( exists $hash->{$key}, ”Key $key exists in hash” ) ) { return 1; } else { diag(”Hash contained keys: ” . join ’, ’, sort keys %$hash ); return 0; } } The use of the Test::More function ok() adds a layer of unneeded indirection; it’s possible instead to talk directly to the Test::Builder singleton that Test::More and virtually every other Perl testing library uses under the hood: use base ’Test::Builder::Module’; sub test_hash_has_key { my ( $class, $hash, $key ) = @_; # Get the Test::Builder singleton my $builder = $class->builder; # Run the test, and save its pass/fail state in $status my $status = $builder->ok( exists $hash->{$key}, ”Key $key exists in hash” ); # Print diagnostics if it failed unless ( $status ) { $builder->diag( ”Hash contained keys: ” . join ’, ’, sort keys %$hash ); } # Pass back the test/fail status to the caller return $status; } The result is very simple code, but also very flexible code: it can be used almost anywhere in the Perl testing ecosystem. It reports the status of the test assertion on both success and failure, and adds diagnostics on failure, in a way that will integrate with no additional work with test suites built on: • Test::Class, Perl’s xUnit work-alike
  17. 17. 12 • Test::Expectation, Perl’s RSpec clone • Test::WWW::Mechanize, a testing library that drives a web browser • Test::BDD::Cucumber, Perl’s Cucumber port (which will be examined in Chapter 4) • And almost without exception, every other one of Perl’s many testing libraries 2.2.2 Python Python has a simple and particularly readable structure for testing for key membership of a dict (Python-esque for a hash). One can signal failure of a test assertion in Python in a portable way that will be caught and understood by the majority of testing libraries; simply raise an AssertionError: if key not in d: raise AssertionError(”Key %s does not exist in dict” % repr(key) ) Extension with diagnostic information is slightly more complicated, as there’s no standard way to do it across different libraries (or even with most libraries). One can can manually write to STDERR, and hope for the best: if key not in d: keydesc = ”, ”.join(d.keys()) sys.stderr.write(”Dict contained keys: %s” % keydesc) raise AssertionError(”Key %s does not exist in dict” % repr(key) ) Text to STDERR is explicitly summarized when the test is run using PyTest, but other- wise has nothing to distinguish it as relating to the test rather than any other expected diagnostic output. One could instead add the diagnostics to the message in the raised AssertionError. At the point where the AssertionError is being raised, there’s already a problem, so diagnostics are likely appropriate: if key not in d: keydesc = ”, ”.join(d.keys()) raise AssertionError( ”Key %s does not exist in dict; extant keys: %s”% (repr(key), keydesc) ) There are wo problems with this approach: Firstly, explicitly raising an AssertionError means that no actual test assertion is being exercised — there is no positive assertion path. The parent library is unable to detect that a test assertion has run, and so can’t keep statistics (such as counting the number of assertions run or printing descriptions of successful assertions), can’t make use of any special behavior encoded in its test assertions (such as coverage tracing), nor can it assign benchmarking results to given assertions. This first problem can be solved by moving the predicate itself (key not in d) into a library-provided test assertion. But this has to be done separately and differently for every testing library to be integrated with, much as wrong in the Ruby section did: # unittest self.assertTrue( key in d, ”Key %s exists in dict” % repr(key) )
  18. 18. 13 or # PyTest assert key in d, ”Key %s exists in dict” % repr(key) This degrades the apparent unity of Python testing libraries — code can only be re-used between them if the positive assertion path is ignored. More on this momentarily. The second problem concerns having conflated diagnostic information and the assertion identifier. The earlier Wikipedia-derived definition for a test assertion — “an assertion, along with some form of identifier” — hints that it would be useful to identify and track assertions. Overloading the identifier to include diagnostics isn’t just philosophically ugly on account of its mixing of concerns, it hampers the ability to reliable identify an assertion. No longer can a continuous deployment tool running the tests keeps statistics on assertions that frequently fail, or accurately benchmark the amount of time taken to get to an assertion over time. One could attempt to solve both problems by relying on the knowledge that test assertions from all libraries will raise the same kind of catchable exception. For example, by running the appropriate test assertion, and intercepting the raised exception to add diagnostics: try: msg = ”Key %s exists in dict” % repr(key) if ( ’unittest_object’ in vars() ): unittest_object.assertTrue( key in d, msg ) else: assert key in d, msg except AssertionError: keydesc = ”, ”.join(d.keys()) sys.stderr.write(”Dict contained keys: %s” % keydesc) raise except: raise The lack of a defined method for communicating diagnostics means there will always be an unpalatable choice between pushing diagnostics as unstructured text to the potentially noisy STDERR, or overloading the test assertion identifier. xUnit-based testing libraries (like unittest) work around both issues by making the smallest-reportable unit the Test Case (Hamill 2004) — a named block that can po- tentially contain several test assertions — rather than the actual test assertions they provide. The test assertions would be collated into one named meta test-assertion: def test_waldo_found(self): d = self.fixture_dict # A real unittest assertion self.assertTrue( len(d) > 0, ”dict has some items” ) # Manually raising a failure if ”Waldo” not in d: keydesc = ”, ”.join(d.keys()) raise AssertionError(
  19. 19. 14 ”Key ’Waldo’ does not exist in dict; extant keys: %s” % keydesc ) If the method representing the Test Case is treated as a meta test-assertion for reporting — rather than recording the status of the test assertions it’s made up of — then a positive test assertion path is regained (the Test Case was run and did not fail), as is a test assertion identity separate from diagnostics information (the method name of the Test Case vs the message in the the raised AssertionError). Essentially, if: • one trusts that the developer using the test assertion is treating it as a small part of a bigger meta test-assertion; and • the bigger meta test-assertion has a stable and high-quality identifier; and • the bigger meta test-assertion not failing is recorded and treated as an assertive success then one can fall back to the already-seen solution of explicitly raising a AssertionError and overloading its test name with diagnostics: if key not in d: keydesc = ”, ”.join(d.keys()) raise AssertionError( ”Key %s does not exist in dict; extant keys: %s”% (repr(key), keydesc) ) Python presents a unified mechanism for representing test assertion failure23 , but there is no unified mechanism for representing assertion success, and thus no mechanism for specifying more generally that a test assertion took place. Lack of a specific diagnostic channel for assertions to use mean the author of an extended diagnostic test assertion will need to think carefully about how to provide this information. In practice though, the majority of Python testing is done using unittest (or tools based on it) — which uses the Test Case pattern above — or via PyTest24 — whose default is to look for testing classes with test_ methods, and thus also implement the Test Case pattern. Assuming one sticks to tools using this pattern, the pitfalls and distinctions above are — literally — academic only, and only of interest to those trying to understand the implementation details. 2.2.3 Ruby Unlike either Perl or Python, Ruby’s testing tools have not coalesced around any shared approach. The approach of wrong has been examined in Section 2.1.3 — a library with adapters that allow a given test assertion function to raise an exception of the appropriate type for the testing library been used.25 To then compare Ruby to Perl and Python by developing a specific extended diagnostic assertion seems unrewarding. However, both wrong and Test::Unit have interesting takes 23 or more accurately, Python’s testing tools have unified around treating the built-in AssertionError exception as such 24 or both together 25 Entirely anecdotally — while researching this topic — there seems to be a prevalent sentiment that people only use RSpec or minitest, or derived tools, and that once one had settled on one, one was expected to work inside the ecosystem of that particular tool only
  20. 20. 15 on how their assertion diagnostics are raised, so this section will look at them in more detail. Specifically not mentioned here are minitest, which takes the same approach as Python’s unittest, and RSpec, a Behavior Driven Development tool which will be looked at in a little more detail at the same time as Cucumber. wrong Every other test assertion library looked at so far provides a method for asserting truth, a method for asserting equality with some diagnostic capabilities, and a set of other extended diagnostic test assertions. wrong provides only a single method — assert {block} — which accepts a block of code expected to be a predicate expression. When the block evaluates to true, the code moves on. When the block to evaluates false, a more in-depth process is kicked off. The assert method determines which file on the file-system it’s in, and what line number, and that file is then opened and the block is located and statically parsed!26 The boolean- returning expression in the block is then split into sub-expressions (if they exist), and the boolean value of each is shown. For example, and from wrong’s documentation: x = 7; y = 10; assert { x == 7 && y == 11 } ==> Expected ((x == 7) and (y == 11)), but (x == 7) is true x is 7 (y == 11) is false y is 10 wrong’s documentation explicitly discourages adding identifier names to test assertions created with assert on the basis that the predicate itself should be sufficient documen- tation: if your assertion code isn’t self-explanatory, then that’s a hint that you might need to do some refactoring until it is. In the example above, x == 7 && y == 11 is expected to act both as the identifier and the assertion. On failure, and in raising its own exception class, wrong merges the stringified predi- cate that acts as an identifier into its diagnostics for the failure. This approach extends to its design of adapters for other exception classes too. While Test::Unit’s exception class (examined next) supports a distinction between these, wrong assumes that all ex- ception classes it has been adapted to raise exceptions using will also simply use a string containing both diagnostics and assertion identifier. Test::Unit Test::Unit is an occasionally-bundled-with-Ruby27 xUnit-derivative, which provides an assert_equal() test assertion. Like the other xUnit descendants (such as unittest above), 26 https://github.com/sconover/wrong 27 http://www.slideshare.net/kou/rubykaigi-2015
  21. 21. 16 it requires test assertions to be used in named Test Case blocks, which it uses to identify tests. By default: def test_simple actual = ”Bar” assert_equal(”Foo”, actual, ”It’s a Foo” ) will die, but will interestingly not conflate diagnostics and identifiers: Failure: test_simple(TUSimple) TU_Simple.rb:8:in ‘test_simple’ 5: 6: def test_simple 7: actual = ”Bar” => 8: assert_equal(”Foo”, actual, ”It’s a Foo” ) 9: end 10: 11: end It’s a Foo <”Foo”> expected but was <”Bar”> The enclosing test case is what’s marked as failed (Failure: test_simple(TUSimple)), and the test assertion’s name is presented separately (It’s a Foo) to the diagnostic message (<”Foo”> expected but was <”Bar”>) and stack trace. Indeed, Test::Unit raises exceptions of the class Test::Unit::AssertionFailedError, which has explicit attributes supporting an expected value, an actual value, and a mes- sage, separately28 . This seems like a best of both worlds approach for test assertions that result in exceptions — passing the diagnostic information back to the test harness distinct from both the test assertion name, and distinct from the wider containing test name. Test::Unit — via plugins — is able to support output from its tests that make use of this distinction, including a TAP output module. 2.2.4 Summary The differences between reporting test assertions and exceptional test assertions have started to become more clear as the examination continues, and the concept of a meta test-assertion (such as an xUnit-style Test Case) has been introduced. Both reporting and exceptional test assertions seem to have advantages and disadvan- tages. Perl’s reporting-assertion-based approach means that a failed test assertion doesn’t derail sibling assertions from being run — one can run a lengthy series of assertions in series, and a single failing one near the beginning won’t stop further potentially useful diagnostics on other facets of the code from being generated. 28 https://github.com/test-unit/test-unit/blob/master/lib/test/unit/assertion-failed-error.rb
  22. 22. 17 A desire to see the outcome of several facets of the code under test may incentivize users of exceptional test assertions to organize their tests into smaller units that individually test these facets. This starts to resemble the xUnit ideal that “a test method should only test one behavior … when there is more than one condition to test, then a test fixture should be set up, and each condition placed in a separate test method” (Hamill 2004). This gentle pressure from the tooling to design tests around small units is not there in Perl (and presumably other languages using reporting-assertion-based approaches), and — anecdotally — this can often lead to tests in Perl being written in a long, meandering style that mixes test assertions directly into fixture code with no clear separation. In Ruby and Python, the combination of this pressure and the lack of a defined diagnostics channel has apparently led to the xUnit style being the default — named blocks that enclose a small number of assertions are considered to be the tests that are run, not the individual assertions. In that context, these blocks are the smallest unit identified by test harnesses, and the messages contained in raised exceptions are seen solely as diagnostic information. These blocks have been referred to as meta test-assertions so far in this chapter. 2.3 A Model For Test Suites and Test Harnesses 2.3.1 Predicates and Test Assertions This chapter has so far described test assertions as operating on predicates – expressions that will evaluate to true or false. In the examples seen so far, a False result can also include diagnostic information. data Result = Pass | Fail Diagnostics deriving (Show) type Predicate e = e -> Result Given this dissertation deals with dynamic languages, side-effects such as exceptions or mutation of the environment may occur: • Data set up to be operated upon by tests — fixtures — may be altered as part of the evaluation of the predicate, as might other values in the environment • The evaluation of the predicate itself may be unable to be completed, and a runtime exception must be raised The model then needs to be able to describe a result that passes or fails (Result), the new environment that exists after evaluation, and whether or not that evaluation caused an exception (Left e or Right e1): data ResultEnv e e1 = ResultEnv Result (Either e e1) deriving (Show) The assertion itself can be modelled as a function that maps from one environment to another, with a result: type Assertion e e1 = e -> ResultEnv e e1 and this made into a test assertion with the addition of an identifier:
  23. 23. 18 data TestAssertion e e1 = TestAssertion Identifier (Assertion e e1) 2.3.2 Sequencing Test-Assertions and Control Flow A developer, a tester, or a continuous delivery tool is ultimately interested in test asser- tions to signal whether a piece of software performs its tasks correctly, and perhaps what remedial actions need to be taken. Any indication that the software does not (Fail Diagnostics) or performs in a way that was unexpected (a Left value being returned) is likely lead the test interpretor to conclude that the end state of the whole sequence of assertions is that of failure, and take appropriate action. The result of previous evaluations needs to be considered in subsequent ones, and thus a way of combining results is needed: defResult = Pass addResult (Fail d) (Fail d’) = Fail $ mappend d d’ addResult (Fail d) _ = Fail d addResult _ x = x instance Monoid Result where mempty = defResult mappend = addResult More generally, a way is also needed of providing the environment left by the last eval- uation to the next one, and for stopping on failure — essentially a way of binding them together: instance Monad (ResultEnv e) where return e = ResultEnv Pass (Right e) (ResultEnv r (Left x)) >>= _ = ResultEnv r (Left x) (ResultEnv _ (Right x)) >>= f = ResultEnv r o where (ResultEnv r o) = f x A failure is expected to be the zero value for combining results — any sequence of results with a failure in it will be a failure. This behavior is implemented in all implementations covered so far. However other behaviors related to collections of test assertions and meta test-assertions differ both between testing libraries and indeed inside the libraries themselves. The following will be used to illustrate combination choices: eg = ( idTestGroupX, [ ( idTestGroupX1, [tX1a, tX1b]), ( idTestGroupX2, [tX2a, tX2b]), ( idTestGroupX3, []) ]) Continuation after a failure Two classes of test assertion have been seen so far — reportable test-assertions and excep- tional test-assertions. Using reportable test-assertions like those built with Test::Builder’s ok() method, a test assertion failing will not prevent subsequent sister test assertions from running:
  24. 24. 19 ok( 0, ”This fails” ); ok( 1, ”This is evaluated anyway” ); A test assertion built with unittest’s assertTrue will — through virtue of raising a ex- ception (albeit of a special type) — prevent sibling test assertions from running: self.assertTrue( False, ”This fails” ) self.assertTrue( True, ”This is never evaluated” ) If tx1a fails in the illustrative example, one might reasonably expect tx1b to only be run if the test assertions are reportable. However, when test assertions are collated into meta test-assertions provided by the testing library, behavior may change in this regard. If Test Group X1 as a whole is marked as a failure (due to failure of tx1a, whether or not Test Group X2 is evaluated depends on the type of the meta test-assertion. While the basic test assertions for unittest are exceptional, the Test Methods that they’re collected into are reportable, as are the Test Classes that those are collected into. This logic could be placed in assertions themselves, but this limits flexibility. Instead, the model has a function that accepts a description of desired behavior (Exceptional or Reportable) and adjusts results appropriately: transmute :: AssertionType -> ResultEnv e e -> ResultEnv e e -- For all AssertionTypes -- -- Passes are passed through transmute _ r@(ResultEnv Pass (Right x)) = r -- Exceptions that are passes become failures transmute _ r@(ResultEnv Pass (Left x)) = ResultEnv (Fail failedToCompile) (Left x) -- Exceptional -- -- A failure in Exception mode becomes an exception transmute Exceptional (ResultEnv (Fail d) (Right x)) = ResultEnv (Fail d) (Left x) -- A exception in Exceptional mode is passed through transmute Exceptional r@(ResultEnv (Fail d) (Left x)) = r -- Reportable -- -- A failure in Reportable mode is passed through transmute Reportable r@(ResultEnv (Fail d) (Right x)) = r -- An exception in Reportable mode is passed through transmute Reportable r@(ResultEnv (Fail d) (Left x)) = r Catching Exceptions Related to whether test assertions are reportable or exceptional — how should unexpected exceptions be handled? Should an exception cause sibling test assertions to be skipped and control handed back to the enclosing meta test-assertion? While the overall Result
  25. 25. 20 shape of the enclosing meta test-assertion will not be changed, the environment may be mutated further, and further diagnostics from failures and stack traces may be added. For all libraries examined so far, exceptions at the level of test assertions will cause siblings to be skipped. However, each library has at least one meta test-assertion construct that will catch exceptions, and continue to run sibling meta test-assertions. These meta test-assertions essentially treat exceptions as failures, a behavior which can be added to the transmute function: -- Catchable -- -- A failure in Catchable mode is passed through transmute Catchable r@(ResultEnv (Fail d) (Right x)) = r -- An exception in Catchable mode is changed to a fail transmute Catchable r@(ResultEnv (Fail d) (Left x)) = ResultEnv (Fail $ mappend d recovering) (Right x) Assertions types are then: data AssertionType = Exceptional | Reportable | Catchable Empty Sequences Another aspect to consider is the behavior of Test Group X3, an empty sequence of assertions. Does Test Group X3 pass because no test assertion failures were recorded, or does it fail as no test assertion successes were recorded? Test::Builder’s reportable test-assertions require positive proof that tests passed, or the tests are marked as failing: # Subtest: Test Group X3 1..0 # No tests run! not ok 1 - No tests run for subtest ”Test Group X3” Where unittest’s exceptional test-assertions consider empty Test Methods and Test Classes to be passing, due to the absence of failure. This property of meta test-assertions will also need to be recorded in the model — a function that accepts the desired behavior and returns a predicate yielding an appropriate ResultEnv: data EmptyBehavior = Succeeds | Fails emptyAssertion :: EmptyBehavior -> e -> ResultEnv e e emptyAssertion Succeeds e = ResultEnv Pass (Right e) emptyAssertion Fails e = ResultEnv (Fail dEmpty) (Right e) 2.3.3 Modeling Meta Test-Assertion Control Flow A generalized meta test-assertion model must allow the attributes in the previous section to be recorded alongside a sequence of test assertions or meta test-assertions that comprise it.
  26. 26. 21 In the model the attributes are stored as fields in a Configuration data type: data Configuration = Configuration { assertionType :: AssertionType, emptyBehavior :: EmptyBehavior } Collections of test assertions can then be described as meta test-assertions, with an identifier and with the desired sequencing configuration: data MetaTestAssertion e e1 = Single (TestAssertion e e1) | Sequence Configuration Identifier [MetaTestAssertion e e1] MetaTestAssertion allows all test groupings seen so far to be modeled and their behavior described — consider for example the xUnit groups as implemented by unittest: A test method is a Python block containing a sequence of assertX test assertions; a failure raises an exception, stopping further execution of test assertions in that block. An empty block is a pass: testMethod = Sequence Configuration { assertionType = Exceptional, emptyBehavior = Succeeds } A test class contains many test methods, but if any fail then the test class should continue, as it should in the case of an exception, making them Catchable: testClass = Sequence Configuration { assertionType = Catchable, emptyBehavior = Succeeds } A test suite container for test classes which required that at least one test class exists completes the xUnit hierarchy: testSuite = Sequence Configuration { assertionType = Catchable, emptyBehavior = Fails } and the illustrative example — now with information about sequencing embedded – can be written as: s = testSuite idTestSuite [ testClass idTestClass [ testMethod idTestMethodX1 [Single tX1a, Single tX1b], testMethod idTestMethodX2 [Single tX2a, Single tX2b], testMethod idTestMethodX3 [] ] ] 2.4 Decisions for Test Library Implementors This chapter has considered a number of different facets of the Perl, Python and Ruby testing infrastructures. These considerations are particularly relevant to those looking to implement a testing library targeting one of those languages, and searching for lessons from an existent one.
  27. 27. 22 Integration with Existing Testing Infrastructure and Ecosystem What level of integration with the existing testing infrastructure and ecosystem should be aimed for, and what language-specific considerations are there? A developer targeting Perl will need to be mindful of Perl’s reporting-based test assertions, and will wish to make fully exploit the existing Test::Builder infrastructure. A developer targeting Python would want to make sure their library understood and reified exceptions inheriting from AssertionError, and a Ruby developer might well wish to simply choose a single existing Ruby library to build upon. Meta Test-Assertions and Other Platform Norms Consideration should also be given to the target platform’s existing ideas of meta test- assertions — are there organizational structures and norms (such as Test::Builder’s sub- tests) that should be respected and utilized so as to be optimally familiar to experienced developers for that platform? Are tests found and run according to certain conventions? For example, Ruby developers may well expect their test suites to be runnable via a rake task, where Perl developers would expect tests to be findable and runnable via an entry point in the ./t directory of their project. Creation and Organization of Test Data and Fixtures Are there conventions for setting up and providing test data and fixtures to blocks con- taining test assertions? Most xUnit descendants will have a Test Fixture class with setUp and tearDown methods, and access to those via a Test Caller class (Hamill 2004). Is there a testing context where this (and other test run contextual) data is held, and how do assertions access it? Reporting and Collation How should success and failure of test assertions be captured and reported upon? For example, a Perl developer would expect any form of test run against their code-base (regardless of what language the test itself was implemented in) to output TAP, and would either expect a hosting continuous delivery tool to understand TAP or a format to which TAP could easily be converted.
  28. 28. 3 The Cucumber Model This chapter will examine both the implementation of RCucumber, and the structure of Gherkin, the language in which Cucumber tests are defined, and link both back to the model constructed in Chapter 2. That examination and the lessons from it will be used to explain choices made in the implementation of Test::BDD::Cucumber in Chapter 4. In the last chapter, four major questions for designers of testing libraries were raised: • What level of integration with the existing testing infrastructure and ecosystem of the host language is provided? • What meta test-assertions are provided or suggested, and how are the component (single or meta) test assertions inside those selected and composed? • What features are provided or suggested for creation and organization of test data and fixtures? • How are results reported upon for human and software audiences? This chapter will examine those questions for the Cucumber implementation in Ruby (RCucumber), the original and reference implementation. 3.1 A very High-Level Overview of Cucumber Cucumber’s promise is that it has been designed specifically to ensure the acceptance tests can easily be read—and written—by anyone on the team. (Wynne and Hellesøy 2012). Software features are described in a way that must satisfy the customer that the correct behavior is being described, but also written sufficiently tersely and specifically that a developer implementing test code believes they can literally organize their test assertions around that description. Consider the following Cucumber feature description of a simple hand-held calculator: 1 Feature: Basic Functionality 2 3 Background: 4 Given a fresh calculator 5 Then the display should be blank 6 7 Scenario: First Key Press on the Display 8 When I press 1 9 Then the display should show 1 10 23
  29. 29. 24 11 Scenario: Combining Key Presses on the Display 12 When I press 1 13 And I press 2 14 Then the display should show 12 15 16 @addition 17 Scenario: Basic Addition 18 When I key 19 | 1 | 20 | + | 21 | 2 | 22 | = | 23 Then the display should show 3 24 25 @addition 26 Scenario Outline: Addition examples 27 When I press <left> 28 And I press + 29 Then the display should show <left> 30 When I press <right> 31 But I press = 32 Then the display should show <result> 33 Examples: 34 | left | right | result | 35 | 2 | 3 | 5 | 36 | 3 | 5 | 8 | 37 | 5 | 8 | 13 | The description of the feature should serve as documentation of the intended behavior of the software, and also as a meta test-assertion which can be run, and its results reported upon and analyzed to verify the software behaves as expected. The mechanism by which this is achieved – through the lens of the meta assertion model — forms the basis of this chapter. 3.2 Organization of Assertions 3.2.1 Steps A clue to where the test assertions live in the presented feature description is given by the presence of the word “should”. The lines beginning with Given, When, and Then1 are called Steps and are mapped by RCucumber to tester-defined code blocks called step definitions. A mapping for Then the display should be blank might be: Then(’the display should be blank’) do expect( @calc.display ).to eql(’’) end 1 but also And and But, which are stand-ins for the conjunction starting the previous line
  30. 30. 25 which uses RSpec’s test assertion expect to build a test assertion. A step definition is a block of code that can include 0 or more test assertions created with a host testing library, with the addition of a parameterizable (see Section 3.3.1) lookup key. A step is a line of text — or descriptive identifier — that can be mapped to that step definition. A step then, with its accompanying step definition forms the most basic meta test-assertion in a Cucumber test-suite. Then conjunction used (Given/When/Then) is only used as a key to the lookup process itself — no semantic value is conveyed by it. Given that the block referenced is written in Ruby, an imperative language with side- effects, test assertions in the block are generally run in the order they’re written in, and may mutate global state — this mutative property is examine in more detail in Section 2.3.1. Ruby has exceptional test assertions, and RCucumber makes no effort to check the host- ing test-library for evidence of positively run test assertions. This implies that (and is implemented such that) an empty step definition, or one containing code but no assertions is considered a pass. In fact, if the above mapping is rewritten to omit the test assertion: Then(’the display should be blank’) do end then Cucumber’s output remains identical, and the test results summarized as: 6 scenarios (6 undefined) 37 steps (25 undefined, 12 passed) 0m0.029s This is subtly different from the case where a step definition is simply undefined. On encountering a step that can’t be mapped to a step definition, Cucumber registers a named type of failure, TODO, distinguished from regular failures only via presentation (and thus forming part of the Diagnostics in the model). One final point: as Ruby’s test assertions are exceptional rather than reporting based, any subsequent test assertions inside a step definition after a failing one aren’t run, and are ignored. This is also true of test assertions which raise runtime exceptions. The step definition as a meta-assertion then exhibits failure and exception behavior can be modeled as: step = Sequence Configuration { assertionType = Exceptional, emptyBehavior = Succeeds } Mapping a step to a step definition is slightly more involved, and examined in Section 3.3. 3.2.2 Scenarios The next level up in the meta test-assertion hierarchy are Scenarios, such as those defined on lines 7, 11, and 17. Scenarios have names, and are a meta test-assertion over steps. If a step inside a scenario fails, sibling steps will not be executed — they are marked by Cucumber as skipped, another type of failure distinguished only be presentation. A
  31. 31. 26 scenario with no steps is considered to be passing. Therefore the definition is the same as for step: scenario = Sequence Configuration { assertionType = Exceptional, emptyBehavior = Succeeds } Scenarios can be templated and parameterized and Background on line 3 of the example above closely resembles a Scenario — these issues are all addressed in Section 3.3. 3.2.3 Features Scenarios are combined into a file that defines a single feature. When a Feature contains either a failing or an exception raising Scenario it continues to run its other Scenarios. An empty feature is counted and reported-upon as a pass, so: feature = Sequence Configuration { assertionType = Catchable, emptyBehavior = Succeeds } Features exist individually as single .feature files on the file-system, generally in a hi- erarchy under a single directory. Running cucumber on an entirely empty directory will complain that certain helper directories its expecting are missing, but as long as those are there, then a directory simply without any .feature files is considered to pass: directory = Sequence Configuration { assertionType = Catchable, emptyBehavior = Succeeds } 3.2.4 Tags The example included also has a tag on lines 16 and 25: @addition. Tags are annotations on scenarios and features that allow them to be filtered. For example, a developer may have a directory full of features to implement, but only be interested in running and reading reports on the pass or fail state of those she knows to be under active development. Those features and scenarios can be annotated as — for example — work in progress (@wip)2 , and RCucumber asked to run just those. In the model detailed so far, these tags form part of the test assertion and meta test- assertion identifiers. These tags are used to transform one meta test-assertion into one with fewer enclosed meta test-assertions3 . To perform this transformation based on a selection of tags desirable or undesirable requires a function to take a description of that selection, a meta test-assertion, and can return a value describing if it should be kept: type TagSpec = Identifier -> Bool which can be fed into a filtering function: 2 @wip itself has no special meaning to RCucumber, but has widespread conventional use in the Cu- cumber community as the primary tag for toggling whether a test should be run 3 Data.Witherable would seem to be the closest Haskell description of this operation generally: https://hackage.haskell.org/package/witherable
  32. 32. 27 mtaIdentifier :: MetaTestAssertion e e1 -> Identifier mtaIdentifier (Single (TestAssertion i _)) = i mtaIdentifier (Sequence _ i _) = i select :: TagSpec -> MetaTestAssertion e e1 -> MetaTestAssertion e e1 select _ s@(Single t) = s select _ s@(Sequence _ _ []) = s select t (Sequence c i xs) = Sequence c i $ filter (t . mtaIdentifier) xs 3.3 Test Data and Fixtures RCucumber supports and explicitly encourages (Wynne and Hellesøy 2012) re-use of step definitions: the steps on lines 8, 12, and 13 all map to the same step definition. They are not, however, necessarily the same step as their identifier may be altered via annotations such as tags, and other fixtures described in this section. These data-providing annotations are explicitly part of the identifier for a test, and thus the identifiers described so far also constitute a type of fixture, of which the descriptive name for a test forms part. An assertion with provided fixture data can thus be: type AssertionWithFixture e e1 = Identifier -> e -> ResultEnv e e1 which can be initialized with the descriptive name and the rest of the fixture-constituting identity: initialize :: AssertionWithFixture e e1 -> Description -> Identifier -> TestAssertion e e1 initialize a n f = TestAssertion i (a f) where i = addDescription f n 3.3.1 Parameterizable Step Definitions The most fundamental mechanism for providing data to test assertions run by Cucumber is in steps that target parameterizable step definitions via regular expressions. The sample feature (in Section 3.1) has steps: Then the display should show 1 and Then the display should show 12 which both target the same step definition: Then(/^the display should show (d+)$/) do |number| expect( @calc.display ).to eql(number) end Although the matching and dispatch of the step definition occurs at compile time, the definition of the step has occurred in the static feature description, so the data is compile- time, making it part of the fixture-encapsulating identifier.
  33. 33. 28 In the model, step definitions that receive this data are then of type AssertionWithFixture, and are turned into TestAssertions by the code that performs the lookup and matching: match :: Regex -> AssertionWithFixture e e1 -> TestAssertion e e1 lookup :: (Monoid e1) => Regex -> [AssertionWithFixture e e1] -> TestAssertion e e1 lookup r as = first $ fmap (match r) as where first [] = notFoundAssertion first ms = head ms 3.3.2 Step Data Steps can also have structured data associated with them, as per line 18. This data can either be an array, a hash, or a block of multi-line text — the example on line 18 shows the array form. Associated step data is passed as the last argument to a step definition being run: When (”I key”) do |data_table| # ... do something with the data_table object end This fixture data is fixed at compile time (although the copy passed to the step definition is mutable inside the step definition only), and it’s the responsibility of the feature parser to ensure it is placed into the identifier. 3.3.3 Outlines Line 26 contains a Scenario Outline, which bears a strong resemblance to a normal scenario, only with arrow-bracket placeholders and a list of Examples at the end. The pipe-delimited table in the Examples section is parsed, and the scenario is then repeated for each data row in the table — the scenario shown becomes three scenarios, run sequentially, and with the placeholders replaced with the data in the appropriate column. The data provided becomes both part of the step identifier, and it’s name — it produces scenarios equivalent to having simply repeated the Scenario Outline three times with the data from each row. 3.3.4 Background Sections as Transformers The final mechanism to cover in terms of creating and organizing test data are Background scenarios, as per line 3. A single Background section is allowed per Feature, and it describes steps that must be run at the beginning of every scenario, similar to xUnit’s concept of a setup method.
  34. 34. 29 3.4 Reporting RCucumber’s output uses color very effectively to illustrate test progression and result statuses. However, it also has other formatters for a number of other formats, such as JSON. Figure 3.1: Cucumber Colorized Output An output format in RCucumber is a string, but in the model can be any type. To model the generation of the output, a Report wraps the current state of output and the ResultEnv: data Report t e e1 = Report t (ResultEnv e e1) Simple functor-like helpers can be implemented to make dealing with the enclosed ResultEnv easier: reportResultMap :: (ResultEnv e e1 -> ResultEnv e2 e3) -> Report t e e1 -> Report t e2 e3 reportResultMap g (Report t f) = Report t (g f) runReportAssertion :: (a -> ResultEnv e e1) -> Report t e a -> Report t e e1 runReportAssertion a = reportResultMap (>>= a) RCucumber’s reporting capabilities can be extended using a built-in formatting extension mechanism, built around an event stream4 . Developers can provide an instantiated object to Cucumber to use as the formatter, which should implement methods that receive information about the meta test-assertions currently being run. 4 https://github.com/cucumber/cucumber/wiki/Custom-Formatters
  35. 35. 30 Consider a simplified5 selection of those run for the Scenario meta test-assertion: before_scenario tag_name scenario_name before_steps ... after_steps after_scenario The events split into two types — those run before and after evaluation of the feature. The further separation into individual events beyond that is simply a convenience mechanism: specific parts of a formatter can be extended from a default implementation — changing how tags are rendered, for example — without the subclass also needing to re-implement the other before events. But a more general model that simply had before and after events would be able to implement the events given if desired. An even more general model that offloaded meta test-assertion discrimination to the formatter itself would also work. The model generalizes this formatting extension concept even further to an Extension. Extensions receive a Start or End Event, a MetaTestAssertion, and an incoming Report. data Event = Start | End type Extension t e = Event -> MetaTestAssertion e e -> Report t e e -> Report t e e Extensions can mutate the report on ingress or egress. Sequences of extensions are composed, and applied to the Report: applyExtensions :: Event -> [Extension t e] -> MetaTestAssertion e e -> Report t e e -> Report t e e applyExtensions event extensions metaTA = combined where order Start = reverse extensions order End = extensions combined = foldr (.) id $ map (x -> x event metaTA) (order event) and the sequence of extensions passed to whichever function is coordinating the evaluation of meta test-assertions. A very simple extension that simply shows a meta test-assertion is being started or ending might be: simple :: Extension String e simple event mta (Report t e) = Report (t ++ ”[” ++ verb event ++ ”: ” ++ name mta ++ ”]” ) e where verb Start = ”Starting” verb End = ”Ending” Producing more complicated output and an implementation of a test runner that makes use of extensions is discussed in Appendix A. 5 before_scenario is in fact before_feature_element, and some extra tag-related events have been removed
  36. 36. 31 3.5 Implementation Details 3.5.1 Integrating with Test Assertion Providers By default, step definitions in RCucumber are written to use RSpec. RSpec provides a rich set of exception-raising test assertions. RCucumber makes no attempt to distinguish between exceptions raised by test assertions, and those raised more generally. The step definition evaluation code is approximately:6 def execute(*args) # Run the step definition code with any arguments @block.call(*args) # Instantiate a new Core::Test::Result::Passed object passed # Catch any exceptions that occurred in this scope rescue Exception => exception # Instantiate a new Core::Test::Result::Failed object failed(exception) end All Ruby testing libraries (covered) create exceptional test assertions without any com- mon basis, which would make it very hard without a per-library adapter to inspect and meaningfully reason about those exceptions. At the same time, the approach of conflating exceptions raised by test assertions into the same category as all exceptions allows for a great deal of flexibility — there is no restriction on which testing libraries you can use with RCucumber, as long as the test assertions are exceptional. 3.5.2 World The recommended approach for integrating with non-RSpec test assertion providers is to consume their methods into an instance of World.78 World is a blank class, an instance of which is instantiated before every scenario. The step definition is then executed as if it were a method of the World class, using Ruby’s built-in method on the base class instance_exec. Tests that create test data at runtime are able to assign these to instance variables of the scenario’s encapsulating World class, and thus share data between different steps in a scenario. 6 Comments have been added. See https://goo.gl/jHjadr for the actual code on GitHub at time of writing 7 https://github.com/cucumber/cucumber/wiki/Using-MiniTest 8 https://github.com/cucumber/cucumber/wiki/Using-Test::Unit
  37. 37. 32 3.5.3 Running the Test Suite RCucumber ships with an executable, cucumber, which by default will search for a features directory with a step_definitions sub-folder, and run the .feature files con- tained inside. The default output is highly colorized (as per the figure in Section 3.1), and provides a very user-friendly output format, which is the recommended way of running the test suite during development9 . For developers using Rake — “a make-like build utility for Ruby”10 — to test and build their projects, an integration is provided. This allows for easy running of the RCucumber- based tests along side tests written with other tools. 9 https://github.com/cucumber/cucumber/wiki/Using-Rake 10 https://github.com/ruby/rake
  38. 38. 4 Implementing Perl’s Test::BDD::Cucumber The previous chapter examined RCucumber through the lens of the test-assertion model, and adding to the model as new concepts were encountered. RCucumber’s method for integrating into Ruby’s testing libraries and more generally into Ruby projects was shown, the meta test-assertions of steps, scenarios, and features were detailed, creation and organization of test data and fixtures were explained in terms of the model, and collation and reporting of results was described. This chapter looks at the implementation details of the Perl implementation of Cucumber, Test::BDD::Cucumber, and describes the reasoning in developing it, as well as relating it back to the behavior of RCucumber and the test-assertion model. 4.1 An Exculpatory Note on the Code Ahead Test::BDD::Cucumber is a relatively large code-base, with many parts written in a style that Perl developers would describe as “high magic” — code that makes use of some of Perl’s more powerful (and unusual) features. It exists primarily as a tool for professional developers to use in their day-to-day work, and as such many aspects of the code-base have been written with practicality as their primary concern. It also represents a work that has changed over time, and vestigial stubs of previously implemented behaviors can still be observed. The code included in this chapter consists of snippets of simplified, and re-ordered code, and an effort has been made to it so as to try and make it as accessible as possible to non-Perl programmers, while still illustrating the most important concepts. The entire code-base in its original form is available on GitHub1 and on the CPAN2 . 4.2 Step Definitions with Test::Builder 4.2.1 Why Integrate with Test::Builder? As with virtually every Perl testing library, Test::BDD::Cucumber builds on top of Test::Builder. Test::Builder provides a singleton to allow testing libraries a unified in- terface to reporting the results of individual test assertions, creating meta test-assertions, and a standard test harness interface which outputs TAP by default. A developer experienced writing tests with Perl will probably have a good feel for how Test::Builder-based tests will work and some favorite associated testing libraries that 1 https://github.com/pjlsergeant/test-bdd-cucumber-perl 2 https://metacpan.org/pod/Test::BDD::Cucumber 33
  39. 39. 34 they like using — for example, Test::WWW::Mechanize for tests which interact with web servers. To leverage both developers’ existing experience and knowledge, and also the very wide range of testing libraries available on CPAN, integrating as tightly as practicable with Test::Builder was desirable. However, Cucumber also has its own conventions to be followed: an existing set of meta test-assertions, syntax-highlighted output for developers to quickly see the status of their test implementations, and a JSON output understood by several tools. Choices needed to be made to concerning how much to compromise between providing a familiar and easily integrated environment for Perl developers and how closely to hew to the original RCucumber. 4.2.2 A Meta Test-Assertion for Step Definitions Subtests Test::Builder has a generic subtest meta test-assertion which supports arbitrarily deep trees of meta test-assertions. Subtests are introduced by passing a code-reference to Test::Builder->subtest: $Test->subtest( ”Parent”, sub { ok( 1, ”Passing assertion” ) } ); which by itself will produce: # Subtest: Parent ok 1 - Passing assertion 1..1 ok 1 - Parent 1..1 This may appear initially to be a good candidate on which to have built the Cucumber step-definition meta test-assertion. However, subtests have a fixed configuration, which matches that of a test script written in Perl containing a list of test assertions: subtest = Sequence Configuration { assertionType = Reportable, emptyBehavior = Fails } This differs from Cucumber’s step definitions by treating empty blocks as failures, and by the enclosed test assertions being reportable and not exceptional. Empty blocks as failures is simply not a compatible concept with the Cucumber model. Code in step definitions is used to set up fixtures and test data with no requirement for test assertions to be run. Some way of changing this behavior was required. Test::Builder also outputs the result of reportable test assertions to STDOUT — as well as any diagnostics — as it encounters them, by default. Should a colorized output similar to RCucumber’s be required, this needs instead to be captured and passed to whatever code is making formatting and reporting decisions. Subtests provide no inbuilt mechanism for doing this. Finally, subtests don’t provide any mechanism for either catching exceptions, or for converting captured exceptions into failures. Subtests then were considered not to
  40. 40. 35 be the solution without significant changes to their behavior in the development of Test::BDD::Cucumber. An Entirely New Testing Context The Test::Builder documentation says: you only run one test per program [which] shares such global information as the test counter and where test output is going This itself is suggestive that the Test::Builder singleton itself is a meta test-assertion. It has predefined behavior for no tests having been run (failure), and operates over test assertions implemented using its test methods. Running each step definition against its own Test::Builder ‘singleton’ seemed to have some benefits. “Where test output is going” is a function of the Test::Builder object, and thus configurable. Test::Builder can be put in a passing state even when empty by preceding any other test assertions with a call to its pass() method. The Test::Builder singleton stores information about how many tests passed, failed, or were skipped, and so the result of a single step definition can be queried can reported on programmatically. This didn’t solve the issue with test assertions being reportable and not exceptional. But that didn’t appear to be a key feature of Cucumber, and so the trade-off was made to keep that in the Perl style of being reportable — this also meant that no extra magic was required to change the behavior of Test::Builder-based libraries, which helped keep the implementation simple. All that was required was finding a way to spoof the Test::Builder singleton to a clean instance before each step definition was run. Spoofing the Test::Builder singleton Creating a new clean Test::Builder object to use as the singleton was simple — Test::Builder provides a method for just this: 1 my $proxy_object = Test::Builder->create(); Test::Builder also provides reports in TAPs of test assertion executions. By default these are written to STDOUT, but can be intercepted: 2 my $output; 3 $proxy_object->output( $output ); # Equivalent to ‘Report‘ 4 $proxy_object->failure_output( $output ); # Equivalent to ‘Diagnostics‘ This Test::Builder instance also needs to pass by default, so a test assertion that’s guar- anteed to pass is run first: 5 $proxy_object->ok( 1, 6 ”Starting to execute step: ” . $step_text ); Well-implemented libraries that integrate with Test::Builder will retrieve the single- ton when needed by calling Test::Builder->new — a subroutine called new in the Test::Builder name-space. Perl provides a keyword local that essentially performs variable substitution inside a lexical scope, and all child lexical scopes.
  41. 41. 36 By pointing the Test::Builder::new3 at a subroutine that instead returns the proxy object, the aim of diverting calls to the Test::Builder singleton to the one used to collect results from the step definition was achieved: 7 local *Test::Builder::new = sub { return $proxy_object }; The step definition could now be run: 8 # ‘eval‘ will catch any exceptions, and place their value in $@ 9 eval { $step_definition->($context) }; and any exceptions caught turned into failures against the Test::Builder object with a diagnostic description: 10 if ($@) { 11 # Fail a test assertion called ”Test compiled” 12 $proxy_object->ok( 0, ”Test compiled” ); 13 14 # Add the exception details as diagnostics 15 $proxy_object->diag($@); 16 } The result of the step definition can then be examined and encapsulated in order to pass them on to the scenario meta test-assertion: 17 my $test_builder_status = $proxy_object->history; 18 my $cucumber_step_status = 19 $test_builder_status->test_was_successful ? 20 ( $history->todo_count ? ’pending’ : ’passing’ ) : 21 ’failing’; 22 23 $result = Test::BDD::Cucumber::Model::Result->new({ 24 result => $cucumber_step_status, 25 output => $output 26 }); 4.3 Handling Results 4.3.1 What’s Needed As foreshadowed by the appearance of a Result Model class in the preceding code section, passing the results of steps to enclosing meta test-assertions required some considera- tion. For every meta test-assertion considered in this dissertation (and in the model: see Chapter 3), the failure of a constituent test assertion or meta test-assertion causes the meta test-assertion to be marked as a failure. However, the status of preceding meta test-assertions can also determine whether or not a meta test-assertion gets evaluated in the first place. Test::BDD::Cucumber needed a way of emulating the foldable results from meta test- assertions described in the model, and allowing results from one meta test-assertion to affect whether or not other meta test-assertions are run at all. 3 The -> is a Perl-ism that passes the name-space to the left of it into the function (in that name-space) on the right of it, and is one of the quirks of Perl’s “bolted-on” object orientation implementation
  42. 42. 37 4.3.2 Foldable Results In the model, a Result is a set of Pass and Fail Diagnostics. Assuming that Diagnostics describes a set with both an identity element and the ability to append items from that set, then Result result can also be a set with an identity element Pass (defResult) and an appending function (addResult). More simply, assuming Diagnostics is a monoid, then so is Result (these definitions are covered in Section 2.3.2). Test::BDD::Cucumber::Model::Result attempts to model Result in Perl. It begins with a list of possible states that a result can hold: enum ’StepStatus’, [qw( passing failing pending undefined )]; This contains two new statuses not seen yet: pending (equivalent to skipped) and undefined (equivalent to TODO). These are types of failure with extra meta-information added to them, which a sufficiently advanced Diagnostics data type could handle. Diagnostics themselves are collated in an output string attribute. The class contains a constructor to allow it to be instantiated from a sequence of existing result objects: 1 sub from_children { 2 my $class = shift; # Perl OO boilerplate 3 my ( @children ) = @_; # The results from which to build this one A list of statuses is built up in %results, which starts with a default passing state, and a string in which to concatenate the diagnostics is declared: 4 my %results = ( passing => 1 ); 5 my $output; The result of each child result is added to %results, and any diagnostics it contains are appended: 6 for my $child (@children) { 7 $results{ $child->result }++; 8 $output .= $child->output . ”n”; 9 } 10 $output .= ”n”; Finally, the result types are evaluated in terms of precedence — the presence of a result of that type causes the new class to be instantiated with that result: 11 for my $status (qw( failing undefined pending passing )) { 12 if ( $results{$status} ) { 13 return $class->new( 14 { 15 result => $status, 16 output => $output 17 } 18 ); 19 } 20 }
  43. 43. 38 4.3.3 Control Flow The results of test assertions and meta test-assertions don’t simply affect the overall result, they also affect whether or not other test assertions and meta test-assertions are run. Given a result with an environment Result e e1and a new assertion which also produces a result (e -> Result e e1), whether or not to continue depends on if the environment encapsulated in that result is in a normal state (Right e1) or an exceptional state (Left e). The monad that implements this has already been covered in Section 2.3.2. Feature directories and features both share the same configuration for sequentially eval- uating meta test-assertions: c = Configuration { assertionType = Catchable, emptyBehavior = Succeeds } and this configuration is used to ensure that the result of evaluating a feature or scenario can’t leave the result in an exceptional state (Left e) via transmute (see: Section 2.3.2). As a result, Test::BDD::Cucumber doesn’t need to implement any special control flow logic for these, simply the foldable result model already covered. However step definitions failures are treated as exceptional, so in running a scenario, step definitions occurring after a failing one must not be run. A real Perl exception would be tricky to handle as reporting events should still be generated, simply the assertion should not be run. While running a scenario, Test::BDD::Cucumber keeps track of if the scenario has failed using a boolean attribute on the scenario runtime called short_circuit. This is set having examined the result: # If it didn’t pass, short-circuit the rest unless ( $result->result eq ’passing’ ) { $outline_stash->{’short_circuit’} = 1; } and is used to set the result of a step definition without running it in the step dispatch code: # Short-circuit rather than running if needs be return $self->skip_step( $context, ’pending’, ”Short-circuited from previous tests”, 0 ) if $short_circuit; 4.4 Data Provision, Fixtures, and the Environment Cucumber uses a World object to allow step definitions in a scenario to share data and certain fixtures. “Compile-time” fixture data previously described as residing in the identifier is passed in as arguments to that step definition. No method for introspecting the step definition itself (or any of the enclosing meta test-assertion hierarchy) is provided.
  44. 44. 39 4.4.1 Test::BDD::Cucumber::StepContext Test::BDD::Cucumber takes a slightly different approach: “data made available to step definitions”(Wynne and Hellesøy 2012) is provided via a Test::BDD::Cucumber::StepContext object, passed as the first argument to a step definition. The step context firstly contains references to allow for introspection. A link to the ob- ject that defines the step definition is available via step(), to the object that defined the enclosing scenario as scenario(), and to the object that defined the feature as feature() — this also allows access to the tags defined for the scenario and feature. The conjunc- tion used to declare the step (Given/When/Then) is available via the misleadingly named verb(), and the rest of the step line as text(). Access to identity information is available via data() for data tables, and matches for data that was extracted from the step text as part of the step-definition lookup process. 4.4.2 The Stash The context also provides access to a hash via stash(). This hash is meant to serve a similar purpose to Cucumber’s World object — a place to store data created during the test run. The stash itself is a hash containing two hashes — feature and scenario: $stash = { feature => {}, scenario => {}, }; The scenario is reset at the start of every scenario, making it appropriate for storing data created during a scenario run, where feature is reset at the beginning of every new feature — should a feature require data to be persisted between scenarios (such as a computationally expensive fixture) it can be stored here. 4.4.3 Easy Access To prevent a developer from needing to write the relatively verbose: sub { # Read the context from the first argument my $context = $_[0]; my $stash = $context->stash; my $value = $stash->{’scenario’}->{’foo’}; to access stash variables, step definitions are able to access two globally defined methods in the Test::BDD::Cucumber::StepFile class: S and C. Before the step definition is executed, the definitions of these methods are changed to provide access to the stash and the step context respectively: local *Test::BDD::Cucumber::StepFile::S = sub { return $context->stash->{’scenario’};
  45. 45. 40 }; local *Test::BDD::Cucumber::StepFile::C = sub { return $context; }; allowing for a far more concise access method: sub { my $value = S->{’scenario’}->{’foo’}; Finally, Perl developers are used to accessing regular expression matches using the Perl special variables $1 to $n. Rather than access matches via: my $first_match = C->matches->[0] the regular expression is re-matched again against the step text immediately before the step definition is executed so that a more natural style of: my $first_match = $1; can be used. 4.5 Output Harnesses Test::BDD::Cucumber calls its formatting mechanisms harnesses, on the basis that the output will be consumed by a testing harness. An abstract implementation is provided which names methods for each meta test-assertion (feature, feature_done, etc). While the model provides a very generic formatter, Test::BDD::Cucumber takes a similar approach to RCucumber itself by naming each of the formatting events separately. Again, this allows a developer extending a formatter to not need to re-implement formatting for every event simply to re-implement formatting for one. Additionally, where the model has a single data type to model the meta test-identifier and associated fixture data, Test::BDD::Cucumber provides slightly different objects to model each. Test::BDD::Cucumber::Model::Feature for example contains information about the document that contains the feature, but no place to hold step data, and Test::BDD::Cucumber::Model::Step is unable to hold tag-related data as steps don’t support tags in the Cucumber model. 4.5.1 Test::BDD::Cucumber::Harness::TestBuilder Both the most interesting and most important harness is the one for exercising a Test::Builder singleton. Beyond executing the step definitions, the Test::Builder harness is the only part of the code-base that knows about Test::Builder — the code between the step definitions and this harness is completely agnostic about Test::Builder. There’s no linkage at all between the step definitions’ usage of the class to evaluate the status of a step definition and the use of it to communicate with the outside world: As explored in Section 4.2.2, a new instance of Test::Builder is instantiated for every step definition, its state after execution of the step definition examined, and then it’s simply discarded. The harness communicates with the “real” Test::Builder singleton — if there is one — to communicate its results.
  46. 46. 41 The description of TAP as outputting solely to STDOUT (see: Section 2.1.1) was a sim- plification. It supports diagnostics both via note(), which outputs #-prefixed diagnostics to STDOUT, and diag(), which outputs #-prefixed diagnostics to STDERR. note() is intended for diagnostics information useful for a developer performing debugging, where diag() is intended for diagnostic information relating to unexpected or undesired test execution — for example further information on test failure. prove, Test::Builder’s test runner, outputs only the latter by default, but will output both in verbose mode. Scenario and feature names are recorded using note() — #-prefixed diagnostics to STD- OUT — and so their names are only printed in verbose mode: # Scenario: MD5 longer data A passing step is marked as such to Test::Builder using its pass() method: pass($step_name);. This causes Test::Builder to record the test assertion has passed. Descriptions of pass- ing tests are also suppressed by prove unless in verbose mode — in verbose mode, Test::BDD::Cucumber presents them with their step text: ok 142 - Given a usable ”Digest” class ok 143 - Given a Digest MD5 object ok 144 - When I’ve added ”foo bar baz” to the object A failing step is recorded as such using the fail() method, fail($step_name);. Failing test assertions are always output by prove, along with the location of the assertion: not ok 145 - Then the hex output is ”75ad9f578e43b863590fae52d5d19ce6” # Failed test ’ Then the hex output is ”75ad9f578e43b863590fae52d5d19ce6”’ # at TestBuilder.pm line 87. # in step at examples/tagged-digest/features/basic.feature line 39. However, as the step was evaluated using the per-step Test::Builder object, their out- put is also available, and this is then marked as a diagnostic to Test::Builder: diag( $result->output );. This causes its transcript to always be output by prove: This transcript starts with the always-passing test assertion that allows for correct be- havior on step definitions without assertions: # ok 1 - Starting to execute step: the hex output is ”75ad9f578e43b863590fae52d5d19ce6” before continuing to show the failing test assertion, along with its own failure diagnostics: # not ok 2 # # # Failed test at digest/features/step_definitions/basic_steps.pl line 34. # # got: ’75ad9f578e43b863590fae52d5d19ce6ZZZ’ # # expected: ’75ad9f578e43b863590fae52d5d19ce6’ # 1..2 The TAP output of the step definition’s Test::Builder object is thus folded into the output of the “real” Test::Builder object. This folding allows the whole test suite written with Cucumber to be executed and evaluated by prove or other TAP-compatible test runners, and recapturing the benefits used by communicating test status with TAP (see: Section 2.1.1), such as easy conversion to jUnit XML format.

×