Assessing Unit Test Quality
Upcoming SlideShare
Loading in...5

Like this? Share it with your network


Assessing Unit Test Quality






Total Views
Views on SlideShare
Embed Views



1 Embed 3 3



Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
Post Comment
Edit your comment

Assessing Unit Test Quality Presentation Transcript

  • 1. Welcome!
  • 2. Assessing Unit Test Quality Matt Harrah Southeast Virginia Java Users Group 20 Nov 2007
  • 3. About this presentation
    • This presentation is more like a case study than a lecture
    • Several different technologies will be discussed
    • In particular, we will discuss the suitability of various technologies and approaches to solve a computing problem
    • We will also discuss the computing problem itself
    • Suggestions and questions are welcomed throughout
  • 4. Before we begin: Some definitions
    • Unit testing – testing a single class to ensure that it performs according to its API specs (i.e., its Javadoc)
    • Integration testing – testing that the units interact appropriately
    • Functional testing – testing that the integrated units meet the system requirements
    • Regression testing – testing that changes to code have not (re-)introduced unexpected changes in performance, inputs, or outputs
  • 5. Part I: The Problem
  • 6. Restating the obvious
    • The Brass Ring: We generally want our code to be as free of bugs as is economically feasible
    • Testing is the only way to know how bug-free your code is
    • All four kinds of testing mentioned a minute ago can be automated with repeatable suites of tests
  • 7. Restating the obvious
    • The better your test suite, the more confidence you can have in your code’s correctness
    • Junit is the most commonly used way to automate unit tests for Java code
      • Suites of repeatable tests are commonly built up over time
      • QA involves running these suites of tests on a regular basis
  • 8. Brief Digression
    • Junit can also be used to do integration, functional, and regression testing
      • Integration tests theoretically should create two objects, and test their interactions
      • Functional tests can simulate the user interacting with the system and verifying its outcomes
      • Regression testing is typically making sure that changes do not introduce test failures in the growing suite of automated tests
  • 9. So…if the better your test suite, the better the code, the real question is: How good are your tests?
  • 10. Mock Objects
    • Are you isolating your objects under test?
      • If a test uses two objects and the objects interact, a test failure can be attributed to either of the two objects, or because they were not meant to interact
      • Mock objects are a common solution
        • One and only one real code object is tested – the other objects are “mock objects” which simulate the real objects for test purposes
        • Allows test writer to simulate conditions that might be otherwise difficult to create
      • This problem is well-known and amply addressed by several products (e.g., EasyMock)
  • 11. Code Coverage
    • Do you have enough tests? What’s tested and what isn’t?
      • Well-known problem with numerous tools to help, such as Emma, Jcoverage, Cobertura, and Clover. These tools monitor which pieces of code under test get executed during the test suite.
      • All the code that executed during the test is considered covered, and the other code is considered uncovered.
      • This provides a numeric measurement of test coverage (e.g., “Package x has 49% class coverage”)
  • 12. JUnit Fallacy # 1
    • “ The code is just fine – all our tests pass”
    • Test success does not mean the code is fine
    • Consider the following test:
    • public void testMethod() {
    • ;// Do absolutely nothing
    • }
    • This test will pass every time.
  • 13. What the world really needs
    • Some way of measuring how rigorous each test is
      • A test that makes more assertions about the behaviour of the class under test is presumably more rigorous than one that makes fewer assertions
      • If only we had some sort of measure of how many assertions are made per something-or-other
  • 14. “Assertion Density”
    • Assertion Density for a test is defined by the equation shown, where
      • A is the assertion density
      • a is the number of assertions made during the execution of the test
      • m is the number of method calls made during the execution of the test
    • Yep, I just made this up
  • 15. Junit Fallacy #2
    • “ Our code is thoroughly tested – Cobertura says we have 95% code coverage”
    • Covered is not the same as tested
    • Many modules call other modules which call other modules.
  • 16. Indirect Testing Test A Class A Class B Class C Tests Calls Calls
    • Class A, Class B, and Class C all execute as Test A runs
    • Code coverage tools will register Class A, Class B, and class C as all covered, even though there was no test specifically written for Class B or Class C
    "Covered" "Covered" "Covered"
  • 17. What the world really needs
    • Some way of measuring how directly a class is tested
      • A class that is tested directly and explicitly by a test designed for that class is better-tested than one that only gets run when some other class is tested
      • If only we had some sort of “test directness” measure…
      • Perhaps a reduced quality rating the more indirectly a class is tested?
  • 18. “ Testedness”
    • Testedness is defined by the formula shown, where
      • t is the testedness
      • d is the test distance
      • n d is the number of calls at test distance d
    • Yep, I made this one up too
  • 19. Part II: Solving the Problem (or, at least attempting to…)
  • 20. Project PEA
    • Project PEA
      • Named after The Princess and the Pea
    • Primary Goals:
      • Collect and report test directness / testedness of code
      • Collect and report assertion density of tests
    • Start with test directness
    • Add assertion density later
  • 21. Project PEA
    • Requirements:
      • No modifications to source code or tests required
      • Test results not affected by data gathering
      • XML result set
    • Ideals:
      • Fast
      • Few restrictions on how the tests can be run
  • 22. Approach #1: Static Code Analysis
    • From looking at a test’s imports, determine which classes are referenced directly by tests
    • From each class, look what calls what
    • Assemble a call network graph
    • From each node in graph, count steps to a test case
  • 23. Approach #1: Static Code Analysis
    • Doesn’t work
    • Reflective calls defeat static detection of what is being called
    • Polymorphism defeats static detection of what is being called
  • 24. Approach #1: Static Code Analysis
    • Consider: class TestMyMap extends AbstractMapTestCase; private void testFoo() { // Map m defined in superclass m.put(“bar”,”baz”); }
    • What concrete class’ put() method is being called?
    • Java’s late binding makes this static code analysis unsuitable
  • 25. Approach #2: Byte Code Instrumentation
    • Modify the compiled .class files to call a routine on each method’s entry
    • This routine gets a dump of the stack and looks through the it until it finds a class that is a subclass of TestCase
    • Very similar to what other code coverage tools like Emma do (except those tools don’t examine the stack)
  • 26. Approach #2: Byte Code Instrumentation
    • There are several libraries out there for modifying .class files
      • BCEL from Apache
      • ASM from ObjectWeb
    • ASM was much easier to use than BCEL
    • Wrote an Ant task to go through class files and add call to a tabulating routine that examined the stack
  • 27. Approach #2: Byte Code Instrumentation
    • It did work, but it was unbearably slow – typically 1000x slower and sometimes even slower
    • This is because getting a stack dump is inherently slow
      • Stack dumps are on threads
      • Threads are implemented natively
      • To get a dump of the stack, the thread needs to be stopped and the JVM has to pause
    • To be viable, PEA cannot use thread stack dumps
  • 28. Approach #3: Aspect-Oriented Programming
    • Use AOP to accomplish similar tasks as the byte code instrumentation
      • Track method entry/exit and maintain a mirror of the stack in the app
      • Calculate and record distance from a test at every method entry
    • Avoids the overhead of getting stack dumps from the thread
  • 29. Approach #3: Aspect-Oriented Programming
    • Unsatisfactory – in fact, a complete failure
    • Method exits are all over the place in a method
      • Method exits in the byte code do not always correspond to the source structure, particularly where exceptions are concerned
      • Introducing an aspect behavior at each exit point can increase the size of the byte code by up to 30%
  • 30. Approach #3: Aspect-Oriented Programming
    • Expanding methods by 30% can (and did) cause them to bump into Java’s 64K limit on method bytecode
      • Instrumented classes would not load
    • In addition, AspectJ required you to either:
      • Recompile the source and tests using AspectJ’s compiler; or
      • Create and use your own aspecting-on-the-fly classloader
      • Either way you still hit the 64K barrier
  • 31. Approach #4: Debugger
    • The idea is to write a debugger that monitors the tests in one JVM as they run in another
    • The debugger can track method entries and exits as they happen and keep the stack straight
    • The code being tested, and the tests themselves, do not need to be aware that the debugger is watching them
  • 32. Approach #4: Debugger
    • Java includes in the SE JDK an architecture called JPDA
      • Java Platform Debugger Architecture
    • This architecture allows one JVM to debug another JVM over sockets, shared files, etc.
    • It provides an Object-Oriented API for the debugging JVM
      • Models the debugged JVM as a POJO
      • Provides call-backs for events as they occur in the debugged JVM
  • 33. Approach #4: Debugger
    • JPDA allows you to
      • Specify which events you want to be notified about
      • Specify which packages, etc. should be monitored
      • Pause and restart the other process
      • Inspect the variables, call stacks, etc. of the other process (as long as it’s paused)
    • No additional libraries required!
  • 34. Putting JPDA to work
    • First I wrote a debugger process to attach to an already running JVM
      • By using the <parallel> task in Ant, I can simultaneously launch the debugger and the Junit tests
      • The debugger will attach to and monitor the other JVM
      • Register interest in callbacks on method entry and exit, exceptions, and JVM death
  • 35. Putting JPDA to work
    • As methods enter and exit, push and pop entries onto a stack maintained in the debugger
      • This effectively mirrors the real stack in the tests
      • Ignore certain packages beyond developer control (such as the JDK itself)
  • 36. Putting JPDA to work
    • As methods are entered, calculate and record its distance in the stack from a test
    • Shut down when the other JVM dies
    • Just before shutdown, write all the recorded data to a file
  • 37. Putting JPDA to work
    • Remember – an Ant Junit task can fork multiple JVMs – one per test if you want, so we need to monitor each one
    • Multiple JVMs mean multiple files of recorded data that need to be accumulated after all the tests are complete
    • Produce XML file of accumulated results
  • 38. Results of using JPDA
    • Performance is way better than using byte code instrumentation
      • Running with monitoring on slows execution by 100x or less, depending on the code
    • Ant script is kind of complicated
      • JUnit tests and PEA must be run with forked JVMs
      • Special JVM parameters for the debugged process are required
      • JUnit and PEA must be started simultaneously using the <parallel> task (which many people don’t know about)
  • 39. Results of using JPDA
    • The byte code being monitored is completely unchanged
      • No special instrumentation or preparatory build step is required
    • XML file comes out with details about how many method calls were made at what test distance
  • 40. Results file example
  • 41. Report Sample
  • 42. Code Review Let’s roll that beautiful Java footage!
  • 43. Future Plans
    • Implement assertion density tracking
    • Tweak performance
    • Make easier to run the tests with PEA running
      • Perhaps subclass Ant’s <junit> task to run with PEA?
    • Documentation (ick)
    • Eat my own dog food and use the tool to measure my own JUnit tests
    • Sell for $2.5billion to Scott McNeely and Jonathan Schwartz and retire to Jamaica
  • 44. Lessons Learned (so far)
    • What seems impossible sometimes isn’t
    • Creativity is absolutely crucial to solving problems (as opposed to just implementing solutions)
    • JDPA is cool – and I had never heard of it
      • I never thought I’d be able to write a debugger but JDPA made it easy
    • ASM library is also cool – much nicer than BCEL
  • 45. Wanna help?
    • I’d welcome anyone who wants to participate
    • Contributing to an open-source project looks good on your resume hint hint…
  • 46. Resources
    • Java Platform Debugger Architecture
    • ASM – Byte code processing library
    • BCEL – Byte code processing library
    • AspectJ – Aspect-oriented Java extension
    • JUnit – unit testing framework
  • 47. Resources
    • Emma – code coverage tool
    • Cobertura – code coverage tool
    • JCoverage – code coverage tool
    • Clover – code coverage tool
    Thanks for listening!