Your SlideShare is downloading. ×
Assessing Unit Test Quality
Upcoming SlideShare
Loading in...5

Thanks for flagging this SlideShare!

Oops! An error has occurred.

Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

Assessing Unit Test Quality


Published on

Published in: Travel, Business
1 Like
  • Be the first to comment

No Downloads
Total Views
On Slideshare
From Embeds
Number of Embeds
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

No notes for slide


  • 1. Welcome!
  • 2. Assessing Unit Test Quality Matt Harrah Southeast Virginia Java Users Group 20 Nov 2007
  • 3. About this presentation
    • This presentation is more like a case study than a lecture
    • Several different technologies will be discussed
    • In particular, we will discuss the suitability of various technologies and approaches to solve a computing problem
    • We will also discuss the computing problem itself
    • Suggestions and questions are welcomed throughout
  • 4. Before we begin: Some definitions
    • Unit testing – testing a single class to ensure that it performs according to its API specs (i.e., its Javadoc)
    • Integration testing – testing that the units interact appropriately
    • Functional testing – testing that the integrated units meet the system requirements
    • Regression testing – testing that changes to code have not (re-)introduced unexpected changes in performance, inputs, or outputs
  • 5. Part I: The Problem
  • 6. Restating the obvious
    • The Brass Ring: We generally want our code to be as free of bugs as is economically feasible
    • Testing is the only way to know how bug-free your code is
    • All four kinds of testing mentioned a minute ago can be automated with repeatable suites of tests
  • 7. Restating the obvious
    • The better your test suite, the more confidence you can have in your code’s correctness
    • Junit is the most commonly used way to automate unit tests for Java code
      • Suites of repeatable tests are commonly built up over time
      • QA involves running these suites of tests on a regular basis
  • 8. Brief Digression
    • Junit can also be used to do integration, functional, and regression testing
      • Integration tests theoretically should create two objects, and test their interactions
      • Functional tests can simulate the user interacting with the system and verifying its outcomes
      • Regression testing is typically making sure that changes do not introduce test failures in the growing suite of automated tests
  • 9. So…if the better your test suite, the better the code, the real question is: How good are your tests?
  • 10. Mock Objects
    • Are you isolating your objects under test?
      • If a test uses two objects and the objects interact, a test failure can be attributed to either of the two objects, or because they were not meant to interact
      • Mock objects are a common solution
        • One and only one real code object is tested – the other objects are “mock objects” which simulate the real objects for test purposes
        • Allows test writer to simulate conditions that might be otherwise difficult to create
      • This problem is well-known and amply addressed by several products (e.g., EasyMock)
  • 11. Code Coverage
    • Do you have enough tests? What’s tested and what isn’t?
      • Well-known problem with numerous tools to help, such as Emma, Jcoverage, Cobertura, and Clover. These tools monitor which pieces of code under test get executed during the test suite.
      • All the code that executed during the test is considered covered, and the other code is considered uncovered.
      • This provides a numeric measurement of test coverage (e.g., “Package x has 49% class coverage”)
  • 12. JUnit Fallacy # 1
    • “ The code is just fine – all our tests pass”
    • Test success does not mean the code is fine
    • Consider the following test:
    • public void testMethod() {
    • ;// Do absolutely nothing
    • }
    • This test will pass every time.
  • 13. What the world really needs
    • Some way of measuring how rigorous each test is
      • A test that makes more assertions about the behaviour of the class under test is presumably more rigorous than one that makes fewer assertions
      • If only we had some sort of measure of how many assertions are made per something-or-other
  • 14. “Assertion Density”
    • Assertion Density for a test is defined by the equation shown, where
      • A is the assertion density
      • a is the number of assertions made during the execution of the test
      • m is the number of method calls made during the execution of the test
    • Yep, I just made this up
  • 15. Junit Fallacy #2
    • “ Our code is thoroughly tested – Cobertura says we have 95% code coverage”
    • Covered is not the same as tested
    • Many modules call other modules which call other modules.
  • 16. Indirect Testing Test A Class A Class B Class C Tests Calls Calls
    • Class A, Class B, and Class C all execute as Test A runs
    • Code coverage tools will register Class A, Class B, and class C as all covered, even though there was no test specifically written for Class B or Class C
    "Covered" "Covered" "Covered"
  • 17. What the world really needs
    • Some way of measuring how directly a class is tested
      • A class that is tested directly and explicitly by a test designed for that class is better-tested than one that only gets run when some other class is tested
      • If only we had some sort of “test directness” measure…
      • Perhaps a reduced quality rating the more indirectly a class is tested?
  • 18. “ Testedness”
    • Testedness is defined by the formula shown, where
      • t is the testedness
      • d is the test distance
      • n d is the number of calls at test distance d
    • Yep, I made this one up too
  • 19. Part II: Solving the Problem (or, at least attempting to…)
  • 20. Project PEA
    • Project PEA
      • Named after The Princess and the Pea
    • Primary Goals:
      • Collect and report test directness / testedness of code
      • Collect and report assertion density of tests
    • Start with test directness
    • Add assertion density later
  • 21. Project PEA
    • Requirements:
      • No modifications to source code or tests required
      • Test results not affected by data gathering
      • XML result set
    • Ideals:
      • Fast
      • Few restrictions on how the tests can be run
  • 22. Approach #1: Static Code Analysis
    • From looking at a test’s imports, determine which classes are referenced directly by tests
    • From each class, look what calls what
    • Assemble a call network graph
    • From each node in graph, count steps to a test case
  • 23. Approach #1: Static Code Analysis
    • Doesn’t work
    • Reflective calls defeat static detection of what is being called
    • Polymorphism defeats static detection of what is being called
  • 24. Approach #1: Static Code Analysis
    • Consider: class TestMyMap extends AbstractMapTestCase; private void testFoo() { // Map m defined in superclass m.put(“bar”,”baz”); }
    • What concrete class’ put() method is being called?
    • Java’s late binding makes this static code analysis unsuitable
  • 25. Approach #2: Byte Code Instrumentation
    • Modify the compiled .class files to call a routine on each method’s entry
    • This routine gets a dump of the stack and looks through the it until it finds a class that is a subclass of TestCase
    • Very similar to what other code coverage tools like Emma do (except those tools don’t examine the stack)
  • 26. Approach #2: Byte Code Instrumentation
    • There are several libraries out there for modifying .class files
      • BCEL from Apache
      • ASM from ObjectWeb
    • ASM was much easier to use than BCEL
    • Wrote an Ant task to go through class files and add call to a tabulating routine that examined the stack
  • 27. Approach #2: Byte Code Instrumentation
    • It did work, but it was unbearably slow – typically 1000x slower and sometimes even slower
    • This is because getting a stack dump is inherently slow
      • Stack dumps are on threads
      • Threads are implemented natively
      • To get a dump of the stack, the thread needs to be stopped and the JVM has to pause
    • To be viable, PEA cannot use thread stack dumps
  • 28. Approach #3: Aspect-Oriented Programming
    • Use AOP to accomplish similar tasks as the byte code instrumentation
      • Track method entry/exit and maintain a mirror of the stack in the app
      • Calculate and record distance from a test at every method entry
    • Avoids the overhead of getting stack dumps from the thread
  • 29. Approach #3: Aspect-Oriented Programming
    • Unsatisfactory – in fact, a complete failure
    • Method exits are all over the place in a method
      • Method exits in the byte code do not always correspond to the source structure, particularly where exceptions are concerned
      • Introducing an aspect behavior at each exit point can increase the size of the byte code by up to 30%
  • 30. Approach #3: Aspect-Oriented Programming
    • Expanding methods by 30% can (and did) cause them to bump into Java’s 64K limit on method bytecode
      • Instrumented classes would not load
    • In addition, AspectJ required you to either:
      • Recompile the source and tests using AspectJ’s compiler; or
      • Create and use your own aspecting-on-the-fly classloader
      • Either way you still hit the 64K barrier
  • 31. Approach #4: Debugger
    • The idea is to write a debugger that monitors the tests in one JVM as they run in another
    • The debugger can track method entries and exits as they happen and keep the stack straight
    • The code being tested, and the tests themselves, do not need to be aware that the debugger is watching them
  • 32. Approach #4: Debugger
    • Java includes in the SE JDK an architecture called JPDA
      • Java Platform Debugger Architecture
    • This architecture allows one JVM to debug another JVM over sockets, shared files, etc.
    • It provides an Object-Oriented API for the debugging JVM
      • Models the debugged JVM as a POJO
      • Provides call-backs for events as they occur in the debugged JVM
  • 33. Approach #4: Debugger
    • JPDA allows you to
      • Specify which events you want to be notified about
      • Specify which packages, etc. should be monitored
      • Pause and restart the other process
      • Inspect the variables, call stacks, etc. of the other process (as long as it’s paused)
    • No additional libraries required!
  • 34. Putting JPDA to work
    • First I wrote a debugger process to attach to an already running JVM
      • By using the <parallel> task in Ant, I can simultaneously launch the debugger and the Junit tests
      • The debugger will attach to and monitor the other JVM
      • Register interest in callbacks on method entry and exit, exceptions, and JVM death
  • 35. Putting JPDA to work
    • As methods enter and exit, push and pop entries onto a stack maintained in the debugger
      • This effectively mirrors the real stack in the tests
      • Ignore certain packages beyond developer control (such as the JDK itself)
  • 36. Putting JPDA to work
    • As methods are entered, calculate and record its distance in the stack from a test
    • Shut down when the other JVM dies
    • Just before shutdown, write all the recorded data to a file
  • 37. Putting JPDA to work
    • Remember – an Ant Junit task can fork multiple JVMs – one per test if you want, so we need to monitor each one
    • Multiple JVMs mean multiple files of recorded data that need to be accumulated after all the tests are complete
    • Produce XML file of accumulated results
  • 38. Results of using JPDA
    • Performance is way better than using byte code instrumentation
      • Running with monitoring on slows execution by 100x or less, depending on the code
    • Ant script is kind of complicated
      • JUnit tests and PEA must be run with forked JVMs
      • Special JVM parameters for the debugged process are required
      • JUnit and PEA must be started simultaneously using the <parallel> task (which many people don’t know about)
  • 39. Results of using JPDA
    • The byte code being monitored is completely unchanged
      • No special instrumentation or preparatory build step is required
    • XML file comes out with details about how many method calls were made at what test distance
  • 40. Results file example
  • 41. Report Sample
  • 42. Code Review Let’s roll that beautiful Java footage!
  • 43. Future Plans
    • Implement assertion density tracking
    • Tweak performance
    • Make easier to run the tests with PEA running
      • Perhaps subclass Ant’s <junit> task to run with PEA?
    • Documentation (ick)
    • Eat my own dog food and use the tool to measure my own JUnit tests
    • Sell for $2.5billion to Scott McNeely and Jonathan Schwartz and retire to Jamaica
  • 44. Lessons Learned (so far)
    • What seems impossible sometimes isn’t
    • Creativity is absolutely crucial to solving problems (as opposed to just implementing solutions)
    • JDPA is cool – and I had never heard of it
      • I never thought I’d be able to write a debugger but JDPA made it easy
    • ASM library is also cool – much nicer than BCEL
  • 45. Wanna help?
    • I’d welcome anyone who wants to participate
    • Contributing to an open-source project looks good on your resume hint hint…
  • 46. Resources
    • Java Platform Debugger Architecture
    • ASM – Byte code processing library
    • BCEL – Byte code processing library
    • AspectJ – Aspect-oriented Java extension
    • JUnit – unit testing framework
  • 47. Resources
    • Emma – code coverage tool
    • Cobertura – code coverage tool
    • JCoverage – code coverage tool
    • Clover – code coverage tool
    Thanks for listening!