IME-USP

The relationship
between test
code and
production code
Mauricio Aniche
University of São Paulo (USP)
aniche@ime.usp.br
Who Am I?
   PhD Student at University of São Paulo
       Master thesis defended on April.2012
   Software Developer
       Consultancy for companies such as
        VeriFone, Sony.
       Nowadays: Caelum
   Open Source
   Restfulie.NET
 1st Test-Driven Development book in Brazilian
    portuguese (in my non biased opinion, the
    best TDD book ever!)
Unit Tests and Code Quality
 Great  synergy between a testable class
  and a well-designed class (Feathers, 2007)
 The write of unit tests can become
  complex as the interactions between
  software components grow out of control
  (McGregor, 2001)
 Agile practitioners state that unit tests are
  a way to validate and improve class
  design (Beck et al, 2001)
What am I Going to Say?
A  little bit about my master thesis.
 The first very step of my PhD.
 The tool I am working on.
1st part: TDD and Class Design
 Does  the practice of TDD influence on the
  quality of class design?
 Mixed study with ~20 experienced
  developers from industry
     33% has 6 to 10 years of experience
     6 different companies in 3 different cities
 Developers  were asked to implement a
 set of problems, using and not using TDD.
     Exercises dealt with coupling, cohesion,
      encapsulation problems.
Quantitative Analysis
 264   production classes
    831 methods / 2520 lines of code
 73   test classes
    225 methods / 1832 lines of code
 Wilcoxonto compare the difference in
 both groups.
Show me the p-value!
Quantitative Analysis
 Filtering   by their experience in TDD
     No statistical significance.
 Specialists’   opinion
     Two different specialists reviewed all
      generated code, without knowing if that
      code was produced with or without TDD.
     They evaluated in terms of “class
      design”, “testability”, “simplicity”, using a
      Likert scale from 1 to 5.
     No difference in their evaluation.
Qualitative Analysis
 Interviews with ~10 developers.
 All of them said that “TDD does not guide
  you to a better class design by itself; the
  experience in OO and class design makes
  such a difference”.
 Some patterns emerged.
Patterns of Feedback
In my PhD
 My idea is to check whether the presence
 of those patterns in a unit test really
 implies in a bad production code.
    MSR techniques.
    Open source repositories for exploratory
     purposes and industry repositories for the
     final study.
2nd part: Unit Tests and Asserts
 Every    unit test contains three parts
     Set up the scenario
     Invoke the behavior under test
     Validates the expected output
 Assert   instructions
     assertEquals (expected, calculated);
     assertTrue(), assertFalse(), and so on
 No   limits for the number of asserts per test
A little piece of code
class InvoiceTest {
  @Test
  public void shouldCalculateTaxes() {
    // (i) setting up the scenario
    Invoice inv = new Invoice(5000.0);

        // (ii) invoking the behavior
        double tax = inv.calculateTaxes();

        // (iii) validating the output
        assertEquals (5000 ∗ 0.06 , tax );
    }
}
Why would…
…     a test contain more than one assert?
 Is it a smell of bad code/design?
Research Design
 We    selected 22 projects
     19 from ASF
     3 from a Brazilian consultancy
 Data   extraction from all projects
     Code metrics
 Statistical
            Test
 Qualitative Analysis
Data Extraction
 Test   code
     Number of asserts per test
     Production method being tested
 Production    code
     Cyclomatic Complexity (McCabe, 1976)
     Number of method invocations (Li and
      Henry, 1993)
     Lines of Code
Heuristic to Extract the
        Production Method
class InvoiceTest {                          class Invoice {
  @Test                                        public double calculateTaxes()
                                               {
  public void shouldCalculateTaxes() {
                                                 // something…
    // (i) setting up the scenario
                                               }
    Invoice inv = new Invoice(5000.0);
                                             }

        // (ii) invoking the behavior
        double tax = inv.calculateTaxes();

        // (iii) validating the output
        assertEquals (5000 ∗ 0.06 , tax );
    }
}
Asserts
Distribution
in Selected
Projects
Results of
the Test
Why more than 1 assert?
 130tests randomly selected
 Qualitative analysis:
    More than one assert for the same object
     (40.4%)
    Different inputs to the same method (38.9%)
    List/Array (9.9%)
    Others (6.8%)
    Extra assert to check if object is not null (3.8%)
“Asserted Objects”
 We   coined the term “asserted objects”
    It counts not the number of asserts in a unit
     test, but the number of different instances
     of objects that are being asserted

     assertEquals(10, obj.getA());
     assertEquals(20, obj.getB());
        Counts as 1 “asserted object”
Distribution of
Asserted
Objects
Results of
the Test
Findings
   Counting the number of asserts in a unit test
    does not give valuable feedback about
    code quality
       But counting the number of asserted objects
        may provide useful information
       However, the difference between both groups
        was not “big”
   A possible explanation:
       Methods that contain higher CC, lines of
        code, and method invocations contains many
        different paths, and developers prefer to write
        all of it in a single unit test, rather than splitting in
        many of them
My current problem
 How to statistically identify if a test code is
 a “unit test” or a “integration/system
 test”?
3rd Step: Metric Miner
 Started
        as a command-line tool to
 calculate code metrics in Git repositories.
    As you can guess, I needed that for my
     masters.
A undergraduate student ported my tool
 to a web-based system.
    Much more interesting!
What does it do?
 Tool  that facilitates studies in MSR.
 Already contains the entire Apache
  repository cloned.
 Researcher can write a new metric and
  just plug to the system.
 Later on, he can execute an SQL query
  and extract data.
 He can also execute an statistical test
  with two sets of existent data.
Pros and Cons
 You do not need to spend your computer
  resources.
     The power of cloud computing (thanks,
      Locaweb!)
 Still   slow.
     We need to parallelize the metric
      execution.
     Go for Google’s Big Query (~300GB of
      data).
Contact Information
 Mauricio Aniche
  aniche@ime.usp.br / @mauricioaniche
 TDD no Mundo Real
  http://www.tddnomundoreal.com.br

   Software Engineering & Collaborative
     Systems Research Lab (LAPESSC)
         http://lapessc.ime.usp.br/

The relationship between test and production code quality (@ SIG)

  • 1.
    IME-USP The relationship between test codeand production code Mauricio Aniche University of São Paulo (USP) aniche@ime.usp.br
  • 2.
    Who Am I?  PhD Student at University of São Paulo  Master thesis defended on April.2012  Software Developer  Consultancy for companies such as VeriFone, Sony.  Nowadays: Caelum  Open Source  Restfulie.NET  1st Test-Driven Development book in Brazilian portuguese (in my non biased opinion, the best TDD book ever!)
  • 3.
    Unit Tests andCode Quality  Great synergy between a testable class and a well-designed class (Feathers, 2007)  The write of unit tests can become complex as the interactions between software components grow out of control (McGregor, 2001)  Agile practitioners state that unit tests are a way to validate and improve class design (Beck et al, 2001)
  • 4.
    What am IGoing to Say? A little bit about my master thesis.  The first very step of my PhD.  The tool I am working on.
  • 5.
    1st part: TDDand Class Design  Does the practice of TDD influence on the quality of class design?  Mixed study with ~20 experienced developers from industry  33% has 6 to 10 years of experience  6 different companies in 3 different cities  Developers were asked to implement a set of problems, using and not using TDD.  Exercises dealt with coupling, cohesion, encapsulation problems.
  • 6.
    Quantitative Analysis  264 production classes  831 methods / 2520 lines of code  73 test classes  225 methods / 1832 lines of code  Wilcoxonto compare the difference in both groups.
  • 7.
    Show me thep-value!
  • 8.
    Quantitative Analysis  Filtering by their experience in TDD  No statistical significance.  Specialists’ opinion  Two different specialists reviewed all generated code, without knowing if that code was produced with or without TDD.  They evaluated in terms of “class design”, “testability”, “simplicity”, using a Likert scale from 1 to 5.  No difference in their evaluation.
  • 9.
    Qualitative Analysis  Interviewswith ~10 developers.  All of them said that “TDD does not guide you to a better class design by itself; the experience in OO and class design makes such a difference”.  Some patterns emerged.
  • 10.
  • 11.
    In my PhD My idea is to check whether the presence of those patterns in a unit test really implies in a bad production code.  MSR techniques.  Open source repositories for exploratory purposes and industry repositories for the final study.
  • 12.
    2nd part: UnitTests and Asserts  Every unit test contains three parts  Set up the scenario  Invoke the behavior under test  Validates the expected output  Assert instructions  assertEquals (expected, calculated);  assertTrue(), assertFalse(), and so on  No limits for the number of asserts per test
  • 13.
    A little pieceof code class InvoiceTest { @Test public void shouldCalculateTaxes() { // (i) setting up the scenario Invoice inv = new Invoice(5000.0); // (ii) invoking the behavior double tax = inv.calculateTaxes(); // (iii) validating the output assertEquals (5000 ∗ 0.06 , tax ); } }
  • 14.
    Why would… … a test contain more than one assert?  Is it a smell of bad code/design?
  • 15.
    Research Design  We selected 22 projects  19 from ASF  3 from a Brazilian consultancy  Data extraction from all projects  Code metrics  Statistical Test  Qualitative Analysis
  • 16.
    Data Extraction  Test code  Number of asserts per test  Production method being tested  Production code  Cyclomatic Complexity (McCabe, 1976)  Number of method invocations (Li and Henry, 1993)  Lines of Code
  • 17.
    Heuristic to Extractthe Production Method class InvoiceTest { class Invoice { @Test public double calculateTaxes() { public void shouldCalculateTaxes() { // something… // (i) setting up the scenario } Invoice inv = new Invoice(5000.0); } // (ii) invoking the behavior double tax = inv.calculateTaxes(); // (iii) validating the output assertEquals (5000 ∗ 0.06 , tax ); } }
  • 18.
  • 19.
  • 20.
    Why more than1 assert?  130tests randomly selected  Qualitative analysis:  More than one assert for the same object (40.4%)  Different inputs to the same method (38.9%)  List/Array (9.9%)  Others (6.8%)  Extra assert to check if object is not null (3.8%)
  • 21.
    “Asserted Objects”  We coined the term “asserted objects”  It counts not the number of asserts in a unit test, but the number of different instances of objects that are being asserted assertEquals(10, obj.getA()); assertEquals(20, obj.getB()); Counts as 1 “asserted object”
  • 22.
  • 23.
  • 24.
    Findings  Counting the number of asserts in a unit test does not give valuable feedback about code quality  But counting the number of asserted objects may provide useful information  However, the difference between both groups was not “big”  A possible explanation:  Methods that contain higher CC, lines of code, and method invocations contains many different paths, and developers prefer to write all of it in a single unit test, rather than splitting in many of them
  • 25.
    My current problem How to statistically identify if a test code is a “unit test” or a “integration/system test”?
  • 26.
    3rd Step: MetricMiner  Started as a command-line tool to calculate code metrics in Git repositories.  As you can guess, I needed that for my masters. A undergraduate student ported my tool to a web-based system.  Much more interesting!
  • 27.
    What does itdo?  Tool that facilitates studies in MSR.  Already contains the entire Apache repository cloned.  Researcher can write a new metric and just plug to the system.  Later on, he can execute an SQL query and extract data.  He can also execute an statistical test with two sets of existent data.
  • 28.
    Pros and Cons You do not need to spend your computer resources.  The power of cloud computing (thanks, Locaweb!)  Still slow.  We need to parallelize the metric execution.  Go for Google’s Big Query (~300GB of data).
  • 29.
    Contact Information  MauricioAniche aniche@ime.usp.br / @mauricioaniche  TDD no Mundo Real http://www.tddnomundoreal.com.br Software Engineering & Collaborative Systems Research Lab (LAPESSC) http://lapessc.ime.usp.br/