The relationship between test and production code quality (@ SIG)


Published on

the slides of my presentation about my research topics at SIG, in 2013.

Published in: Technology
  • Be the first to comment

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

The relationship between test and production code quality (@ SIG)

  1. 1. IME-USPThe relationshipbetween testcode andproduction codeMauricio AnicheUniversity of São Paulo (USP)
  2. 2. Who Am I? PhD Student at University of São Paulo  Master thesis defended on April.2012 Software Developer  Consultancy for companies such as VeriFone, Sony.  Nowadays: Caelum Open Source  Restfulie.NET 1st Test-Driven Development book in Brazilian portuguese (in my non biased opinion, the best TDD book ever!)
  3. 3. Unit Tests and Code Quality Great synergy between a testable class and a well-designed class (Feathers, 2007) The write of unit tests can become complex as the interactions between software components grow out of control (McGregor, 2001) Agile practitioners state that unit tests are a way to validate and improve class design (Beck et al, 2001)
  4. 4. What am I Going to Say?A little bit about my master thesis. The first very step of my PhD. The tool I am working on.
  5. 5. 1st part: TDD and Class Design Does the practice of TDD influence on the quality of class design? Mixed study with ~20 experienced developers from industry  33% has 6 to 10 years of experience  6 different companies in 3 different cities Developers were asked to implement a set of problems, using and not using TDD.  Exercises dealt with coupling, cohesion, encapsulation problems.
  6. 6. Quantitative Analysis 264 production classes  831 methods / 2520 lines of code 73 test classes  225 methods / 1832 lines of code Wilcoxonto compare the difference in both groups.
  7. 7. Show me the p-value!
  8. 8. Quantitative Analysis Filtering by their experience in TDD  No statistical significance. Specialists’ opinion  Two different specialists reviewed all generated code, without knowing if that code was produced with or without TDD.  They evaluated in terms of “class design”, “testability”, “simplicity”, using a Likert scale from 1 to 5.  No difference in their evaluation.
  9. 9. Qualitative Analysis Interviews with ~10 developers. All of them said that “TDD does not guide you to a better class design by itself; the experience in OO and class design makes such a difference”. Some patterns emerged.
  10. 10. Patterns of Feedback
  11. 11. In my PhD My idea is to check whether the presence of those patterns in a unit test really implies in a bad production code.  MSR techniques.  Open source repositories for exploratory purposes and industry repositories for the final study.
  12. 12. 2nd part: Unit Tests and Asserts Every unit test contains three parts  Set up the scenario  Invoke the behavior under test  Validates the expected output Assert instructions  assertEquals (expected, calculated);  assertTrue(), assertFalse(), and so on No limits for the number of asserts per test
  13. 13. A little piece of codeclass InvoiceTest { @Test public void shouldCalculateTaxes() { // (i) setting up the scenario Invoice inv = new Invoice(5000.0); // (ii) invoking the behavior double tax = inv.calculateTaxes(); // (iii) validating the output assertEquals (5000 ∗ 0.06 , tax ); }}
  14. 14. Why would…… a test contain more than one assert? Is it a smell of bad code/design?
  15. 15. Research Design We selected 22 projects  19 from ASF  3 from a Brazilian consultancy Data extraction from all projects  Code metrics Statistical Test Qualitative Analysis
  16. 16. Data Extraction Test code  Number of asserts per test  Production method being tested Production code  Cyclomatic Complexity (McCabe, 1976)  Number of method invocations (Li and Henry, 1993)  Lines of Code
  17. 17. Heuristic to Extract the Production Methodclass InvoiceTest { class Invoice { @Test public double calculateTaxes() { public void shouldCalculateTaxes() { // something… // (i) setting up the scenario } Invoice inv = new Invoice(5000.0); } // (ii) invoking the behavior double tax = inv.calculateTaxes(); // (iii) validating the output assertEquals (5000 ∗ 0.06 , tax ); }}
  18. 18. AssertsDistributionin SelectedProjects
  19. 19. Results ofthe Test
  20. 20. Why more than 1 assert? 130tests randomly selected Qualitative analysis:  More than one assert for the same object (40.4%)  Different inputs to the same method (38.9%)  List/Array (9.9%)  Others (6.8%)  Extra assert to check if object is not null (3.8%)
  21. 21. “Asserted Objects” We coined the term “asserted objects”  It counts not the number of asserts in a unit test, but the number of different instances of objects that are being asserted assertEquals(10, obj.getA()); assertEquals(20, obj.getB()); Counts as 1 “asserted object”
  22. 22. Distribution ofAssertedObjects
  23. 23. Results ofthe Test
  24. 24. Findings Counting the number of asserts in a unit test does not give valuable feedback about code quality  But counting the number of asserted objects may provide useful information  However, the difference between both groups was not “big” A possible explanation:  Methods that contain higher CC, lines of code, and method invocations contains many different paths, and developers prefer to write all of it in a single unit test, rather than splitting in many of them
  25. 25. My current problem How to statistically identify if a test code is a “unit test” or a “integration/system test”?
  26. 26. 3rd Step: Metric Miner Started as a command-line tool to calculate code metrics in Git repositories.  As you can guess, I needed that for my masters.A undergraduate student ported my tool to a web-based system.  Much more interesting!
  27. 27. What does it do? Tool that facilitates studies in MSR. Already contains the entire Apache repository cloned. Researcher can write a new metric and just plug to the system. Later on, he can execute an SQL query and extract data. He can also execute an statistical test with two sets of existent data.
  28. 28. Pros and Cons You do not need to spend your computer resources.  The power of cloud computing (thanks, Locaweb!) Still slow.  We need to parallelize the metric execution.  Go for Google’s Big Query (~300GB of data).
  29. 29. Contact Information Mauricio Aniche / @mauricioaniche TDD no Mundo Real Software Engineering & Collaborative Systems Research Lab (LAPESSC)