Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Test Automation Day 2018

216 views

Published on

My talk Besides the obvious tools: improving your testing with state-of-the-art techniques , at the Test Automation Day 2018, in Rotterdam.

Published in: Technology
  • Be the first to comment

  • Be the first to like this

Test Automation Day 2018

  1. 1. Besides the obvious tools: improving your testing with state-of-the-art techniques Maurício Aniche m.f.aniche@tudelft.nl @mauricioaniche Photo by Sora Sagano https://unsplash.com/photos/WA-QRL5wDMw
  2. 2. Content and License • This presentation can be found at: http://www.mauricioaniche.com/talks/2018/tad • You can use it and modify it. • You always have to give credits to the original author. • You agree not to sell it or make profit in any way with this.
  3. 3. ! Jeroen Castelein " Mozhan Soltani # Annibale Panichella ! Joop Aué ! Maikel Lobbezoo ! Rick Wieman ! Sicco Verwer ! Felienne Hermans # Davide Spadini# $ Alberto Bacchelli ! Arie van DeursenKristín Fjóla ! Peter Evers Qianqian Zhu
  4. 4. • First job as a developer in 2004 • First important project in 2016 • First important bug: 2016 • Tests are important! A little story Photo by Michael Mims https://unsplash.com/photos/0ZL0O-eDOpU
  5. 5. TEST ANALYSIS & TEST DESIGN clipart by j4p4n, adlerweb https://openclipart.org/detail/297959/standing-robot https://openclipart.org/detail/262444/bubble-person
  6. 6. “Testing is different from writing tests. Developers write tests as a a way to give them space to think and confidence for refactoring. Testing focuses on finding bugs. Both should be done.” https://medium.com/@mauricioaniche/testing-vs-writing-tests-d817bffea6bc
  7. 7. The literature on test oracles has introduced techniques for oracle automation, including modelling, specifications, contract-driven development and metamorphic testing. When none of these is completely adequate, the final source of test oracle information remains the human, who may be aware of informal specifications, expectations, norms and domain specific information that provide informal oracle guidance.
  8. 8. TEST ANALYSIS & TEST DESIGN Find systematic and automated ways to design and execute tests!
  9. 9. Topics of today • Structural testing and MC/DC • Log monitoring and passive learning • Search-based software testing • Mutation testing • Fuzzing • Property-based testing • Code review • Static analysis tools
  10. 10. Who are you? • Software developers? • Software testers? • What are your expectations here today? • Fill this out: https://bit.ly/tad2018 clipart by GDJ https://openclipart.org/detail/230150/crowd-of-kids
  11. 11. Structural Testing clipart by J_Alves https://openclipart.org/detail/61405/threonine-amino-acid
  12. 12. Given the points of two different players, the program must return the number of points the one who wins has! public int play(int left, int right) { int ln = left; int rn = right; if(ln > 21) ln = 0; if(rn > 21) rn = 0; if(ln > rn) return rn; else return ln; }
  13. 13. public int play(int left, int right) { int ln = left; int rn = right; if(ln > 21) ln = 0; if(rn > 21) rn = 0; if(ln > rn) return rn; else return ln; } First criteria: “going through all the lines” If our test suite exercises all the lines, we are happy.
  14. 14. public int play(int left, int right) { int ln = left; int rn = right; if(ln > 21) ln = 0; if(rn > 21) rn = 0; if(ln > rn) return rn; else return ln; } First criteria: “going through all the lines” If our test suite exercises all the lines, we are happy. T1 = (30, 30)
  15. 15. public int play(int left, int right) { 1 int ln = left; 2 int rn = right; 3 if(ln > 21) 4 ln = 0; 5 if(rn > 21) 6 rn = 0; 7 if(ln > rn) 8 return rn; 9 else 10 return ln; } First criteria: “going through all the lines” If our test suite exercises all the lines, we are happy. T1 = (30, 30) 9 / 10 = 90% line coverage
  16. 16. public int play(int left, int right) { 1 int ln = left; 2 int rn = right; 3 if(ln > 21) 4 ln = 0; 5 if(rn > 21) 6 rn = 0; 7 if(ln > rn) 8 return rn; 9 else 10 return ln; } First criteria: “going through all the lines” If our test suite exercises all the lines, we are happy. T1 = (30, 30) T2 = (10,9) <-- left player wins Make it true
  17. 17. public int play(int left, int right) { 1 int ln = left; 2 int rn = right; 3 if(ln > 21) 4 ln = 0; 5 if(rn > 21) 6 rn = 0; 7 if(ln > rn) 8 return rn; 9 else 10 return ln; } First criteria: “going through all the lines” If our test suite exercises all the lines, we are happy. T1 = (30, 30) T2 = (10,9) <-- left player wins 10 / 10 = 100% line coverage
  18. 18. 9/10 = 90%, 5/6 = 83%... From now on, I’ll write as many lines as I can!! Xclipart by GDJ https://openclipart.org/detail/230143/female-engineer-9
  19. 19. Given a sentence, you should count the number of words that end with either an “s” or an “r”. A word ends when a non- letter appears.
  20. 20. int words = 0; char last = ' '; for(int i = 0; i<str.length(); i++) if(!Character.isLetter (str.charAt(i)) && (last == ‘s’ || last == ‘r’)) words++; last = str.charAt(i); if(last == ‘s’ || last == ‘r’) words++; return words; true false false false true true Control-flow graph (CFG) We should cover all the branches (arrows)
  21. 21. int words = 0; char last = ' '; for(int i = 0; i<str.length(); i++) if(!Character.isLetter (str.charAt(i)) && (last == ‘s’ || last == ‘r’)) words++; last = str.charAt(i); if(last == ‘s’ || last == ‘r’) words++; return words; true false false false true true “cats|dogs”
  22. 22. int words = 0; char last = ' '; for(int i = 0; i<str.length(); i++) if(!Character.isLetter (str.charAt(i)) && (last == ‘s’ || last == ‘r’)) words++; last = str.charAt(i); if(last == ‘s’ || last == ‘r’) words++; return words; true false false false true true “cats|dog”
  23. 23. Branch coverage means we exercise all the branches!
  24. 24. I wonder if that’s enough…
  25. 25. if(!Character.isLetter (str.charAt(i))) last == 'r'last == 's’ words++; last = str.charAt(i); false true true false true false If we “explode” the if into its several conditions, we have more paths to explore!
  26. 26. int words = 0; char last = ' '; for(int i = 0; i<str.length(); i++) if(!Character.isLetter (str.charAt(i)) last == 'r'last == 's’ words++; last = str.charAt(i); if(last == ‘s' last == ‘r’) words++; return words; true false true true false false false true false true true false
  27. 27. Ok, condition coverage seems to cover more than branch coverage!
  28. 28. If we aim for condition coverage, are we testing all the paths?
  29. 29. (A && (B | C)) Tests a b c Outcome 1 T T T T 2 T T F T 3 T F T T 4 T F F F 5 F T T F 6 F T F F 7 F F T F 8 F F F F Path Coverage
  30. 30. Can we actually achieve 100% path coverage?
  31. 31. • The subpaths through this control flow can include or exclude each of the statements Si, so that in total N branches result in 2^N paths that must be traversed • Choosing input data to force execution of one particular path may be very difficult, or even impossible if the conditions are not independent if (a) { S1; } if (b) { S2; } if (C) { S3; } ... if (x) { Sn; } The number of paths can still grow exponentially
  32. 32. Can we test just the important combinations?
  33. 33. Modified Condition/ Decision Coverage (MC/DC)
  34. 34. (A && (B | C)) Tests a b c Outcome 1 T T T T 2 T T F T 3 T F T T 4 T F F F 5 F T T F 6 F T F F 7 F F T F 8 F F F F
  35. 35. (A && (B | C)) Tests a b c Outcome 1 T T T T 2 T T F T 3 T F T T 4 T F F F 5 F T T F 6 F T F F 7 F F T F 8 F F F F A = {1, 5}, {2, 6}, {3,7} B = {2, 4} C = {3, 4} Final = {2, 3, 4, 6} They are the same! We don’t need them all
  36. 36. So, for N conditions, I always have only N+1 tests! That’s definitely better than 2n!!
  37. 37. McCabe’s Cyclomatic Complexity • C = |E| - |N| + 2 • C = # decision points + 1 • C = # of decision-statements + 1 C > 10: method too complex [McCabe, 1976] [ C correlated with #lines of code ] 32 1 7 65 4
  38. 38. McCabe for Testing? No empirical evidence that it is better than just decision coverage. How many tests? • Branch: 2 tests • All paths: 4 tests • McCabe: 3 tests 32 1 7 65 4 McCabe: Easy to count, limited usefulness as coverage metric
  39. 39. Strategy Subsumption MC/DC Branch + Condition Coverage Branch Coverage Statement Coverage • Strategy X subsumes strategy Y if all elements that Y exercises are also exercised by X • No conclusive results on relative bug-finding effectiveness have been established. Path coverage
  40. 40. What do YOU think: Do we need 100% code coverage?
  41. 41. Don’t worry about coverage, just write some good tests. I am ready to write some unit tests. What code coverage should I aim for? Testivus on Code Coverage. Alberto Savoia. https://www.artima.com/weblogs/viewpost.jsp?thread=204677 clipart by 10_boss, bibbleycheese https://openclipart.org/detail/202573/my-yoda https://openclipart.org/detail/248493/pretzel-ninja
  42. 42. How many grains of rice should put in that [boiling water] pot? I am ready to write some unit tests. What code coverage should I aim for? Testivus on Code Coverage. Alberto Savoia. https://www.artima.com/weblogs/viewpost.jsp?thread=204677 It depends on how many people you need to feed, how hungry they are, what other food you are serving, how much rice you have available, and so on Exactly!
  43. 43. 80% and no less! I am ready to write some unit tests. What code coverage should I aim for? Testivus on Code Coverage. Alberto Savoia. https://www.artima.com/weblogs/viewpost.jsp?thread=204677
  44. 44. The first programmer is new and just getting started with testing. Right now he has a lot of code and no tests. He has a long way to go; focusing on code coverage at this time would be depressing and quite useless. He’s better off just getting used to writing and running some tests. He can worry about coverage later. Testivus on Code Coverage. Alberto Savoia. https://www.artima.com/weblogs/viewpost.jsp?thread=204677
  45. 45. The second programmer, on the other hand, is quite experience both at programming and testing. When I replied by asking her how many grains of rice I should put in a pot, I helped her realize that the amount of testing necessary depends on a number of factors, and she knows those factors better than I do – it’s her code after all. There is no single, simple, answer, and she’s smart enough to handle the truth and work with that. Testivus on Code Coverage. Alberto Savoia. https://www.artima.com/weblogs/viewpost.jsp?thread=204677
  46. 46. The third programmer wants only simple answers – even when there are no simple answers … and then does not follow them anyway. Testivus on Code Coverage. Alberto Savoia. https://www.artima.com/weblogs/viewpost.jsp?thread=204677
  47. 47. Mutation testing Gif by h1flosse https://openclipart.org/detail/190026/mutant
  48. 48. Imagine your code is a small town, where crimes happen from times to times… Photo by Jesus in Taiwan https://unsplash.com/photos/c6aunWXHZZ0
  49. 49. Imagine your code is a small town, where crimes happen from times to times… clipart by kolbasun https://openclipart.org/detail/219619/ninja-cop Let’s simulate crimes and see if the cops can get it!
  50. 50. City -> Program Crime -> Bugs in code Police -> Unit testing Fake crime -> Mutation Testing
  51. 51. public int play(int left, int right) { int ln = left; int rn = right; if(ln > 21) ln = 0; if(rn > 21) rn = 0; if(ln > rn) return rn; else return ln; } public int play(int left, int right) { int ln = left; int rn = right; if(ln > 21) ln = 0; if(rn < 21) rn = 0; if(ln > rn) return rn; else return ln; }
  52. 52. public int play(int left, int right) { int ln = left; int rn = right; if(ln > 21) ln = 0; if(rn > 21) rn = 0; if(ln > rn) return rn; else return ln; } public int play(int left, int right) { int ln = left; int rn = right; if(ln > 21) ln = 0; if(rn < 21) rn = 0; if(ln > rn) return rn; else return ln; } If your test still passes, this is no good!
  53. 53. Common mutants • Replace arithmetic operator (+, -, *, /, …) • Replace relational operators (>, >=, <, <=, ==, !=, …) • Replace constants (a -> a+1)
  54. 54. As a research field • Since the 70s • Benefits: • Better fault exposing capability • A good alternative to real faults • Limitations: • High computational power • Undecidable Equivalent Mutant Problem •Mutants for other problems • SQL
  55. 55. In order to alleviate the computational issues, we present a diff-based probabilistic approach to mutation analysis that drastically reduces the number of mutants by omitting lines of code without statement coverage and lines that are determined to be uninteresting
  56. 56. Mutations: http://pitest.org/quickstart/mutators/
  57. 57. Is (preventive) testing enough? Maybe not… clipart by dani ela https://openclipart.org/detail/229476/14-flowers
  58. 58. Context: Payments Payment Provider
  59. 59. DEV OPS Logs are our current bridge!
  60. 60. One Billion Log Lines a Day: Monitoring using the ELK Stack • Logstash: Unify different logging sources • Elastic Search: Search and filter large log data • Kibana: Visual interactive dashboard Image credit: www.neteye-blog.com
  61. 61. Poll: Java Exceptions in a Payment System Your payment system in production generates 1 billion log lines per day. How many errors / warnings with exceptions do you expect to see? A. None. “We have a zero exception policy.” B. 1 Thousand. “Some exceptions are unavoidable.” C. 1 Million. “Most exceptions are harmless.” D. 1 Billion. “We only log errors and exceptions.” Adyen, Nov 2016: ~1,000,000 per day
  62. 62. Complex API Integration • Payment APIs are complex • Integration faults are easily made • Merchant needs assistance with API usage • Merchant may not notice mistakes • 2.5M http error responses per month • What can we learn from them? 66
  63. 63. 11 Common Causes for API Error Reponses Integrators are definitely the main responsible for API integration problems!
  64. 64. 11 Common Causes for API Error Reponses Integrators are definitely the main responsible for API integration problems! Understand your errors
  65. 65. Payment Terminals Payment Provider
  66. 66. Point of sale terminal variability • Card brands • Card entry modes (chip, swipe, contactless) • Currency conversion • Loyalty points • Validation type (pin, signature) • Issuer responses (declined, insufficient balance) • Cancellations (shopper, merchant)
  67. 67. Passive learning Identifying system behavior from observations, and representing it in the smallest possible model. 20170101160001 Adyen version: ****** 20170101160002 Starting TX/amt=10001/currency=978 20170101160003 Starting EMV 20170101160004 EMV started 20170101160005 Magswipe opened 20170101160006 CTLS started 20170101160007 Transaction initialised 20170101160008 Run TX as EMV transaction 20170101160009 Application selected app:****** 20170101160010 read_application_data succeeded 20170101160011 data_authentication succeeded 20170101160012 validate 0 20170101160013 DCC rejected 20170101160014 terminal_risk_management succeeded 20170101160015 verify_card_holder succeeded 20170101160016 generate_first_ac succeeded 20170101160017 Authorizing online 20170101160018 Data returned by the host succeeded 20170101160019 Transaction authorized by card 20170101160020 Approved receipt printed 20170101160021 pos_result_code:APPROVED 20170101160022 Final status: Approved 20170101160001 Adyen version: ****** 20170101160002 Starting TX/amt=10001/currency=978 20170101160003 Starting EMV 20170101160004 EMV started 20170101160005 Magswipe opened 20170101160006 CTLS started 20170101160007 Transaction initialised 20170101160008 Run TX as EMV transaction 20170101160009 Application selected app:****** 20170101160010 read_application_data succeeded 20170101160011 data_authentication succeeded 20170101160012 validate 0 20170101160013 DCC rejected 20170101160014 terminal_risk_management succeeded 20170101160015 verify_card_holder succeeded 20170101160016 generate_first_ac succeeded 20170101160017 Authorizing online 20170101160018 Data returned by the host succeeded 20170101160019 Transaction authorized by card 20170101160020 Approved receipt printed 20170101160021 pos_result_code:APPROVED 20170101160022 Final status: Approved 20170101160001 Adyen version: ****** 20170101160002 Starting TX/amt=10001/currency=978 20170101160003 Starting EMV 20170101160004 EMV started 20170101160005 Magswipe opened 20170101160006 CTLS started 20170101160007 Transaction initialised 20170101160008 Run TX as EMV transaction 20170101160009 Application selected app:****** 20170101160010 read_application_data succeeded 20170101160011 data_authentication succeeded 20170101160012 validate 0 20170101160013 DCC rejected 20170101160014 terminal_risk_management succeeded 20170101160015 verify_card_holder succeeded 20170101160016 generate_first_ac succeeded 20170101160017 Authorizing online 20170101160018 Data returned by the host succeeded 20170101160019 Transaction authorized by card 20170101160020 Approved receipt printed 20170101160021 pos_result_code:APPROVED 20170101160022 Final status: Approved 20170101160001 Adyen version: ****** 20170101160002 Starting TX/amt=10001/currency=978 20170101160003 Starting EMV 20170101160004 EMV started 20170101160005 Magswipe opened 20170101160006 CTLS started 20170101160007 Transaction initialised 20170101160008 Run TX as EMV transaction 20170101160009 Application selected app:****** 20170101160010 read_application_data succeeded 20170101160011 data_authentication succeeded 20170101160012 validate 0 20170101160013 DCC rejected 20170101160014 terminal_risk_management succeeded 20170101160015 verify_card_holder succeeded 20170101160016 generate_first_ac succeeded 20170101160017 Authorizing online 20170101160018 Data returned by the host succeeded 20170101160019 Transaction authorized by card 20170101160020 Approved receipt printed 20170101160021 pos_result_code:APPROVED 20170101160022 Final status: Approved 20170101160001 Adyen version: ****** 20170101160002 Starting TX/amt=10001/currency=978 20170101160003 Starting EMV 20170101160004 EMV started 20170101160005 Magswipe opened 20170101160006 CTLS started 20170101160007 Transaction initialised 20170101160008 Run TX as EMV transaction 20170101160009 Application selected app:****** 20170101160010 read_application_data succeeded 20170101160011 data_authentication succeeded 20170101160012 validate 0 20170101160013 DCC rejected 20170101160014 terminal_risk_management succeeded 20170101160015 verify_card_holder succeeded 20170101160016 generate_first_ac succeeded 20170101160017 Authorizing online 20170101160018 Data returned by the host succeeded 20170101160019 Transaction authorized by card 20170101160020 Approved receipt printed 20170101160021 pos_result_code:APPROVED 20170101160022 Final status: Approved Rick Wieman, Maurício Aniche, Willem Lobbezoo, Sicco Verwer and Arie van Deursen. An Experience Report on Applying Passive Learning in a Large-Scale Payment Company. ICSME Industry Track, 2017 https://automatonlearning.net/ DFASAT / FlexFringe Heule & Verwer, ICGI 2010
  68. 68. Use Inferred Models to Analyze: Bugs in Test Phase • Terminal asked for PIN • AND asked for signature • Domain expert noted this unwanted behavior in inferred model. • Fixed before it went into production
  69. 69. Use Inferred Models to Analyze: Differences Between Card Brands Twice as many chip errors Informed merchant about issue.
  70. 70. Use Inferred Models to Analyze: Time out problems Timeout Improved performance under network instability by adding targeted retry mechanism
  71. 71. Can the machine generate tests for us? Automated test generation! clipart by bingenberg https://openclipart.org/detail/229476/14-flowers
  72. 72. 1 5 2 6 7 3 4 8 9 10
  73. 73. 1 5 2 6 7 3 4 8 9 10 (1,2,3)
  74. 74. 1 5 2 6 7 3 4 8 9 10 @Test public void test(){ // Constructor (init) // Method Calls // Assertions (check) }
  75. 75. 1 5 2 6 7 3 4 8 9 10 @Test public void test(){ Triangle t = new Triangle (1,2,3); // Method Calls // Assertions (check) }
  76. 76. 1 5 2 6 7 3 4 8 9 10 @Test public void test(){ Triangle t = new Triangle (1,2,3); t.computeTriangleType(); // Assertions (check) }
  77. 77. 1 5 2 6 7 3 4 8 9 10 @Test public void test(){ Triangle t = new Triangle (1,2,3); t.computeTriangleType(); String typ = t.getType(); assertTrue(typ.equals(“SCALENE”)); }
  78. 78. Random testing 1. Pick one of the available constructors (with random input) 2. Pick one or more public methods (with random input) 3. Generate the assertions by checking the final state of the object using get methods clipart by 10binary https://openclipart.org/detail/175047/february-11-2013
  79. 79. Fuzzing tests in practice
  80. 80. Genetic Algorithm Initialization Fitness Calculations Terminate? Selection Crossover Mutation Elitism Yes No
  81. 81. 1 5 2 6 7 3 4 8 9 10 (2,2,3) -> <1,2,4> (2,3,3) -> <1,5,7,8>
  82. 82. 1 5 2 6 7 3 4 8 9 10 (2,2,3) -> <1,2,4> (2,3,3) -> <1,5,7,8> Fitness = Approach + Distance Approach = # of control nodes between the execution and the target. Distance = The normalized distance for the control node that diverged to “not diverge”. n/(n+1)
  83. 83. 1 5 2 6 7 3 4 8 9 10 (2,2,3) -> <1,2,4> = 2 + [1/(1+1)] = 2.5 (2,3,3) -> <1,5,7,8> = 0 + [1/(1+1)] = 0.5 Fitness = Approach + Distance Approach = # of control nodes between the execution and the target. Distance = The normalized distance for the control node that diverged to “not diverge”. n/(n+1)
  84. 84. 1 5 2 6 7 3 4 8 9 10 (2,2,3) -> <1,2,4> = 2 + [1/(1+1)] = 2.5 (2,3,3) -> <1,5,7,8> = 0 + [1/(1+1)] = 0.5 <-- better! Fitness = Approach + Distance Approach = # of control nodes between the execution and the target. Distance = The normalized distance for the control node that diverged to “not diverge”. n/(n+1)
  85. 85. Genetic Algorithm Initialization Fitness Calculations Terminate? Selection Crossover Mutation Elitism Yes No
  86. 86. Fraser, Gordon, and Andrea Arcuri. "Evosuite: automatic test suite generation for object-oriented software." Proceedings of the 19th ACM SIGSOFT symposium and the 13th European conference on Foundations of software engineering. ACM, 2011.
  87. 87. Testing SQL Query SELECT Name FROM Product WHERE Price > 20 Name Price - 19 - 20 - 21 Test Database Table: Product Coverage Criterion 1. False Price = 19 2. Boundary Price = 20 3. True Price = 21
  88. 88. Testing SQL Query SELECT * FROM `account` LEFT JOIN `user` AS `assignedUser` ON account.assigned_user_id = assigneduser.id LEFT JOIN `user` AS `modifiedBy` ON account.modified_by_id = modifiedby.id LEFT JOIN `user` AS `createdBy` ON account.created_by_id = createdby.id LEFT JOIN `entity_email_address` AS `emailAddressesMiddle` ON account.id = emailaddressesmiddle.entity_id AND emailaddressesmiddle.deleted = '0' AND emailaddressesmiddle.primary = '1' AND emailaddressesmiddle.entity_type = 'Account' LEFT JOIN `email_address` AS `emailAddresses` ON emailaddresses.id = emailaddressesmiddle.email_address_id AND emailaddresses.deleted = '0' LEFT JOIN `entity_phone_number` AS `phoneNumbersMiddle` ON account.id = phonenumbersmiddle.entity_id AND phonenumbersmiddle.deleted = '0' AND phonenumbersmiddle.primary = '1' AND phonenumbersmiddle.entity_type = 'Account' LEFT JOIN `phone_number` AS `phoneNumbers` ON phonenumbers.id = phonenumbersmiddle.phone_number_id AND phonenumbers.deleted = '0' WHERE (( account.name LIKE 'Besha%' OR account.id IN (SELECT entity_id FROM entity_email_address JOIN email_address ON email_address.id = entity_email_address.email_address_id WHERE entity_email_address.deleted = 0 AND entity_email_address.entity_type = 'Account' AND email_address.deleted = 0 AND email_address.name LIKE 'Besha%') )) AND account.deleted = '0' x 42 Coverage Rules ü
  89. 89. EvoSQL EvoSQL SQLFpc Test Data Query Database Schema Coverage Rules Jeroen Castelein, Maurício Aniche, Mozhan Soltani, Annibale Panicchella, Arie Van Deursen Search-Based Test Data Generation for SQL Queries. ICSE 2018.
  90. 90. Study Context 2,135 queries / 4 systems: • Alura, e-learning platform • EspoCRM, open source software for customer relations • SuiteCRM, open source software for customer relations • ERPNext, open source resource planning software for enterprises.
  91. 91. EvoSQL Evaluation Outcomes • 100% of targets covered for 98% of the queries • On average 86% covered for the remaining 2% • Usually within seconds • Outperforms biased and random alternatives: • Biased random can handle 90% of simple queries (< 10 rules) • Biased random often finds no solution for complex queries (10+ rules)
  92. 92. Property- Based Testing clipart by GDJ https://openclipart.org/detail/232264/colorful-fleur-de-lis-fractal-3
  93. 93. Alan Turing on Assertions (wo)
  94. 94. Assertions Defined An assertion is a Boolean expression at a specific point in a program which will be true unless there is a bug in the program. http://wiki.c2.com/?WhatAreAssertions Assertions in the program: They hold for any execution of that point. Unlike test code assertion, which holds for one execution only105
  95. 95. The Java (C, C++, …) assert Statement If boolean-expression is true, do nothing. If it is false, throw an AssertionError, with the string as message “assert” boolean-expression [“:” string ]
  96. 96. LLVM Assertion Examples (BitcodeReader.cpp) 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 assert(BlockAddrFwdRefs.empty() && "Unresolved blockaddress fwd references"); assert(Ty == V->getType() && "Type mismatch in constant table!"); assert((Ty == 0 || Ty == V->getType()) && "Type mismatch in value table!"); assert(It != ResolveConstants.end() && It->first == *I); assert(isa<ConstantExpr>(UserC) && "Must be a ConstantExpr."); assert(V->getType()->isMetadataTy() && "Type mismatch in value table!"); assert((!Alignment || isPowerOf2_32(Alignment)) && "Alignment must be a power of two."); assert((Record[i] == 3 || Record[i] == 4) && "Invalid attribute group entry"); assert(Record[i] == 0 && "Kind string not null terminated"); assert(Record[i] == 0 && "Value string not null terminated"); assert(ResultTy && "Didn't read a type?"); assert(TypeList[NumRecords] == 0 && "Already read type?"); assert(NextBitCode == bitc::METADATA_NAMED_NODE); (void)NextBitCode; assert((CT != LandingPadInst::Catch || !isa<ArrayType>(Val->getType())) && "Catch clause has a invalid type!"); assert((CT != LandingPadInst::Filter || isa<ArrayType>(Val->getType())) && "Filter clause has invalid type!"); assert(DFII != DeferredFunctionInfo.end() && "Deferred function not found!"); assert(DeferredFunctionInfo.count(F) && "No info to read function later?"); assert(M == TheModule && "Can only Materialize the Module this BitcodeReader is attached to."); https://blog.regehr.org/archives/1091
  97. 97. Thinking in Assertions • Method preconditions: • Propositions that must hold before calling the method • Method postconditions • Propositions that are guaranteed to hold after the method has finished • Structural invariants • Properties over the state of an object throughout the object’s lifetime • Helps to improve / reason about design • Can be turned into assertions that can be checked at run time • Supports the testing process
  98. 98. Formal Specifications via Hoare Triples • Any execution of A, • starting in a state where P holds • will terminate in a state where Q holds { P } A { Q } { preconditions } Method { postconditions }
  99. 99. Precondition Design • The “strength” of your preconditions is a design choice. • The weaker your precondition • The more situations your method needs to handle • The less thinking the client needs to do (easier to use) • However, with weak preconditions: • The server will always do the checking • This may be redundant: checks also done if we’re sure they’ll pass.
  100. 100. Examples: File has been crated; Player has been moved; Points have been added; Resulting tile is never null; If client invokes a (server) method and meets its preconditions, the server guarantees the postcondition will hold. clipart by floEdelmann https://openclipart.org/detail/260432/beach-chair
  101. 101. If you (as client) invoke a (server) method without meeting its preconditions, anything can happen. E.g.: Null pointer exception clipart by tzunghaor https://openclipart.org/detail/166696/nuclear-explosion
  102. 102. Design By Contract • Contract metaphor: • Contract: an explicit statement of the rights and obligations between a client and a server • Server perspective: • If you call me and meet my precondition, I ensure that after returning I deliver a state in which my postcondition holds • If not, you’re on your own. Bertrand Meyer, Applying "Design by Contract", IEEE Computer 25, 10, October 1992, pages 40-51
  103. 103. Bertrand Meyer’s Seven Principles of Software Testing 1. To test a program is to try to make it fail. 2. Tests are no substitute for specifications 3. Any failed execution must yield a test case 4. Determining success or failure of tests must be an automatic process (4.b: via contracts) Bertrand Meyer, IEEE Software, 2008. Required Reading!
  104. 104. Seven Principles of Software Testing 5. An effective testing process must include both manually and automatically produced test cases. 6. Test strategies must be empirically validated 7. A testing strategy’s most important property is the number of faults it uncovers as a function of time.
  105. 105. Assertions Pro / Con Great • Support better testing • Make debugging easier (less distance) • Executable comments • “Gateway drug to formal methods” Less than Great • Slow down code • Make programs incorrect when used improperly • Might trick some of us lazy programmers into using them to implement error handling • Are commonly misunderstood http://blog.regehr.org/archives/1091 Required reading
  106. 106. Property-Based Testing • Think of ”properties” (assertions) for functions • Let “generator” produces series of random input values for function • For each random input check the assertions.
  107. 107. Property: length of concatenated strings equals sum of length of individual strings Quickcheck: will generate 100 random strings to check this property.
  108. 108. Can tools help us find bugs automatically? Yes, even without running the code! clipart by Machovka https://openclipart.org/detail/2676/lady-bug
  109. 109. Examples of bugs • Equals checks for incompatible operand • HE: Class defines equals() but not hashCode() • RpC: Repeated conditional tests • FL: Method performs math using floating point precision • RANGE: Array offset is out of bounds (RANGE_ARRAY_OFFSET) • Etc etc… • Full list: https://spotbugs.readthedocs.io/en/latest/bugDescriptions.html#
  110. 110. Linters are prevalent • OSS systems have been intensively using linters. • Tools are highly flexible, and developers have different strategies to configure it. • Challenge: false positives. • You should develop your own!! • Bugs specific to your context, e.g., config files. Beller, Moritz, et al. "Analyzing the state of static analysis: A large-scale evaluation in open source software." Software Analysis, Evolution, and Reengineering (SANER), 2016 IEEE 23rd International Conference on. Vol. 1. IEEE, 2016. Tómasdóttir, K. F., Aniche, M., & Deursen, A. V. (2017, October). Why and how JavaScript developers use linters. In Proceedings of the 32nd IEEE/ACM International Conference on Automated Software Engineering (pp. 578-589). IEEE Press.
  111. 111. Why Developers Use Linters
  112. 112. Importance of the different rules 1. Stylistic Issues 2. Best Practices 3. Variables 4. Possible Errors 5. Node.js & CommonJS 6. ECMAScript 6 7. Strict Mode 1. Possible Errors 92.5% 2. Best Practices 89% 3. ECMAScript 6 86.7% 4. Variables 86,4% 5. Stylistic Issues 78.2% 6. Node.js & CommonJS 62.6% 7. Strict Mode 57.8%
  113. 113. Code review in test files! Test files are almost 2 times less likely to be discussed during code review when reviewed together with production files!! Davide Spadini, Maurício Aniche, Magiel Bruntink, Margaret-Anne Storey, Alberto Bacchelli. When Testing Meets Code Review: Why and How Developers Review Tests. ICSE 2018.
  114. 114. Code review in test files! Little on finding more bugs! Davide Spadini, Maurício Aniche, Magiel Bruntink, Margaret-Anne Storey, Alberto Bacchelli. When Testing Meets Code Review: Why and How Developers Review Tests. ICSE 2018. 0% 10% 20% 30% 0% 10% 20% 30% Code improvement Understanding Social communication Defect Knowledge transfer Misc
  115. 115. Learn software testing is challenging! clipart by frankes https://openclipart.org/detail/190242/comic-girl-tini-at-school
  116. 116. Common mistakes • Test coverage (20.87%) • Maintainability of test code (20.42%) • Understanding test concepts (15.35%) • Boundary testing (12.95%) • State-based testing (12.39%) • Assertions (8.93%) • Mock Objects (5.87%) • Tools (4.21%)
  117. 117. Difficult topics Maurício Aniche, Felienne Hermans, Arie van Deursen. An Exploratory Study on Challenges in Software Testing Education. TU Delft. In submission. 17% 19% 30% 31% 42% 35% 27% 35% 29% 46% 56% 36% 30% 44% 54% 46% 73% 76% 49% 42% 33% 32% 27% 25% 25% 25% 21% 20% 19% 18% 16% 16% 14% 14% 2% 1% 34% 39% 37% 37% 31% 40% 48% 41% 50% 35% 26% 46% 54% 40% 32% 41% 25% 23% Minimum set of tests Q18 (80) Avoid flaky tests Q17 (81) Exploratory Testing Q16 (80) Defensive programming Q15 (81) How much to test Q14 (80) Acceptance tests Q13 (81) Design by contracts Q12 (81) TDD Q11 (81) Testability Q10 (81) Best practices Q9 (81) State−based testing Q8 (81) Apply MC/DC Q7 (83) Structural testing Q6 (82) Boundary Testing Q5 (84) Mock Objects Q4 (84) Choose the test level Q3 (84) Arrange−Act−Assert Q2 (81) JUnit tests Q1 (83) 100 50 0 50 100
  118. 118. How to Learn? Maurício Aniche, Felienne Hermans, Arie van Deursen. An Exploratory Study on Challenges in Software Testing Education. TU Delft. In submission. 0% 1% 7% 6% 9% 10% 7% 31% 30% 35% 29% 93% 93% 80% 75% 73% 72% 65% 33% 32% 30% 20% 7% 6% 12% 19% 19% 18% 28% 36% 38% 34% 51% Midterm exam Q11 (81) AMA sessions Q10 (82) Related papers Q9 (79) Support from TAs Q8 (82) Labwork Q7 (83) ISTQB book Q6 (81) PragProg book Q5 (80) Interaction Q4 (83) Live coding Q3 (83) Guest lectures Q2 (83) Lectures Q1 (83) 100 50 0 50 100 Peopledonotlikebooksandpapers…
  119. 119. The majority of projects and users [from 416 participants and 1,337,872 intervals] do not practice testing actively. We should change it. Moritz Beller, Georgios Gousios, Annibale Panichella, Andy Zaidman. When, How, and Why Developers (Do Not) Test in Their IDEs. FSE 2015. clipart by laobc https://openclipart.org/detail/65257/sad-baby
  120. 120. Topics of today • Structural testing and MC/DC • Log monitoring and passive learning • Search-based software testing • Mutation testing • Fuzzing • Property-based testing • Code review • Static analysis tools Maurício Aniche m.f.aniche@tudelft.nl @mauricioaniche http://www.mauricioaniche.com/talks/2018/tad

×