Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Software Testing with Caipirinhas and Stroopwafels

292 views

Published on

In this presentation, I talk about different ways of testing your software that go beyond testing. Log analytics and DevOps, static analysis tools, automated test generation, mistakes in web API integration, and challenges in software testing education.

I gave this talk to several Brazilian companies in December/2017.

Published in: Technology
  • Be the first to comment

  • Be the first to like this

Software Testing with Caipirinhas and Stroopwafels

  1. 1. Maurício Aniche m.f.aniche@tudelft.nl @mauricioaniche Software testing with caipirinhas and stroopwafels
  2. 2. 🇳🇱 Jeroen Castelein 🇮🇷 Mozhan Soltani 🇮🇹 Annibale Panichella 🇳🇱 Joop Aué 🇳🇱 Maikel Lobbezoo 🇳🇱 Rick Wieman 🇳🇱 Sicco Verwer 🇳🇱 Felienne Hermans 🇮🇹 Davide Spadini🇮🇹 🇨🇭 Alberto Bacchelli 🇳🇱 Arie van DeursenKristín Fjóla 🇳🇱 Peter Evers
  3. 3. How to make sure your software works? Software testing! More: • Log analytics • Static analysis And more: • Test generation • Code review • Test code quality • Production code quality
  4. 4. Context: Payments Payment Provider
  5. 5. DEV OPS
  6. 6. One Billion Log Lines a Day: Monitoring using the ELK Stack • Logstash: Unify different logging sources • Elastic Search: Search and filter large log data • Kibana: Visual interactive dashboard Image credit: www.neteye-blog.com
  7. 7. Poll: Java Exceptions in a Payment System Your payment system in production generates 1 billion log lines per day. How many errors / warnings with exceptions do you expect to see? A. None. “We have a zero exception policy.” B. 1 Thousand. “Some exceptions are unavoidable.” C. 1 Million. “Most exceptions are harmless.” D. 1 Billion. “We only log errors and exceptions.” Adyen, Nov 2016: ~1,000,000 per day
  8. 8. Logness: Extract, Cluster, Tag • Extract features: • application name, class name, exception • Remove details: • literal numbers, (encryption) hashes • Cluster: • Same payment identifier in 15min window • Same features • longest common substring above threshold • Tag as severe, known (monitored, bug), and unknown Peter Evers, Maurício Aniche, Arie van Deursen, Maikel Lobbezoo. Finding Relevant Errors in Massive Payment Log Data. TU Delft, 2017, in preparation. 1,000,000 err log lines --> 250 exception clusters
  9. 9. Issues Found in Research Period First credit cards starting with 95 and with 19 digits: long overflow! Merchant configuration error. All payments stalled. Discovered before being noticed by merchant Firewall configuration problem: Server unreachable. Discovered before merchants were assigned to this server Server update incompatible with legacy point of sale terminals. Customer could buy, but merchant received no money. IOException triggered.
  10. 10. Complex API Integration • Payment APIs are complex • Integration faults are easily made • Merchant needs assistance with API usage • Merchant may not notice mistakes • 2.5M http error responses per month • What can we learn from them? 12
  11. 11. 2.5M Errors to 69 Fault Cases FC12 Contract not found Replication latency. FC24 iDEAL communication error FC42 Invalid paRes from issuer FC1 ApplePay token amount-mismatch FC5 Billing address problem (Country 0) FC62 Unable to decrypt data FC14 Could not read XML stream. FC15 Couldn’t parse expiry year Joop Aué, Maurício Aniche, Arie van Deursen, Maikel Lobbezoo An Exploratory Study on Faults in Web API Integration in a Large-Scale Payment Company . TU Delft, 2017. Submitted.
  12. 12. 11 Common Causes for API Error Reponses Integrators are definitely the main responsible for API integration problems!
  13. 13. API Integration Recommendations • API Consumer: • Actually handle all error codes returned by provider • API Producer: • Document which error codes can be returned under what circumstances • Offer easy-to-use test harness for integrations created by consumers • Make explicit which error codes are ‘retriable’ • Enrich returned error codes with actionable info (for consumer or end user) • Offer Error Dashboard for API consumer offering live insight in error handling • API Researcher: • Rethink API usability in this context
  14. 14. Payment Terminals Payment Provider
  15. 15. Point of sale terminal variability • Card brands • Card entry modes (chip, swipe, contactless) • Currency conversion • Loyalty points • Validation type (pin, signature) • Issuer responses (declined, insufficient balance) • Cancellations (shopper, merchant)
  16. 16. Passive learning Identifying system behavior from observations, and representing it in the smallest possible model. 20170101160001 Adyen version: ****** 20170101160002 Starting TX/amt=10001/currency=978 20170101160003 Starting EMV 20170101160004 EMV started 20170101160005 Magswipe opened 20170101160006 CTLS started 20170101160007 Transaction initialised 20170101160008 Run TX as EMV transaction 20170101160009 Application selected app:****** 20170101160010 read_application_data succeeded 20170101160011 data_authentication succeeded 20170101160012 validate 0 20170101160013 DCC rejected 20170101160014 terminal_risk_management succeeded 20170101160015 verify_card_holder succeeded 20170101160016 generate_first_ac succeeded 20170101160017 Authorizing online 20170101160018 Data returned by the host succeeded 20170101160019 Transaction authorized by card 20170101160020 Approved receipt printed 20170101160021 pos_result_code:APPROVED 20170101160022 Final status: Approved 20170101160001 Adyen version: ****** 20170101160002 Starting TX/amt=10001/currency=978 20170101160003 Starting EMV 20170101160004 EMV started 20170101160005 Magswipe opened 20170101160006 CTLS started 20170101160007 Transaction initialised 20170101160008 Run TX as EMV transaction 20170101160009 Application selected app:****** 20170101160010 read_application_data succeeded 20170101160011 data_authentication succeeded 20170101160012 validate 0 20170101160013 DCC rejected 20170101160014 terminal_risk_management succeeded 20170101160015 verify_card_holder succeeded 20170101160016 generate_first_ac succeeded 20170101160017 Authorizing online 20170101160018 Data returned by the host succeeded 20170101160019 Transaction authorized by card 20170101160020 Approved receipt printed 20170101160021 pos_result_code:APPROVED 20170101160022 Final status: Approved 20170101160001 Adyen version: ****** 20170101160002 Starting TX/amt=10001/currency=978 20170101160003 Starting EMV 20170101160004 EMV started 20170101160005 Magswipe opened 20170101160006 CTLS started 20170101160007 Transaction initialised 20170101160008 Run TX as EMV transaction 20170101160009 Application selected app:****** 20170101160010 read_application_data succeeded 20170101160011 data_authentication succeeded 20170101160012 validate 0 20170101160013 DCC rejected 20170101160014 terminal_risk_management succeeded 20170101160015 verify_card_holder succeeded 20170101160016 generate_first_ac succeeded 20170101160017 Authorizing online 20170101160018 Data returned by the host succeeded 20170101160019 Transaction authorized by card 20170101160020 Approved receipt printed 20170101160021 pos_result_code:APPROVED 20170101160022 Final status: Approved 20170101160001 Adyen version: ****** 20170101160002 Starting TX/amt=10001/currency=978 20170101160003 Starting EMV 20170101160004 EMV started 20170101160005 Magswipe opened 20170101160006 CTLS started 20170101160007 Transaction initialised 20170101160008 Run TX as EMV transaction 20170101160009 Application selected app:****** 20170101160010 read_application_data succeeded 20170101160011 data_authentication succeeded 20170101160012 validate 0 20170101160013 DCC rejected 20170101160014 terminal_risk_management succeeded 20170101160015 verify_card_holder succeeded 20170101160016 generate_first_ac succeeded 20170101160017 Authorizing online 20170101160018 Data returned by the host succeeded 20170101160019 Transaction authorized by card 20170101160020 Approved receipt printed 20170101160021 pos_result_code:APPROVED 20170101160022 Final status: Approved 20170101160001 Adyen version: ****** 20170101160002 Starting TX/amt=10001/currency=978 20170101160003 Starting EMV 20170101160004 EMV started 20170101160005 Magswipe opened 20170101160006 CTLS started 20170101160007 Transaction initialised 20170101160008 Run TX as EMV transaction 20170101160009 Application selected app:****** 20170101160010 read_application_data succeeded 20170101160011 data_authentication succeeded 20170101160012 validate 0 20170101160013 DCC rejected 20170101160014 terminal_risk_management succeeded 20170101160015 verify_card_holder succeeded 20170101160016 generate_first_ac succeeded 20170101160017 Authorizing online 20170101160018 Data returned by the host succeeded 20170101160019 Transaction authorized by card 20170101160020 Approved receipt printed 20170101160021 pos_result_code:APPROVED 20170101160022 Final status: Approved Rick Wieman, Maurício Aniche, Willem Lobbezoo, Sicco Verwer and Arie van Deursen. An Experience Report on Applying Passive Learning in a Large-Scale Payment Company. ICSME Industry Track, 2017 https://automatonlearning.net/ DFASAT / FlexFringe Heule & Verwer, ICGI 2010
  17. 17. Use Inferred Models to Analyze: Bugs in Test Phase • Terminal asked for PIN • AND asked for signature • Domain expert noted this unwanted behavior in inferred model. • Fixed before it went into production
  18. 18. Use Inferred Models to Analyze: Differences Between Card Brands Twice as many chip errors Informed merchant about issue.
  19. 19. Use Inferred Models to Analyze: Time out problems Timeout Improved performance under network instability by adding targeted retry mechanism
  20. 20. Log Analysis in Research 1. Abstraction Seeing the bigger picture 2. Detection Finding errors and anomalies 3. Enhancing More effective logging practices 4. Parsing Extracting message templates 5. Modeling Message ordering and protocols 6. Scaling Dealing with many many logs 7. Visualizing Put the eyes to use Joop Aué, Maurício Aniche, Arie van Deursen. Log Analysis from A-Z: A Literature Survey. TU Delft, 2017, in preparation. Identified 73 core papers. Venues: SIGOPS SOSP ACM TOCS Usenix WASL Usenix OSDI IEEE ISSRE ICSE
  21. 21. Testing can be hard… • Lack of testability. • Hard to think about all the corner cases. • You never know if your tests are good enough.
  22. 22. Fraser, Gordon, and Andrea Arcuri. "Evosuite: automatic test suite generation for object-oriented software." Proceedings of the 19th ACM SIGSOFT symposium and the 13th European conference on Foundations of software engineering. ACM, 2011.
  23. 23. SQL Query SELECT Name FROM Product WHERE Price > 20 Name Price Towel 15 Lawn mower 40 1kg Caviar 900 Coffee cup 1 Name Price Towel 15 Lawn mower 40 1kg Caviar 900 Coffee cup 1 Database Table: Product Output Name Lawn mower 1kg Caviar
  24. 24. Testing SQL Query SELECT Name FROM Product WHERE Price > 20 Name Price - 19 - 20 - 21 Test Database Table: Product Coverage Criterion 1. False Price = 19 2. Boundary Price = 20 3. True Price = 21
  25. 25. Testing SQL Query SELECT * FROM `account` LEFT JOIN `user` AS `assignedUser` ON account.assigned_user_id = assigneduser.id LEFT JOIN `user` AS `modifiedBy` ON account.modified_by_id = modifiedby.id LEFT JOIN `user` AS `createdBy` ON account.created_by_id = createdby.id LEFT JOIN `entity_email_address` AS `emailAddressesMiddle` ON account.id = emailaddressesmiddle.entity_id AND emailaddressesmiddle.deleted = '0' AND emailaddressesmiddle.primary = '1' AND emailaddressesmiddle.entity_type = 'Account' LEFT JOIN `email_address` AS `emailAddresses` ON emailaddresses.id = emailaddressesmiddle.email_address_id AND emailaddresses.deleted = '0' LEFT JOIN `entity_phone_number` AS `phoneNumbersMiddle` ON account.id = phonenumbersmiddle.entity_id AND phonenumbersmiddle.deleted = '0' AND phonenumbersmiddle.primary = '1' AND phonenumbersmiddle.entity_type = 'Account' LEFT JOIN `phone_number` AS `phoneNumbers` ON phonenumbers.id = phonenumbersmiddle.phone_number_id AND phonenumbers.deleted = '0' WHERE (( account.name LIKE 'Besha%' OR account.id IN (SELECT entity_id FROM entity_email_address JOIN email_address ON email_address.id = entity_email_address.email_address_id WHERE entity_email_address.deleted = 0 AND entity_email_address.entity_type = 'Account' AND email_address.deleted = 0 AND email_address.name LIKE 'Besha%') )) AND account.deleted = '0' x 42 Coverage Rules 
  26. 26. Other Approaches Coverage Rule Query SELECT Name FROM Product WHERE Price = 20 Column Constraints Name - Price = 20 Constraint Satisfaction Problem Coverage Rule Query SELECT Name FROM Product WHERE Price > 50 AND Price < 100 Column Constraints Name - Price > 50 and < 100 Constraint Satisfaction Problem
  27. 27. Limitations Subqueries SELECT Name FROM Product WHERE Price < (SELECT MAX(Price) FROM UserPrice WHERE UserId = 1) String Constraints SELECT Price FROM Product WHERE Name = ‘Towel’ OR Name LIKE ‘%Caviar%’ 84% of our evaluation
  28. 28. Using a Database Query ”detailed execution” log
  29. 29. Using a Database Target Query SELECT Name FROM Product WHERE Price = 20 Name Price Towel 15 Coffee 4 Table: Product Dataset 1 Name Price Caviar 900 Table: Product Dataset 2 Fitness value: 5 880 > 20 - 15 900 - 20
  30. 30. Using a Database Target Query SELECT Name FROM Product WHERE Price = 20 Name Price Towel 20 Coffee 4 Table: Product Dataset 1 Fitness value: 0 20 - 20
  31. 31. Using a Database LEFT JOIN RIGHT JOIN INNER JOIN GROUP BY EXISTS HAVING WHERE LIKE MAX SUM IN NOT >= <> <= OR COUNT
  32. 32. MC/DC Coverage on SQL Queries Javier Tuya, Maria Suarez-Cabal and Claudio de la Riva. Full predicate coverage for testing SQL database queries. Software Testing, Verification and Reliability, 2010.
  33. 33. Genetic Algorithm Initialization Fitness Calculations Terminate? Selection Crossover Mutation Elitism Yes No
  34. 34. Initialization Coverage Rule Query SELECT Name FROM Product WHERE Price = 20 Name Price af08u4 -5461 1ruhaev 491 Table: Product Random Individual Name Price af08u4 20 1ruhaev 491 Table: Product Seeded Individual
  35. 35. Crossover Name Price Towel 15Parent 1 Name Cat Coffee Drinks Name Price Coffee 4 Parent 2 Name Cat Glass Food T1 T2 T1 T2 Name Price Towel 15Offspring 1 Name Cat Coffee Drinks Name Price Coffee 4 Offspring 2 Name Cat Glass Food T1 T2 T1 T2
  36. 36. Mutation – Add Name Price Towel 15 Caviar 400 Table: Product Name Price Towel 15 Caviar 400 Nail -23 Table: Product
  37. 37. Mutation – Duplicate Name Price Towel 15 Caviar 400 Table: Product Name Price Towel 15 Caviar 400 Towel 15 Table: Product
  38. 38. Mutation – Remove Name Price Towel 15 Caviar 400 Table: Product Name Price Towel 15 Table: Product
  39. 39. Mutation – Change Name Price Towel 15 Caviar 400 Table: Product Name Price Towul 15 Caviar 400 Table: Product
  40. 40. Mutation – Seeded Change Name Price Towel 15 Caviar 400 Table: Product Name Price Coffee 15 Caviar 400 Table: Product Coverage Rule Query SELECT Price FROM Product WHERE Name = ‘Coffee’
  41. 41. EvoSQL EvoSQL SQLFpc Test Data Query Database Schema Coverage Rules Jeroen Castelein, Maurício Aniche, Mozhan Soltani, Annibale Panicchella, Arie Van Deursen Search-Based Test Data Generation for SQL Queries. ICSE 2018.
  42. 42. Study Context 2,135 queries / 4 systems: • Alura, e-learning platform • EspoCRM, open source software for customer relations • SuiteCRM, open source software for customer relations • ERPNext, open source resource planning software for enterprises.
  43. 43. Study Context Coverage Rules 1-2 3-4 5-6 7-8 9-10 11-15 16-20 21+ # Queries 656 382 408 346 114 107 51 71 84%
  44. 44. EvoSQL Evaluation Outcomes • 100% of targets covered for 98% of the queries • On average 86% covered for the remaining 2% • Usually within seconds • Outperforms biased and random alternatives: • Biased random can handle 90% of simple queries (< 10 rules) • Biased random often finds no solution for complex queries (10+ rules)
  45. 45. Developers love and hate linters!
  46. 46. Configurations ESLint output
  47. 47. oWhat do developers expect from such tools? Why do they use them? oHow do they configure such flexible tools? oWhat challenges do developer face? Kristín Fjóla Tómasdóttir, Maurício Aniche, Arie Van Deursen. Why and How JavaScript Developers Use Linters. ASE 2017. The Adoption of JavaScript Linters. TU Delft. In preparation.
  48. 48. Interviewing Developers Goal - Reasons for using a linter - Methods to configure a linter - Challenges Method - Grounded Theory - 13 questions Data - 15 developers - Top 120 JS GitHub projects 52
  49. 49. Data - 86,366 JavaScript projects - 9,548 ESLint configuration files Analyzing Configuration Files Goal - Prevalence of configurations - Most common rules Method - GHTorrent & Google BigQuery - Tool to parse files
  50. 50. Surveying Developers Goal - Reasons for using a linter - Methods to configure - Most important rules - Challenges Method - Questionnaire - Open and closed questions - Distributed in JS communities Data - 337 responses - Reddit, Echo JS, Facebook, Twitter
  51. 51. Why Developers Use Linters
  52. 52. Importance of the different rules 1. Stylistic Issues 2. Best Practices 3. Variables 4. Possible Errors 5. Node.js & CommonJS 6. ECMAScript 6 7. Strict Mode 1. Possible Errors 92.5% 2. Best Practices 89% 3. ECMAScript 6 86.7% 4. Variables 86,4% 5. Stylistic Issues 78.2% 6. Node.js & CommonJS 62.6% 7. Strict Mode 57.8%
  53. 53. Stylistic Issues quotes 60.6% semi 48.1% indent 43.3% How Developers Configure Linters
  54. 54. Possible Errors no-dupe-keys 39.2% no-unreachable 37.2% How Developers Configure Linters
  55. 55. Best Practices eqeqeq 42.7% no-eval 36.9% How Developers Configure Linters
  56. 56. How Developers Configure Linters Variables no-undef 40.6% no-unused-vars 40.3%
  57. 57. What Challenges Developers Face
  58. 58. To Mock or Not To Mock?
  59. 59. http://www.mauricioaniche.com/2014/06/mockar-ou-nao-mockar/
  60. 60. When to mock? • Infrastructure is often mocked. • There was no clear trend on domain objects. • Complicated classes are mocked. • Classes that are too coupled are mocked. Davide Spadini, M. Finavaro Aniche, Magiel Bruntink, Alberto Bacchelli. To Mock or Not To Mock? An Empirical Study on Mocking Practices. MSR 2017. Mock Objects For Testing Java Systems: Why and How Developers Use Them, and How They Evolve. EMSE. In submission.
  61. 61. Mocks are introduced from the very beginning of the test class! Davide Spadini, M. Finavaro Aniche, Magiel Bruntink, Alberto Bacchelli. To Mock or Not To Mock? An Empirical Study on Mocking Practices. MSR 2017. Mock Objects For Testing Java Systems: Why and How Developers Use Them, and How They Evolve. EMSE. In submission.
  62. 62. Challenges • Dealing with coupling • Mocking in legacy systems • Non-testable/Hard-to-test classes • Untestable dependencies Davide Spadini, M. Finavaro Aniche, Magiel Bruntink, Alberto Bacchelli. To Mock or Not To Mock? An Empirical Study on Mocking Practices. MSR 2017. Mock Objects For Testing Java Systems: Why and How Developers Use Them, and How They Evolve. EMSE. In submission.
  63. 63. 50% of changes in a mock occur because the production code changed! Coupling is strong! Davide Spadini, M. Finavaro Aniche, Magiel Bruntink, Alberto Bacchelli. To Mock or Not To Mock? An Empirical Study on Mocking Practices. MSR 2017. Mock Objects For Testing Java Systems: Why and How Developers Use Them, and How They Evolve. EMSE. In submission.
  64. 64. ATTENTION: THE MOST IMPORTANT LESSON ABOUT WRITING AUTOMATED UNIT TESTS IS ABOUT TO COME!
  65. 65. There’s a correlation between a complex code and a hard-to-test code. Aniche, M., Gerosa, M. “Does test-driven development improve class design? A qualitative study on developers’ perceptions”. Journal of the Brazilian Computer Society.2015, 21:15. Bruntink, Magiel, and Arie Van Deursen. "Predicting class testability using object-oriented metrics." Fourth IEEE International Workshop on Source Code Analysis and Manipulation, 2004.
  66. 66. https://www.facebook.com/notes/kent-beck/unit-tests/1726369154062608/
  67. 67. How I (Maurício) do the trade-off Unit tests Integration tests System tests Manual All business rules should be tested here. Avoid at all cost. But do it when needed. Complex integrations with external services. Main/Risky flow of the app tested. You will come up with your own way of thinking!
  68. 68. It’s your job to decide the best test to write!
  69. 69. A catalogue of patterns For Web Testing Fixture API ID in HTML Move Fast, Move Slow … Aniche, M., Guerra, E., Gerosa, M. “A Set of Patterns to Improve Code Quality of Automated Functional Tests of Web Applications”. 21th Conference on Pattern Languages of Programs. 2014.
  70. 70. Code review in test files! Test files are almost 2 times less likely to be discussed during code review when reviewed together with production files!! Davide Spadini, Maurício Aniche, Magiel Bruntink, Margaret-Anne Storey, Alberto Bacchelli. When Testing Meets Code Review: Why and How Developers Review Tests. ICSE 2018.
  71. 71. Code review in test files! Little on finding more bugs! Davide Spadini, Maurício Aniche, Magiel Bruntink, Margaret-Anne Storey, Alberto Bacchelli. When Testing Meets Code Review: Why and How Developers Review Tests. ICSE 2018.
  72. 72. A main concern of reviewers is understanding whether the test covers all the paths of the production code and to ensure tests’ maintainability and readability. Lack of good tooling support!
  73. 73. Learn software testing is challenging!
  74. 74. Common mistakes Maurício Aniche, Felienne Hermans, Arie van Deursen. An Exploratory Study on Challenges in Software Testing Education. TU Delft. In submission. • Test coverage (20.87%) • Maintainability of test code (20.42%) • Understanding test concepts (15.35%) • Boundary testing (12.95%) • State-based testing (12.39%) • Assertions (8.93%) • Mock Objects (5.87%) • Tools (4.21%)
  75. 75. Difficult topics Maurício Aniche, Felienne Hermans, Arie van Deursen. An Exploratory Study on Challenges in Software Testing Education. TU Delft. In submission.
  76. 76. How to Learn? Maurício Aniche, Felienne Hermans, Arie van Deursen. An Exploratory Study on Challenges in Software Testing Education. TU Delft. In submission. Peopledonotlikebooksandpapers…
  77. 77. Challenges Maurício Aniche, Felienne Hermans, Arie van Deursen. An Exploratory Study on Challenges in Software Testing Education. TU Delft. In submission. • Apply tools and techniques for the first time. • How, what, and how much to test. • Understanding the system under test. • Motivation and experience. • Software testing theory. • Testability mindset.
  78. 78. The majority of projects and users [from 416 participants and 1,337,872 intervals] do not practice testing actively. We should change it. Moritz Beller, Georgios Gousios, Annibale Panichella, Andy Zaidman. When, How, and Why Developers (Do Not) Test in Their IDEs. FSE 2015.
  79. 79. Topics we discussed today! • Log Analytics and DevOps • Web API integration mistakes • Testing SQL queries • To Mock or Not To Mock? • Code review in test files • Challenges in learning software testing Maurício Aniche (m.f.aniche@tudelft.nl / @mauricioaniche)

×