Software Testing with Caipirinhas and Stroopwafels

Maurício Aniche
m.f.aniche@tudelft.nl
@mauricioaniche
Software testing with
caipirinhas and
stroopwafels

🇳🇱 Jeroen Castelein
🇮🇷 Mozhan Soltani 🇮🇹 Annibale Panichella
🇳🇱 Joop Aué 🇳🇱 Maikel Lobbezoo 🇳🇱 Rick Wieman
🇳🇱 Sicco Verwer
🇳🇱 Felienne Hermans 🇮🇹 Davide Spadini🇮🇹 🇨🇭 Alberto Bacchelli
🇳🇱 Arie van DeursenKristín Fjóla
🇳🇱 Peter Evers

How to make sure your software
works?
Software testing!
More:
• Log analytics
• Static analysis
And more:
• Test generation
• Code review
• Test code quality
• Production code quality

Context:
Payments
Payment
Provider

One Billion Log Lines a Day:
Monitoring using the ELK Stack
• Logstash: Unify different logging sources
• Elastic Search: Search and filter large log data
• Kibana: Visual interactive dashboard
Image credit: www.neteye-blog.com

Poll: Java Exceptions in a Payment
System
Your payment system in production generates 1
billion log lines per day. How many errors / warnings
with exceptions do you expect to see?
A. None. “We have a zero exception policy.”
B. 1 Thousand. “Some exceptions are unavoidable.”
C. 1 Million. “Most exceptions are harmless.”
D. 1 Billion. “We only log errors and exceptions.”
Adyen, Nov 2016:
~1,000,000 per
day

Logness: Extract, Cluster, Tag
• Extract features:
• application name, class name, exception
• Remove details:
• literal numbers, (encryption) hashes
• Cluster:
• Same payment identifier in 15min window
• Same features
• longest common substring above threshold
• Tag as severe, known (monitored, bug), and
unknown
Peter Evers, Maurício Aniche, Arie van Deursen, Maikel Lobbezoo.
Finding Relevant Errors in Massive Payment Log Data. TU Delft, 2017, in preparation.
1,000,000
err log lines
-->
250
exception clusters

Issues Found in Research Period
First credit cards
starting with 95 and
with 19 digits:
long overflow!
Merchant configuration error.
All payments stalled.
Discovered before being
noticed by merchant
Firewall configuration
problem: Server unreachable.
Discovered before merchants
were assigned to this server
Server update incompatible with
legacy point of sale terminals.
Customer could buy, but merchant
received no money. IOException
triggered.

Complex API Integration
• Payment APIs are complex
• Integration faults are easily made
• Merchant needs assistance with API
usage
• Merchant may not notice mistakes
• 2.5M http error responses per month
• What can we learn from them?
12

2.5M Errors to 69 Fault Cases
FC12
Contract not found
Replication latency.
FC24
iDEAL
communication error
FC42
Invalid paRes
from issuer
FC1
ApplePay token
amount-mismatch
FC5
Billing address
problem (Country 0)
FC62
Unable to
decrypt data
FC14
Could not read
XML stream.
FC15
Couldn’t parse
expiry year
Joop Aué, Maurício Aniche, Arie van Deursen, Maikel Lobbezoo
An Exploratory Study on Faults in Web API Integration in a Large-Scale Payment Company . TU Delft, 2017. Submitted.

11 Common Causes for API Error
Reponses
Integrators are definitely the main responsible for API integration problems!

API Integration Recommendations
• API Consumer:
• Actually handle all error codes returned by provider
• API Producer:
• Document which error codes can be returned under what
circumstances
• Offer easy-to-use test harness for integrations created by
consumers
• Make explicit which error codes are ‘retriable’
• Enrich returned error codes with actionable info (for
consumer or end user)
• Offer Error Dashboard for API consumer offering live insight in
error handling
• API Researcher:
• Rethink API usability in this context

Payment
Terminals
Payment
Provider

Point of sale terminal variability
• Card brands
• Card entry modes
(chip, swipe, contactless)
• Currency conversion
• Loyalty points
• Validation type (pin, signature)
• Issuer responses
(declined, insufficient balance)
• Cancellations
(shopper, merchant)

Passive learning
Identifying system behavior from observations,
and representing it in the smallest possible model.
20170101160001 Adyen version: ******
20170101160002 Starting TX/amt=10001/currency=978
20170101160003 Starting EMV
20170101160004 EMV started
20170101160005 Magswipe opened
20170101160006 CTLS started
20170101160007 Transaction initialised
20170101160008 Run TX as EMV transaction
20170101160009 Application selected app:******
20170101160010 read_application_data succeeded
20170101160011 data_authentication succeeded
20170101160012 validate 0
20170101160013 DCC rejected
20170101160014 terminal_risk_management succeeded
20170101160015 verify_card_holder succeeded
20170101160016 generate_first_ac succeeded
20170101160017 Authorizing online
20170101160018 Data returned by the host succeeded
20170101160019 Transaction authorized by card
20170101160020 Approved receipt printed
20170101160021 pos_result_code:APPROVED
20170101160022 Final status: Approved
20170101160001 Adyen version: ******
20170101160004 EMV started
20170101160012 validate 0
20170101160001 Adyen version: ******
20170101160004 EMV started
20170101160012 validate 0
20170101160001 Adyen version: ******
20170101160004 EMV started
20170101160012 validate 0
20170101160001 Adyen version: ******
20170101160004 EMV started
20170101160012 validate 0
Rick Wieman, Maurício Aniche, Willem Lobbezoo, Sicco Verwer and Arie van Deursen.
An Experience Report on Applying Passive Learning in a Large-Scale Payment Company. ICSME Industry Track, 2017
https://automatonlearning.net/
DFASAT / FlexFringe
Heule & Verwer, ICGI 2010

Use Inferred Models to Analyze:
Bugs in Test Phase
• Terminal asked for PIN
• AND asked for signature
• Domain expert noted this unwanted
behavior in inferred model.
• Fixed before it went into production

Differences Between Card Brands
Twice as many chip errors
Informed
merchant
about issue.

Time out problems
Timeout
Improved
performance under
network instability
by adding targeted
retry mechanism

Log Analysis in Research
1. Abstraction Seeing the bigger picture
2. Detection Finding errors and anomalies
3. Enhancing More effective logging practices
4. Parsing Extracting message templates
5. Modeling Message ordering and protocols
6. Scaling Dealing with many many logs
7. Visualizing Put the eyes to use
Joop Aué, Maurício Aniche, Arie van Deursen.
Log Analysis from A-Z: A Literature Survey. TU Delft, 2017, in preparation.
Identified 73
core papers.
Venues:
SIGOPS SOSP
ACM TOCS
Usenix WASL
Usenix OSDI
IEEE ISSRE
ICSE

Testing can be hard…
• Lack of testability.
• Hard to think about all the
corner cases.
• You never know if your tests
are good enough.

Fraser, Gordon, and Andrea Arcuri. "Evosuite: automatic test suite generation for object-oriented software." Proceedings of
the 19th ACM SIGSOFT symposium and the 13th European conference on Foundations of software engineering. ACM, 2011.

SQL
Query
SELECT Name
FROM Product
WHERE Price > 20
Name Price
Towel 15
Lawn mower 40
1kg Caviar 900
Coffee cup 1
Name Price
Towel 15
Lawn mower 40
1kg Caviar 900
Coffee cup 1
Database
Table: Product
Output
Name
Lawn mower
1kg Caviar

Testing SQL
Query
SELECT Name
FROM Product
WHERE Price > 20
Name Price
- 19
- 20
- 21
Test Database
Table: Product
Coverage Criterion
1. False
Price = 19
2. Boundary
Price = 20
3. True
Price = 21

Testing SQL
Query
SELECT *
FROM àccount`
LEFT JOIN ùser` AS àssignedUser` ON account.assigned_user_id = assigneduser.id
LEFT JOIN ùser` AS `modifiedBy` ON account.modified_by_id = modifiedby.id
LEFT JOIN ùser` AS `createdBy` ON account.created_by_id = createdby.id
LEFT JOIN èntity_email_address` AS èmailAddressesMiddle`
ON account.id = emailaddressesmiddle.entity_id
AND emailaddressesmiddle.deleted = '0'
AND emailaddressesmiddle.primary = '1'
AND emailaddressesmiddle.entity_type = 'Account'
LEFT JOIN èmail_address` AS èmailAddresses`
ON emailaddresses.id = emailaddressesmiddle.email_address_id
AND emailaddresses.deleted = '0'
LEFT JOIN èntity_phone_number` AS `phoneNumbersMiddle`
ON account.id = phonenumbersmiddle.entity_id
AND phonenumbersmiddle.deleted = '0'
AND phonenumbersmiddle.primary = '1'
AND phonenumbersmiddle.entity_type = 'Account'
LEFT JOIN `phone_number` AS `phoneNumbers`
ON phonenumbers.id = phonenumbersmiddle.phone_number_id
AND phonenumbers.deleted = '0'
WHERE (( account.name LIKE 'Besha%'
OR account.id IN (SELECT entity_id
FROM entity_email_address
JOIN email_address
ON email_address.id =
entity_email_address.email_address_id
WHERE entity_email_address.deleted = 0
AND entity_email_address.entity_type =
'Account'
AND email_address.deleted = 0
AND email_address.name LIKE 'Besha%') ))
AND account.deleted = '0'
x 42 Coverage Rules


Other Approaches
Coverage Rule Query
SELECT Name
FROM Product
WHERE Price = 20
Column Constraints
Name -
Price = 20
Constraint Satisfaction Problem
Coverage Rule Query
SELECT Name
FROM Product
WHERE Price > 50 AND Price < 100
Column Constraints
Name -
Price > 50 and < 100
Constraint Satisfaction Problem

Limitations
Subqueries
SELECT Name
FROM Product
WHERE Price < (SELECT MAX(Price) FROM UserPrice WHERE UserId = 1)
String Constraints
SELECT Price
FROM Product
WHERE Name = ‘Towel’ OR Name LIKE ‘%Caviar%’
84% of our evaluation

Using a Database
Query
”detailed execution”
log

Using a Database
Target Query
SELECT Name
FROM Product
WHERE Price = 20 Name Price
Towel 15
Coffee 4
Table: Product
Dataset 1
Name Price
Caviar 900
Table: Product
Dataset 2
Fitness value: 5 880
>
20 - 15 900 - 20

Using a Database
Target Query
SELECT Name
FROM Product
WHERE Price = 20 Name Price
Towel 20
Coffee 4
Table: Product
Dataset 1
Fitness value: 0
20 - 20

Using a Database
LEFT JOIN
RIGHT JOIN
INNER JOIN
GROUP BY
EXISTS
HAVING
WHERE
LIKE
MAX SUM
IN
NOT
>=
<>
<=
OR
COUNT

MC/DC Coverage on SQL Queries
Javier Tuya, Maria Suarez-Cabal and Claudio de la Riva. Full predicate coverage for testing SQL database queries. Software
Testing, Verification and Reliability, 2010.

Genetic Algorithm
Initialization
Fitness
Calculations
Terminate?
Selection
Crossover
Mutation
Elitism
Yes
No

Initialization
Coverage Rule Query
SELECT Name
FROM Product
WHERE Price = 20
Name Price
af08u4 -5461
1ruhaev 491
Table: Product
Random Individual
Name Price
af08u4 20
1ruhaev 491
Table: Product
Seeded Individual

Crossover
Name Price
Towel 15Parent 1
Name Cat
Coffee Drinks
Name Price
Coffee 4 Parent 2
Name Cat
Glass Food
T1
T2
T1
T2
Name Price
Towel 15Offspring 1
Name Cat
Coffee Drinks
Name Price
Coffee 4 Offspring 2
Name Cat
Glass Food
T1
T2
T1
T2

Mutation – Add
Name Price
Towel 15
Caviar 400
Table: Product
Name Price
Towel 15
Caviar 400
Nail -23
Table: Product

Mutation – Duplicate
Name Price
Towel 15
Caviar 400
Table: Product
Name Price
Towel 15
Caviar 400
Towel 15
Table: Product

Mutation – Remove
Name Price
Towel 15
Caviar 400
Table: Product
Name Price
Towel 15
Table: Product

Mutation – Change
Name Price
Towel 15
Caviar 400
Table: Product
Name Price
Towul 15
Caviar 400
Table: Product

Mutation – Seeded Change
Name Price
Towel 15
Caviar 400
Table: Product
Name Price
Coffee 15
Caviar 400
Table: Product
Coverage Rule Query
SELECT Price
FROM Product
WHERE Name = ‘Coffee’

EvoSQL
EvoSQL
SQLFpc
Test Data
Query
Database Schema
Coverage
Rules
Jeroen Castelein, Maurício Aniche, Mozhan Soltani, Annibale Panicchella, Arie Van Deursen
Search-Based Test Data Generation for SQL Queries. ICSE 2018.

Study Context
2,135 queries / 4 systems:
• Alura, e-learning platform
• EspoCRM, open source software for customer relations
• SuiteCRM, open source software for customer relations
• ERPNext, open source resource planning software for
enterprises.

Study Context
Coverage Rules 1-2 3-4 5-6 7-8 9-10 11-15 16-20 21+
# Queries 656 382 408 346 114 107 51 71
84%

EvoSQL Evaluation Outcomes
• 100% of targets covered for 98% of the queries
• On average 86% covered for the remaining 2%
• Usually within seconds
• Outperforms biased and random alternatives:
• Biased random can handle 90% of simple queries (< 10
rules)
• Biased random often finds no solution for complex
queries (10+ rules)

Developers love and hate linters!

oWhat do developers
expect from such tools?
Why do they use them?
oHow do they configure
such flexible tools?
oWhat challenges do
developer face?
Kristín Fjóla Tómasdóttir, Maurício Aniche, Arie Van Deursen.
Why and How JavaScript Developers Use Linters. ASE 2017.
The Adoption of JavaScript Linters. TU Delft. In preparation.

Interviewing Developers
Goal
- Reasons for using a
linter
- Methods to configure
a linter
- Challenges
Method
- Grounded Theory
- 13 questions
Data
- 15 developers
- Top 120 JS GitHub
projects
52

Data
- 86,366 JavaScript
projects
- 9,548 ESLint
configuration files
Analyzing Configuration Files
Goal
- Prevalence of
configurations
- Most common rules
Method
- GHTorrent & Google
BigQuery
- Tool to parse files

Surveying Developers
Goal
- Reasons for using a
linter
- Methods to configure
- Most important rules
- Challenges
Method
- Questionnaire
- Open and closed
questions
- Distributed in JS
communities
Data
- 337 responses
- Reddit, Echo JS,
Facebook, Twitter

Importance of the different rules
1. Stylistic Issues
2. Best Practices
3. Variables
4. Possible Errors
5. Node.js &
CommonJS
6. ECMAScript 6
7. Strict Mode
1. Possible Errors 92.5%
2. Best Practices 89%
3. ECMAScript 6 86.7%
4. Variables 86,4%
5. Stylistic Issues 78.2%
6. Node.js & CommonJS 62.6%
7. Strict Mode 57.8%

Stylistic Issues
quotes 60.6%
semi 48.1%
indent 43.3%
How Developers Configure Linters

Possible Errors
no-dupe-keys 39.2%
no-unreachable 37.2%

Best Practices
eqeqeq 42.7%
no-eval 36.9%

Variables
no-undef 40.6%
no-unused-vars 40.3%

What Challenges Developers Face

http://www.mauricioaniche.com/2014/06/mockar-ou-nao-mockar/

When to mock?
• Infrastructure is often mocked.
• There was no clear trend on domain objects.
• Complicated classes are mocked.
• Classes that are too coupled are mocked.
Davide Spadini, M. Finavaro Aniche, Magiel Bruntink, Alberto Bacchelli.
To Mock or Not To Mock? An Empirical Study on Mocking Practices. MSR 2017.
Mock Objects For Testing Java Systems: Why and How Developers Use Them, and How They Evolve. EMSE. In submission.

Mocks are introduced from the
very beginning of the test class!

Challenges
• Dealing with coupling
• Mocking in legacy systems
• Non-testable/Hard-to-test classes
• Untestable dependencies

50% of changes in a mock occur
because the production code
changed! Coupling is strong!

ATTENTION:
THE MOST IMPORTANT LESSON
ABOUT WRITING AUTOMATED UNIT TESTS
IS ABOUT TO COME!

There’s a correlation between a
complex code and a hard-to-test
code.
Aniche, M., Gerosa, M. “Does test-driven development improve class design? A qualitative study on developers’
perceptions”. Journal of the Brazilian Computer Society.2015, 21:15.
Bruntink, Magiel, and Arie Van Deursen. "Predicting class testability using object-oriented metrics." Fourth IEEE
International Workshop on Source Code Analysis and Manipulation, 2004.

https://www.facebook.com/notes/kent-beck/unit-tests/1726369154062608/

How I (Maurício) do the trade-off
Unit tests
Integration tests
System tests
Manual
All business rules should be
tested here.
Avoid at all cost. But do it
when needed.
Complex integrations with
external services.
Main/Risky flow of the app
tested.
You will come up with your own way of thinking!

It’s your job to decide the best
test to write!

A catalogue
of patterns
For Web Testing Fixture API
ID in HTML
Move Fast, Move Slow
…
Aniche, M., Guerra, E., Gerosa, M. “A Set of Patterns to Improve Code Quality of Automated Functional Tests of Web
Applications”. 21th Conference on Pattern Languages of Programs. 2014.

Code review in test files!
Test files are almost 2 times less likely to be discussed
during code review when reviewed together with
production files!!
Davide Spadini, Maurício Aniche, Magiel Bruntink, Margaret-Anne Storey, Alberto Bacchelli. When Testing Meets Code
Review: Why and How Developers Review Tests. ICSE 2018.

Code review in test files!
Little on
finding more
bugs!
Davide Spadini, Maurício Aniche, Magiel Bruntink, Margaret-Anne Storey, Alberto Bacchelli. When Testing Meets Code
Review: Why and How Developers Review Tests. ICSE 2018.

A main concern of reviewers is
understanding
whether the test covers all the
paths of the production code
and to ensure tests’
maintainability and readability.
Lack of good tooling
support!

Learn
software
testing is
challenging!

Common mistakes
Maurício Aniche, Felienne Hermans, Arie van Deursen. An Exploratory Study on Challenges in Software Testing
Education. TU Delft. In submission.
• Test coverage (20.87%)
• Maintainability of test code (20.42%)
• Understanding test concepts (15.35%)
• Boundary testing (12.95%)
• State-based testing (12.39%)
• Assertions (8.93%)
• Mock Objects (5.87%)
• Tools (4.21%)

Difficult topics

How to Learn?
Peopledonotlikebooksandpapers…

Challenges
• Apply tools and techniques for the first time.
• How, what, and how much to test.
• Understanding the system under test.
• Motivation and experience.
• Software testing theory.
• Testability mindset.

The majority of projects and users [from 416
participants and 1,337,872 intervals] do not practice
testing actively.
We should change it.
Moritz Beller, Georgios Gousios, Annibale Panichella, Andy Zaidman. When, How, and Why Developers (Do Not) Test in Their IDEs. FSE 2015.

Topics we discussed today!
• Log Analytics and DevOps
• Web API integration mistakes
• Testing SQL queries
• To Mock or Not To Mock?
• Code review in test files
• Challenges in learning software testing
Maurício Aniche (m.f.aniche@tudelft.nl / @mauricioaniche)

Software Testing with Caipirinhas and Stroopwafels

Recommended

Recommended

More Related Content

Similar to Software Testing with Caipirinhas and Stroopwafels

Similar to Software Testing with Caipirinhas and Stroopwafels (20)

More from Maurício Aniche

More from Maurício Aniche (20)

Recently uploaded

Recently uploaded (20)

Software Testing with Caipirinhas and Stroopwafels

Editor's Notes