Predictive Testing

Predictive Testing
𝑥 + 𝑎 𝑛 = ෍
𝑘=0
𝑛
𝑛
𝑘
𝑥 𝑘 𝑎 𝑛−𝑘
𝑓 𝑥 = 𝑎0 + ෍
𝑛=1
∞
𝑎 𝑛 cos
𝑛𝜋𝑥
𝐿
+ 𝑏 𝑛 sin
𝑛𝜋𝑥
𝐿
cos 𝛼 + cos 𝛽 = 2 cos
1
2
𝛼 + 𝛽 cos
1
2
𝛼 − 𝛽
𝑖ℏ
𝑑
𝑑𝑡
ห ۧΨ 𝑡 = ห෡Η ۧΨ 𝑡
Ψ 𝑟, 𝑡 =
1
( 2𝜋)3
න Φ(𝑘)𝑒 𝑖(𝑘∙𝑟−𝜔𝑡)
𝑑3
𝑘
𝑎2 + 𝑏2 = 𝑐2
ING Bank Netherlands | Devoxx Antwerp | November 2019
Herminio Vazquez
ING Bank | Mortgages Tribe
@canimus
How much testing is enough?
𝑖ℏ
𝑑
𝑑𝑡
ห ۧΨ 𝑡 = ห෡Η ۧΨ 𝑡
𝑓 𝑥 = 𝑎0 + ෍
𝑛=1
∞
𝑎 𝑛 cos
𝑛𝜋𝑥
𝐿
+ 𝑏 𝑛 sin
𝑛𝜋𝑥
𝐿
𝑎2
+ 𝑏2
= 𝑐2
cos 𝛼 + cos 𝛽 = 2 cos
1
2
𝛼 + 𝛽 cos
1
2
𝛼 − 𝛽

Let’s start
with some
figures…

Search hits
0 50 100 150 200 250 300
How much testing
is enough?
P vs NP?
Thousands
Source: Google (in thousands)

born (got free milk = 1)
work (got paid = 53)
live (got utility bills = 13)

Machine
Learning
Expert
• Gather data to create a numeric
representation of a problem
• Fit a function that describes such
problem minimizing its error
• Improve the function until the
cost of computing does not
require a lot of energy, time or
money
• Close the gap between academia
and industry without scaring
stakeholders
But, what do you do?

Roadmap to
Predictive Testing
Testing Mortgages
Industry Support
for handling change
Testing in
the old days Risk Formula
Meet:
The Domain Expert
Convert problem
to Math
The Predictive
MachineOh Yeah!
+90% Coverage baby!
Setting up
your predictive
testing model

Talking about homes
feels like this…
• Dream Homes
• Flowers
• Customers happy
Photo by Jesse Roberts on Unsplash

Testing Mortgages feels
like the effort required
to build this...
• Data intensive process
• Large environment
• Seriously integrated
• Complex rule set
Photo by Adli Wahid on Unsplash

Workflow
Management
System
Document
Management
System
Contract
Management
System
Client
Data Check
Credit
Check
National
Guarantee
Check
Price
Check
Intermediary
Portal
Check
Int-x
Fraud
Check
Mortgage
Network
Check
Document
Inflow
Check
Document
Outflow
Check
Postcode
Check
Int-y
Int-z
Architecture

To get it right, we needed to:
• Reduce complexity
• Outsmart automation
• Orchestrate data
• Travel in time
• Profile risk
• Develop confidence
Photo by Sven Mieke on Unsplash

Did you test it?
Test-Driven
Development

It works on my machine…
DevOps

Continuous improvement
Metrics

Once upon a time
in the
Testing Industry…

Theory Vacuum
or System Critical Apps
Orthogonal Arrays
Defect Paradox
Equivalence Partitions
Pairwise Testing
Decision Tables
State Transition Diagrams
Exploratory Testing

Domain Expert…
• Logical Associations
• Uncover dependencies
• Driven by experience
• Naturally biased
Photo by Hunters Race on Unsplash

Analog
Test Model
Collateral
Test
Cases
Interfaces
Calculations
Validations
Process
Product
T2
T9
T3
T1
T4
T5
T6
T8
Events
T7
Parties
Documents

Gotchas
• Unquantifiable risk
• Regression test suite keeps
growing. And with it, the technical
debt too
• Coverage remains unresolved, or
explained in terms of screens,
fields or interfaces
• Change advisory board does not
exist in Agile, but there is a group
of Go-To persons for the last
feeling check. Should we release?
But, what is the
problem with analog?

𝑹 = 𝚰 × 𝚸
R = Risk
I = Impact
P = Probability

Step 1:
Probabilities
• Search for transaction tables or
log files with system events
• Access logs in web servers are a
great source for calculating odds
• Search tables with [datetime]
data type
• Calculate row counts and row
counts

Code Example:
import numpy as np
import matplotlib.pyplot as plt
df = pd.read_csv(“access-log.csv”)
filter = df[
(~df.stem.str.match(“.*eot$”)) &
(~df.stem.str.match(“isalive.aspx$”)) &
(~df.stem.str.match(“.*js$”)) &
(~df.stem.str.match(“.*png$”)) &
(~df.stem.str.match(“.*css$”)) &
(~df.stem.str.match(“.*jpg$”)) &
(~df.stem.str.match(“.*ico$”)) &
(~df.stem.str.match(“.*gif$”)) &
(~df.stem.str.match(“.*xml$”)) &
(~df.stem.str.match(“.*woff$”))
].groupby([‘stem’,‘user_id’]).size()
plt.imshow(filter)
Python 3.7.5

Workload
Distribution
User
Stem

From Workflow
to Graph Theory…

Query Example:
-- search for immutable records
SELECT
*
from
INFORMATION_SCHEMA.COLUMNS
where
DATA_TYPE like ‘datetime’
MS SQL Server 2017

Query Example:
-- search for ETL star model anchors
SELECT
fromStatus,
toStatus,
userId,
changedAt,
contract
from
WorkflowEvent
where
changedAt BETWEEN ‘2019-11-01’ and ‘2019-12-01’
MS SQL Server 2017

changedAt userId fromStatus toStatus contract
2019-01-01 10:01:24 1 A B 100001
2019-01-01 10:01:29 23 B C 100001
2019-01-01 10:09:33 16 C D 100001
1 Table
A
B
C
ED
MF
O
H
Z
Graph2 Adjacency Matrix2
0 1 0 0 0 0
0 0 1 0 0 0
0 0 0 1 1 0
0 0 0 0 0 1
0 0 0 0 0 0
0 0 0 0 0 0
A B C D E F
A
B
C
D
E
F

Code Example:
import numpy as np
import networkx as nx
import pandas as pd
G = nx.DiGraph()
G.add_edge(“A”, “B”)
G.add_edge(“B”, “C”)
# more edges...
# alternative
G1 = nx.from_pandas_dataframe(pd.read_sql(QUERY, conn))
M = nx.to_numpy_array(G) # adjacency matrix
np.linalg.matrix_power(M, 2) # paths length 2
np.linalg.matrix_power(M, 3) # paths length 3
Python 3.7.5

Code Example:
-- Graph metrics
G.degree()
DiDegreeView({'A': 1, 'B': 2, 'C': 1})
-- Graph metrics
nx.between_centrality(G)
{'A': 0.0, 'B': 0.5, 'C': 0.0}
MS SQL Server 2017

Step 2:
Impact
• This depends on the risk
appetite of your stakeholders
• A good start will be revenue and
reputation
• For ING, being a safe and secure
bank is key
• Research your range, from
compliance to competitive
advantage

<mortgage>
<parties>
<person/>
<person/>
</parties>
<provider/>
<property>
<address>
<street/>
<number/>
<postcode/>
<address/>
</property>
<income/>
</mortgage>
<mortgage>
<parties>
<person/>
<person/>
</parties>
<provider/>
<property>
<address>
<street/>
<number/>
<postcode/>
<address/>
</property>
<income/>
</mortgage>
1
Mortgage
Application
Time-series
Model
2 3
Euclidian
Distance is cheap
Level
Time

P1,1
Stochastic Matrix
VS.
Risk Profile
Impact
Probability
P1,2 P1,3 P1,4
P2,2 P2,3 P2,4
P3,2 P3,3 P3,4
P4,2 P4,3 P4,4
P2,1
P3,1
P4,1

Achievements:
• Coverage 92% of high demand flows
• Lead to reduce time on regression
suite execution from days to hours
• Risk-neutrality
• Informed decisions
• Collapse automation entropy

In summary:
• Predict test when coverage is uncertain
regardless of the test level
• Begin with volumes & probabilities as they
are relatively easier than impact
• Map your problem domain to a numeric
representation
• Domain expertise is good. But nothings
beats: Expert + Data
• If you are automating everything, you need
predictive testing ;-)
Photo by Hunters Race on Unsplash

Gotchas
• XML Classification
Maciej Piernik (2019) Pattern-based clustering and classification of XML Data
• Graph Similarity
Danai Koutra, Ankir Parikh (2011) Algorithms for Graph similarity and Subgraph matching
• Mining Graph Data
Churu Aggarwal, Haixum Wang (Springer 2010) Managing and Mining Graph Data
• XML VS. JSON
Zia Ul Haq, Gul Faraz Khan, Tazar Hussain (2013) A Comprehensive analysis of XML and JSON web
technologies
• Stochastic Matrices
Branko Curgus, Robert Jewett (2007) Somewhat stochastic matrices

Herminio Vazquez
ING Bank | Mortgages Tribe
@canimus

Predictive Testing

Recommended

Recommended

More Related Content

Similar to Predictive Testing

Similar to Predictive Testing (20)

Recently uploaded

Recently uploaded (20)

Predictive Testing