An introduction on how machine learning can assist you in finding, how much is enough to test. Covering the risk formula, and references to how to assess impact, and calculate probabilities across a complex domain.
5. born (got free milk = 1)
work (got paid = 53)
live (got utility bills = 13)
6. Machine
Learning
Expert
• Gather data to create a numeric
representation of a problem
• Fit a function that describes such
problem minimizing its error
• Improve the function until the
cost of computing does not
require a lot of energy, time or
money
• Close the gap between academia
and industry without scaring
stakeholders
But, what do you do?
8. Roadmap to
Predictive Testing
Testing Mortgages
Industry Support
for handling change
Testing in
the old days Risk Formula
Meet:
The Domain Expert
Convert problem
to Math
The Predictive
MachineOh Yeah!
+90% Coverage baby!
Setting up
your predictive
testing model
10. Talking about homes
feels like this…
• Dream Homes
• Flowers
• Customers happy
Photo by Jesse Roberts on Unsplash
11. Testing Mortgages feels
like the effort required
to build this...
• Data intensive process
• Large environment
• Seriously integrated
• Complex rule set
Photo by Adli Wahid on Unsplash
13. To get it right, we needed to:
• Reduce complexity
• Outsmart automation
• Orchestrate data
• Travel in time
• Profile risk
• Develop confidence
Photo by Sven Mieke on Unsplash
26. Gotchas
• Unquantifiable risk
• Regression test suite keeps
growing. And with it, the technical
debt too
• Coverage remains unresolved, or
explained in terms of screens,
fields or interfaces
• Change advisory board does not
exist in Agile, but there is a group
of Go-To persons for the last
feeling check. Should we release?
But, what is the
problem with analog?
29. Step 1:
Probabilities
• Search for transaction tables or
log files with system events
• Access logs in web servers are a
great source for calculating odds
• Search tables with [datetime]
data type
• Calculate row counts and row
counts
34. Query Example:
-- search for immutable records
SELECT
*
from
INFORMATION_SCHEMA.COLUMNS
where
DATA_TYPE like ‘datetime’
MS SQL Server 2017
35. Query Example:
-- search for ETL star model anchors
SELECT
fromStatus,
toStatus,
userId,
changedAt,
contract
from
WorkflowEvent
where
changedAt BETWEEN ‘2019-11-01’ and ‘2019-12-01’
MS SQL Server 2017
36. changedAt userId fromStatus toStatus contract
2019-01-01 10:01:24 1 A B 100001
2019-01-01 10:01:29 23 B C 100001
2019-01-01 10:09:33 16 C D 100001
1 Table
A
B
C
ED
MF
O
H
Z
Graph2 Adjacency Matrix2
0 1 0 0 0 0
0 0 1 0 0 0
0 0 0 1 1 0
0 0 0 0 0 1
0 0 0 0 0 0
0 0 0 0 0 0
A B C D E F
A
B
C
D
E
F
37. Code Example:
import numpy as np
import networkx as nx
import pandas as pd
G = nx.DiGraph()
G.add_edge(“A”, “B”)
G.add_edge(“B”, “C”)
# more edges...
# alternative
G1 = nx.from_pandas_dataframe(pd.read_sql(QUERY, conn))
M = nx.to_numpy_array(G) # adjacency matrix
np.linalg.matrix_power(M, 2) # paths length 2
np.linalg.matrix_power(M, 3) # paths length 3
Python 3.7.5
40. Step 2:
Impact
• This depends on the risk
appetite of your stakeholders
• A good start will be revenue and
reputation
• For ING, being a safe and secure
bank is key
• Research your range, from
compliance to competitive
advantage
45. Achievements:
• Coverage 92% of high demand flows
• Lead to reduce time on regression
suite execution from days to hours
• Risk-neutrality
• Informed decisions
• Collapse automation entropy
47. In summary:
• Predict test when coverage is uncertain
regardless of the test level
• Begin with volumes & probabilities as they
are relatively easier than impact
• Map your problem domain to a numeric
representation
• Domain expertise is good. But nothings
beats: Expert + Data
• If you are automating everything, you need
predictive testing ;-)
Photo by Hunters Race on Unsplash
49. Gotchas
• XML Classification
Maciej Piernik (2019) Pattern-based clustering and classification of XML Data
• Graph Similarity
Danai Koutra, Ankir Parikh (2011) Algorithms for Graph similarity and Subgraph matching
• Mining Graph Data
Churu Aggarwal, Haixum Wang (Springer 2010) Managing and Mining Graph Data
• XML VS. JSON
Zia Ul Haq, Gul Faraz Khan, Tazar Hussain (2013) A Comprehensive analysis of XML and JSON web
technologies
• Stochastic Matrices
Branko Curgus, Robert Jewett (2007) Somewhat stochastic matrices