QA Fest 2019. Никита Кричко. Тестирование приложений, использующих ИИ

Тема доклада
KYIV 2019
Кричко Никита
ML application testing
QA CONFERENCE #1 IN UKRAINE

About me
Krychko Mykyta
Performance test architect
Like:
- Puzzle
- ML
- Data science
- Data science puzzle
Nik.krichko@gmail.com

Our hero
Onufrious
- Millennial
- Tank driver
- traveller
- Cannot live without smartphone
- Game of thrones fan

Data science errors
The Neural Net Tank Urban Legend
https://www.gwern.net/Tanks

Data science errors
UBER - supply and demand

Data science errors
Recommendation system from one IP

Data science errors
WHY I see ads
about pregnancy
termination?

Errors
• Critical
• Fraud
• Unethical

Errors: based on type of ML
Supervised
Unsupervised

ML TASKS
SUPERVISED:
• Classification
• Regression
• Forecast
UNSUPERVISED:
• Clustering
• Outlier detection
• Dimensionality reduction

CLASSIFICATION
Define object class

CLASSIFICATION
Define object class
Hotdog not hotdog

CLASSIFICATION
TRAIN data set
Main hormone Long
hair
has_hotdog Sex
testosterone 0 1 male
estrogen 1 0 female
estrogen 1 0 female

CLASSIFICATION
TRAIN data set
Main hormone Long
hair
has_hotdog Sex
estrogen 1 0 female
estrogen 1 0 female
Imbalanced data
20% female
80% male

Imbalanced data
Over-sampling minority class under-sampling majority class Both
library(ROSE)
undersampling_result <- ovun.sample(Class ~ .,
data = Dataset,
method = {“over”,“under”, “both”})

CLASSIFICATION
Real life dataset
Main hormone Long
hair
has_hotdog Sex
estrogen 1 0
estrogen 1 0
testosterone 0 1
testosterone 0 0
estrogen 1 1
testosterone 0 1
testosterone 0 0

CLASSIFICATION
Define object class
Main hormone Long
hair
hotdog Sex
estrogen 1 0 female
estrogen 1 0 female
testosterone 0 0 female
estrogen 1 1 male

Regression
Define object valuebased on other values
Price depend on
supply
demand
weekday
time
rush hour
weather

How to test
Boundaries values
Outlier detection (anomaly detection)
Use GANs

Clestering
How to find classes?
How many classes?

Outlier detection
Local anomaly
Global anomaly
Other cluster

Outlier detection
library(dbscan)
furniture_lof <- lof(scale(furniture), k = 5)
Interpreting LOF
LOF is a ratio of densities
LOF > 1more likely to be anomalous
LOF ≤ 1less likely to be anomalous
Large LOF values indicate more isolated points

Outlier detection
# Train deep autoencoder learning model on "normal"
# training data, y ignored
anomaly_model <- h2o.deeplearning(
x = names(train_dataset),
training_frame = train_dataset,
activation = "Tanh",
autoencoder = TRUE,
hidden = c(50,20,50),
sparse = TRUE,
l1 = 1e-4,
epochs = 100)
# Compute reconstruction error with the Anomaly
# detection app (MSE between output and input layers)
Detected_anomalies<- h2o.anomaly(anomaly_model, test_dataset)

Outlier detection
library(isofor)
Isofor_model <- iForest(data = train_dataset, nt = 1)
Isofor_score <- predict(Isofor_model, newdata = test_dataset)

GAN
0
1
0
1
1
1
0
1
0
1
0
А1 * А2 * А3 * А4

GAN
0
1
0
1
1
1
0
1
0
1
0
1
1
1
1
А1 * А2 * А3 * А4
Pooling

ML testing
OBJECT: ML App, Model, data, process
SUBJECT: QA engineer, data analyst, data scientist, ML-
engineer
GOAL: find unexpected object behavior for improving object

What is ML applications errors
Wrong:
decision -- binary, multi class classification
prediction -- regression, forecasting
answer (generation) -- speech generation, picture generation,
Not enough Accuracy (Precision and Recall):
particular situation -- detecting (edges of) object (detect target
on (medecine) battlefield)
big amount of data -- ROC-AUC

Changes in testing philosophy
Text
Traditional software ML software
Some FIXED expected results
Sorted list for all situation
one IN one OUT
Some PROBABLE value
Arranged list for particular situation
multiple IN multiple OUT

Common data science mistakes
•Cherry-Picking
•Data Dredging
•False Causality
•Cobra Effect
•Survivorship Bias
•Gerrymandering
•Sampling Bias
•Gambler’s Fallacy
link
•Hawthorne Effect
•Regression Fallacy
•Simpson’s Paradox
•McNamara Fallacy
•Overfitting
•Publishing Bias
•Relying only on Summary
Metrics (Anscombe )

What can be tested
• Data
• Feature
• Entities
• Model
• Phases
• Performance
• Workflow

Application workflow
UI
Not ML part
ML part
Not ML part
UI
Interact with user
gather data
return data
validate right answers
inform user about errors possibility
validate user knowledge for validating right or wrong
answer

UI
Not ML part
ML part
Not ML part
UI
transform data
integration with third party systems
API actions
form answers
add business rules
filtering and wrangling
error handling
outliers detection
outliers handling
missing data handling
invalidation new rules with ML actions

UI
Not ML part
ML part
Not ML part
UI
Integrations
end-to-end (system)
reinforcement process
new (absence) of data (rules) handling
Require:
big amount of data
supervised different situations
full automatization

QA engineer task:
Interpret cases when application
not work
work not enough accurate
work in non standard situation
detect situation when application can damage others
Gather data for taking decision
interpret negative cases on outliers
False positive
False negative
prepare special controversial data for validating system
pictures with specific
objects
noise
prepare controversial situation when application can generate errors
test and research existing solutions (kaggle)

ML steps to reproduce:
all entities which were wrong classified
require understand why
understand their cluster
give possibility to detect them separately
wrong measurement metric
accuracy on big amount of data
validate system with giving controversial data

BUG (issue) report
Statuses:
• does not work
• wrong work
• work not enough accurate
• work not accurate
• work not enough fast
• WORK ON DEV SAMPLE

Who wants know more?
If we collect at least 200
interested requests –
We will create small
course (smart talk or
meetup series) for this.
https://forms.gle/sYM1Rhc5MZXi76Di9

QA Fest 2019. Никита Кричко. Тестирование приложений, использующих ИИ

Recommended

Recommended

More Related Content

Similar to QA Fest 2019. Никита Кричко. Тестирование приложений, использующих ИИ

Similar to QA Fest 2019. Никита Кричко. Тестирование приложений, использующих ИИ (20)

More from QAFest

More from QAFest (20)

Recently uploaded

Recently uploaded (20)

QA Fest 2019. Никита Кричко. Тестирование приложений, использующих ИИ