SlideShare a Scribd company logo
2021/09/02
Masaaki Uesaka
Tests for AI model
Today’s topic
I would like to introduce the testing of machine learning models
based on the contents of the book [Sato et al, 2021].
But as a result of surveying papers related to the contents of the book, the contents have expanded
beyond the book.
I will skip the detail in my talk and please read this slide and search the references if you are interested in this
topic.
We focus on the following four topics:
• Metamorphic test
• Neuron coverage
• Adversarial minimum distortion (最大安全半径)
• Formal verification (網羅検証)
[Sato et al, 2021] 佐藤直人・小川秀人・來間啓伸・明神智之 (2021).『AIソフトウェアのテストー答えのない答え合わせ [4つの手法]』リックテレコム
We can access the code in [Sato et al, 2021] by purchasing it.
(based on keras & tensorflow. not pytorch)
Problems in testing of machine learning
Output
Behaviour
In ordinary software, (unit) test is done by checking whether the output of function in the
specified input is equal to expected one.
Some techniques for software tests
Equivalence class test
• Divide the input spaces to equivalence classes in which the
expected outputs are same.
• Take a representative from each equivalence class
• Create the test case for the representatives.
Boundary value test
• Divide the input spaces to equivalence classes
• Take the nearest values to neighboring classes for each class
• Create the test cases for these boundary values
• This is applicable if the input space is ordered.
Input value Expected output
0〜9 1
10〜99 2
100〜999 3
1000〜9999 4
e.g. The function which returns the
number of digits in decimal
notation
Problems in test for machine learning model
Can we apply the software test techniques for machine learning model?
dog
cat
Machine
Learning
Model
We do not know the entire boundary of the dog images
and cat images.
dog cat
other
?
In the vase space of images, we cannot determine the
boundaries of each category in advance.
We cannot apply the boundary test.
Problems in test for machine learning model
Can we apply the software test techniques for machine learning model?
dog
cat
Machine
Learning
Model
• The expected behavior of the classification depends
on the human judgements.
• Sometimes output of the model differs from the
human judgements.
Deterministic assertion test is not applicable
At most, we can make the statistical one:
Assert that the probability that the labeling of human and ML model coincides is high enough to
make it practical.
→ Precision, Recall, F1 score,…
Problems in test for machine learning model
If we admit the probabilistic assertion, we encounter the problem of adversarial attack.
[Goodfellow et al., 2015]
ML models are easily fooled by small perturbations unrecognizable to humans.
Methods of Adversarial Attacks
Fast Gradient Sign Method [Goodfellow et al, 2015] Corresponding label of x
Cost function of training
Carlini-Wagner Method [Carlini & Wagner, 2017]
Trained model
Projected Gradient Descent Method [Madry et al, 2018]
Measure of correspondence
[Goodfellow et al., 2015] Goodfellow, I. J., Shlens, J., & Szegedy, C. (2015). Explaining and Harnessing Adversarial Examples. ICLR.
[Carlini & Wagner, 2017] Carlini, N., & Wagner, D. (2017). Towards Evaluating the Robustness of Neural Networks. 2017 IEEE Symposium on Security and Privacy (SP), 39–57.
[Madry et al, 2018] Madry, A., Makelov, A., Schmidt, L., Tsipras, D., & Vladu, A. (2018). Towards Deep Learning Models Resistant to Adversarial Attacks. ICLR
Intermediate Summary
In ML model, the ordinary software test is difficult to apply because
(1) Tests cannot be deterministic; they can only be evaluated statistically.
(2) In the first place, it is not possible to identify the equivalence classes and their
boundaries in advance in a very high dimensional data space.
(3) Even if we accept statistical evaluation, the existence of adversarial attacks does not
guarantee the robustness of the inference.
These are due to the inductive nature of the ML model construction.
• There are no clear functional requirements. (The expected output for the input cannot be
perfectly assumed.)
• Because it learns only from a limited amount of data (though large), it is not perfectly clear
whether the trained model will make valid inferences in other cases (even if generalization
performance is improved).
What is the metamorphic test?
Metamorphic test itself is proposed in [Chen et al, 1998].
This technique is to uncover software error in the production phase and in the absence of test oracle.
Example: Shortest path problem on the weighted graph
undirected weighted graph with vertices V and edges E with distance function d
[Chen et al, 1998] Chen, T. Y., Cheung, S. C., & Yiu, S. M. (2020). Metamorphic Testing: A New Approach for Generating Next Test Cases. ArXiv:2002.12543
What is the metamorphic test?
• Of course, if the graph 𝐺 is small, we can find manually the
shortest path. So we can make the test case manually.
• But in production phase, 𝐺 is generally large. We cannot
construct the test case for such large graph!!
• How can we test for real-world input in the production phase?
→Metamorphic testing!!!
[Chen et al, 1998] Chen, T. Y., Cheung, S. C., & Yiu, S. M. (2020). Metamorphic Testing: A New Approach for Generating Next Test Cases. ArXiv:2002.12543
Examples of metamorphic test
Suppose that
Then the following statements should also be valid:
[Chen et al, 1998] Chen, T. Y., Cheung, S. C., & Yiu, S. M. (2020). Metamorphic Testing: A New Approach for Generating Next Test Cases. ArXiv:2002.12543
Examples of metamorphic test
Assertion 1
Assertion 2
[Chen et al, 1998] Chen, T. Y., Cheung, S. C., & Yiu, S. M. (2020). Metamorphic Testing: A New Approach for Generating Next Test Cases. ArXiv:2002.12543
Assertion 3
Formulation
the function to be tested
Ordinary test
Assert
Metamorphic test
Assert
We don’t need the
expected output.
Input
Modified
input
Output
Output
f
Metamorphic
relation
Input Output
Expected
Output
f
Application of metamorphic test
Application in search engine [Zhou et al, 2016]
Relation name Input modification Expected relation
MPSite If the domain of pi
is di
,
”A” → “A & site:di
”
The result still contains pi
.
MPTitle If the title of pi
is ti
,
”A” → “A & ti
”
The result still contains pi
.
MPReverseJD If A have the form “A1
& A2
& … & An
”,
“A1
& A2
& … & An
” → “An
& … & A 2
& A1
”
The results of A and reversed one have
large similarity.
(measured by Jaccard coefficient)
Search query
A
result
p1
,…,pn
Examples of metamorphic relations
Metamorphic test is designed in order to estimate the quality of search engine.
[Zhou et al, 2016] Zhou, Z. Q., Xiang, S., & Chen, T. Y. (2016). Metamorphic Testing for Software Quality Assessment: A Study of Search Engines.
IEEE Transactions on Software Engineering, 42(3), 264–284.
Metamorphic test in ML model
Also in machine learning system, the concept of metamorphic test is applicable.
Example in image recognition for autonomous cars (DeepTest) [Tian et al, 2018]
Applying the several transformation which is expected not to be affected to the
steering angle, the metamorphic test oracles are created.
They find the erroneous behaviors by these synthetic images.
Remark: They also use the neuron coverage for finding erroneous behavior
(explain later)
Erroneous behaviors found
by metamorphic test
[Tian et al, 2018] Tian, Y., Pei, K., Jana, S., & Ray, B. (2018). DeepTest. Proceedings of the 40th International Conference on Software Engineering, 2018-May, 303–314.
Other examples of metamorphic test
[Zhang et al, 2018] constructs the metamorphic test oracles by synthetic images created by GAN.
[Zhang et al, 2018] Zhang, M., Zhang, Y., Zhang, L., Liu, C., & Khurshid, S. (2018). DeepRoad: GAN-Based Metamorphic Testing and Input Validation Framework for Autonomous Driving
Systems. 2018 33rd IEEE/ACM International Conference on Automated Software Engineering (ASE), 132–142.
Other examples of metamorphic test
[Zhou et al, 2019] uses metamorphic test on
LIDAR data in the Baidu Apollo autonomous
car system.
By adding random data outside the ROI, they
find the fatal behavior that the pedestrian or
other car in the ROI cannot be detected by
small random noise.
[Zhou et al, 2019] Zhou, Z. Q., & Sun, L. (2019). Metamorphic testing of driverless
cars. Communications of the ACM, 62(3), 61–67.
Other examples of metamorphic test
[He et al, 2020] uses metamorphic test on machine translation.
I live on campus and am tall.
(erroneous behavior)
[He et al, 2020] He, P., Meister, C., & Su, Z. (2020). Structure-invariant testing for machine translation.
Proceedings of the ACM/IEEE 42nd International Conference on Software Engineering, 961–973.
The number of “activated” neurons
What is neuron coverage?
Deep neural network
The number of neurons
Activation function like ReLU
What is neuron coverage?
NC is proposed in [Pei et al, 2017].
Idea: analogy of coverage rate
If the neuron value is large, this neuron strongly influences
the output through the downstream layers.
Low NC implies that a significant number of neurons have
no effect on the output for the test data set.
In order to test the effect on the output for all neurons, we
need to create test data that will have a high NC.
[Pei et al, 2017] Pei, K., Cao, Y., Yang, J., & Jana, S. (2017). DeepXplore. Proceedings of the 26th Symposium on Operating Systems Principles, 1–18.
How to create the high NC test data?
[Pei et al, 2017] proposes to use the gradient ascending and it is widely used.
Update the data in the
direction of increasing the
value of inactive neurons
Application of neuron coverage
DeepXplore in [Pei et al, 2017] Erroneous behaviors by changing light conditions
Make the outputs of NNs differently
Raise NC
[Pei et al, 2017] Pei, K., Cao, Y., Yang, J., & Jana, S. (2017). DeepXplore. Proceedings of the 26th Symposium on Operating Systems Principles, 1–18.
Application of neuron coverage
[Tian et al, 2018] DeepTest (bis.)
They use the NC for finding the erroneous behavior like DeepXplore.
Erroneous behaviors found
by metamorphic test
Metamorphic test and neuron
coverage can be combined to
find errors efficiently.
Is neuron coverage meaningful?
Some studies indicate that increasing NC is not necessarily meaningful for testing adversarial attack.
[Harel-Canada et al, 2020]
Generate adversarial images that also improve NC.
Carlini-Wagner type adversarial attack
(We can also choose other attacks)
Kullback-Leibler divergence
Uniform distribution in [0,1]
This term measures how different the output of the
j-th layer is from uniform distribution
This also measures whether the neurons in the
j-th layer is unbiasedly activated
= high NC
[Harel-Canada et al, 2020] Harel-Canada, F., Wang, L., Gulzar, M. A., Gu, Q., & Kim, M. (2020). Is neuron coverage a meaningful measure for testing deep neural networks?
Proceedings of the 28th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering, 851–862.
Is neuron coverage meaningful?
For these high-NC adversarial images, they measures the following indices:
Naturalness measures how natural the generated image is.
• Frèchet Inception Distance: a measure of similarity between two dataset of images.
• Inception Score: a measure of naturalness based on salience and diversity
Output impartiality measures how the predictions of the model for adversarial images are biased.
Attack Success Rate is the classification error rate for adversarial test data set.
[Harel-Canada et al, 2020] Harel-Canada, F., Wang, L., Gulzar, M. A., Gu, Q., & Kim, M. (2020). Is neuron coverage a meaningful measure for testing deep neural networks?
Proceedings of the 28th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering, 851–862.
Is neuron coverage meaningful?
Attack success rate Naturalness Output Impartiality
They cannot support the hypothesis that these indices is positively correlated with NC.
[Harel-Canada et al, 2020] Harel-Canada, F., Wang, L., Gulzar, M. A., Gu, Q., & Kim, M. (2020). Is neuron coverage a meaningful measure for testing deep neural networks?
Proceedings of the 28th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering, 851–862.
Is neuron coverage meaningful?
Speaker’s Remark
• Contexts seems to be different between DeepTest & DeepXplore and [Harel-Canada et al, 2020].
• Deeptest & DeepXplore
• The deformation method of image is restricted and recognizable by human
• Finding erroneous behavior of the model is the main concern.
• [Harel-Canada et al, 2020]
• The deformation is unrecognizable by human.
• Finding adversarial attack is the main concern, which is more severe.
• NC+metamorphic test effectively detects erroneous behavior in response to changes in data that are
not in the test dataset but are likely to be real and recognizable by humans.
• For adversarial attack, naturalness and output impartiality are more useful indices.
Minimal adversarial distortion
How can we assure the robustness of ML model including adversarial attack?
In order to do it, it is sufficient to estimate how much noise is allowed to be added without change the
inference results.
Decision boundary
Decision boundary
Decision boundary
x
r
Can we calculate minimal adversarial
distortion in advance?
We can estimate the lower bound.
Lower bound of adversarial minimum distortion
CROWN [Zhang et al, NIPS 2018]
Moreover, these two functions are calculated explicitly as
The proof can be done by backward
induction on the layer number j and
some elementary calculations (if you
have the energy).
The proof for the simple case is given in
[Sato et al, 2021].
[Zhang et al, NIPS 2018] Zhang, H., Weng, T. W., Chen, P. Y., Hsieh, C. J., & Daniel, L. (2018). Efficient neural network robustness certification with general activation functions.
NeurIPS, 4939–4948.
Lower bound of adversarial minimum distortion
Prescribed range of
the neuron values
[Zhang et al, NIPS 2018] Zhang, H., Weng, T. W., Chen, P. Y., Hsieh, C. J., & Daniel, L. (2018). Efficient neural network robustness certification with general activation functions.
NeurIPS, 4939–4948.
Lower bound of adversarial minimum distortion
Moreover, these two functions are calculated explicitly as
Then if c is the inferenced label of the input x,
we have the lower bound of adversarial minimum distortion by
[Zhang et al, 2018]
[Zhang et al, NIPS 2018] Zhang, H., Weng, T. W., Chen, P. Y., Hsieh, C. J., & Daniel, L. (2018). Efficient neural network robustness certification with general activation functions.
NeurIPS, 4939–4948.
Other techniques to find the lower bound
[Boopathy et al, 2019] extends [Zhang et al, 2018] to CNN case.
The code is available as CNN-Cert on https://github.com/IBM/CNN-Cert
[Boopathy et al, 2019] Boopathy, A., Weng, T.-W., Chen, P.-Y., Liu, S., & Daniel, L. (2019). CNN-Cert: An Efficient Framework for Certifying
Robustness of Convolutional Neural Networks. Proceedings of the AAAI Conference on Artificial Intelligence, 33, 3240–3247.
[Weng et al, 2019] Weng, L., Chen, P.-Y., Nguyen, L., Squillante, M., Boopathy, A., Oseledets, I., & Daniel, L. (2019). PROVEN: Verifying
Robustness of Neural Networks with a Probabilistic Approach. PMLR.
Then the following inequality holds for any a:
Can we check the assertion directly?
Example: We construct the machine learning model predicting housing prices based on housing
attributes.
Can we check the following assertion?
If the living area is more than 6,000 square feet, the inferred result is more than $400,000.
At first glance, it looks like it can't be done...
• It is impossible to test on all possible data where the living area is more than 6000 square feet;
• No matter how many data are tested, there is no guarantee that the above claim will hold 100% of
the time.
However, given that we are trying to find counterexamples to the above claim, it would be nice if we
could find at least one example as follows:
Find the data where the living area is more than 6000 square feet but the inferred result is less
than or equal to $400,000.
ML model
Find the data x such that
“and”
Formal verification
From this logical formula, we consider eliminating f.
It may seem impossible, but for example, if f is a deep neural
network, then only affine transformations and activation functions
(ReLU) appear in the definition of f.
So, for example, when the activation function is ReLU, if you try hard, you should be able to eliminate f from
the above equation and write it down using only linear inequalities and "and/or".
In this way, we can reduce the problem to find the counterexample to the satisfiability problem:
Satisfiability problem:
For given propositional logical formula, determine if there exists the true/false assignment to atomic formula
so that the whole proposition is made true.
This problem is NP-complete but efficient algorithms are being actively researched.
Application of formal verification
[Huang et al, 2017] Huang, X., Kwiatkowska, M., Wang, S., & Wu, M. (2017). Safety Verification of Deep Neural Networks. In R. Majumdar & V. Kunčak (Eds.), Computer Aided
Verification (pp. 3–29). Springer International Publishing.
[Huang et al, 2017] applies formal verification to find the adversarial images.
Application of formal verification
[Sato et al, 2020] SATO, N., KURUMA, H., NAKAGAWA, Y., & OGAWA, H. (2020). Formal Verification of a Decision-Tree Ensemble Model and Detection of Its Violation Ranges. IEICE
Transactions on Information and Systems, E103.D(2), 363–378.
[Sato et al, 2020] applies formal verification to decision tree ensemble model and find the violation range
(the range of the values which does not satisfy the assertion)
Application result to housesales prediction problem
Summary of four testing method
Metamorphic test uses the relationship between the output of the base input and that of the modified input.
The erroneous behavior can be captured.
Increasing the neuron coverage is helpful to find the erroneous examples efficiently by combining
metamorphic test.
If you are worried about adversarial attack, you may check the adversarial minimum distortion. The lower
bound of this quantity can be estimated like backpropagation.
For more comprehensive search of counterexamples for some assertion, the formal verification is helpful.
Behaviorist's perspective
Focus on how the ML model
behaves.
Metamorphic test
Neuron Coverage
Anatomist’s perspective
Follow the detailed structure of the
ML model.
Adversarial minimum distortion
Formal verification
This kind of detailed
analysis is possible due to
the fact that the model
structure of DNNs and
the like is a composition
of simple functions.
Practical viewpoint
Metamorphic test is easy to introduce in application.
• We do not need the special libraries or special codes.
• The key is to define the metamorphic relation appropriately according to the target domain.
• Driving support system: make the images foggy, rainy, rotated, ...etc
• Chatbot: change a question into a different expression without changing the meaning.
• ...
• It is possible to use parameter search to find defects in the model without relying on neuron coverage.
CNN-Cert is available to estimate the adversarial minimum distortion.
• This is based on tensorflow and the applicable models are limited.
More important thing is how to raise the robustness and quality of ML model based on these testing
method.
• Erroneous behavior
• data augmentation (we can utilize the metamorphic test to augment the data)
• preprocessing
• Adversarial attack
• adversarial training
References (I)
[Sato et al, 2021] 佐藤直人・小川秀人・來間啓伸・明神智之 (2021).『AIソフトウェアのテストー答えのない答え
合わせ [4つの手法]』リックテレコム
[Goodfellow et al., 2015] Goodfellow, I. J., Shlens, J., & Szegedy, C. (2015). Explaining and Harnessing
Adversarial Examples. In Y. Bengio & Y. LeCun (Eds.), 3rd International Conference on Learning
Representations.
[Carlini & Wagner, 2017] Carlini, N., & Wagner, D. (2017). Towards Evaluating the Robustness of Neural
Networks. 2017 IEEE Symposium on Security and Privacy (SP), 39–57.
[Madry et al, 2018] Madry, A., Makelov, A., Schmidt, L., Tsipras, D., & Vladu, A. (2018). Towards Deep
Learning Models Resistant to Adversarial Attacks. 6th International Conference on Learning
Representations.
[Chen et al, 1998] Chen, T. Y., Cheung, S. C., & Yiu, S. M. (2020). Metamorphic Testing: A New Approach
for Generating Next Test Cases. ArXiv:2002.12543
[Zhou et al, 2016] Zhou, Z. Q., Xiang, S., & Chen, T. Y. (2016). Metamorphic Testing for Software Quality
Assessment: A Study of Search Engines. IEEE Transactions on Software Engineering, 42(3), 264–284.
References (II)
[Tian et al, 2018] Tian, Y., Pei, K., Jana, S., & Ray, B. (2018). DeepTest. Proceedings of the 40th
International Conference on Software Engineering, 2018-May, 303–314.
[Zhang et al, 2018] Zhang, M., Zhang, Y., Zhang, L., Liu, C., & Khurshid, S. (2018). DeepRoad:
GAN-Based Metamorphic Testing and Input Validation Framework for Autonomous Driving Systems.
2018 33rd IEEE/ACM International Conference on Automated Software Engineering (ASE), 132–142.
[Zhou et al, 2019] Zhou, Z. Q., & Sun, L. (2019). Metamorphic testing of driverless cars. Communications
of the ACM, 62(3), 61–67.
[He et al, 2020] He, P., Meister, C., & Su, Z. (2020). Structure-invariant testing for machine translation.
Proceedings of the ACM/IEEE 42nd International Conference on Software Engineering, 961–973.
[Pei et al, 2017] Pei, K., Cao, Y., Yang, J., & Jana, S. (2017). DeepXplore. Proceedings of the 26th
Symposium on Operating Systems Principles, 1–18.
[Harel-Canada et al, 2020] Harel-Canada, F., Wang, L., Gulzar, M. A., Gu, Q., & Kim, M. (2020). Is
neuron coverage a meaningful measure for testing deep neural networks? Proceedings of the 28th ACM
Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of
Software Engineering, 851–862.
References (III)
[Zhang et al, NIPS2018] Zhang, H., Weng, T. W., Chen, P. Y., Hsieh, C. J., & Daniel, L. (2018). Efficient
neural network robustness certification with general activation functions. Advances in Neural Information
Processing Systems, 2018-Decem(NeurIPS), 4939–4948.
[Boopathy et al, 2019] Boopathy, A., Weng, T.-W., Chen, P.-Y., Liu, S., & Daniel, L. (2019). CNN-Cert: An
Efficient Framework for Certifying Robustness of Convolutional Neural Networks. Proceedings of the
AAAI Conference on Artificial Intelligence, 33, 3240–3247.
[Weng et al, 2019] Weng, L., Chen, P.-Y., Nguyen, L., Squillante, M., Boopathy, A., Oseledets, I., &
Daniel, L. (2019). PROVEN: Verifying Robustness of Neural Networks with a Probabilistic Approach. In
K. Chaudhuri & R. Salakhutdinov (Eds.), Proceedings of the 36th International Conference on Machine
Learning (Vol. 97, pp. 6727–6736). PMLR.
[Huang et al, 2017] Huang, X., Kwiatkowska, M., Wang, S., & Wu, M. (2017). Safety Verification of Deep
Neural Networks. In R. Majumdar & V. Kunčak (Eds.), Computer Aided Verification (pp. 3–29). Springer
International Publishing.
[Sato et al, 2020] SATO, N., KURUMA, H., NAKAGAWA, Y., & OGAWA, H. (2020). Formal Verification of
a Decision-Tree Ensemble Model and Detection of Its Violation Ranges. IEICE Transactions on
Information and Systems, E103.D(2), 363–378.
43

More Related Content

What's hot

Naver learning to rank question answer pairs using hrde-ltc
Naver learning to rank question answer pairs using hrde-ltcNaver learning to rank question answer pairs using hrde-ltc
Naver learning to rank question answer pairs using hrde-ltc
NAVER Engineering
 
Evolutionary Design of Swarms (SSCI 2014)
Evolutionary Design of Swarms (SSCI 2014)Evolutionary Design of Swarms (SSCI 2014)
Evolutionary Design of Swarms (SSCI 2014)
Benjamin Bengfort
 
One shot learning
One shot learningOne shot learning
One shot learning
Vuong Ho Ngoc
 
Presentation on supervised learning
Presentation on supervised learningPresentation on supervised learning
Presentation on supervised learning
Tonmoy Bhagawati
 
Brief introduction to Machine Learning
Brief introduction to Machine LearningBrief introduction to Machine Learning
Brief introduction to Machine Learning
CodeForFrankfurt
 
PPT - AutoML-Zero: Evolving Machine Learning Algorithms From Scratch
PPT - AutoML-Zero: Evolving Machine Learning Algorithms From ScratchPPT - AutoML-Zero: Evolving Machine Learning Algorithms From Scratch
PPT - AutoML-Zero: Evolving Machine Learning Algorithms From Scratch
Jisang Yoon
 
DagdelenSiriwardaneY..
DagdelenSiriwardaneY..DagdelenSiriwardaneY..
DagdelenSiriwardaneY..
butest
 
Machine Learning for Dummies (without mathematics)
Machine Learning for Dummies (without mathematics)Machine Learning for Dummies (without mathematics)
Machine Learning for Dummies (without mathematics)
ActiveEon
 
Policy Based reinforcement Learning for time series Anomaly detection
Policy Based reinforcement Learning for time series Anomaly detectionPolicy Based reinforcement Learning for time series Anomaly detection
Policy Based reinforcement Learning for time series Anomaly detection
Kishor Datta Gupta
 
Deep Learning Enabled Question Answering System to Automate Corporate Helpdesk
Deep Learning Enabled Question Answering System to Automate Corporate HelpdeskDeep Learning Enabled Question Answering System to Automate Corporate Helpdesk
Deep Learning Enabled Question Answering System to Automate Corporate Helpdesk
Saurabh Saxena
 
Presentation on Machine Learning and Data Mining
Presentation on Machine Learning and Data MiningPresentation on Machine Learning and Data Mining
Presentation on Machine Learning and Data Mining
butest
 
Deep Learning for Stock Prediction
Deep Learning for Stock PredictionDeep Learning for Stock Prediction
Deep Learning for Stock Prediction
Lim Zhi Yuan (Zane)
 
Improving Spam Mail Filtering Using Classification Algorithms With Partition ...
Improving Spam Mail Filtering Using Classification Algorithms With Partition ...Improving Spam Mail Filtering Using Classification Algorithms With Partition ...
Improving Spam Mail Filtering Using Classification Algorithms With Partition ...
IRJET Journal
 
Few shot learning/ one shot learning/ machine learning
Few shot learning/ one shot learning/ machine learningFew shot learning/ one shot learning/ machine learning
Few shot learning/ one shot learning/ machine learning
ﺁﺻﻒ ﻋﻠﯽ ﻣﯿﺮ
 
InfoGAIL
InfoGAIL InfoGAIL
InfoGAIL
Sungjoon Choi
 
Introduction to Machine learning ppt
Introduction to Machine learning pptIntroduction to Machine learning ppt
Introduction to Machine learning ppt
shubhamshirke12
 
Applying Deep Learning to Enhance Momentum Trading Strategies in Stocks
Applying Deep Learning to Enhance Momentum Trading Strategies in StocksApplying Deep Learning to Enhance Momentum Trading Strategies in Stocks
Applying Deep Learning to Enhance Momentum Trading Strategies in Stocks
Lawrence Takeuchi
 
Deep Learning in Recommender Systems - RecSys Summer School 2017
Deep Learning in Recommender Systems - RecSys Summer School 2017Deep Learning in Recommender Systems - RecSys Summer School 2017
Deep Learning in Recommender Systems - RecSys Summer School 2017
Balázs Hidasi
 
Generative Models for General Audiences
Generative Models for General AudiencesGenerative Models for General Audiences
Generative Models for General Audiences
Sangwoo Mo
 
Steganalysis of LSB Embedded Images Using Gray Level Co-Occurrence Matrix
Steganalysis of LSB Embedded Images Using Gray Level Co-Occurrence MatrixSteganalysis of LSB Embedded Images Using Gray Level Co-Occurrence Matrix
Steganalysis of LSB Embedded Images Using Gray Level Co-Occurrence Matrix
CSCJournals
 

What's hot (20)

Naver learning to rank question answer pairs using hrde-ltc
Naver learning to rank question answer pairs using hrde-ltcNaver learning to rank question answer pairs using hrde-ltc
Naver learning to rank question answer pairs using hrde-ltc
 
Evolutionary Design of Swarms (SSCI 2014)
Evolutionary Design of Swarms (SSCI 2014)Evolutionary Design of Swarms (SSCI 2014)
Evolutionary Design of Swarms (SSCI 2014)
 
One shot learning
One shot learningOne shot learning
One shot learning
 
Presentation on supervised learning
Presentation on supervised learningPresentation on supervised learning
Presentation on supervised learning
 
Brief introduction to Machine Learning
Brief introduction to Machine LearningBrief introduction to Machine Learning
Brief introduction to Machine Learning
 
PPT - AutoML-Zero: Evolving Machine Learning Algorithms From Scratch
PPT - AutoML-Zero: Evolving Machine Learning Algorithms From ScratchPPT - AutoML-Zero: Evolving Machine Learning Algorithms From Scratch
PPT - AutoML-Zero: Evolving Machine Learning Algorithms From Scratch
 
DagdelenSiriwardaneY..
DagdelenSiriwardaneY..DagdelenSiriwardaneY..
DagdelenSiriwardaneY..
 
Machine Learning for Dummies (without mathematics)
Machine Learning for Dummies (without mathematics)Machine Learning for Dummies (without mathematics)
Machine Learning for Dummies (without mathematics)
 
Policy Based reinforcement Learning for time series Anomaly detection
Policy Based reinforcement Learning for time series Anomaly detectionPolicy Based reinforcement Learning for time series Anomaly detection
Policy Based reinforcement Learning for time series Anomaly detection
 
Deep Learning Enabled Question Answering System to Automate Corporate Helpdesk
Deep Learning Enabled Question Answering System to Automate Corporate HelpdeskDeep Learning Enabled Question Answering System to Automate Corporate Helpdesk
Deep Learning Enabled Question Answering System to Automate Corporate Helpdesk
 
Presentation on Machine Learning and Data Mining
Presentation on Machine Learning and Data MiningPresentation on Machine Learning and Data Mining
Presentation on Machine Learning and Data Mining
 
Deep Learning for Stock Prediction
Deep Learning for Stock PredictionDeep Learning for Stock Prediction
Deep Learning for Stock Prediction
 
Improving Spam Mail Filtering Using Classification Algorithms With Partition ...
Improving Spam Mail Filtering Using Classification Algorithms With Partition ...Improving Spam Mail Filtering Using Classification Algorithms With Partition ...
Improving Spam Mail Filtering Using Classification Algorithms With Partition ...
 
Few shot learning/ one shot learning/ machine learning
Few shot learning/ one shot learning/ machine learningFew shot learning/ one shot learning/ machine learning
Few shot learning/ one shot learning/ machine learning
 
InfoGAIL
InfoGAIL InfoGAIL
InfoGAIL
 
Introduction to Machine learning ppt
Introduction to Machine learning pptIntroduction to Machine learning ppt
Introduction to Machine learning ppt
 
Applying Deep Learning to Enhance Momentum Trading Strategies in Stocks
Applying Deep Learning to Enhance Momentum Trading Strategies in StocksApplying Deep Learning to Enhance Momentum Trading Strategies in Stocks
Applying Deep Learning to Enhance Momentum Trading Strategies in Stocks
 
Deep Learning in Recommender Systems - RecSys Summer School 2017
Deep Learning in Recommender Systems - RecSys Summer School 2017Deep Learning in Recommender Systems - RecSys Summer School 2017
Deep Learning in Recommender Systems - RecSys Summer School 2017
 
Generative Models for General Audiences
Generative Models for General AudiencesGenerative Models for General Audiences
Generative Models for General Audiences
 
Steganalysis of LSB Embedded Images Using Gray Level Co-Occurrence Matrix
Steganalysis of LSB Embedded Images Using Gray Level Co-Occurrence MatrixSteganalysis of LSB Embedded Images Using Gray Level Co-Occurrence Matrix
Steganalysis of LSB Embedded Images Using Gray Level Co-Occurrence Matrix
 

Similar to Test for AI model

Machine learning testing survey, landscapes and horizons, the Cliff Notes
Machine learning testing  survey, landscapes and horizons, the Cliff NotesMachine learning testing  survey, landscapes and horizons, the Cliff Notes
Machine learning testing survey, landscapes and horizons, the Cliff Notes
Heemeng Foo
 
Testing survey by_directions
Testing survey by_directionsTesting survey by_directions
Testing survey by_directions
Tao He
 
A Software Measurement Using Artificial Neural Network and Support Vector Mac...
A Software Measurement Using Artificial Neural Network and Support Vector Mac...A Software Measurement Using Artificial Neural Network and Support Vector Mac...
A Software Measurement Using Artificial Neural Network and Support Vector Mac...
ijseajournal
 
(Structural) Feature Interactions for Variability-Intensive Systems Testing
(Structural) Feature Interactions for Variability-Intensive Systems Testing (Structural) Feature Interactions for Variability-Intensive Systems Testing
(Structural) Feature Interactions for Variability-Intensive Systems Testing
Gilles Perrouin
 
Software Testing: Issues and Challenges of Artificial Intelligence & Machine ...
Software Testing: Issues and Challenges of Artificial Intelligence & Machine ...Software Testing: Issues and Challenges of Artificial Intelligence & Machine ...
Software Testing: Issues and Challenges of Artificial Intelligence & Machine ...
gerogepatton
 
Software Testing: Issues and Challenges of Artificial Intelligence & Machine ...
Software Testing: Issues and Challenges of Artificial Intelligence & Machine ...Software Testing: Issues and Challenges of Artificial Intelligence & Machine ...
Software Testing: Issues and Challenges of Artificial Intelligence & Machine ...
gerogepatton
 
SOFTWARE TESTING: ISSUES AND CHALLENGES OF ARTIFICIAL INTELLIGENCE & MACHINE ...
SOFTWARE TESTING: ISSUES AND CHALLENGES OF ARTIFICIAL INTELLIGENCE & MACHINE ...SOFTWARE TESTING: ISSUES AND CHALLENGES OF ARTIFICIAL INTELLIGENCE & MACHINE ...
SOFTWARE TESTING: ISSUES AND CHALLENGES OF ARTIFICIAL INTELLIGENCE & MACHINE ...
ijaia
 
The Status of ML Algorithms for Structure-property Relationships Using Matb...
The Status of ML Algorithms for Structure-property Relationships Using Matb...The Status of ML Algorithms for Structure-property Relationships Using Matb...
The Status of ML Algorithms for Structure-property Relationships Using Matb...
Anubhav Jain
 
Smriti's research paper
Smriti's research paperSmriti's research paper
Smriti's research paper
Smriti Tikoo
 
IRJET- Deep Learning Model to Predict Hardware Performance
IRJET- Deep Learning Model to Predict Hardware PerformanceIRJET- Deep Learning Model to Predict Hardware Performance
IRJET- Deep Learning Model to Predict Hardware Performance
IRJET Journal
 
IRJET- Analysis of PV Fed Vector Controlled Induction Motor Drive
IRJET- Analysis of PV Fed Vector Controlled Induction Motor DriveIRJET- Analysis of PV Fed Vector Controlled Induction Motor Drive
IRJET- Analysis of PV Fed Vector Controlled Induction Motor Drive
IRJET Journal
 
AN ENVIRONMENT FOR NON-DUPLICATE TEST GENERATION FOR WEB BASED APPLICATION
AN ENVIRONMENT FOR NON-DUPLICATE TEST GENERATION FOR WEB BASED APPLICATIONAN ENVIRONMENT FOR NON-DUPLICATE TEST GENERATION FOR WEB BASED APPLICATION
AN ENVIRONMENT FOR NON-DUPLICATE TEST GENERATION FOR WEB BASED APPLICATION
ecij
 
IEEE Pattern analysis and machine intelligence 2016 Title and Abstract
IEEE Pattern analysis and machine intelligence 2016 Title and AbstractIEEE Pattern analysis and machine intelligence 2016 Title and Abstract
IEEE Pattern analysis and machine intelligence 2016 Title and Abstract
tsysglobalsolutions
 
internship project1 report
internship project1 reportinternship project1 report
internship project1 report
sheyk98
 
Automated Machine Learning Applied to Diverse Materials Design Problems
Automated Machine Learning Applied to Diverse Materials Design ProblemsAutomated Machine Learning Applied to Diverse Materials Design Problems
Automated Machine Learning Applied to Diverse Materials Design Problems
Anubhav Jain
 
Exploratory testing STEW 2016
Exploratory testing STEW 2016Exploratory testing STEW 2016
Exploratory testing STEW 2016
Per Runeson
 
Principles of effort estimation
Principles of effort estimationPrinciples of effort estimation
Principles of effort estimation
CS, NcState
 
Transcription Factor DNA Binding Prediction
Transcription Factor DNA Binding PredictionTranscription Factor DNA Binding Prediction
Transcription Factor DNA Binding Prediction
UT, San Antonio
 
Validation and Verification of SYSML Activity Diagrams Using HOARE Logic
Validation and Verification of SYSML Activity Diagrams Using HOARE Logic Validation and Verification of SYSML Activity Diagrams Using HOARE Logic
Validation and Verification of SYSML Activity Diagrams Using HOARE Logic
ijseajournal
 
Predicting Fault-Prone Files using Machine Learning
Predicting Fault-Prone Files using Machine LearningPredicting Fault-Prone Files using Machine Learning
Predicting Fault-Prone Files using Machine Learning
Guido A. Ciollaro
 

Similar to Test for AI model (20)

Machine learning testing survey, landscapes and horizons, the Cliff Notes
Machine learning testing  survey, landscapes and horizons, the Cliff NotesMachine learning testing  survey, landscapes and horizons, the Cliff Notes
Machine learning testing survey, landscapes and horizons, the Cliff Notes
 
Testing survey by_directions
Testing survey by_directionsTesting survey by_directions
Testing survey by_directions
 
A Software Measurement Using Artificial Neural Network and Support Vector Mac...
A Software Measurement Using Artificial Neural Network and Support Vector Mac...A Software Measurement Using Artificial Neural Network and Support Vector Mac...
A Software Measurement Using Artificial Neural Network and Support Vector Mac...
 
(Structural) Feature Interactions for Variability-Intensive Systems Testing
(Structural) Feature Interactions for Variability-Intensive Systems Testing (Structural) Feature Interactions for Variability-Intensive Systems Testing
(Structural) Feature Interactions for Variability-Intensive Systems Testing
 
Software Testing: Issues and Challenges of Artificial Intelligence & Machine ...
Software Testing: Issues and Challenges of Artificial Intelligence & Machine ...Software Testing: Issues and Challenges of Artificial Intelligence & Machine ...
Software Testing: Issues and Challenges of Artificial Intelligence & Machine ...
 
Software Testing: Issues and Challenges of Artificial Intelligence & Machine ...
Software Testing: Issues and Challenges of Artificial Intelligence & Machine ...Software Testing: Issues and Challenges of Artificial Intelligence & Machine ...
Software Testing: Issues and Challenges of Artificial Intelligence & Machine ...
 
SOFTWARE TESTING: ISSUES AND CHALLENGES OF ARTIFICIAL INTELLIGENCE & MACHINE ...
SOFTWARE TESTING: ISSUES AND CHALLENGES OF ARTIFICIAL INTELLIGENCE & MACHINE ...SOFTWARE TESTING: ISSUES AND CHALLENGES OF ARTIFICIAL INTELLIGENCE & MACHINE ...
SOFTWARE TESTING: ISSUES AND CHALLENGES OF ARTIFICIAL INTELLIGENCE & MACHINE ...
 
The Status of ML Algorithms for Structure-property Relationships Using Matb...
The Status of ML Algorithms for Structure-property Relationships Using Matb...The Status of ML Algorithms for Structure-property Relationships Using Matb...
The Status of ML Algorithms for Structure-property Relationships Using Matb...
 
Smriti's research paper
Smriti's research paperSmriti's research paper
Smriti's research paper
 
IRJET- Deep Learning Model to Predict Hardware Performance
IRJET- Deep Learning Model to Predict Hardware PerformanceIRJET- Deep Learning Model to Predict Hardware Performance
IRJET- Deep Learning Model to Predict Hardware Performance
 
IRJET- Analysis of PV Fed Vector Controlled Induction Motor Drive
IRJET- Analysis of PV Fed Vector Controlled Induction Motor DriveIRJET- Analysis of PV Fed Vector Controlled Induction Motor Drive
IRJET- Analysis of PV Fed Vector Controlled Induction Motor Drive
 
AN ENVIRONMENT FOR NON-DUPLICATE TEST GENERATION FOR WEB BASED APPLICATION
AN ENVIRONMENT FOR NON-DUPLICATE TEST GENERATION FOR WEB BASED APPLICATIONAN ENVIRONMENT FOR NON-DUPLICATE TEST GENERATION FOR WEB BASED APPLICATION
AN ENVIRONMENT FOR NON-DUPLICATE TEST GENERATION FOR WEB BASED APPLICATION
 
IEEE Pattern analysis and machine intelligence 2016 Title and Abstract
IEEE Pattern analysis and machine intelligence 2016 Title and AbstractIEEE Pattern analysis and machine intelligence 2016 Title and Abstract
IEEE Pattern analysis and machine intelligence 2016 Title and Abstract
 
internship project1 report
internship project1 reportinternship project1 report
internship project1 report
 
Automated Machine Learning Applied to Diverse Materials Design Problems
Automated Machine Learning Applied to Diverse Materials Design ProblemsAutomated Machine Learning Applied to Diverse Materials Design Problems
Automated Machine Learning Applied to Diverse Materials Design Problems
 
Exploratory testing STEW 2016
Exploratory testing STEW 2016Exploratory testing STEW 2016
Exploratory testing STEW 2016
 
Principles of effort estimation
Principles of effort estimationPrinciples of effort estimation
Principles of effort estimation
 
Transcription Factor DNA Binding Prediction
Transcription Factor DNA Binding PredictionTranscription Factor DNA Binding Prediction
Transcription Factor DNA Binding Prediction
 
Validation and Verification of SYSML Activity Diagrams Using HOARE Logic
Validation and Verification of SYSML Activity Diagrams Using HOARE Logic Validation and Verification of SYSML Activity Diagrams Using HOARE Logic
Validation and Verification of SYSML Activity Diagrams Using HOARE Logic
 
Predicting Fault-Prone Files using Machine Learning
Predicting Fault-Prone Files using Machine LearningPredicting Fault-Prone Files using Machine Learning
Predicting Fault-Prone Files using Machine Learning
 

More from Arithmer Inc.

コーディネートレコメンド
コーディネートレコメンドコーディネートレコメンド
コーディネートレコメンド
Arithmer Inc.
 
最適化
最適化最適化
最適化
Arithmer Inc.
 
Arithmerソリューション紹介 流体予測システム
Arithmerソリューション紹介 流体予測システムArithmerソリューション紹介 流体予測システム
Arithmerソリューション紹介 流体予測システム
Arithmer Inc.
 
Weakly supervised semantic segmentation of 3D point cloud
Weakly supervised semantic segmentation of 3D point cloudWeakly supervised semantic segmentation of 3D point cloud
Weakly supervised semantic segmentation of 3D point cloud
Arithmer Inc.
 
Arithmer NLP 自然言語処理 ソリューション紹介
Arithmer NLP 自然言語処理 ソリューション紹介Arithmer NLP 自然言語処理 ソリューション紹介
Arithmer NLP 自然言語処理 ソリューション紹介
Arithmer Inc.
 
Arithmer Robo Introduction
Arithmer Robo IntroductionArithmer Robo Introduction
Arithmer Robo Introduction
Arithmer Inc.
 
Arithmer AIチャットボット
Arithmer AIチャットボットArithmer AIチャットボット
Arithmer AIチャットボット
Arithmer Inc.
 
Arithmer R3 Introduction
Arithmer R3 Introduction Arithmer R3 Introduction
Arithmer R3 Introduction
Arithmer Inc.
 
VIBE: Video Inference for Human Body Pose and Shape Estimation
VIBE: Video Inference for Human Body Pose and Shape EstimationVIBE: Video Inference for Human Body Pose and Shape Estimation
VIBE: Video Inference for Human Body Pose and Shape Estimation
Arithmer Inc.
 
Arithmer Inspection Introduction
Arithmer Inspection IntroductionArithmer Inspection Introduction
Arithmer Inspection Introduction
Arithmer Inc.
 
全力解説!Transformer
全力解説!Transformer全力解説!Transformer
全力解説!Transformer
Arithmer Inc.
 
Arithmer NLP Introduction
Arithmer NLP IntroductionArithmer NLP Introduction
Arithmer NLP Introduction
Arithmer Inc.
 
Introduction of Quantum Annealing and D-Wave Machines
Introduction of Quantum Annealing and D-Wave MachinesIntroduction of Quantum Annealing and D-Wave Machines
Introduction of Quantum Annealing and D-Wave Machines
Arithmer Inc.
 
Arithmer OCR Introduction
Arithmer OCR IntroductionArithmer OCR Introduction
Arithmer OCR Introduction
Arithmer Inc.
 
Arithmer Dynamics Introduction
Arithmer Dynamics Introduction Arithmer Dynamics Introduction
Arithmer Dynamics Introduction
Arithmer Inc.
 
ArithmerDB Introduction
ArithmerDB IntroductionArithmerDB Introduction
ArithmerDB Introduction
Arithmer Inc.
 
Summarizing videos with Attention
Summarizing videos with AttentionSummarizing videos with Attention
Summarizing videos with Attention
Arithmer Inc.
 
3D human body modeling from RGB images
3D human body modeling from RGB images3D human body modeling from RGB images
3D human body modeling from RGB images
Arithmer Inc.
 
YOLACT
YOLACTYOLACT
Object Pose Estimation
Object Pose EstimationObject Pose Estimation
Object Pose Estimation
Arithmer Inc.
 

More from Arithmer Inc. (20)

コーディネートレコメンド
コーディネートレコメンドコーディネートレコメンド
コーディネートレコメンド
 
最適化
最適化最適化
最適化
 
Arithmerソリューション紹介 流体予測システム
Arithmerソリューション紹介 流体予測システムArithmerソリューション紹介 流体予測システム
Arithmerソリューション紹介 流体予測システム
 
Weakly supervised semantic segmentation of 3D point cloud
Weakly supervised semantic segmentation of 3D point cloudWeakly supervised semantic segmentation of 3D point cloud
Weakly supervised semantic segmentation of 3D point cloud
 
Arithmer NLP 自然言語処理 ソリューション紹介
Arithmer NLP 自然言語処理 ソリューション紹介Arithmer NLP 自然言語処理 ソリューション紹介
Arithmer NLP 自然言語処理 ソリューション紹介
 
Arithmer Robo Introduction
Arithmer Robo IntroductionArithmer Robo Introduction
Arithmer Robo Introduction
 
Arithmer AIチャットボット
Arithmer AIチャットボットArithmer AIチャットボット
Arithmer AIチャットボット
 
Arithmer R3 Introduction
Arithmer R3 Introduction Arithmer R3 Introduction
Arithmer R3 Introduction
 
VIBE: Video Inference for Human Body Pose and Shape Estimation
VIBE: Video Inference for Human Body Pose and Shape EstimationVIBE: Video Inference for Human Body Pose and Shape Estimation
VIBE: Video Inference for Human Body Pose and Shape Estimation
 
Arithmer Inspection Introduction
Arithmer Inspection IntroductionArithmer Inspection Introduction
Arithmer Inspection Introduction
 
全力解説!Transformer
全力解説!Transformer全力解説!Transformer
全力解説!Transformer
 
Arithmer NLP Introduction
Arithmer NLP IntroductionArithmer NLP Introduction
Arithmer NLP Introduction
 
Introduction of Quantum Annealing and D-Wave Machines
Introduction of Quantum Annealing and D-Wave MachinesIntroduction of Quantum Annealing and D-Wave Machines
Introduction of Quantum Annealing and D-Wave Machines
 
Arithmer OCR Introduction
Arithmer OCR IntroductionArithmer OCR Introduction
Arithmer OCR Introduction
 
Arithmer Dynamics Introduction
Arithmer Dynamics Introduction Arithmer Dynamics Introduction
Arithmer Dynamics Introduction
 
ArithmerDB Introduction
ArithmerDB IntroductionArithmerDB Introduction
ArithmerDB Introduction
 
Summarizing videos with Attention
Summarizing videos with AttentionSummarizing videos with Attention
Summarizing videos with Attention
 
3D human body modeling from RGB images
3D human body modeling from RGB images3D human body modeling from RGB images
3D human body modeling from RGB images
 
YOLACT
YOLACTYOLACT
YOLACT
 
Object Pose Estimation
Object Pose EstimationObject Pose Estimation
Object Pose Estimation
 

Recently uploaded

Energy Efficient Video Encoding for Cloud and Edge Computing Instances
Energy Efficient Video Encoding for Cloud and Edge Computing InstancesEnergy Efficient Video Encoding for Cloud and Edge Computing Instances
Energy Efficient Video Encoding for Cloud and Edge Computing Instances
Alpen-Adria-Universität
 
HCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAUHCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAU
panagenda
 
20240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 202420240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 2024
Matthew Sinclair
 
Webinar: Designing a schema for a Data Warehouse
Webinar: Designing a schema for a Data WarehouseWebinar: Designing a schema for a Data Warehouse
Webinar: Designing a schema for a Data Warehouse
Federico Razzoli
 
Generating privacy-protected synthetic data using Secludy and Milvus
Generating privacy-protected synthetic data using Secludy and MilvusGenerating privacy-protected synthetic data using Secludy and Milvus
Generating privacy-protected synthetic data using Secludy and Milvus
Zilliz
 
Artificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopmentArtificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopment
Octavian Nadolu
 
Fueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte WebinarFueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte Webinar
Zilliz
 
Digital Marketing Trends in 2024 | Guide for Staying Ahead
Digital Marketing Trends in 2024 | Guide for Staying AheadDigital Marketing Trends in 2024 | Guide for Staying Ahead
Digital Marketing Trends in 2024 | Guide for Staying Ahead
Wask
 
Presentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of GermanyPresentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of Germany
innovationoecd
 
Programming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup SlidesProgramming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup Slides
Zilliz
 
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdfHow to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
Chart Kalyan
 
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdfUnlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Malak Abu Hammad
 
Columbus Data & Analytics Wednesdays - June 2024
Columbus Data & Analytics Wednesdays - June 2024Columbus Data & Analytics Wednesdays - June 2024
Columbus Data & Analytics Wednesdays - June 2024
Jason Packer
 
AI 101: An Introduction to the Basics and Impact of Artificial Intelligence
AI 101: An Introduction to the Basics and Impact of Artificial IntelligenceAI 101: An Introduction to the Basics and Impact of Artificial Intelligence
AI 101: An Introduction to the Basics and Impact of Artificial Intelligence
IndexBug
 
UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6
DianaGray10
 
Introduction of Cybersecurity with OSS at Code Europe 2024
Introduction of Cybersecurity with OSS  at Code Europe 2024Introduction of Cybersecurity with OSS  at Code Europe 2024
Introduction of Cybersecurity with OSS at Code Europe 2024
Hiroshi SHIBATA
 
Taking AI to the Next Level in Manufacturing.pdf
Taking AI to the Next Level in Manufacturing.pdfTaking AI to the Next Level in Manufacturing.pdf
Taking AI to the Next Level in Manufacturing.pdf
ssuserfac0301
 
20240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 202420240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 2024
Matthew Sinclair
 
OpenID AuthZEN Interop Read Out - Authorization
OpenID AuthZEN Interop Read Out - AuthorizationOpenID AuthZEN Interop Read Out - Authorization
OpenID AuthZEN Interop Read Out - Authorization
David Brossard
 
Project Management Semester Long Project - Acuity
Project Management Semester Long Project - AcuityProject Management Semester Long Project - Acuity
Project Management Semester Long Project - Acuity
jpupo2018
 

Recently uploaded (20)

Energy Efficient Video Encoding for Cloud and Edge Computing Instances
Energy Efficient Video Encoding for Cloud and Edge Computing InstancesEnergy Efficient Video Encoding for Cloud and Edge Computing Instances
Energy Efficient Video Encoding for Cloud and Edge Computing Instances
 
HCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAUHCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAU
 
20240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 202420240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 2024
 
Webinar: Designing a schema for a Data Warehouse
Webinar: Designing a schema for a Data WarehouseWebinar: Designing a schema for a Data Warehouse
Webinar: Designing a schema for a Data Warehouse
 
Generating privacy-protected synthetic data using Secludy and Milvus
Generating privacy-protected synthetic data using Secludy and MilvusGenerating privacy-protected synthetic data using Secludy and Milvus
Generating privacy-protected synthetic data using Secludy and Milvus
 
Artificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopmentArtificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopment
 
Fueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte WebinarFueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte Webinar
 
Digital Marketing Trends in 2024 | Guide for Staying Ahead
Digital Marketing Trends in 2024 | Guide for Staying AheadDigital Marketing Trends in 2024 | Guide for Staying Ahead
Digital Marketing Trends in 2024 | Guide for Staying Ahead
 
Presentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of GermanyPresentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of Germany
 
Programming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup SlidesProgramming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup Slides
 
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdfHow to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
 
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdfUnlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
 
Columbus Data & Analytics Wednesdays - June 2024
Columbus Data & Analytics Wednesdays - June 2024Columbus Data & Analytics Wednesdays - June 2024
Columbus Data & Analytics Wednesdays - June 2024
 
AI 101: An Introduction to the Basics and Impact of Artificial Intelligence
AI 101: An Introduction to the Basics and Impact of Artificial IntelligenceAI 101: An Introduction to the Basics and Impact of Artificial Intelligence
AI 101: An Introduction to the Basics and Impact of Artificial Intelligence
 
UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6
 
Introduction of Cybersecurity with OSS at Code Europe 2024
Introduction of Cybersecurity with OSS  at Code Europe 2024Introduction of Cybersecurity with OSS  at Code Europe 2024
Introduction of Cybersecurity with OSS at Code Europe 2024
 
Taking AI to the Next Level in Manufacturing.pdf
Taking AI to the Next Level in Manufacturing.pdfTaking AI to the Next Level in Manufacturing.pdf
Taking AI to the Next Level in Manufacturing.pdf
 
20240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 202420240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 2024
 
OpenID AuthZEN Interop Read Out - Authorization
OpenID AuthZEN Interop Read Out - AuthorizationOpenID AuthZEN Interop Read Out - Authorization
OpenID AuthZEN Interop Read Out - Authorization
 
Project Management Semester Long Project - Acuity
Project Management Semester Long Project - AcuityProject Management Semester Long Project - Acuity
Project Management Semester Long Project - Acuity
 

Test for AI model

  • 2. Today’s topic I would like to introduce the testing of machine learning models based on the contents of the book [Sato et al, 2021]. But as a result of surveying papers related to the contents of the book, the contents have expanded beyond the book. I will skip the detail in my talk and please read this slide and search the references if you are interested in this topic. We focus on the following four topics: • Metamorphic test • Neuron coverage • Adversarial minimum distortion (最大安全半径) • Formal verification (網羅検証) [Sato et al, 2021] 佐藤直人・小川秀人・來間啓伸・明神智之 (2021).『AIソフトウェアのテストー答えのない答え合わせ [4つの手法]』リックテレコム We can access the code in [Sato et al, 2021] by purchasing it. (based on keras & tensorflow. not pytorch)
  • 3. Problems in testing of machine learning Output Behaviour In ordinary software, (unit) test is done by checking whether the output of function in the specified input is equal to expected one.
  • 4. Some techniques for software tests Equivalence class test • Divide the input spaces to equivalence classes in which the expected outputs are same. • Take a representative from each equivalence class • Create the test case for the representatives. Boundary value test • Divide the input spaces to equivalence classes • Take the nearest values to neighboring classes for each class • Create the test cases for these boundary values • This is applicable if the input space is ordered. Input value Expected output 0〜9 1 10〜99 2 100〜999 3 1000〜9999 4 e.g. The function which returns the number of digits in decimal notation
  • 5. Problems in test for machine learning model Can we apply the software test techniques for machine learning model? dog cat Machine Learning Model We do not know the entire boundary of the dog images and cat images. dog cat other ? In the vase space of images, we cannot determine the boundaries of each category in advance. We cannot apply the boundary test.
  • 6. Problems in test for machine learning model Can we apply the software test techniques for machine learning model? dog cat Machine Learning Model • The expected behavior of the classification depends on the human judgements. • Sometimes output of the model differs from the human judgements. Deterministic assertion test is not applicable At most, we can make the statistical one: Assert that the probability that the labeling of human and ML model coincides is high enough to make it practical. → Precision, Recall, F1 score,…
  • 7. Problems in test for machine learning model If we admit the probabilistic assertion, we encounter the problem of adversarial attack. [Goodfellow et al., 2015] ML models are easily fooled by small perturbations unrecognizable to humans.
  • 8. Methods of Adversarial Attacks Fast Gradient Sign Method [Goodfellow et al, 2015] Corresponding label of x Cost function of training Carlini-Wagner Method [Carlini & Wagner, 2017] Trained model Projected Gradient Descent Method [Madry et al, 2018] Measure of correspondence [Goodfellow et al., 2015] Goodfellow, I. J., Shlens, J., & Szegedy, C. (2015). Explaining and Harnessing Adversarial Examples. ICLR. [Carlini & Wagner, 2017] Carlini, N., & Wagner, D. (2017). Towards Evaluating the Robustness of Neural Networks. 2017 IEEE Symposium on Security and Privacy (SP), 39–57. [Madry et al, 2018] Madry, A., Makelov, A., Schmidt, L., Tsipras, D., & Vladu, A. (2018). Towards Deep Learning Models Resistant to Adversarial Attacks. ICLR
  • 9. Intermediate Summary In ML model, the ordinary software test is difficult to apply because (1) Tests cannot be deterministic; they can only be evaluated statistically. (2) In the first place, it is not possible to identify the equivalence classes and their boundaries in advance in a very high dimensional data space. (3) Even if we accept statistical evaluation, the existence of adversarial attacks does not guarantee the robustness of the inference. These are due to the inductive nature of the ML model construction. • There are no clear functional requirements. (The expected output for the input cannot be perfectly assumed.) • Because it learns only from a limited amount of data (though large), it is not perfectly clear whether the trained model will make valid inferences in other cases (even if generalization performance is improved).
  • 10. What is the metamorphic test? Metamorphic test itself is proposed in [Chen et al, 1998]. This technique is to uncover software error in the production phase and in the absence of test oracle. Example: Shortest path problem on the weighted graph undirected weighted graph with vertices V and edges E with distance function d [Chen et al, 1998] Chen, T. Y., Cheung, S. C., & Yiu, S. M. (2020). Metamorphic Testing: A New Approach for Generating Next Test Cases. ArXiv:2002.12543
  • 11. What is the metamorphic test? • Of course, if the graph 𝐺 is small, we can find manually the shortest path. So we can make the test case manually. • But in production phase, 𝐺 is generally large. We cannot construct the test case for such large graph!! • How can we test for real-world input in the production phase? →Metamorphic testing!!! [Chen et al, 1998] Chen, T. Y., Cheung, S. C., & Yiu, S. M. (2020). Metamorphic Testing: A New Approach for Generating Next Test Cases. ArXiv:2002.12543
  • 12. Examples of metamorphic test Suppose that Then the following statements should also be valid: [Chen et al, 1998] Chen, T. Y., Cheung, S. C., & Yiu, S. M. (2020). Metamorphic Testing: A New Approach for Generating Next Test Cases. ArXiv:2002.12543
  • 13. Examples of metamorphic test Assertion 1 Assertion 2 [Chen et al, 1998] Chen, T. Y., Cheung, S. C., & Yiu, S. M. (2020). Metamorphic Testing: A New Approach for Generating Next Test Cases. ArXiv:2002.12543 Assertion 3
  • 14. Formulation the function to be tested Ordinary test Assert Metamorphic test Assert We don’t need the expected output. Input Modified input Output Output f Metamorphic relation Input Output Expected Output f
  • 15. Application of metamorphic test Application in search engine [Zhou et al, 2016] Relation name Input modification Expected relation MPSite If the domain of pi is di , ”A” → “A & site:di ” The result still contains pi . MPTitle If the title of pi is ti , ”A” → “A & ti ” The result still contains pi . MPReverseJD If A have the form “A1 & A2 & … & An ”, “A1 & A2 & … & An ” → “An & … & A 2 & A1 ” The results of A and reversed one have large similarity. (measured by Jaccard coefficient) Search query A result p1 ,…,pn Examples of metamorphic relations Metamorphic test is designed in order to estimate the quality of search engine. [Zhou et al, 2016] Zhou, Z. Q., Xiang, S., & Chen, T. Y. (2016). Metamorphic Testing for Software Quality Assessment: A Study of Search Engines. IEEE Transactions on Software Engineering, 42(3), 264–284.
  • 16. Metamorphic test in ML model Also in machine learning system, the concept of metamorphic test is applicable. Example in image recognition for autonomous cars (DeepTest) [Tian et al, 2018] Applying the several transformation which is expected not to be affected to the steering angle, the metamorphic test oracles are created. They find the erroneous behaviors by these synthetic images. Remark: They also use the neuron coverage for finding erroneous behavior (explain later) Erroneous behaviors found by metamorphic test [Tian et al, 2018] Tian, Y., Pei, K., Jana, S., & Ray, B. (2018). DeepTest. Proceedings of the 40th International Conference on Software Engineering, 2018-May, 303–314.
  • 17. Other examples of metamorphic test [Zhang et al, 2018] constructs the metamorphic test oracles by synthetic images created by GAN. [Zhang et al, 2018] Zhang, M., Zhang, Y., Zhang, L., Liu, C., & Khurshid, S. (2018). DeepRoad: GAN-Based Metamorphic Testing and Input Validation Framework for Autonomous Driving Systems. 2018 33rd IEEE/ACM International Conference on Automated Software Engineering (ASE), 132–142.
  • 18. Other examples of metamorphic test [Zhou et al, 2019] uses metamorphic test on LIDAR data in the Baidu Apollo autonomous car system. By adding random data outside the ROI, they find the fatal behavior that the pedestrian or other car in the ROI cannot be detected by small random noise. [Zhou et al, 2019] Zhou, Z. Q., & Sun, L. (2019). Metamorphic testing of driverless cars. Communications of the ACM, 62(3), 61–67.
  • 19. Other examples of metamorphic test [He et al, 2020] uses metamorphic test on machine translation. I live on campus and am tall. (erroneous behavior) [He et al, 2020] He, P., Meister, C., & Su, Z. (2020). Structure-invariant testing for machine translation. Proceedings of the ACM/IEEE 42nd International Conference on Software Engineering, 961–973.
  • 20. The number of “activated” neurons What is neuron coverage? Deep neural network The number of neurons Activation function like ReLU
  • 21. What is neuron coverage? NC is proposed in [Pei et al, 2017]. Idea: analogy of coverage rate If the neuron value is large, this neuron strongly influences the output through the downstream layers. Low NC implies that a significant number of neurons have no effect on the output for the test data set. In order to test the effect on the output for all neurons, we need to create test data that will have a high NC. [Pei et al, 2017] Pei, K., Cao, Y., Yang, J., & Jana, S. (2017). DeepXplore. Proceedings of the 26th Symposium on Operating Systems Principles, 1–18.
  • 22. How to create the high NC test data? [Pei et al, 2017] proposes to use the gradient ascending and it is widely used. Update the data in the direction of increasing the value of inactive neurons
  • 23. Application of neuron coverage DeepXplore in [Pei et al, 2017] Erroneous behaviors by changing light conditions Make the outputs of NNs differently Raise NC [Pei et al, 2017] Pei, K., Cao, Y., Yang, J., & Jana, S. (2017). DeepXplore. Proceedings of the 26th Symposium on Operating Systems Principles, 1–18.
  • 24. Application of neuron coverage [Tian et al, 2018] DeepTest (bis.) They use the NC for finding the erroneous behavior like DeepXplore. Erroneous behaviors found by metamorphic test Metamorphic test and neuron coverage can be combined to find errors efficiently.
  • 25. Is neuron coverage meaningful? Some studies indicate that increasing NC is not necessarily meaningful for testing adversarial attack. [Harel-Canada et al, 2020] Generate adversarial images that also improve NC. Carlini-Wagner type adversarial attack (We can also choose other attacks) Kullback-Leibler divergence Uniform distribution in [0,1] This term measures how different the output of the j-th layer is from uniform distribution This also measures whether the neurons in the j-th layer is unbiasedly activated = high NC [Harel-Canada et al, 2020] Harel-Canada, F., Wang, L., Gulzar, M. A., Gu, Q., & Kim, M. (2020). Is neuron coverage a meaningful measure for testing deep neural networks? Proceedings of the 28th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering, 851–862.
  • 26. Is neuron coverage meaningful? For these high-NC adversarial images, they measures the following indices: Naturalness measures how natural the generated image is. • Frèchet Inception Distance: a measure of similarity between two dataset of images. • Inception Score: a measure of naturalness based on salience and diversity Output impartiality measures how the predictions of the model for adversarial images are biased. Attack Success Rate is the classification error rate for adversarial test data set. [Harel-Canada et al, 2020] Harel-Canada, F., Wang, L., Gulzar, M. A., Gu, Q., & Kim, M. (2020). Is neuron coverage a meaningful measure for testing deep neural networks? Proceedings of the 28th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering, 851–862.
  • 27. Is neuron coverage meaningful? Attack success rate Naturalness Output Impartiality They cannot support the hypothesis that these indices is positively correlated with NC. [Harel-Canada et al, 2020] Harel-Canada, F., Wang, L., Gulzar, M. A., Gu, Q., & Kim, M. (2020). Is neuron coverage a meaningful measure for testing deep neural networks? Proceedings of the 28th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering, 851–862.
  • 28. Is neuron coverage meaningful? Speaker’s Remark • Contexts seems to be different between DeepTest & DeepXplore and [Harel-Canada et al, 2020]. • Deeptest & DeepXplore • The deformation method of image is restricted and recognizable by human • Finding erroneous behavior of the model is the main concern. • [Harel-Canada et al, 2020] • The deformation is unrecognizable by human. • Finding adversarial attack is the main concern, which is more severe. • NC+metamorphic test effectively detects erroneous behavior in response to changes in data that are not in the test dataset but are likely to be real and recognizable by humans. • For adversarial attack, naturalness and output impartiality are more useful indices.
  • 29. Minimal adversarial distortion How can we assure the robustness of ML model including adversarial attack? In order to do it, it is sufficient to estimate how much noise is allowed to be added without change the inference results. Decision boundary Decision boundary Decision boundary x r Can we calculate minimal adversarial distortion in advance? We can estimate the lower bound.
  • 30. Lower bound of adversarial minimum distortion CROWN [Zhang et al, NIPS 2018] Moreover, these two functions are calculated explicitly as The proof can be done by backward induction on the layer number j and some elementary calculations (if you have the energy). The proof for the simple case is given in [Sato et al, 2021]. [Zhang et al, NIPS 2018] Zhang, H., Weng, T. W., Chen, P. Y., Hsieh, C. J., & Daniel, L. (2018). Efficient neural network robustness certification with general activation functions. NeurIPS, 4939–4948.
  • 31. Lower bound of adversarial minimum distortion Prescribed range of the neuron values [Zhang et al, NIPS 2018] Zhang, H., Weng, T. W., Chen, P. Y., Hsieh, C. J., & Daniel, L. (2018). Efficient neural network robustness certification with general activation functions. NeurIPS, 4939–4948.
  • 32. Lower bound of adversarial minimum distortion Moreover, these two functions are calculated explicitly as Then if c is the inferenced label of the input x, we have the lower bound of adversarial minimum distortion by [Zhang et al, 2018] [Zhang et al, NIPS 2018] Zhang, H., Weng, T. W., Chen, P. Y., Hsieh, C. J., & Daniel, L. (2018). Efficient neural network robustness certification with general activation functions. NeurIPS, 4939–4948.
  • 33. Other techniques to find the lower bound [Boopathy et al, 2019] extends [Zhang et al, 2018] to CNN case. The code is available as CNN-Cert on https://github.com/IBM/CNN-Cert [Boopathy et al, 2019] Boopathy, A., Weng, T.-W., Chen, P.-Y., Liu, S., & Daniel, L. (2019). CNN-Cert: An Efficient Framework for Certifying Robustness of Convolutional Neural Networks. Proceedings of the AAAI Conference on Artificial Intelligence, 33, 3240–3247. [Weng et al, 2019] Weng, L., Chen, P.-Y., Nguyen, L., Squillante, M., Boopathy, A., Oseledets, I., & Daniel, L. (2019). PROVEN: Verifying Robustness of Neural Networks with a Probabilistic Approach. PMLR. Then the following inequality holds for any a:
  • 34. Can we check the assertion directly? Example: We construct the machine learning model predicting housing prices based on housing attributes. Can we check the following assertion? If the living area is more than 6,000 square feet, the inferred result is more than $400,000. At first glance, it looks like it can't be done... • It is impossible to test on all possible data where the living area is more than 6000 square feet; • No matter how many data are tested, there is no guarantee that the above claim will hold 100% of the time. However, given that we are trying to find counterexamples to the above claim, it would be nice if we could find at least one example as follows: Find the data where the living area is more than 6000 square feet but the inferred result is less than or equal to $400,000. ML model Find the data x such that “and”
  • 35. Formal verification From this logical formula, we consider eliminating f. It may seem impossible, but for example, if f is a deep neural network, then only affine transformations and activation functions (ReLU) appear in the definition of f. So, for example, when the activation function is ReLU, if you try hard, you should be able to eliminate f from the above equation and write it down using only linear inequalities and "and/or". In this way, we can reduce the problem to find the counterexample to the satisfiability problem: Satisfiability problem: For given propositional logical formula, determine if there exists the true/false assignment to atomic formula so that the whole proposition is made true. This problem is NP-complete but efficient algorithms are being actively researched.
  • 36. Application of formal verification [Huang et al, 2017] Huang, X., Kwiatkowska, M., Wang, S., & Wu, M. (2017). Safety Verification of Deep Neural Networks. In R. Majumdar & V. Kunčak (Eds.), Computer Aided Verification (pp. 3–29). Springer International Publishing. [Huang et al, 2017] applies formal verification to find the adversarial images.
  • 37. Application of formal verification [Sato et al, 2020] SATO, N., KURUMA, H., NAKAGAWA, Y., & OGAWA, H. (2020). Formal Verification of a Decision-Tree Ensemble Model and Detection of Its Violation Ranges. IEICE Transactions on Information and Systems, E103.D(2), 363–378. [Sato et al, 2020] applies formal verification to decision tree ensemble model and find the violation range (the range of the values which does not satisfy the assertion) Application result to housesales prediction problem
  • 38. Summary of four testing method Metamorphic test uses the relationship between the output of the base input and that of the modified input. The erroneous behavior can be captured. Increasing the neuron coverage is helpful to find the erroneous examples efficiently by combining metamorphic test. If you are worried about adversarial attack, you may check the adversarial minimum distortion. The lower bound of this quantity can be estimated like backpropagation. For more comprehensive search of counterexamples for some assertion, the formal verification is helpful. Behaviorist's perspective Focus on how the ML model behaves. Metamorphic test Neuron Coverage Anatomist’s perspective Follow the detailed structure of the ML model. Adversarial minimum distortion Formal verification This kind of detailed analysis is possible due to the fact that the model structure of DNNs and the like is a composition of simple functions.
  • 39. Practical viewpoint Metamorphic test is easy to introduce in application. • We do not need the special libraries or special codes. • The key is to define the metamorphic relation appropriately according to the target domain. • Driving support system: make the images foggy, rainy, rotated, ...etc • Chatbot: change a question into a different expression without changing the meaning. • ... • It is possible to use parameter search to find defects in the model without relying on neuron coverage. CNN-Cert is available to estimate the adversarial minimum distortion. • This is based on tensorflow and the applicable models are limited. More important thing is how to raise the robustness and quality of ML model based on these testing method. • Erroneous behavior • data augmentation (we can utilize the metamorphic test to augment the data) • preprocessing • Adversarial attack • adversarial training
  • 40. References (I) [Sato et al, 2021] 佐藤直人・小川秀人・來間啓伸・明神智之 (2021).『AIソフトウェアのテストー答えのない答え 合わせ [4つの手法]』リックテレコム [Goodfellow et al., 2015] Goodfellow, I. J., Shlens, J., & Szegedy, C. (2015). Explaining and Harnessing Adversarial Examples. In Y. Bengio & Y. LeCun (Eds.), 3rd International Conference on Learning Representations. [Carlini & Wagner, 2017] Carlini, N., & Wagner, D. (2017). Towards Evaluating the Robustness of Neural Networks. 2017 IEEE Symposium on Security and Privacy (SP), 39–57. [Madry et al, 2018] Madry, A., Makelov, A., Schmidt, L., Tsipras, D., & Vladu, A. (2018). Towards Deep Learning Models Resistant to Adversarial Attacks. 6th International Conference on Learning Representations. [Chen et al, 1998] Chen, T. Y., Cheung, S. C., & Yiu, S. M. (2020). Metamorphic Testing: A New Approach for Generating Next Test Cases. ArXiv:2002.12543 [Zhou et al, 2016] Zhou, Z. Q., Xiang, S., & Chen, T. Y. (2016). Metamorphic Testing for Software Quality Assessment: A Study of Search Engines. IEEE Transactions on Software Engineering, 42(3), 264–284.
  • 41. References (II) [Tian et al, 2018] Tian, Y., Pei, K., Jana, S., & Ray, B. (2018). DeepTest. Proceedings of the 40th International Conference on Software Engineering, 2018-May, 303–314. [Zhang et al, 2018] Zhang, M., Zhang, Y., Zhang, L., Liu, C., & Khurshid, S. (2018). DeepRoad: GAN-Based Metamorphic Testing and Input Validation Framework for Autonomous Driving Systems. 2018 33rd IEEE/ACM International Conference on Automated Software Engineering (ASE), 132–142. [Zhou et al, 2019] Zhou, Z. Q., & Sun, L. (2019). Metamorphic testing of driverless cars. Communications of the ACM, 62(3), 61–67. [He et al, 2020] He, P., Meister, C., & Su, Z. (2020). Structure-invariant testing for machine translation. Proceedings of the ACM/IEEE 42nd International Conference on Software Engineering, 961–973. [Pei et al, 2017] Pei, K., Cao, Y., Yang, J., & Jana, S. (2017). DeepXplore. Proceedings of the 26th Symposium on Operating Systems Principles, 1–18. [Harel-Canada et al, 2020] Harel-Canada, F., Wang, L., Gulzar, M. A., Gu, Q., & Kim, M. (2020). Is neuron coverage a meaningful measure for testing deep neural networks? Proceedings of the 28th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering, 851–862.
  • 42. References (III) [Zhang et al, NIPS2018] Zhang, H., Weng, T. W., Chen, P. Y., Hsieh, C. J., & Daniel, L. (2018). Efficient neural network robustness certification with general activation functions. Advances in Neural Information Processing Systems, 2018-Decem(NeurIPS), 4939–4948. [Boopathy et al, 2019] Boopathy, A., Weng, T.-W., Chen, P.-Y., Liu, S., & Daniel, L. (2019). CNN-Cert: An Efficient Framework for Certifying Robustness of Convolutional Neural Networks. Proceedings of the AAAI Conference on Artificial Intelligence, 33, 3240–3247. [Weng et al, 2019] Weng, L., Chen, P.-Y., Nguyen, L., Squillante, M., Boopathy, A., Oseledets, I., & Daniel, L. (2019). PROVEN: Verifying Robustness of Neural Networks with a Probabilistic Approach. In K. Chaudhuri & R. Salakhutdinov (Eds.), Proceedings of the 36th International Conference on Machine Learning (Vol. 97, pp. 6727–6736). PMLR. [Huang et al, 2017] Huang, X., Kwiatkowska, M., Wang, S., & Wu, M. (2017). Safety Verification of Deep Neural Networks. In R. Majumdar & V. Kunčak (Eds.), Computer Aided Verification (pp. 3–29). Springer International Publishing. [Sato et al, 2020] SATO, N., KURUMA, H., NAKAGAWA, Y., & OGAWA, H. (2020). Formal Verification of a Decision-Tree Ensemble Model and Detection of Its Violation Ranges. IEICE Transactions on Information and Systems, E103.D(2), 363–378.
  • 43. 43