Robust Testing Strategies for Machine Learning Models
Agile Testing Alliance Hyderabad Meet
TMI Networks, Hyderabad, 22 July 2023
Tilottama Goswami, Ph.D. (University Of Hyderabad)
Professor, Department Of Information Technology
Vasavi College Of Engineering
Hyderabad, INDIA
Agenda
1. Motivation
2. Real-World Examples of ML Model Failures
3. Key Factors for Robust ML Model Testing
4. Conclusion
Motivation
Fourth Industrial Revolution
 Digital Transformation
 AI & Automation – Intelligent Systems
 Data Usage - Privacy Security Ethics
 Social Transformation – Quality of Life
Robust Testing Strategies
o Performance
o Security
o Reliability
o Seamless Integration & Deployment
o Building Trust
DEMAND
Impact of Industrial
Revolution 4.0 in
Real World
Scenarios
ROBOTIC
PROCESS
AUTOMATION
MACHINE
LEARNING
Explore.entenic.com
Depositphotos.com
Repetitive
Rule Based
Structured Data
Pre-Programmed Rules
Not Adaptable to
handle variations
No Cognitive
Capabilities
Struggles with
Unstructured Data –
Audio/Image/Text
Learn From Data
Predictions
Unstructured Data
Predictions on new
unseen Data
Adaptable and Evolves
with changes – Flexible
Complex Cognitive
Tasks – Reasoning
Unstructured Data –
Image Recognition
Language Translation
Data Entry & Transactional Tasks
Sentiment Analysis & Pattern Recognition
Machine Learning
Robotic Process Automation
AI & CV
https://medium.com/swlh/a-beginners-guide-to-understanding-the-buzz-words-ai-ml-nlp-deep-learning-computer-vision-a877ee1c2cde
The goal of AI is to capture the collective intelligence of humans and do a given task better than any individual human
can ever do
Computer Vision Tasks
Object Detection: Vehicle
Object Recognition: Car
Object Tracking: Speed Limit
https://www.optisolbusiness.com/insight/an-overview-of-image-segmentation-part-1
Classification
Sematic Segmentation
Classification &
Localization
Instance Segmentation
Feature Extractor
Optical Character Recognition
content.iospress.com
Computer Vision Tasks with Natural Language Processing
Real-World Examples of ML Model Failures
IBM Watson's
Cancer Treatment
Recommendations
Amazon's AI
Recruitment Tool
Google Photos'
Racist Labelling
Tesla's Autopilot
Accidents
Microsoft's Tay
Chatbot
Real-World Examples of ML Model Failures
IBM Watson's
Cancer
Treatment
Recommendatio
ns
Amazon's AI
Recruitment
Tool
Google
Photos' Racist
Labelling
Tesla's
Autopilot
Accidents
Microsoft's
Tay Chatbot
Bias Against
Female Candidates
Limitation of Training Data
and Biased Training
Real time Decision Making in
complex environments
Learnt offensive and
inappropriate
conversations from
tweets
Lacked proper testing and Validation
Erroneous Recommendations
Challenges
A. IBM Watson's Cancer Treatment Recommendations
1. Challenges with training data and complexity of cancer treatment
2. Interpretation of unstructured data and limited contextual
understanding
3. Lessons learned and improvements made
B. Microsoft's Tay Chatbot
1. Vulnerability to manipulation and lack of contextual
understanding
2. Rapid learning and amplification of bias
3. Importance of human oversight and responsibility
Challenges
C. Google Photos' Racist Labeling / Amazon’s AI Recruitment Tool
1. Biased training data and insufficient testing
2. Limited diversity in development teams
3. Ethical considerations and response to the incident
D. Tesla's Autopilot Accidents
1. Overreliance on the Autopilot system and
inattentive driving
2. System limitations and edge cases
3. Regulatory and legal challenges
Key Factors for Robust ML Model Testing
1.BIAS-VARIANCE Trade Off Overfitting/Underfitting
2.Comprehensive Training Data
3.Hyperparameter Tuning
4.Validation and Evaluation Techniques
5.Adversarial Testing
6.Continuous Monitoring and Maintenance
1. BIAS-VARIANCE Trade Off
Overfitting/Underfitting
Courtesy: Medium.com
15
Bias-Variance
The goal of any predictive modelling machine learning algorithm is to achieve low
bias and low variance.
 Bias are the simplifying assumptions made by a model to make the target function
easier to learn.
 V
ariance is the amount that the estimate of the target function will change if
different training data was used.
Address BIAS in ML
High Bias = Underfitting
Building Ethical and
Trustworthy AI systems
Promote Fairness and
Inclusivity
Diverse and
Representative
Data
Regularization &
Post Processing
Fairness-aware
Algorithms
Bias Aware
Evaluation
Feature
Engineering
L
O
W
B
I
A
S
Address VARIANCE in ML
High Variance = Overfitting
Stable and Robust AI Systems
L
O
W
V
A
R
I
A
N
C
E
Generalization
Adequate Variance
to avoid
Underfitting
Feature
Engineering
Regularization &
Post Processing
Cross Validation Ensemble Methods
Early Stopping of
Training
Comprehensive Training Data
1. Importance of diverse, representative, and unbiased training data
2. Data quality, data augmentation, and addressing class imbalance
3. Rigorous Hyperparameter Tuning
Rigorous Hyperparameter Tuning
1.Optimizing model performance through systematic
exploration
2.Techniques such as grid search, random search, and
Bayesian optimization
Validation & Evaluation Techniques
1.Cross-validation and holdout validation for assessing
model performance
2.Metrics selection, including accuracy, precision, recall,
F1-score, and AUC-ROC
3.Uncovering vulnerabilities and weaknesses in ML models
2. Crafting deceptive inputs and evaluating model
robustness
Adversarial Testing
1. Uncovering vulnerabilities and weaknesses in ML
models
2. Crafting deceptive inputs and evaluating model
robustness
Continuous Monitoring &
Maintenance
1. Importance of ongoing model performance monitoring
2. Regular updates, retraining, and version control
3. Human Feedback in Loop
Conclusion
Responsible Feature Engineering
Responsible development & deployment of ML Model
Building RITE System- Reliable, Inclusive, Trustworthy and Ethical systems
Thank you
AgileTestingAlliance
TMI Networks
Vasavi College of Engineering

Robust Testing Strategies for Machine Learning Models

  • 1.
    Robust Testing Strategiesfor Machine Learning Models Agile Testing Alliance Hyderabad Meet TMI Networks, Hyderabad, 22 July 2023 Tilottama Goswami, Ph.D. (University Of Hyderabad) Professor, Department Of Information Technology Vasavi College Of Engineering Hyderabad, INDIA
  • 2.
    Agenda 1. Motivation 2. Real-WorldExamples of ML Model Failures 3. Key Factors for Robust ML Model Testing 4. Conclusion
  • 3.
    Motivation Fourth Industrial Revolution Digital Transformation  AI & Automation – Intelligent Systems  Data Usage - Privacy Security Ethics  Social Transformation – Quality of Life Robust Testing Strategies o Performance o Security o Reliability o Seamless Integration & Deployment o Building Trust DEMAND
  • 4.
    Impact of Industrial Revolution4.0 in Real World Scenarios ROBOTIC PROCESS AUTOMATION MACHINE LEARNING Explore.entenic.com Depositphotos.com
  • 5.
    Repetitive Rule Based Structured Data Pre-ProgrammedRules Not Adaptable to handle variations No Cognitive Capabilities Struggles with Unstructured Data – Audio/Image/Text Learn From Data Predictions Unstructured Data Predictions on new unseen Data Adaptable and Evolves with changes – Flexible Complex Cognitive Tasks – Reasoning Unstructured Data – Image Recognition Language Translation Data Entry & Transactional Tasks Sentiment Analysis & Pattern Recognition Machine Learning Robotic Process Automation
  • 6.
    AI & CV https://medium.com/swlh/a-beginners-guide-to-understanding-the-buzz-words-ai-ml-nlp-deep-learning-computer-vision-a877ee1c2cde Thegoal of AI is to capture the collective intelligence of humans and do a given task better than any individual human can ever do
  • 7.
    Computer Vision Tasks ObjectDetection: Vehicle Object Recognition: Car Object Tracking: Speed Limit https://www.optisolbusiness.com/insight/an-overview-of-image-segmentation-part-1 Classification Sematic Segmentation Classification & Localization Instance Segmentation Feature Extractor Optical Character Recognition
  • 8.
    content.iospress.com Computer Vision Taskswith Natural Language Processing
  • 9.
    Real-World Examples ofML Model Failures IBM Watson's Cancer Treatment Recommendations Amazon's AI Recruitment Tool Google Photos' Racist Labelling Tesla's Autopilot Accidents Microsoft's Tay Chatbot
  • 10.
    Real-World Examples ofML Model Failures IBM Watson's Cancer Treatment Recommendatio ns Amazon's AI Recruitment Tool Google Photos' Racist Labelling Tesla's Autopilot Accidents Microsoft's Tay Chatbot Bias Against Female Candidates Limitation of Training Data and Biased Training Real time Decision Making in complex environments Learnt offensive and inappropriate conversations from tweets Lacked proper testing and Validation Erroneous Recommendations
  • 11.
    Challenges A. IBM Watson'sCancer Treatment Recommendations 1. Challenges with training data and complexity of cancer treatment 2. Interpretation of unstructured data and limited contextual understanding 3. Lessons learned and improvements made B. Microsoft's Tay Chatbot 1. Vulnerability to manipulation and lack of contextual understanding 2. Rapid learning and amplification of bias 3. Importance of human oversight and responsibility
  • 12.
    Challenges C. Google Photos'Racist Labeling / Amazon’s AI Recruitment Tool 1. Biased training data and insufficient testing 2. Limited diversity in development teams 3. Ethical considerations and response to the incident D. Tesla's Autopilot Accidents 1. Overreliance on the Autopilot system and inattentive driving 2. System limitations and edge cases 3. Regulatory and legal challenges
  • 13.
    Key Factors forRobust ML Model Testing 1.BIAS-VARIANCE Trade Off Overfitting/Underfitting 2.Comprehensive Training Data 3.Hyperparameter Tuning 4.Validation and Evaluation Techniques 5.Adversarial Testing 6.Continuous Monitoring and Maintenance
  • 14.
    1. BIAS-VARIANCE TradeOff Overfitting/Underfitting Courtesy: Medium.com
  • 15.
    15 Bias-Variance The goal ofany predictive modelling machine learning algorithm is to achieve low bias and low variance.  Bias are the simplifying assumptions made by a model to make the target function easier to learn.  V ariance is the amount that the estimate of the target function will change if different training data was used.
  • 16.
    Address BIAS inML High Bias = Underfitting Building Ethical and Trustworthy AI systems Promote Fairness and Inclusivity Diverse and Representative Data Regularization & Post Processing Fairness-aware Algorithms Bias Aware Evaluation Feature Engineering L O W B I A S Address VARIANCE in ML High Variance = Overfitting Stable and Robust AI Systems L O W V A R I A N C E Generalization Adequate Variance to avoid Underfitting Feature Engineering Regularization & Post Processing Cross Validation Ensemble Methods Early Stopping of Training
  • 17.
    Comprehensive Training Data 1.Importance of diverse, representative, and unbiased training data 2. Data quality, data augmentation, and addressing class imbalance 3. Rigorous Hyperparameter Tuning
  • 18.
    Rigorous Hyperparameter Tuning 1.Optimizingmodel performance through systematic exploration 2.Techniques such as grid search, random search, and Bayesian optimization
  • 19.
    Validation & EvaluationTechniques 1.Cross-validation and holdout validation for assessing model performance 2.Metrics selection, including accuracy, precision, recall, F1-score, and AUC-ROC 3.Uncovering vulnerabilities and weaknesses in ML models 2. Crafting deceptive inputs and evaluating model robustness
  • 20.
    Adversarial Testing 1. Uncoveringvulnerabilities and weaknesses in ML models 2. Crafting deceptive inputs and evaluating model robustness
  • 21.
    Continuous Monitoring & Maintenance 1.Importance of ongoing model performance monitoring 2. Regular updates, retraining, and version control 3. Human Feedback in Loop
  • 22.
    Conclusion Responsible Feature Engineering Responsibledevelopment & deployment of ML Model Building RITE System- Reliable, Inclusive, Trustworthy and Ethical systems
  • 23.