SlideShare a Scribd company logo
Testing with
Machines: An
Back to Basics
Vivek and Jahnavi, Epsilon
8 December 2023
• Less than and Equal to 1
• Between 2 to 5
• Between 5-10
• More than 10
Functional Testing Challenges
Test Case Prioritization: Limited resources and time for executing all test cases.
Defect Prediction: Identifying potential defects early in the development lifecycle.
Code Review Assistance: Ensuring code quality during development.
Regression Testing Optimization: Efficiently identifying impacted areas for regression
testing after changes.
Automation Test Script Generation: Writing and maintaining automated test scripts for
evolving applications
Test Case Prioritization and Defect Prediction Solutions
SVMs can assist in prioritizing test cases by predicting which test cases are more likely
to uncover defects. Test case attributes like code coverage, historical defect data, and
code complexity can be used as features for SVM-based prioritization.
SVMs can analyze historical data to predict the likelihood of defects in specific modules
or components. This can aid in focusing testing efforts on critical areas.
• Understanding the Terminology
• Machine Learning Overview
• Machine Learning Workflow
• Data Preparation
• Challenges in Data Preparation
• Data Quality Issues and Model Impact
• Training, Validation and Test Datasets
• Support Vector Machine – Introduction
• Why SVM for Functional Testing
• Use Case 1 –Test Case Prioritization
• Use Case 2 – Defect Prediction
Understanding Terminologies
Generative AI, a branch of artificial intelligence and a subset of Deep Learning, focuses on creating
models capable of generating new content that resemble existing data. These models aim to generate
content that is indistinguishable from what might be created by humans. Generative Adversarial Networks
(GANs) are popular examples of generative AI models that use deep neural networks to generate realistic
content such as images, text, or even music
Example : Image Generation, Video Synthesis, Social Media Content Generation
AI is broadly defined as the ability of machines to mimic human behavior. It encompasses a broad range
of techniques and approaches aimed at enabling machines to perceive, reason, learn, and make
decisions. AI can be rule-based, statistical, or involve machine learning algorithms.
Example : Virtual Assistants , Healthcare Diagnosis and Imaging , Virtual Reality and Augmented
The term “ML” focuses on machines learning from data without the need for explicit programming.
Machine Learning algorithms leverage statistical techniques to automatically detect patterns and make
predictions or decisions based on historical data that they are trained on. While ML is a subset of AI, the
term was coined to emphasize the importance of data-driven learning and the ability of machines to
improve their performance through exposure to relevant data
Example: Predictive text, recommendation systems​ , Time Series Forecasting
Definition of machine learning
Supervised learning vs
unsupervised learning
Common algorithms used in
machine learning
Real-world applications of
machine learning
Introduction to Machine Learning
What is Machine Learning-Contd.
Machine Learning
• Supervised Learning
• Classification
• Spam email classification example
• Regression
• Predicting software development
• Unsupervised Learning​
• Clustering​
• Grouping similar software modules
• Association​
• Discovering relationships
between software features
Explanation of supervised
Types of supervised learning:
classification and regression
Examples of supervised
learning algorithms: linear
regression, decision trees,
support vector machines
Training and testing process in
supervised learning
Supervised Learning
ML Models
Data Collection Emphasis on high-quality
data for effective ML
Data Preparation Transition to the next
Model Selection
Importance of choosing
appropriate ML
Training Models learn patterns
from the training data
Evaluation Assessment of model
Data Preparation in
ML Workflow
• Importance of Data Preparation
• Success of ML models depends on quality
• Steps Involved
• Data Cleaning
• Remove or handle missing values,
• Data Transformation
• Normalize or scale features
• Feature Engineering
• Create or modify features
Challenges in Data
• Common Challenges
• Missing Data
• Example: Handling missing values in software
defect prediction
• Outliers
• Example: Impact of outliers in user behavior
• Feature Scaling
• Example: Unscaled features in code complexity
• Categorical Data Handling
• Example: Challenges of non-numeric data in
customer feedback analysis
Data Quality Issues
and Model Impact
• Impact of Poor Data Quality
• Low-quality data leads to inaccurate models
• Examples of Data Quality Issues
• Inconsistent Data
• Example: Inconsistencies in user
feedback data
• Biased Data
• Example: Addressing biases in
demographic-specific data
• Noisy Data
• Example: Random variations in
performance metrics
Training, Validation,
and Test Datasets
• Need for Splitting Data
• Importance of dividing data for fair
model assessment
• Overview of Datasets
• Training Dataset
• Used for teaching the model
• Validation Dataset
• Fine-tuning hyperparameters
• Test Dataset
• Unbiased evaluation of the model
Training the
learning with
labeled data
Use of
Labeled Data
of known
outcomes for
to improve
Support Vector Machine
SVMs are a type of machine learning algorithm that have gained
popularity in recent years due to their effectiveness in solving complex
SVMs can be used for both classification and regression tasks
The basic idea behind SVMs is to find the hyperplane that best
separates the data into different classes or predicts the value of
the target variable in regression. The hyperplane is chosen so
that it maximizes the margin between the two classes or the
predicted values and the actual values in regression
To find the optimal hyperplane, SVMs use a technique
called kernel trick, which transforms the input data into a
higher-dimensional space where it becomes easier to find a
separating hyperplane. Common kernel functions include
linear, polynomial, radial basis function (RBF), and sigmoid.
SVM for classification
• In classification, SVMs are used to separate data into different classes based on their attributes . SVMs can
be applied to various classification tasks in software testing due to their ability to handle complex decision
boundaries and non-linear relationships. Some applications are:
Software Defect Prediction:
SVMs can predict whether a piece of code is likely to contain defects based on features
such as code metrics, historical defect data, and complexity measures.
Anomaly Detection:
SVMs can be used to classify abnormal patterns in system behavior, identifying potential
software anomalies or performance issues.
Requirement Traceability:
SVMs can classify whether a given piece of code corresponds to a specific requirement,
facilitating traceability and impact analysis.
SVM for regression
• In regression, SVMs are used to predict numerical values based on input data SVMs can be applied to
various regression tasks in software testing, providing solutions to predict and optimize continuous
outcomes. Here are some applications:
Test Suite Prioritization:
SVM regression can predict the priority or importance of test cases within a test suite, optimizing the order in
which test cases are executed based on predicted outcomes.
Resource Consumption Estimation:
SVM regression can be used to estimate resource consumption during the execution of software tests,
helping in resource allocation and planning.
Code Complexity Prediction:
SVM regression models can predict code complexity metrics, aiding in identifying potentially challenging or
error-prone code sections.
Defect Prediction
• Definition and Importance
• Predicts areas likely to contain defects
• Proactively identifies and addresses
high-risk areas
• Complements Test Case Prioritization
• Prioritization focuses on predicted
defect locations
• Defect prediction informs prioritization
Software Defect Prediction
Data Preparation
GHPR Dataset
• GHPR is a public dataset to identify bug fixing based on Pull
Requests(PRs) in Github
• It has a total of 6052 instances which contain 3026 defective
instances and 3026 non-defective instances
• It uses 21 static metrics for the total 6052 instances which is the
data used for the baseline approaches
author={Jiaxi Xu;Fei Wang;Jun Ai},
journal={IEEE Transactions on Reliability},
title={Defect Prediction With Semantics and Context Features of
Codes Based on Graph Representation Learning},
year={2021}, }
Metrics name Known as
Coupling between objects CBO
Weight method class WMC
Depth inheritance tree DIT
Response for a class RFC
Lack of cohesion of methods LCOM
Counts the number of methods totalMethods
Counts the number of fields totalFields
Lines of code LOC
Quantity of returns returnQty
Quantity of loops loopQty
Quantity of comparisons comparisonsQty
Quantity of try/catches tryCatchQty
Quantity of parenthesized
String literals stringLiteralsQty
Quantity of number numbersQty
Quantity of variables assignmentsQty
Quantity of math operations mathOperationsQty
Quantity of variables variablesQty
Max nested blocks maxNestedBlocks
Number of unique words uniqueWordsQty
21 static metrics
Feature Selection and Data Pre-
• Feature Selection:
Identify and select features that are crucial for defect prediction
For our dataset, all 21 metrics are selected as features
• Data pre-processing:
Before training SVM model, the data is Pre-processed.
The selected features are normalized to ensure that they have a consistent scale
• Data Splitting:
The dataset were split between train and validation sets using the 80/20 split
Training the SVM Model
• SVM Model Selection:
An appropriate SVM variant (linear, polynomial, or radial basis function kernel)is chosen based
on the characteristics of the dataset.
Since the relationship between input features and the target variable is non-linear and we want
our SVM model to capture complex patterns and decision boundaries in the data, we chose
Radial Basis (RBF) Kernel.
• Model Training:
SVM model is trained on the training dataset.
The SVM algorithm aims to find the hyperplane that best separates instances of different
• Hyperparameter Tuning:
Fine-tune the model's hyperparameters, such as the regularization parameter (C) and kernel
parameters, to optimize the model's performance
To find the optimal values of hyperparameters, Grid Search technique is used which
Model Evaluation
• Our SVM model achieved an overall accuracy of approximately
72.9% on the evaluation dataset
• In predicting class 0 (Non-defective), the model demonstrated a
precision of 82%, recall of 60%, and an F1-score of 69%
• For class 1 (Defective), the precision was 68%, recall was 86%, and
the F1-score reached 76%.
Class Precision Recall F1-
Non-defective (0) 82% 60% 69%
Defective (1) 68% 86% 76%
Overall Accuracy 72.9%
Model Evaluation
• Area Under the Curve (AUC) is
0.8, indicating a relatively high
discriminatory power of the
• It also indicates that the model
has a good ability to distinguish
between positive and negative
Use Case - Test
Application of
SVMs applied to
prioritize test cases
Results and
Showcase of key
findings, possibly
through charts or
How SVMs
contributed to
enhancing test case
Lessons Learned
and Challenges
Insights gained
during the test case
Test Suite Prioritization
Data Preparation
Industrial Datasets:
• The Cisco dataset is a test suite used for testing video-conferencing systems, provided by Cisco
• The other two industrial datasets are used for testing industrial robotics applications provided by
ABB Robotics (Paint Control and Input/Output Control, noted as IOF/ROL)
Synthetic Datasets (Data Augmentation):
• The ratio of failed test executions is extremely low in industrial dataset
• To address the problem of insufficient representation of relevant test cases in the industrial
datasets, [4] performed data augmentation. Specifically, uses SMOGN which is a technique for
tackling imbalanced regression datasets by generating diverse new data points for the given data.
• The synthetic data generated was concatenated to the industrial datasets
• We have used this concatenated dataset from [4] for our use case
Feature Selection and Pre-processing
• Data preprocessing was necessary to format all the three test suites in the same way, for example, to make
the number and type of features constant across all the files
• Features selected were:
• Target Variable: Priority Values for the testcases
• Our target is to minimize the loss between the actual and predicted priority values for prioritizing test cases in
the correct descending order
Column Name Content
Average test case execution time computed across all its previous executions
E1, E2, E3 Execution Status (0,1)
Previous last execution of the test case as date-time-string (Format: YYYY-MM-
DD HH:ii)
Absolute difference of a test case execution status between the least recent and
most recent CI cycle
Number of times a test case execution status has changed from pass to fail in all
its previous executions
Training SVM Regression Model
• Data Split: The dataset were split between train and validation sets using
the 80/20 split values.
• Choose SVM Regression:
SVM regression variant, SVR (Support Vector Regressor) is chosen which is
specifically designed for regression tasks
Radial Basis Kernel was chosen for the dataset
• Train SVM Model: SVM regression model is trained on the training set
• Hyperparameter tuning: To find the optimal values of hyperparameters
like C and gamma, Grid Search technique is used which exhaustively
searches through a manually specified subset of hyperparameter values
Model Evaluation
• Our SVR model’s performance is evaluated using regression-specific metrics like:
1. Mean Absolute Error (MAE): Measures the average absolute errors between predicted and actual
values. This serves as the loss function for regression
2. R-squared (R2): Represents the proportion of the variance in the dependent variable that is predictable
from the independent variables.
MAE and R2-score of our model came out to be:
• An MAE of 0.050 suggests that, on average, the absolute difference between your predicted and actual
'PRIORITY_VALUE' is very small which indicates a high level of accuracy in the predictions
• An R2 value of 0.985 means that approximately 98.5% of the variability in 'PRIORITY_VALUE' is captured
by the SVR model. This is an excellent result, indicating that our model is effectively capturing the
underlying patterns in the data.
Model Evaluation
• To assess the regression analysis visually,
a scatter plot is used:
 Blue Points (Actual Values): Each
blue point represents the actual
'PRIORITY_VALUE' from the test set.
This line represents the ideal scenario
where the predicted values perfectly
match the actual values.
 Red Cross Markers (Predicted
Values): Each red cross marker
represents the corresponding
predicted 'PRIORITY_VALUE' for a
data point in the test set.
• Since the red markers follow the diagonal
line closely, it suggests that the model is
making accurate predictions.
• Engage Audience
• Encourage questions and discussions
• Clarify any queries related to the experimental
• Main Takeaways
• The value of SVMs in enhancing functional
• Contribution of prioritization and defect
prediction to efficient testing
Experimental Design
• Experimental Setup
• Specify tools, environments, and
• Create diverse and representative datasets
Chosen Features
For prioritization:
code coverage,
historical defect data,
complexity metrics
For defect prediction:
code metrics,
historical bug data,
code changes
Importance of
Enhances SVM
model performance
Data Preparation
• Data Collection and Preparation
• Gather historical data, extract features,
create labeled dataset
• Ensure dataset represents the software
Training the SVM
• SVM Training Process
• Split dataset into training and testing sets
• Choose kernel, tune parameters for optimal
Evaluation Metrics
• Metrics for Evaluation
• Prioritization: accuracy, precision, recall, F1-
• Defect Prediction: similar metrics for identifying
defect-prone areas
• Experimental Results
• Showcase performance metrics and
• Highlight significant findings or trends
Comparison with
Other Methods
• Comparison
• Compare SVM-based approaches with
alternative methods
• Discuss strengths and weaknesses of
Feedback and
• Feedback from Testing Teams
• How SVM models were received and
• Refinements made based on feedback
Lessons Learned
• Key Lessons
• Importance of well-defined features and
representative datasets
• Adaptability of SVMs in addressing
testing challenges
Future Directions
• Future Directions
• Explore new features or data sources
• Consider improvements to SVM model or
alternative techniques
2. Khatibsyarbini M, Isa MA, Jawawi DN, Tumeng R. Test case prioritization approaches in regression
testing: A systematic literature review. Information and Software Technology 2017; 93: 74–93. doi:
3. Radial Basis Function (RBF) Kernel: The Go-To Kernel | by Sushanth Sreenivasa | Towards Data Science
4. A. Sharif, D. Marijan and M. Liaaen, "DeepOrder: Deep Learning for Test Case Prioritization in Continuous
Integration Testing," 2021 IEEE International Conference on Software Maintenance and Evolution
(ICSME), Luxembourg, 2021, pp. 525-534, doi: 10.1109/ICSME52107.2021.00053.
5. - PROMISE Dataset
6. GHPR_dataset/ at master · feiwww/GHPR_dataset · GitHub – GHPR Dataset
• Optimal hyperparameter values chosen:
• Classification-specific evaluation metrics:
Use case C Gamma Kernel
Software Defect
1000 0.01 RBF
Test Suite Prioritization 100 0.01 RBF
Metric Description Formula
Accuracy Number of correctly classified data instances
over the total number of instances Accuracy =
Precision Positive Predictive value, should ideally be 1. Precision =
Recall Sensitivity or true positive rate, should ideally be
Precision =
F1-score Harmonic mean of precision and recall ,
becomes 1 only when precision and recall are
both 1
F1-score = 2 *
𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 ∗𝑅𝑒𝑐𝑎𝑙𝑙
• Regression-specific Evaluation metrics:
Metric Description Formula
Error (MAE)
Average of the absolute differences
between the predicted and actual values.
Lower MAE indicates better model
𝑛 𝑖=1
n: number of data points
yi​: actual value
y^​i​: predicted value
(R2) score
Measures the proportion of the variance
in the dependent variable that is
predictable from the independent
variables. A higher R2 score suggests a
better fit of the model to the data
= 1 -
(𝑦𝑡𝑟𝑢𝑒,𝑖 − 𝑦𝑝𝑟𝑒𝑑,𝑖)2
(𝑦𝑡𝑟𝑢𝑒,𝑖 − 𝑦𝑡𝑟𝑢𝑒
− )2
∶ mean of actual target values
Kernel Functions and Hyperparameters
Hyperparameters: Hyperparameters are external configurations or settings for an ML model that are not learned from the training data but
rather set by the user. Some key hyperparameters for SVM:
o C (Regularization Parameter): The regularization parameter C trades off correct classification of training examples against maximizing the
decision function's margin. A smaller C encourages a larger margin but may misclassify some training points. A larger C penalizes
misclassifications more heavily but results in a smaller margin.
o Kernel: SVMs use a kernel function to transform the input features into a higher-dimensional space. The choice of the kernel determines
the shape of the decision boundary. Common kernel functions include:
 Linear Kernel (kernel='linear'): Suitable for linearly separable data.
 Polynomial Kernel (kernel='poly'): Introduces polynomial features to handle non-linear decision boundaries.
 Radial Basis Function (RBF) Kernel (kernel='rbf' or kernel='sigmoid'): Useful for capturing complex non-linear relationships.
o Gamma (Kernel Coefficient): Relevant for RBF and polynomial kernels. It defines the influence of a single training example, with low
values meaning 'far' and high values meaning 'close.' A small gamma leads to a smooth decision boundary, while a large gamma can
result in a more complex, wiggly boundary.
o Degree (Degree of the Polynomial Kernel): Relevant for polynomial kernels. It specifies the degree of the polynomial kernel function.
Higher degrees can capture more complex relationships but may also lead to overfitting.
o Class Weights: For imbalanced datasets, you can assign different weights to different classes to influence the optimization process.
ROC (Receiver Operating Characteristic) :
• It is a graphical representation used in binary classification to assess the performance of a classification model at
different threshold settings
• The ROC curve is a useful tool for visualizing the trade-off between sensitivity and specificity
• Area Under the ROC Curve (AUC-ROC):
• AUC-ROC represents the area under the ROC curve. A higher AUC-ROC value indicates better model
• AUC-ROC = 0.5 corresponds to random guessing, while AUC-ROC = 1 indicates a perfect classifier.
• Threshold Setting:
• ROC curves are created by plotting the TPR against the FPR at various threshold settings for the model. Each point
on the curve corresponds to a different threshold.
• Diagonal Line (Random Classifier):
• The diagonal line (from (0,0) to (1,1)) represents the performance of a random classifier that makes predictions by
• Optimal Point:

