SlideShare a Scribd company logo
DATA ANALYTICS
Evaluation Metrics for Supervised Learning
Models of Machine Learning
Md. Main Uddin Rony
Software Developer, Infolytx,Inc.
Machine Learning Evaluation Metrics
ML Evaluation Metrics Are…..
● tied to Machine Learning Tasks
● methods which determine an algorithm’s performance and behavior
● helpful to decide the best model to meet the target performance
● helpful to parameterize the model in such a way that can offer best
performing algorithm
Evaluation Metrics Types...
● Various types of ML Algorithms (classification, regression, ranking,
clustering)
● Different types of evaluation metrics for different types of algorithm
● Some metrics can be useful for more than one type of algorithm
(Precision - Recall)
● Will cover Evaluation Metrics for Supervised learning models only (
Classification, Regression, Ranking)
Classification Metrics
Classification Model Does...
Predict class labels given input data
In Binary classification, there are two possible output classes ( 0 or 1, True
or False, Positive or Negative, Yes or No etc.)
Spam detection of email is a good example of Binary classification.
Some Popular Classification Metrics...
Accuracy
Confusion Matrix
Log-Loss
AUC
Accuracy
● Ratio between the number of correct predictions and total number of
predictions
● Example: Suppose we have 100 examples in the positive class and 200
examples in the negative class. Our model declares 80 out of 100
positives as positive correctly and 195 out of 200 negatives as negative
correctly.
● So, accuracy is = (80 + 195)/(100 + 200) = 91.7%
Confusion Matrix
● Shows a more detailed breakdown of correct and incorrect classifications for each
class.
● Think about our previous example and then the confusion matrix looks like:
● What is the accuracy that positive class has ? And Negative class?
● Clearly, positive class has lower accuracy than the negative class
● And that information is lost if we calculate overall accuracy only.
Predicted as positive Predicted as negative
Labeled as positive 80 20
Labeled as negative 5 195
Per-Class Accuracy
● Average per class accuracy of previous example:
(80% + 97.5%)/2 = 88.75 %, different from accuracy
Why important?
- Can show different scenario when there are different numbers of
examples per class
- Class with more examples than other will dominate the statistic of
accuracy, hence produced a distorted picture
Log-Loss
Very much useful when the raw output of classifier is a numeric probability
instead of a class label 0 or 1
Mathematically , log-loss for a binary classifier:
Minimum is 0 when prediction and true label match up
Calculate for a data point predicted by classifier to belong to class 1 with
probability .51 and with probability 1
Minimizing this value, maximizing the accuracy of the classifier
AUC (Area Under Curve)
● The curve is receiver operating
characteristic curve or in short ROC
curve
● Provides nuanced details about the
behavior of the classifier
● Bad ROC curve covers very little area
● Good ROC curve has a lot of space
under it
● But, how?
AUC (contd..)
AUC (contd..)
AUC (contd..)
AUC (contd..)
AUC (contd..)
AUC (contd..)
AUC (contd..)
● So, what’s the advantage of using of ROC curve over a simpler metric?
ROC curve visualizes all possible classification thresholds, whereas
other metrics only represents your error rate for a single threshold
Ranking Metrics
Ranking ...
Is related to binary classification
Internet Search can be a good example which acts as a ranker.
During a query, it returns ranked list of web pages relevant to that query
So, here ranking can be a binary classification of “relevant query” or
“irrelevant query”
It also ordering the results so that the most relevant result should be on top
So, what can be done in underlying implementation considering both??
Can we predict what will ranking metrics evaluate and how?
Some Ranking Metrics..
Precision - Recall
Precision - Recall Curve and F1 Score
NDCG
Precision - Recall
Considering the scenario of web search result, Precision answers this
question:
“Out of the items that the ranker/classifier predicted to be relevant, how many are
truly relevant?”
Whereas, Recall answers this:
“Out of all the items that are truly relevant, how many are found by the
ranker/classifier?”
Precision - Recall (Contd..)
Calculation Example Of Precision- Recall
Total Negative = 9760 + 140 = 9900
Total Positive = 40 + 60 = 100 Total
Negative prediction = 9760 + 40 = 9800 Total
Positive prediction = 140 + 60 = 200
Precision = TP / (TP+FP)
= 60 / (60 + 140) = 30%
Recall = TP / (TP+FN)
= 60 / (60+40) = 60%
Predicted as
Negative
Predicted as
Positive
Actual
Negative
9760 (TN) 140 (FP)
Actual
Positive
40 (FN) 60 (TP)
Precision - Recall Curve
When the numbers of answers returned by
the ranker will change, the precision and
recall score will also be changed
By plotting precision versus recall over a
range of k values which denotes
numbers of results returned, we get the
precision - recall curve
Computing Precision-Recall Point
Interpolating a Recall/Precision Curve
Trade-off between Recall and Precision
F-Measure
One measure of performance that takes into account both recall and
precision
Harmonic mean of recall and precision:
Compared to arithmetic mean, both need to be high for harmonic mean to
be high
NDCG
● Precision and recall treat all retrieved items equally.
● But, a relevant item in position 1 and a relevant item in position 5 bear
same significance?
● Think about a web search result
● NDCG tries to take this scenario into account.
What?
● NDCG stands for Normalized Discounted Cumulative Gain
● First just focus on DCG (Discounted Cumulative Gain)
Discounted Cumulative Gain
● Popular measure for evaluating web search and related tasks.
● Discounts items that are further down the search result list
● Two assumptions:
- Highly relevant documents are more useful than marginally relevant
document
- the lower the ranked position of a relevant document, the less useful it is
for the user, since it is less likely to be examined
Discounted Cumulative Gain
● Uses graded relevance as a measure of the usefulness, or gain, from
examining a document
● Gain is accumulated starting at the top of the ranking and may be
reduced, or discounted, at lower ranks
● Typical discount is 1/log (rank)
- With base 2, the discount at rank 4 is ½, and at rank 8 it is 1/3
Discounted Cumulative Gain
● DCG is the total gain accumulated at a particular rank p:
● Alternative formulation:
- used by some web search companies
- emphasis on retrieving highly relevant documents
* Equation used from Addison Wesley’s
DCG Example
● 10 ranked documents judged on 0-3 relevance scale:
3, 2, 3, 0, 0, 1, 2, 2, 3, 0
● discounted gain:
3, 2/1, 3/1.59, 0, 0, 1/ 2.59, 2/2.81, 2/3 , 3/3.17, 0
= 3, 2, 1.89, 0, 0, 0.39, 0.71, 0.67, 0.95, 0
● DCG:
3, 5, 6.89, 6.89, 6.89, 7.28, 7.99, 8.66, 9.61, 9.61
* Example used from Addison Wesley’s
presentation
Normalized DCG
● Normalized version of discounted cumulative gain
● Often normalized by comparing the DCG at each rank with the DCG value
for the perfect ranking
● Normalized score always lies between 0.0 and 1.0
NDCG Example
● Let’s look back the list of ranked document judged on relevance scale:
3, 2, 3, 0, 0, 1, 2, 2, 3, 0
● Perfect ranking:
3, 3, 3, 2, 2, 2, 1, 0, 0, 0
● Perfect discounted gain:
3, 3/1, 3/1.59, 2/2, 2/2.32, 2/ 2.59, 1/2.81, 0 , 0, 0
= 3, 3, 1.89, 1, 0.86, 0.77, 0.36, 0, 0, 0
NDCG Example
● Ideal DCG values:
3, 6, 7.89, 8.89, 9.75, 10.52, 10.88, 10.88, 10.88, 10.88
NDCG values( divide actual by ideal):
3/3, 5/6, 6.89/7.89, 6.89/8.89, 6.89/9.75, 7.28/10.52,
7.99/10.88, 8.66/10.88, 9.61/10.88, 9.61/10.88
= 1, 0.83, 0.87, 0.76, 0.71, 0.69, 0.73, 0.8, 0.88, 0.88
3, 2, 3, 0, 0, 1, 2, 2, 3, 0
Regression Metrics
What Regression Tasks do?
Model learns to predict numeric scores.
For example, we try to predict the price of a stock on future days given past
price history and other useful information
Some Regression Metrics..
RMSE (Root Mean Square Error)
Quantiles of Errors
RMSE
The most commonly used metric for regression tasks
Also known as RMSD ( root-mean-square deviation)
This is defined as the square root of the average squared distance between
the actual score and the predicted score:
Quantiles of Errors
RMSE is an average, so it is sensitive to large outliers.
If the regressor performs really badly on a single data point, the average
error could be big, not robust
Quantiles (or percentiles) are much more robust
Because it is not affected by large outliers
It’s important to look at the median absolute percentage:
It gives us a relative measure of the typical error.
Acknowledgement
Evaluating Machine Learning Models by Alice Zheng
Many slides in this section are adapted from Prof. Joydeep Ghosh (UT ECE)
who in turn adapted them from Prof. Dik Lee (Univ. of Science and Tech,
Hong Kong)
Tutorial of Data School on ROC Curves and AUC by Kevin Markham
Questions???
Thank You

More Related Content

What's hot

Module 4: Model Selection and Evaluation
Module 4: Model Selection and EvaluationModule 4: Model Selection and Evaluation
Module 4: Model Selection and Evaluation
Sara Hooker
 
Gradient descent method
Gradient descent methodGradient descent method
Gradient descent method
Prof. Neeta Awasthy
 
Model selection and cross validation techniques
Model selection and cross validation techniquesModel selection and cross validation techniques
Model selection and cross validation techniques
Venkata Reddy Konasani
 
Data Preprocessing
Data PreprocessingData Preprocessing
supervised learning
supervised learningsupervised learning
supervised learning
Amar Tripathi
 
Machine Learning With Logistic Regression
Machine Learning  With Logistic RegressionMachine Learning  With Logistic Regression
Machine Learning With Logistic Regression
Knoldus Inc.
 
Confusion Matrix Explained
Confusion Matrix ExplainedConfusion Matrix Explained
Confusion Matrix Explained
Stockholm University
 
Data preprocessing in Machine learning
Data preprocessing in Machine learning Data preprocessing in Machine learning
Data preprocessing in Machine learning
pyingkodi maran
 
Confusion Matrix
Confusion MatrixConfusion Matrix
Confusion Matrix
Rajat Gupta
 
Over fitting underfitting
Over fitting underfittingOver fitting underfitting
Over fitting underfitting
SivapriyaS12
 
Machine Learning Algorithm - Decision Trees
Machine Learning Algorithm - Decision Trees Machine Learning Algorithm - Decision Trees
Machine Learning Algorithm - Decision Trees
Kush Kulshrestha
 
Nonlinear component analysis as a kernel eigenvalue problem
Nonlinear component analysis as a kernel eigenvalue problemNonlinear component analysis as a kernel eigenvalue problem
Nonlinear component analysis as a kernel eigenvalue problem
Michele Filannino
 
Random forest
Random forestRandom forest
Random forestUjjawal
 
Machine Learning
Machine LearningMachine Learning
Machine Learning
Rahul Kumar
 
Intro to Deep learning - Autoencoders
Intro to Deep learning - Autoencoders Intro to Deep learning - Autoencoders
Intro to Deep learning - Autoencoders
Akash Goel
 
Hyperparameter Tuning
Hyperparameter TuningHyperparameter Tuning
Hyperparameter Tuning
Jon Lederman
 
Classification techniques in data mining
Classification techniques in data miningClassification techniques in data mining
Classification techniques in data mining
Kamal Acharya
 
Anomaly detection
Anomaly detectionAnomaly detection
Anomaly detection
Hitesh Mohapatra
 
Supervised and Unsupervised Machine Learning
Supervised and Unsupervised Machine LearningSupervised and Unsupervised Machine Learning
Supervised and Unsupervised Machine Learning
Spotle.ai
 
Linear Regression vs Logistic Regression | Edureka
Linear Regression vs Logistic Regression | EdurekaLinear Regression vs Logistic Regression | Edureka
Linear Regression vs Logistic Regression | Edureka
Edureka!
 

What's hot (20)

Module 4: Model Selection and Evaluation
Module 4: Model Selection and EvaluationModule 4: Model Selection and Evaluation
Module 4: Model Selection and Evaluation
 
Gradient descent method
Gradient descent methodGradient descent method
Gradient descent method
 
Model selection and cross validation techniques
Model selection and cross validation techniquesModel selection and cross validation techniques
Model selection and cross validation techniques
 
Data Preprocessing
Data PreprocessingData Preprocessing
Data Preprocessing
 
supervised learning
supervised learningsupervised learning
supervised learning
 
Machine Learning With Logistic Regression
Machine Learning  With Logistic RegressionMachine Learning  With Logistic Regression
Machine Learning With Logistic Regression
 
Confusion Matrix Explained
Confusion Matrix ExplainedConfusion Matrix Explained
Confusion Matrix Explained
 
Data preprocessing in Machine learning
Data preprocessing in Machine learning Data preprocessing in Machine learning
Data preprocessing in Machine learning
 
Confusion Matrix
Confusion MatrixConfusion Matrix
Confusion Matrix
 
Over fitting underfitting
Over fitting underfittingOver fitting underfitting
Over fitting underfitting
 
Machine Learning Algorithm - Decision Trees
Machine Learning Algorithm - Decision Trees Machine Learning Algorithm - Decision Trees
Machine Learning Algorithm - Decision Trees
 
Nonlinear component analysis as a kernel eigenvalue problem
Nonlinear component analysis as a kernel eigenvalue problemNonlinear component analysis as a kernel eigenvalue problem
Nonlinear component analysis as a kernel eigenvalue problem
 
Random forest
Random forestRandom forest
Random forest
 
Machine Learning
Machine LearningMachine Learning
Machine Learning
 
Intro to Deep learning - Autoencoders
Intro to Deep learning - Autoencoders Intro to Deep learning - Autoencoders
Intro to Deep learning - Autoencoders
 
Hyperparameter Tuning
Hyperparameter TuningHyperparameter Tuning
Hyperparameter Tuning
 
Classification techniques in data mining
Classification techniques in data miningClassification techniques in data mining
Classification techniques in data mining
 
Anomaly detection
Anomaly detectionAnomaly detection
Anomaly detection
 
Supervised and Unsupervised Machine Learning
Supervised and Unsupervised Machine LearningSupervised and Unsupervised Machine Learning
Supervised and Unsupervised Machine Learning
 
Linear Regression vs Logistic Regression | Edureka
Linear Regression vs Logistic Regression | EdurekaLinear Regression vs Logistic Regression | Edureka
Linear Regression vs Logistic Regression | Edureka
 

Viewers also liked

Grape(Ruby on Rails)
Grape(Ruby on Rails)Grape(Ruby on Rails)
Grape(Ruby on Rails)
Md. Main Uddin Rony
 
Study On ATM/POS Switching Software For Banks
Study On ATM/POS Switching Software For BanksStudy On ATM/POS Switching Software For Banks
Study On ATM/POS Switching Software For Banks
Md. Main Uddin Rony
 
Machine Learning Project
Machine Learning ProjectMachine Learning Project
Machine Learning ProjectAbhishek Singh
 
Bridging the Gap: Machine Learning for Ubiquitous Computing -- Evaluation
Bridging the Gap: Machine Learning for Ubiquitous Computing -- EvaluationBridging the Gap: Machine Learning for Ubiquitous Computing -- Evaluation
Bridging the Gap: Machine Learning for Ubiquitous Computing -- Evaluation
Thomas Ploetz
 
Lean Six Sigma and the Environment - Sample Slides
Lean Six Sigma and the Environment - Sample SlidesLean Six Sigma and the Environment - Sample Slides
Lean Six Sigma and the Environment - Sample Slides
Business Performance Improvement (BPI)
 
Tweet Recommendation with Graph Co-Ranking
Tweet Recommendation with Graph Co-RankingTweet Recommendation with Graph Co-Ranking
Tweet Recommendation with Graph Co-RankingYoshinari Fujinuma
 
Learning to rankの評価手法
Learning to rankの評価手法Learning to rankの評価手法
Learning to rankの評価手法Kensuke Mitsuzawa
 
L2. Evaluating Machine Learning Algorithms I
L2. Evaluating Machine Learning Algorithms IL2. Evaluating Machine Learning Algorithms I
L2. Evaluating Machine Learning Algorithms I
Machine Learning Valencia
 
Machine Learning and Data Mining: 14 Evaluation and Credibility
Machine Learning and Data Mining: 14 Evaluation and CredibilityMachine Learning and Data Mining: 14 Evaluation and Credibility
Machine Learning and Data Mining: 14 Evaluation and Credibility
Pier Luca Lanzi
 
Lecture 3: Basic Concepts of Machine Learning - Induction & Evaluation
Lecture 3: Basic Concepts of Machine Learning - Induction & EvaluationLecture 3: Basic Concepts of Machine Learning - Induction & Evaluation
Lecture 3: Basic Concepts of Machine Learning - Induction & Evaluation
Marina Santini
 
Measuring Effectiveness
Measuring EffectivenessMeasuring Effectiveness
Measuring Effectiveness
National Arts Strategies
 
Helpdesk
HelpdeskHelpdesk
Helpdesk
shishir.jain
 
Metrics & Analytics That Matter - Steve Krull, CEO, Be Found Online
Metrics & Analytics That Matter - Steve Krull, CEO, Be Found OnlineMetrics & Analytics That Matter - Steve Krull, CEO, Be Found Online
Metrics & Analytics That Matter - Steve Krull, CEO, Be Found OnlineBrightEdge Technologies
 
Nabil Malik - Security performance metrics
Nabil Malik - Security performance metricsNabil Malik - Security performance metrics
Nabil Malik - Security performance metricsnooralmousa
 
Lean Workbench For Creating And Tracking Metrics That Matter
Lean Workbench For Creating And Tracking Metrics That MatterLean Workbench For Creating And Tracking Metrics That Matter
Lean Workbench For Creating And Tracking Metrics That Matter
Jennifer Rubinovitz
 
DataPower Operations Dashboard
DataPower Operations DashboardDataPower Operations Dashboard
DataPower Operations Dashboard
IBM Integration
 
in10: How to build a metric in a metric
in10: How to build a metric in a metricin10: How to build a metric in a metric
in10: How to build a metric in a metric
Petr Olmer
 
Analytics and Reporting: Measuring Success Along the Journey
Analytics and Reporting: Measuring Success Along the JourneyAnalytics and Reporting: Measuring Success Along the Journey
Analytics and Reporting: Measuring Success Along the Journey
Gene Begin
 

Viewers also liked (20)

Version controll.pptx
Version controll.pptxVersion controll.pptx
Version controll.pptx
 
Grape(Ruby on Rails)
Grape(Ruby on Rails)Grape(Ruby on Rails)
Grape(Ruby on Rails)
 
Study On ATM/POS Switching Software For Banks
Study On ATM/POS Switching Software For BanksStudy On ATM/POS Switching Software For Banks
Study On ATM/POS Switching Software For Banks
 
Machine Learning Project
Machine Learning ProjectMachine Learning Project
Machine Learning Project
 
Six sigma (1)
Six sigma (1)Six sigma (1)
Six sigma (1)
 
Bridging the Gap: Machine Learning for Ubiquitous Computing -- Evaluation
Bridging the Gap: Machine Learning for Ubiquitous Computing -- EvaluationBridging the Gap: Machine Learning for Ubiquitous Computing -- Evaluation
Bridging the Gap: Machine Learning for Ubiquitous Computing -- Evaluation
 
Lean Six Sigma and the Environment - Sample Slides
Lean Six Sigma and the Environment - Sample SlidesLean Six Sigma and the Environment - Sample Slides
Lean Six Sigma and the Environment - Sample Slides
 
Tweet Recommendation with Graph Co-Ranking
Tweet Recommendation with Graph Co-RankingTweet Recommendation with Graph Co-Ranking
Tweet Recommendation with Graph Co-Ranking
 
Learning to rankの評価手法
Learning to rankの評価手法Learning to rankの評価手法
Learning to rankの評価手法
 
L2. Evaluating Machine Learning Algorithms I
L2. Evaluating Machine Learning Algorithms IL2. Evaluating Machine Learning Algorithms I
L2. Evaluating Machine Learning Algorithms I
 
Machine Learning and Data Mining: 14 Evaluation and Credibility
Machine Learning and Data Mining: 14 Evaluation and CredibilityMachine Learning and Data Mining: 14 Evaluation and Credibility
Machine Learning and Data Mining: 14 Evaluation and Credibility
 
Lecture 3: Basic Concepts of Machine Learning - Induction & Evaluation
Lecture 3: Basic Concepts of Machine Learning - Induction & EvaluationLecture 3: Basic Concepts of Machine Learning - Induction & Evaluation
Lecture 3: Basic Concepts of Machine Learning - Induction & Evaluation
 
Measuring Effectiveness
Measuring EffectivenessMeasuring Effectiveness
Measuring Effectiveness
 
Helpdesk
HelpdeskHelpdesk
Helpdesk
 
Metrics & Analytics That Matter - Steve Krull, CEO, Be Found Online
Metrics & Analytics That Matter - Steve Krull, CEO, Be Found OnlineMetrics & Analytics That Matter - Steve Krull, CEO, Be Found Online
Metrics & Analytics That Matter - Steve Krull, CEO, Be Found Online
 
Nabil Malik - Security performance metrics
Nabil Malik - Security performance metricsNabil Malik - Security performance metrics
Nabil Malik - Security performance metrics
 
Lean Workbench For Creating And Tracking Metrics That Matter
Lean Workbench For Creating And Tracking Metrics That MatterLean Workbench For Creating And Tracking Metrics That Matter
Lean Workbench For Creating And Tracking Metrics That Matter
 
DataPower Operations Dashboard
DataPower Operations DashboardDataPower Operations Dashboard
DataPower Operations Dashboard
 
in10: How to build a metric in a metric
in10: How to build a metric in a metricin10: How to build a metric in a metric
in10: How to build a metric in a metric
 
Analytics and Reporting: Measuring Success Along the Journey
Analytics and Reporting: Measuring Success Along the JourneyAnalytics and Reporting: Measuring Success Along the Journey
Analytics and Reporting: Measuring Success Along the Journey
 

Similar to Data Analysis: Evaluation Metrics for Supervised Learning Models of Machine Learning

What Is Multilayer Perceptron Classifier And How Is It Used For Enterprise An...
What Is Multilayer Perceptron Classifier And How Is It Used For Enterprise An...What Is Multilayer Perceptron Classifier And How Is It Used For Enterprise An...
What Is Multilayer Perceptron Classifier And How Is It Used For Enterprise An...
Smarten Augmented Analytics
 
Learning machine learning with Yellowbrick
Learning machine learning with YellowbrickLearning machine learning with Yellowbrick
Learning machine learning with Yellowbrick
Rebecca Bilbro
 
EvaluationMetrics.pptx
EvaluationMetrics.pptxEvaluationMetrics.pptx
EvaluationMetrics.pptx
shuchismitjha2
 
Predicting Cab Booking Cancellations- Data Mining Project
Predicting Cab Booking Cancellations- Data Mining ProjectPredicting Cab Booking Cancellations- Data Mining Project
Predicting Cab Booking Cancellations- Data Mining Project
raj
 
Methods of Optimization in Machine Learning
Methods of Optimization in Machine LearningMethods of Optimization in Machine Learning
Methods of Optimization in Machine Learning
Knoldus Inc.
 
What is Hierarchical Clustering and How Can an Organization Use it to Analyze...
What is Hierarchical Clustering and How Can an Organization Use it to Analyze...What is Hierarchical Clustering and How Can an Organization Use it to Analyze...
What is Hierarchical Clustering and How Can an Organization Use it to Analyze...
Smarten Augmented Analytics
 
Recommender Systems from A to Z – Model Evaluation
Recommender Systems from A to Z – Model EvaluationRecommender Systems from A to Z – Model Evaluation
Recommender Systems from A to Z – Model Evaluation
Crossing Minds
 
Copy of CRICKET MATCH WIN PREDICTOR USING LOGISTIC ...
Copy of CRICKET MATCH WIN PREDICTOR USING LOGISTIC                           ...Copy of CRICKET MATCH WIN PREDICTOR USING LOGISTIC                           ...
Copy of CRICKET MATCH WIN PREDICTOR USING LOGISTIC ...
PATHALAMRAJESH
 
Telecom Churn Analysis
Telecom Churn AnalysisTelecom Churn Analysis
Telecom Churn Analysis
Vasudev pendyala
 
Performance evaluation of IR models
Performance evaluation of IR modelsPerformance evaluation of IR models
Performance evaluation of IR models
Nisha Arankandath
 
Predict Backorder on a supply chain data for an Organization
Predict Backorder on a supply chain data for an OrganizationPredict Backorder on a supply chain data for an Organization
Predict Backorder on a supply chain data for an Organization
Piyush Srivastava
 
What Is Random Forest Classification And How Can It Help Your Business?
What Is Random Forest Classification And How Can It Help Your Business?What Is Random Forest Classification And How Can It Help Your Business?
What Is Random Forest Classification And How Can It Help Your Business?
Smarten Augmented Analytics
 
Application of Machine Learning in Agriculture
Application of Machine  Learning in AgricultureApplication of Machine  Learning in Agriculture
Application of Machine Learning in Agriculture
Aman Vasisht
 
VSSML17 Review. Summary Day 1 Sessions
VSSML17 Review. Summary Day 1 SessionsVSSML17 Review. Summary Day 1 Sessions
VSSML17 Review. Summary Day 1 Sessions
BigML, Inc
 
Production model lifecycle management 2016 09
Production model lifecycle management 2016 09Production model lifecycle management 2016 09
Production model lifecycle management 2016 09
Greg Makowski
 
AIRLINE FARE PRICE PREDICTION
AIRLINE FARE PRICE PREDICTIONAIRLINE FARE PRICE PREDICTION
AIRLINE FARE PRICE PREDICTION
IRJET Journal
 
Open06
Open06Open06
Open06butest
 
What is the KMeans Clustering Algorithm and How Does an Enterprise Use it to ...
What is the KMeans Clustering Algorithm and How Does an Enterprise Use it to ...What is the KMeans Clustering Algorithm and How Does an Enterprise Use it to ...
What is the KMeans Clustering Algorithm and How Does an Enterprise Use it to ...
Smarten Augmented Analytics
 
Deepak-Computational Advertising-The LinkedIn Way
Deepak-Computational Advertising-The LinkedIn WayDeepak-Computational Advertising-The LinkedIn Way
Deepak-Computational Advertising-The LinkedIn Wayyingfeng
 
Credit scorecard
Credit scorecardCredit scorecard
Credit scorecard
Tuhin AI Advisory
 

Similar to Data Analysis: Evaluation Metrics for Supervised Learning Models of Machine Learning (20)

What Is Multilayer Perceptron Classifier And How Is It Used For Enterprise An...
What Is Multilayer Perceptron Classifier And How Is It Used For Enterprise An...What Is Multilayer Perceptron Classifier And How Is It Used For Enterprise An...
What Is Multilayer Perceptron Classifier And How Is It Used For Enterprise An...
 
Learning machine learning with Yellowbrick
Learning machine learning with YellowbrickLearning machine learning with Yellowbrick
Learning machine learning with Yellowbrick
 
EvaluationMetrics.pptx
EvaluationMetrics.pptxEvaluationMetrics.pptx
EvaluationMetrics.pptx
 
Predicting Cab Booking Cancellations- Data Mining Project
Predicting Cab Booking Cancellations- Data Mining ProjectPredicting Cab Booking Cancellations- Data Mining Project
Predicting Cab Booking Cancellations- Data Mining Project
 
Methods of Optimization in Machine Learning
Methods of Optimization in Machine LearningMethods of Optimization in Machine Learning
Methods of Optimization in Machine Learning
 
What is Hierarchical Clustering and How Can an Organization Use it to Analyze...
What is Hierarchical Clustering and How Can an Organization Use it to Analyze...What is Hierarchical Clustering and How Can an Organization Use it to Analyze...
What is Hierarchical Clustering and How Can an Organization Use it to Analyze...
 
Recommender Systems from A to Z – Model Evaluation
Recommender Systems from A to Z – Model EvaluationRecommender Systems from A to Z – Model Evaluation
Recommender Systems from A to Z – Model Evaluation
 
Copy of CRICKET MATCH WIN PREDICTOR USING LOGISTIC ...
Copy of CRICKET MATCH WIN PREDICTOR USING LOGISTIC                           ...Copy of CRICKET MATCH WIN PREDICTOR USING LOGISTIC                           ...
Copy of CRICKET MATCH WIN PREDICTOR USING LOGISTIC ...
 
Telecom Churn Analysis
Telecom Churn AnalysisTelecom Churn Analysis
Telecom Churn Analysis
 
Performance evaluation of IR models
Performance evaluation of IR modelsPerformance evaluation of IR models
Performance evaluation of IR models
 
Predict Backorder on a supply chain data for an Organization
Predict Backorder on a supply chain data for an OrganizationPredict Backorder on a supply chain data for an Organization
Predict Backorder on a supply chain data for an Organization
 
What Is Random Forest Classification And How Can It Help Your Business?
What Is Random Forest Classification And How Can It Help Your Business?What Is Random Forest Classification And How Can It Help Your Business?
What Is Random Forest Classification And How Can It Help Your Business?
 
Application of Machine Learning in Agriculture
Application of Machine  Learning in AgricultureApplication of Machine  Learning in Agriculture
Application of Machine Learning in Agriculture
 
VSSML17 Review. Summary Day 1 Sessions
VSSML17 Review. Summary Day 1 SessionsVSSML17 Review. Summary Day 1 Sessions
VSSML17 Review. Summary Day 1 Sessions
 
Production model lifecycle management 2016 09
Production model lifecycle management 2016 09Production model lifecycle management 2016 09
Production model lifecycle management 2016 09
 
AIRLINE FARE PRICE PREDICTION
AIRLINE FARE PRICE PREDICTIONAIRLINE FARE PRICE PREDICTION
AIRLINE FARE PRICE PREDICTION
 
Open06
Open06Open06
Open06
 
What is the KMeans Clustering Algorithm and How Does an Enterprise Use it to ...
What is the KMeans Clustering Algorithm and How Does an Enterprise Use it to ...What is the KMeans Clustering Algorithm and How Does an Enterprise Use it to ...
What is the KMeans Clustering Algorithm and How Does an Enterprise Use it to ...
 
Deepak-Computational Advertising-The LinkedIn Way
Deepak-Computational Advertising-The LinkedIn WayDeepak-Computational Advertising-The LinkedIn Way
Deepak-Computational Advertising-The LinkedIn Way
 
Credit scorecard
Credit scorecardCredit scorecard
Credit scorecard
 

Recently uploaded

Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...
Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...
Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...
Globus
 
SOCRadar Research Team: Latest Activities of IntelBroker
SOCRadar Research Team: Latest Activities of IntelBrokerSOCRadar Research Team: Latest Activities of IntelBroker
SOCRadar Research Team: Latest Activities of IntelBroker
SOCRadar
 
Lecture 1 Introduction to games development
Lecture 1 Introduction to games developmentLecture 1 Introduction to games development
Lecture 1 Introduction to games development
abdulrafaychaudhry
 
Exploring Innovations in Data Repository Solutions - Insights from the U.S. G...
Exploring Innovations in Data Repository Solutions - Insights from the U.S. G...Exploring Innovations in Data Repository Solutions - Insights from the U.S. G...
Exploring Innovations in Data Repository Solutions - Insights from the U.S. G...
Globus
 
Gamify Your Mind; The Secret Sauce to Delivering Success, Continuously Improv...
Gamify Your Mind; The Secret Sauce to Delivering Success, Continuously Improv...Gamify Your Mind; The Secret Sauce to Delivering Success, Continuously Improv...
Gamify Your Mind; The Secret Sauce to Delivering Success, Continuously Improv...
Shahin Sheidaei
 
Explore Modern SharePoint Templates for 2024
Explore Modern SharePoint Templates for 2024Explore Modern SharePoint Templates for 2024
Explore Modern SharePoint Templates for 2024
Sharepoint Designs
 
Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...
Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...
Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...
Globus
 
Advanced Flow Concepts Every Developer Should Know
Advanced Flow Concepts Every Developer Should KnowAdvanced Flow Concepts Every Developer Should Know
Advanced Flow Concepts Every Developer Should Know
Peter Caitens
 
Software Testing Exam imp Ques Notes.pdf
Software Testing Exam imp Ques Notes.pdfSoftware Testing Exam imp Ques Notes.pdf
Software Testing Exam imp Ques Notes.pdf
MayankTawar1
 
OpenFOAM solver for Helmholtz equation, helmholtzFoam / helmholtzBubbleFoam
OpenFOAM solver for Helmholtz equation, helmholtzFoam / helmholtzBubbleFoamOpenFOAM solver for Helmholtz equation, helmholtzFoam / helmholtzBubbleFoam
OpenFOAM solver for Helmholtz equation, helmholtzFoam / helmholtzBubbleFoam
takuyayamamoto1800
 
Strategies for Successful Data Migration Tools.pptx
Strategies for Successful Data Migration Tools.pptxStrategies for Successful Data Migration Tools.pptx
Strategies for Successful Data Migration Tools.pptx
varshanayak241
 
Webinar: Salesforce Document Management 2.0 - Smarter, Faster, Better
Webinar: Salesforce Document Management 2.0 - Smarter, Faster, BetterWebinar: Salesforce Document Management 2.0 - Smarter, Faster, Better
Webinar: Salesforce Document Management 2.0 - Smarter, Faster, Better
XfilesPro
 
2024 RoOUG Security model for the cloud.pptx
2024 RoOUG Security model for the cloud.pptx2024 RoOUG Security model for the cloud.pptx
2024 RoOUG Security model for the cloud.pptx
Georgi Kodinov
 
BoxLang: Review our Visionary Licenses of 2024
BoxLang: Review our Visionary Licenses of 2024BoxLang: Review our Visionary Licenses of 2024
BoxLang: Review our Visionary Licenses of 2024
Ortus Solutions, Corp
 
Globus Compute wth IRI Workflows - GlobusWorld 2024
Globus Compute wth IRI Workflows - GlobusWorld 2024Globus Compute wth IRI Workflows - GlobusWorld 2024
Globus Compute wth IRI Workflows - GlobusWorld 2024
Globus
 
Understanding Globus Data Transfers with NetSage
Understanding Globus Data Transfers with NetSageUnderstanding Globus Data Transfers with NetSage
Understanding Globus Data Transfers with NetSage
Globus
 
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital TransformationWSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation
WSO2
 
Developing Distributed High-performance Computing Capabilities of an Open Sci...
Developing Distributed High-performance Computing Capabilities of an Open Sci...Developing Distributed High-performance Computing Capabilities of an Open Sci...
Developing Distributed High-performance Computing Capabilities of an Open Sci...
Globus
 
How to Position Your Globus Data Portal for Success Ten Good Practices
How to Position Your Globus Data Portal for Success Ten Good PracticesHow to Position Your Globus Data Portal for Success Ten Good Practices
How to Position Your Globus Data Portal for Success Ten Good Practices
Globus
 
Into the Box 2024 - Keynote Day 2 Slides.pdf
Into the Box 2024 - Keynote Day 2 Slides.pdfInto the Box 2024 - Keynote Day 2 Slides.pdf
Into the Box 2024 - Keynote Day 2 Slides.pdf
Ortus Solutions, Corp
 

Recently uploaded (20)

Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...
Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...
Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...
 
SOCRadar Research Team: Latest Activities of IntelBroker
SOCRadar Research Team: Latest Activities of IntelBrokerSOCRadar Research Team: Latest Activities of IntelBroker
SOCRadar Research Team: Latest Activities of IntelBroker
 
Lecture 1 Introduction to games development
Lecture 1 Introduction to games developmentLecture 1 Introduction to games development
Lecture 1 Introduction to games development
 
Exploring Innovations in Data Repository Solutions - Insights from the U.S. G...
Exploring Innovations in Data Repository Solutions - Insights from the U.S. G...Exploring Innovations in Data Repository Solutions - Insights from the U.S. G...
Exploring Innovations in Data Repository Solutions - Insights from the U.S. G...
 
Gamify Your Mind; The Secret Sauce to Delivering Success, Continuously Improv...
Gamify Your Mind; The Secret Sauce to Delivering Success, Continuously Improv...Gamify Your Mind; The Secret Sauce to Delivering Success, Continuously Improv...
Gamify Your Mind; The Secret Sauce to Delivering Success, Continuously Improv...
 
Explore Modern SharePoint Templates for 2024
Explore Modern SharePoint Templates for 2024Explore Modern SharePoint Templates for 2024
Explore Modern SharePoint Templates for 2024
 
Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...
Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...
Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...
 
Advanced Flow Concepts Every Developer Should Know
Advanced Flow Concepts Every Developer Should KnowAdvanced Flow Concepts Every Developer Should Know
Advanced Flow Concepts Every Developer Should Know
 
Software Testing Exam imp Ques Notes.pdf
Software Testing Exam imp Ques Notes.pdfSoftware Testing Exam imp Ques Notes.pdf
Software Testing Exam imp Ques Notes.pdf
 
OpenFOAM solver for Helmholtz equation, helmholtzFoam / helmholtzBubbleFoam
OpenFOAM solver for Helmholtz equation, helmholtzFoam / helmholtzBubbleFoamOpenFOAM solver for Helmholtz equation, helmholtzFoam / helmholtzBubbleFoam
OpenFOAM solver for Helmholtz equation, helmholtzFoam / helmholtzBubbleFoam
 
Strategies for Successful Data Migration Tools.pptx
Strategies for Successful Data Migration Tools.pptxStrategies for Successful Data Migration Tools.pptx
Strategies for Successful Data Migration Tools.pptx
 
Webinar: Salesforce Document Management 2.0 - Smarter, Faster, Better
Webinar: Salesforce Document Management 2.0 - Smarter, Faster, BetterWebinar: Salesforce Document Management 2.0 - Smarter, Faster, Better
Webinar: Salesforce Document Management 2.0 - Smarter, Faster, Better
 
2024 RoOUG Security model for the cloud.pptx
2024 RoOUG Security model for the cloud.pptx2024 RoOUG Security model for the cloud.pptx
2024 RoOUG Security model for the cloud.pptx
 
BoxLang: Review our Visionary Licenses of 2024
BoxLang: Review our Visionary Licenses of 2024BoxLang: Review our Visionary Licenses of 2024
BoxLang: Review our Visionary Licenses of 2024
 
Globus Compute wth IRI Workflows - GlobusWorld 2024
Globus Compute wth IRI Workflows - GlobusWorld 2024Globus Compute wth IRI Workflows - GlobusWorld 2024
Globus Compute wth IRI Workflows - GlobusWorld 2024
 
Understanding Globus Data Transfers with NetSage
Understanding Globus Data Transfers with NetSageUnderstanding Globus Data Transfers with NetSage
Understanding Globus Data Transfers with NetSage
 
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital TransformationWSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation
 
Developing Distributed High-performance Computing Capabilities of an Open Sci...
Developing Distributed High-performance Computing Capabilities of an Open Sci...Developing Distributed High-performance Computing Capabilities of an Open Sci...
Developing Distributed High-performance Computing Capabilities of an Open Sci...
 
How to Position Your Globus Data Portal for Success Ten Good Practices
How to Position Your Globus Data Portal for Success Ten Good PracticesHow to Position Your Globus Data Portal for Success Ten Good Practices
How to Position Your Globus Data Portal for Success Ten Good Practices
 
Into the Box 2024 - Keynote Day 2 Slides.pdf
Into the Box 2024 - Keynote Day 2 Slides.pdfInto the Box 2024 - Keynote Day 2 Slides.pdf
Into the Box 2024 - Keynote Day 2 Slides.pdf
 

Data Analysis: Evaluation Metrics for Supervised Learning Models of Machine Learning

  • 1. DATA ANALYTICS Evaluation Metrics for Supervised Learning Models of Machine Learning Md. Main Uddin Rony Software Developer, Infolytx,Inc.
  • 3. ML Evaluation Metrics Are….. ● tied to Machine Learning Tasks ● methods which determine an algorithm’s performance and behavior ● helpful to decide the best model to meet the target performance ● helpful to parameterize the model in such a way that can offer best performing algorithm
  • 4. Evaluation Metrics Types... ● Various types of ML Algorithms (classification, regression, ranking, clustering) ● Different types of evaluation metrics for different types of algorithm ● Some metrics can be useful for more than one type of algorithm (Precision - Recall) ● Will cover Evaluation Metrics for Supervised learning models only ( Classification, Regression, Ranking)
  • 6. Classification Model Does... Predict class labels given input data In Binary classification, there are two possible output classes ( 0 or 1, True or False, Positive or Negative, Yes or No etc.) Spam detection of email is a good example of Binary classification.
  • 7. Some Popular Classification Metrics... Accuracy Confusion Matrix Log-Loss AUC
  • 8. Accuracy ● Ratio between the number of correct predictions and total number of predictions ● Example: Suppose we have 100 examples in the positive class and 200 examples in the negative class. Our model declares 80 out of 100 positives as positive correctly and 195 out of 200 negatives as negative correctly. ● So, accuracy is = (80 + 195)/(100 + 200) = 91.7%
  • 9. Confusion Matrix ● Shows a more detailed breakdown of correct and incorrect classifications for each class. ● Think about our previous example and then the confusion matrix looks like: ● What is the accuracy that positive class has ? And Negative class? ● Clearly, positive class has lower accuracy than the negative class ● And that information is lost if we calculate overall accuracy only. Predicted as positive Predicted as negative Labeled as positive 80 20 Labeled as negative 5 195
  • 10. Per-Class Accuracy ● Average per class accuracy of previous example: (80% + 97.5%)/2 = 88.75 %, different from accuracy Why important? - Can show different scenario when there are different numbers of examples per class - Class with more examples than other will dominate the statistic of accuracy, hence produced a distorted picture
  • 11. Log-Loss Very much useful when the raw output of classifier is a numeric probability instead of a class label 0 or 1 Mathematically , log-loss for a binary classifier: Minimum is 0 when prediction and true label match up Calculate for a data point predicted by classifier to belong to class 1 with probability .51 and with probability 1 Minimizing this value, maximizing the accuracy of the classifier
  • 12. AUC (Area Under Curve) ● The curve is receiver operating characteristic curve or in short ROC curve ● Provides nuanced details about the behavior of the classifier ● Bad ROC curve covers very little area ● Good ROC curve has a lot of space under it ● But, how?
  • 19. AUC (contd..) ● So, what’s the advantage of using of ROC curve over a simpler metric? ROC curve visualizes all possible classification thresholds, whereas other metrics only represents your error rate for a single threshold
  • 21. Ranking ... Is related to binary classification Internet Search can be a good example which acts as a ranker. During a query, it returns ranked list of web pages relevant to that query So, here ranking can be a binary classification of “relevant query” or “irrelevant query” It also ordering the results so that the most relevant result should be on top So, what can be done in underlying implementation considering both?? Can we predict what will ranking metrics evaluate and how?
  • 22. Some Ranking Metrics.. Precision - Recall Precision - Recall Curve and F1 Score NDCG
  • 23. Precision - Recall Considering the scenario of web search result, Precision answers this question: “Out of the items that the ranker/classifier predicted to be relevant, how many are truly relevant?” Whereas, Recall answers this: “Out of all the items that are truly relevant, how many are found by the ranker/classifier?”
  • 24. Precision - Recall (Contd..)
  • 25. Calculation Example Of Precision- Recall Total Negative = 9760 + 140 = 9900 Total Positive = 40 + 60 = 100 Total Negative prediction = 9760 + 40 = 9800 Total Positive prediction = 140 + 60 = 200 Precision = TP / (TP+FP) = 60 / (60 + 140) = 30% Recall = TP / (TP+FN) = 60 / (60+40) = 60% Predicted as Negative Predicted as Positive Actual Negative 9760 (TN) 140 (FP) Actual Positive 40 (FN) 60 (TP)
  • 26. Precision - Recall Curve When the numbers of answers returned by the ranker will change, the precision and recall score will also be changed By plotting precision versus recall over a range of k values which denotes numbers of results returned, we get the precision - recall curve
  • 29. Trade-off between Recall and Precision
  • 30. F-Measure One measure of performance that takes into account both recall and precision Harmonic mean of recall and precision: Compared to arithmetic mean, both need to be high for harmonic mean to be high
  • 31. NDCG ● Precision and recall treat all retrieved items equally. ● But, a relevant item in position 1 and a relevant item in position 5 bear same significance? ● Think about a web search result ● NDCG tries to take this scenario into account.
  • 32. What? ● NDCG stands for Normalized Discounted Cumulative Gain ● First just focus on DCG (Discounted Cumulative Gain)
  • 33. Discounted Cumulative Gain ● Popular measure for evaluating web search and related tasks. ● Discounts items that are further down the search result list ● Two assumptions: - Highly relevant documents are more useful than marginally relevant document - the lower the ranked position of a relevant document, the less useful it is for the user, since it is less likely to be examined
  • 34. Discounted Cumulative Gain ● Uses graded relevance as a measure of the usefulness, or gain, from examining a document ● Gain is accumulated starting at the top of the ranking and may be reduced, or discounted, at lower ranks ● Typical discount is 1/log (rank) - With base 2, the discount at rank 4 is ½, and at rank 8 it is 1/3
  • 35. Discounted Cumulative Gain ● DCG is the total gain accumulated at a particular rank p: ● Alternative formulation: - used by some web search companies - emphasis on retrieving highly relevant documents * Equation used from Addison Wesley’s
  • 36. DCG Example ● 10 ranked documents judged on 0-3 relevance scale: 3, 2, 3, 0, 0, 1, 2, 2, 3, 0 ● discounted gain: 3, 2/1, 3/1.59, 0, 0, 1/ 2.59, 2/2.81, 2/3 , 3/3.17, 0 = 3, 2, 1.89, 0, 0, 0.39, 0.71, 0.67, 0.95, 0 ● DCG: 3, 5, 6.89, 6.89, 6.89, 7.28, 7.99, 8.66, 9.61, 9.61 * Example used from Addison Wesley’s presentation
  • 37. Normalized DCG ● Normalized version of discounted cumulative gain ● Often normalized by comparing the DCG at each rank with the DCG value for the perfect ranking ● Normalized score always lies between 0.0 and 1.0
  • 38. NDCG Example ● Let’s look back the list of ranked document judged on relevance scale: 3, 2, 3, 0, 0, 1, 2, 2, 3, 0 ● Perfect ranking: 3, 3, 3, 2, 2, 2, 1, 0, 0, 0 ● Perfect discounted gain: 3, 3/1, 3/1.59, 2/2, 2/2.32, 2/ 2.59, 1/2.81, 0 , 0, 0 = 3, 3, 1.89, 1, 0.86, 0.77, 0.36, 0, 0, 0
  • 39. NDCG Example ● Ideal DCG values: 3, 6, 7.89, 8.89, 9.75, 10.52, 10.88, 10.88, 10.88, 10.88 NDCG values( divide actual by ideal): 3/3, 5/6, 6.89/7.89, 6.89/8.89, 6.89/9.75, 7.28/10.52, 7.99/10.88, 8.66/10.88, 9.61/10.88, 9.61/10.88 = 1, 0.83, 0.87, 0.76, 0.71, 0.69, 0.73, 0.8, 0.88, 0.88 3, 2, 3, 0, 0, 1, 2, 2, 3, 0
  • 41. What Regression Tasks do? Model learns to predict numeric scores. For example, we try to predict the price of a stock on future days given past price history and other useful information
  • 42. Some Regression Metrics.. RMSE (Root Mean Square Error) Quantiles of Errors
  • 43. RMSE The most commonly used metric for regression tasks Also known as RMSD ( root-mean-square deviation) This is defined as the square root of the average squared distance between the actual score and the predicted score:
  • 44. Quantiles of Errors RMSE is an average, so it is sensitive to large outliers. If the regressor performs really badly on a single data point, the average error could be big, not robust Quantiles (or percentiles) are much more robust Because it is not affected by large outliers It’s important to look at the median absolute percentage: It gives us a relative measure of the typical error.
  • 45. Acknowledgement Evaluating Machine Learning Models by Alice Zheng Many slides in this section are adapted from Prof. Joydeep Ghosh (UT ECE) who in turn adapted them from Prof. Dik Lee (Univ. of Science and Tech, Hong Kong) Tutorial of Data School on ROC Curves and AUC by Kevin Markham