This document discusses various machine learning evaluation metrics for supervised learning models. It covers classification, regression, and ranking metrics. For classification, it describes accuracy, confusion matrix, log-loss, and AUC. For regression, it discusses RMSE and quantiles of errors. For ranking, it explains precision-recall, precision-recall curves, F1 score, and NDCG. The document provides examples and visualizations to illustrate how these metrics are calculated and used to evaluate model performance.
Organizations are collecting massive amounts of data from disparate sources. However, they continuously face the challenge of identifying patterns, detecting anomalies, and projecting future trends based on large data sets. Machine learning for anomaly detection provides a promising alternative for the detection and classification of anomalies.
Find out how you can implement machine learning to increase speed and effectiveness in identifying and reporting anomalies.
In this webinar, we will discuss :
How machine learning can help in identifying anomalies
Steps to approach an anomaly detection problem
Various techniques available for anomaly detection
Best algorithms that fit in different situations
Implementing an anomaly detection use case on the StreamAnalytix platform
To view the webinar - https://bit.ly/2IV2ahC
This presentation covers Decision Tree as a supervised machine learning technique, talking about Information Gain method and Gini Index method with their related Algorithms.
Organizations are collecting massive amounts of data from disparate sources. However, they continuously face the challenge of identifying patterns, detecting anomalies, and projecting future trends based on large data sets. Machine learning for anomaly detection provides a promising alternative for the detection and classification of anomalies.
Find out how you can implement machine learning to increase speed and effectiveness in identifying and reporting anomalies.
In this webinar, we will discuss :
How machine learning can help in identifying anomalies
Steps to approach an anomaly detection problem
Various techniques available for anomaly detection
Best algorithms that fit in different situations
Implementing an anomaly detection use case on the StreamAnalytix platform
To view the webinar - https://bit.ly/2IV2ahC
This presentation covers Decision Tree as a supervised machine learning technique, talking about Information Gain method and Gini Index method with their related Algorithms.
Module 4: Model Selection and EvaluationSara Hooker
Delta Analytics is a 501(c)3 non-profit in the Bay Area. We believe that data is powerful, and that anybody should be able to harness it for change. Our teaching fellows partner with schools and organizations worldwide to work with students excited about the power of data to do good.
Welcome to the course! These modules will teach you the fundamental building blocks and the theory necessary to be a responsible machine learning practitioner in your own community. Each module focuses on accessible examples designed to teach you about good practices and the powerful (yet surprisingly simple) algorithms we use to model data.
To learn more about our mission or provide feedback, take a look at www.deltanalytics.org.
How to validate a model?
What is a best model ?
Types of data
Types of errors
The problem of over fitting
The problem of under fitting
Bias Variance Tradeoff
Cross validation
K-Fold Cross validation
Boot strap Cross validation
Machine Learning With Logistic RegressionKnoldus Inc.
Machine learning is the subfield of computer science that gives computers the ability to learn without being programmed. Logistic Regression is a type of classification algorithm, based on linear regression to evaluate output and to minimize the error.
Linear Regression vs Logistic Regression | EdurekaEdureka!
YouTube: https://youtu.be/OCwZyYH14uw
** Data Science Certification using R: https://www.edureka.co/data-science **
This Edureka PPT on Linear Regression Vs Logistic Regression covers the basic concepts of linear and logistic models. The following topics are covered in this session:
Types of Machine Learning
Regression Vs Classification
What is Linear Regression?
What is Logistic Regression?
Linear Regression Use Case
Logistic Regression Use Case
Linear Regression Vs Logistic Regression
Blog Series: http://bit.ly/data-science-blogs
Data Science Training Playlist: http://bit.ly/data-science-playlist
Follow us to never miss an update in the future.
YouTube: https://www.youtube.com/user/edurekaIN
Instagram: https://www.instagram.com/edureka_learning/
Facebook: https://www.facebook.com/edurekaIN/
Twitter: https://twitter.com/edurekain
LinkedIn: https://www.linkedin.com/company/edureka
This presentation is on 'Grape', which is an opinionated micro-framework for creating REST-like APIs in Ruby. For details, visit
http://intridea.github.io/grape
Module 4: Model Selection and EvaluationSara Hooker
Delta Analytics is a 501(c)3 non-profit in the Bay Area. We believe that data is powerful, and that anybody should be able to harness it for change. Our teaching fellows partner with schools and organizations worldwide to work with students excited about the power of data to do good.
Welcome to the course! These modules will teach you the fundamental building blocks and the theory necessary to be a responsible machine learning practitioner in your own community. Each module focuses on accessible examples designed to teach you about good practices and the powerful (yet surprisingly simple) algorithms we use to model data.
To learn more about our mission or provide feedback, take a look at www.deltanalytics.org.
How to validate a model?
What is a best model ?
Types of data
Types of errors
The problem of over fitting
The problem of under fitting
Bias Variance Tradeoff
Cross validation
K-Fold Cross validation
Boot strap Cross validation
Machine Learning With Logistic RegressionKnoldus Inc.
Machine learning is the subfield of computer science that gives computers the ability to learn without being programmed. Logistic Regression is a type of classification algorithm, based on linear regression to evaluate output and to minimize the error.
Linear Regression vs Logistic Regression | EdurekaEdureka!
YouTube: https://youtu.be/OCwZyYH14uw
** Data Science Certification using R: https://www.edureka.co/data-science **
This Edureka PPT on Linear Regression Vs Logistic Regression covers the basic concepts of linear and logistic models. The following topics are covered in this session:
Types of Machine Learning
Regression Vs Classification
What is Linear Regression?
What is Logistic Regression?
Linear Regression Use Case
Logistic Regression Use Case
Linear Regression Vs Logistic Regression
Blog Series: http://bit.ly/data-science-blogs
Data Science Training Playlist: http://bit.ly/data-science-playlist
Follow us to never miss an update in the future.
YouTube: https://www.youtube.com/user/edurekaIN
Instagram: https://www.instagram.com/edureka_learning/
Facebook: https://www.facebook.com/edurekaIN/
Twitter: https://twitter.com/edurekain
LinkedIn: https://www.linkedin.com/company/edureka
This presentation is on 'Grape', which is an opinionated micro-framework for creating REST-like APIs in Ruby. For details, visit
http://intridea.github.io/grape
Bridging the Gap: Machine Learning for Ubiquitous Computing -- EvaluationThomas Ploetz
Tutorial @Ubicomp 2015: Bridging the Gap -- Machine Learning for Ubiquitous Computing (evaluation session).
A tutorial on promises and pitfalls of Machine Learning for Ubicomp (and Human Computer Interaction). From Practitioners for Practitioners.
Presenter: Nils Hammerla <n.hammerla@gmail.com>
video recording of talks as they wer held at Ubicomp:
https://youtu.be/LgnnlqOIXJc?list=PLh96aGaacSgXw0MyktFqmgijLHN-aQvdq
Does your Lean and Six Sigma program properly address environmental problems? Most companies overlook these opportunities, and very few tie them into their LSS program. They are often separate and disjointed initiatives, and may be working against one another. Lean Six Sigma and the Environment was developed to show how important environmental issues are to long-term business success, and how you can leverage them within your existing improvement initiative.
Machine Learning and Data Mining: 14 Evaluation and CredibilityPier Luca Lanzi
Course "Machine Learning and Data Mining" for the degree of Computer Engineering at the Politecnico di Milano. In this lecture we discuss the evaluation of classification algorithms.
Your organization deals with many challenges presented by internal and external accountability demands. You are always looking for ways to improve operations, to anticipate and be more responsive to competitive pressures, and to define meaningful performance goals that render your work concrete in stakeholders’ eyes. Creating a dashboard or scorecard can help. A dashboard can be an excellent tool for focusing board and CEO attention on what matters most. It can help overcome asymmetry between the precision of financial and mission measures. This lesson, developed by National Arts Strategies in partnership with Peter Frumkin, Ph.D., can be used to help you build a scorecard or dashboard for your organization.
Helpdesk 2.0, A unified support dashboard for providing support management to Internal or External Customers. Customizable to be used with any support function like IT Support, Operations, HR, Finance etc.
IBM DataPower Operations Dashboard delivers an advanced operations console for centralized problem determination and advance monitoring of the DataPower infrastructure. To learn more visit us at http://www.ibm.co/1WI9orb
Analytics and Reporting: Measuring Success Along the JourneyGene Begin
Using a higher education decision journey framework, you'll learn about and work on establishing key metrics for success and a reporting structure for your multimedia efforts.
"Multilayer perceptron (MLP) is a technique of feed
forward artificial neural network using back
propagation learning method to classify the target
variable used for supervised learning. It consists of multiple layers and non-linear activation allowing it to distinguish data that is not linearly separable."
Learning machine learning with YellowbrickRebecca Bilbro
Yellowbrick is an open source Python library that provides visual diagnostic tools called “Visualizers” that extend the Scikit-Learn API to allow human steering of the model selection process. For teachers and students of machine learning, Yellowbrick can be used as a framework for teaching and understanding a large variety of algorithms and methods.
Predicting Cab Booking Cancellations- Data Mining Projectraj
The project report is on a project where we 'predict whether a cab booking cancellation will get classified properly'. The dataset is about a cab company based in Bangalore. The name of the cab company is YourCabs.com. The data set was taken from Kaggle.com. The topic deals with the cost, the company incurs in terms of misclassifying the cab cancellations as not cancelled. Thus, we understand that this classification task takes into consideration the misclassification costs. We need to obtain the lowest average cost of booking. Our analysis is also a case where one class is more important than the other i.e., one misclassification error is important than the other.
Methods of Optimization in Machine LearningKnoldus Inc.
In this session we will discuss about various methods to optimise a machine learning model and, how we can adjust the hyper-parameters to minimise the cost function.
Hierarchical Clustering is a process by which objects are classified into a number of groups so that they are as much dissimilar as possible from one group to another group and as similar as possible within each group. This technique can help an enterprise organize data into groups to identify similarities and, equally important, dissimilar groups and characteristics, so the business can target pricing, products, services, marketing messages and more.
Recommender Systems from A to Z – Model EvaluationCrossing Minds
The third meetup will be about evaluating different models for our recommender system. We will review the strategies we have to check if a model is under fitting or overfitting. After that, we will present and analyze the losses that are typically used in recommendation systems to train models. We will compare regression, classification, and rank based losses and when it's convenient to use each one. Finally, we are going to cover all the metrics that are typically used to evaluate the performance of different recommendation systems and how to test that the models are giving good results in production.
As part of our team's enrollment for Data Science Super Specialization course under UpX Academy, we submitted many projects for our final assessments, one of them was Telecom Churn Analysis Model.
The input data was provided by UpX academy and language we used is R. As part of the project, our main objective was :-
-> To predict Customer Churn.
-> To Highlight the main variables/factors influencing Customer Churn.
-> To Use various ML algorithms to build prediction models, evaluate the accuracy and performance of these models.
-> Finding out the best model for our business case & providing executive Summary.
To address the mentioned business problem, we tried to follow a thorough approach. We did a detailed level Exploratory Data Analysis which consists of various Box Plots, Bar Plots etc..
Further we tried our best to build as many Classification models possible which fits our business case (Logistic Regression/kNN/Decision Trees/Random Forest/SVM) and also tried to touch Cox Hazard Survival analysis Model. Later for every model we tried to boost their performances by applying various performance tuning techniques.
As we all are still into our learning mode w.r.t these concepts & starting new, please feel free to provide feedback on our work. Any suggestions are most welcome... :)
Thanks!!
Predict Backorder on a supply chain data for an OrganizationPiyush Srivastava
Performed cleaning and founded the important variables and created a best model using different classification techniques (Random Forest, Naïve Bayes, Decision tree, KNN, Neural Network, Support Vector Machine) to predict the back-order for an organization using the best modelling and technique approach.
Random Forest Classification is a machine learning technique utilizing aggregated outcome of many decision tree classifiers in order to improve precision of the outcome. It measures the relationship between the categorical target variable and one or more independent variables.
Application of Machine Learning in AgricultureAman Vasisht
With the growing trend of machine learning, it is needless to say how machine learning can help reap benefits in agriculture. It will be boon for the farmer welfare.
Valencian Summer School in Machine Learning 2017 - Day 1
Lectures Review: Summary Day 1 Sessions. By Mercè Martín (BigML).
https://bigml.com/events/valencian-summer-school-in-machine-learning-2017
Production model lifecycle management 2016 09Greg Makowski
This talk covers going over the various stages of building data mining models, putting them into production and eventually replacing them. A common theme throughout are three attributes of predictive models: accuracy, generalization and description. I assert you can have it all, and having all three is important for managing the lifecycle. A subtle point is that this is a step to developing embedded, automated data mining systems which can figure out themselves when they need to be updated.
The KMeans Clustering algorithm is a process by which objects are classified into number of groups so that they are as much dissimilar as possible from one group to another, and as much similar as possible within each group. This algorithm is very useful in identifying patterns within groups and understanding the common characteristics to support decisions regarding pricing, product features, risk within certain groups, etc.
Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...Globus
Large Language Models (LLMs) are currently the center of attention in the tech world, particularly for their potential to advance research. In this presentation, we'll explore a straightforward and effective method for quickly initiating inference runs on supercomputers using the vLLM tool with Globus Compute, specifically on the Polaris system at ALCF. We'll begin by briefly discussing the popularity and applications of LLMs in various fields. Following this, we will introduce the vLLM tool, and explain how it integrates with Globus Compute to efficiently manage LLM operations on Polaris. Attendees will learn the practical aspects of setting up and remotely triggering LLMs from local machines, focusing on ease of use and efficiency. This talk is ideal for researchers and practitioners looking to leverage the power of LLMs in their work, offering a clear guide to harnessing supercomputing resources for quick and effective LLM inference.
SOCRadar Research Team: Latest Activities of IntelBrokerSOCRadar
The European Union Agency for Law Enforcement Cooperation (Europol) has suffered an alleged data breach after a notorious threat actor claimed to have exfiltrated data from its systems. Infamous data leaker IntelBroker posted on the even more infamous BreachForums hacking forum, saying that Europol suffered a data breach this month.
The alleged breach affected Europol agencies CCSE, EC3, Europol Platform for Experts, Law Enforcement Forum, and SIRIUS. Infiltration of these entities can disrupt ongoing investigations and compromise sensitive intelligence shared among international law enforcement agencies.
However, this is neither the first nor the last activity of IntekBroker. We have compiled for you what happened in the last few days. To track such hacker activities on dark web sources like hacker forums, private Telegram channels, and other hidden platforms where cyber threats often originate, you can check SOCRadar’s Dark Web News.
Stay Informed on Threat Actors’ Activity on the Dark Web with SOCRadar!
Exploring Innovations in Data Repository Solutions - Insights from the U.S. G...Globus
The U.S. Geological Survey (USGS) has made substantial investments in meeting evolving scientific, technical, and policy driven demands on storing, managing, and delivering data. As these demands continue to grow in complexity and scale, the USGS must continue to explore innovative solutions to improve its management, curation, sharing, delivering, and preservation approaches for large-scale research data. Supporting these needs, the USGS has partnered with the University of Chicago-Globus to research and develop advanced repository components and workflows leveraging its current investment in Globus. The primary outcome of this partnership includes the development of a prototype enterprise repository, driven by USGS Data Release requirements, through exploration and implementation of the entire suite of the Globus platform offerings, including Globus Flow, Globus Auth, Globus Transfer, and Globus Search. This presentation will provide insights into this research partnership, introduce the unique requirements and challenges being addressed and provide relevant project progress.
Gamify Your Mind; The Secret Sauce to Delivering Success, Continuously Improv...Shahin Sheidaei
Games are powerful teaching tools, fostering hands-on engagement and fun. But they require careful consideration to succeed. Join me to explore factors in running and selecting games, ensuring they serve as effective teaching tools. Learn to maintain focus on learning objectives while playing, and how to measure the ROI of gaming in education. Discover strategies for pitching gaming to leadership. This session offers insights, tips, and examples for coaches, team leads, and enterprise leaders seeking to teach from simple to complex concepts.
Modern design is crucial in today's digital environment, and this is especially true for SharePoint intranets. The design of these digital hubs is critical to user engagement and productivity enhancement. They are the cornerstone of internal collaboration and interaction within enterprises.
Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...Globus
The Earth System Grid Federation (ESGF) is a global network of data servers that archives and distributes the planet’s largest collection of Earth system model output for thousands of climate and environmental scientists worldwide. Many of these petabyte-scale data archives are located in proximity to large high-performance computing (HPC) or cloud computing resources, but the primary workflow for data users consists of transferring data, and applying computations on a different system. As a part of the ESGF 2.0 US project (funded by the United States Department of Energy Office of Science), we developed pre-defined data workflows, which can be run on-demand, capable of applying many data reduction and data analysis to the large ESGF data archives, transferring only the resultant analysis (ex. visualizations, smaller data files). In this talk, we will showcase a few of these workflows, highlighting how Globus Flows can be used for petabyte-scale climate analysis.
Advanced Flow Concepts Every Developer Should KnowPeter Caitens
Tim Combridge from Sensible Giraffe and Salesforce Ben presents some important tips that all developers should know when dealing with Flows in Salesforce.
OpenFOAM solver for Helmholtz equation, helmholtzFoam / helmholtzBubbleFoamtakuyayamamoto1800
In this slide, we show the simulation example and the way to compile this solver.
In this solver, the Helmholtz equation can be solved by helmholtzFoam. Also, the Helmholtz equation with uniformly dispersed bubbles can be simulated by helmholtzBubbleFoam.
Strategies for Successful Data Migration Tools.pptxvarshanayak241
Data migration is a complex but essential task for organizations aiming to modernize their IT infrastructure and leverage new technologies. By understanding common challenges and implementing these strategies, businesses can achieve a successful migration with minimal disruption. Data Migration Tool like Ask On Data play a pivotal role in this journey, offering features that streamline the process, ensure data integrity, and maintain security. With the right approach and tools, organizations can turn the challenge of data migration into an opportunity for growth and innovation.
Check out the webinar slides to learn more about how XfilesPro transforms Salesforce document management by leveraging its world-class applications. For more details, please connect with sales@xfilespro.com
If you want to watch the on-demand webinar, please click here: https://www.xfilespro.com/webinars/salesforce-document-management-2-0-smarter-faster-better/
Unleash Unlimited Potential with One-Time Purchase
BoxLang is more than just a language; it's a community. By choosing a Visionary License, you're not just investing in your success, you're actively contributing to the ongoing development and support of BoxLang.
Globus Compute wth IRI Workflows - GlobusWorld 2024Globus
As part of the DOE Integrated Research Infrastructure (IRI) program, NERSC at Lawrence Berkeley National Lab and ALCF at Argonne National Lab are working closely with General Atomics on accelerating the computing requirements of the DIII-D experiment. As part of the work the team is investigating ways to speedup the time to solution for many different parts of the DIII-D workflow including how they run jobs on HPC systems. One of these routes is looking at Globus Compute as a way to replace the current method for managing tasks and we describe a brief proof of concept showing how Globus Compute could help to schedule jobs and be a tool to connect compute at different facilities.
Understanding Globus Data Transfers with NetSageGlobus
NetSage is an open privacy-aware network measurement, analysis, and visualization service designed to help end-users visualize and reason about large data transfers. NetSage traditionally has used a combination of passive measurements, including SNMP and flow data, as well as active measurements, mainly perfSONAR, to provide longitudinal network performance data visualization. It has been deployed by dozens of networks world wide, and is supported domestically by the Engagement and Performance Operations Center (EPOC), NSF #2328479. We have recently expanded the NetSage data sources to include logs for Globus data transfers, following the same privacy-preserving approach as for Flow data. Using the logs for the Texas Advanced Computing Center (TACC) as an example, this talk will walk through several different example use cases that NetSage can answer, including: Who is using Globus to share data with my institution, and what kind of performance are they able to achieve? How many transfers has Globus supported for us? Which sites are we sharing the most data with, and how is that changing over time? How is my site using Globus to move data internally, and what kind of performance do we see for those transfers? What percentage of data transfers at my institution used Globus, and how did the overall data transfer performance compare to the Globus users?
Developing Distributed High-performance Computing Capabilities of an Open Sci...Globus
COVID-19 had an unprecedented impact on scientific collaboration. The pandemic and its broad response from the scientific community has forged new relationships among public health practitioners, mathematical modelers, and scientific computing specialists, while revealing critical gaps in exploiting advanced computing systems to support urgent decision making. Informed by our team’s work in applying high-performance computing in support of public health decision makers during the COVID-19 pandemic, we present how Globus technologies are enabling the development of an open science platform for robust epidemic analysis, with the goal of collaborative, secure, distributed, on-demand, and fast time-to-solution analyses to support public health.
How to Position Your Globus Data Portal for Success Ten Good PracticesGlobus
Science gateways allow science and engineering communities to access shared data, software, computing services, and instruments. Science gateways have gained a lot of traction in the last twenty years, as evidenced by projects such as the Science Gateways Community Institute (SGCI) and the Center of Excellence on Science Gateways (SGX3) in the US, The Australian Research Data Commons (ARDC) and its platforms in Australia, and the projects around Virtual Research Environments in Europe. A few mature frameworks have evolved with their different strengths and foci and have been taken up by a larger community such as the Globus Data Portal, Hubzero, Tapis, and Galaxy. However, even when gateways are built on successful frameworks, they continue to face the challenges of ongoing maintenance costs and how to meet the ever-expanding needs of the community they serve with enhanced features. It is not uncommon that gateways with compelling use cases are nonetheless unable to get past the prototype phase and become a full production service, or if they do, they don't survive more than a couple of years. While there is no guaranteed pathway to success, it seems likely that for any gateway there is a need for a strong community and/or solid funding streams to create and sustain its success. With over twenty years of examples to draw from, this presentation goes into detail for ten factors common to successful and enduring gateways that effectively serve as best practices for any new or developing gateway.
Into the Box Keynote Day 2: Unveiling amazing updates and announcements for modern CFML developers! Get ready for exciting releases and updates on Ortus tools and products. Stay tuned for cutting-edge innovations designed to boost your productivity.
3. ML Evaluation Metrics Are…..
● tied to Machine Learning Tasks
● methods which determine an algorithm’s performance and behavior
● helpful to decide the best model to meet the target performance
● helpful to parameterize the model in such a way that can offer best
performing algorithm
4. Evaluation Metrics Types...
● Various types of ML Algorithms (classification, regression, ranking,
clustering)
● Different types of evaluation metrics for different types of algorithm
● Some metrics can be useful for more than one type of algorithm
(Precision - Recall)
● Will cover Evaluation Metrics for Supervised learning models only (
Classification, Regression, Ranking)
6. Classification Model Does...
Predict class labels given input data
In Binary classification, there are two possible output classes ( 0 or 1, True
or False, Positive or Negative, Yes or No etc.)
Spam detection of email is a good example of Binary classification.
8. Accuracy
● Ratio between the number of correct predictions and total number of
predictions
● Example: Suppose we have 100 examples in the positive class and 200
examples in the negative class. Our model declares 80 out of 100
positives as positive correctly and 195 out of 200 negatives as negative
correctly.
● So, accuracy is = (80 + 195)/(100 + 200) = 91.7%
9. Confusion Matrix
● Shows a more detailed breakdown of correct and incorrect classifications for each
class.
● Think about our previous example and then the confusion matrix looks like:
● What is the accuracy that positive class has ? And Negative class?
● Clearly, positive class has lower accuracy than the negative class
● And that information is lost if we calculate overall accuracy only.
Predicted as positive Predicted as negative
Labeled as positive 80 20
Labeled as negative 5 195
10. Per-Class Accuracy
● Average per class accuracy of previous example:
(80% + 97.5%)/2 = 88.75 %, different from accuracy
Why important?
- Can show different scenario when there are different numbers of
examples per class
- Class with more examples than other will dominate the statistic of
accuracy, hence produced a distorted picture
11. Log-Loss
Very much useful when the raw output of classifier is a numeric probability
instead of a class label 0 or 1
Mathematically , log-loss for a binary classifier:
Minimum is 0 when prediction and true label match up
Calculate for a data point predicted by classifier to belong to class 1 with
probability .51 and with probability 1
Minimizing this value, maximizing the accuracy of the classifier
12. AUC (Area Under Curve)
● The curve is receiver operating
characteristic curve or in short ROC
curve
● Provides nuanced details about the
behavior of the classifier
● Bad ROC curve covers very little area
● Good ROC curve has a lot of space
under it
● But, how?
19. AUC (contd..)
● So, what’s the advantage of using of ROC curve over a simpler metric?
ROC curve visualizes all possible classification thresholds, whereas
other metrics only represents your error rate for a single threshold
21. Ranking ...
Is related to binary classification
Internet Search can be a good example which acts as a ranker.
During a query, it returns ranked list of web pages relevant to that query
So, here ranking can be a binary classification of “relevant query” or
“irrelevant query”
It also ordering the results so that the most relevant result should be on top
So, what can be done in underlying implementation considering both??
Can we predict what will ranking metrics evaluate and how?
23. Precision - Recall
Considering the scenario of web search result, Precision answers this
question:
“Out of the items that the ranker/classifier predicted to be relevant, how many are
truly relevant?”
Whereas, Recall answers this:
“Out of all the items that are truly relevant, how many are found by the
ranker/classifier?”
25. Calculation Example Of Precision- Recall
Total Negative = 9760 + 140 = 9900
Total Positive = 40 + 60 = 100 Total
Negative prediction = 9760 + 40 = 9800 Total
Positive prediction = 140 + 60 = 200
Precision = TP / (TP+FP)
= 60 / (60 + 140) = 30%
Recall = TP / (TP+FN)
= 60 / (60+40) = 60%
Predicted as
Negative
Predicted as
Positive
Actual
Negative
9760 (TN) 140 (FP)
Actual
Positive
40 (FN) 60 (TP)
26. Precision - Recall Curve
When the numbers of answers returned by
the ranker will change, the precision and
recall score will also be changed
By plotting precision versus recall over a
range of k values which denotes
numbers of results returned, we get the
precision - recall curve
30. F-Measure
One measure of performance that takes into account both recall and
precision
Harmonic mean of recall and precision:
Compared to arithmetic mean, both need to be high for harmonic mean to
be high
31. NDCG
● Precision and recall treat all retrieved items equally.
● But, a relevant item in position 1 and a relevant item in position 5 bear
same significance?
● Think about a web search result
● NDCG tries to take this scenario into account.
32. What?
● NDCG stands for Normalized Discounted Cumulative Gain
● First just focus on DCG (Discounted Cumulative Gain)
33. Discounted Cumulative Gain
● Popular measure for evaluating web search and related tasks.
● Discounts items that are further down the search result list
● Two assumptions:
- Highly relevant documents are more useful than marginally relevant
document
- the lower the ranked position of a relevant document, the less useful it is
for the user, since it is less likely to be examined
34. Discounted Cumulative Gain
● Uses graded relevance as a measure of the usefulness, or gain, from
examining a document
● Gain is accumulated starting at the top of the ranking and may be
reduced, or discounted, at lower ranks
● Typical discount is 1/log (rank)
- With base 2, the discount at rank 4 is ½, and at rank 8 it is 1/3
35. Discounted Cumulative Gain
● DCG is the total gain accumulated at a particular rank p:
● Alternative formulation:
- used by some web search companies
- emphasis on retrieving highly relevant documents
* Equation used from Addison Wesley’s
37. Normalized DCG
● Normalized version of discounted cumulative gain
● Often normalized by comparing the DCG at each rank with the DCG value
for the perfect ranking
● Normalized score always lies between 0.0 and 1.0
41. What Regression Tasks do?
Model learns to predict numeric scores.
For example, we try to predict the price of a stock on future days given past
price history and other useful information
43. RMSE
The most commonly used metric for regression tasks
Also known as RMSD ( root-mean-square deviation)
This is defined as the square root of the average squared distance between
the actual score and the predicted score:
44. Quantiles of Errors
RMSE is an average, so it is sensitive to large outliers.
If the regressor performs really badly on a single data point, the average
error could be big, not robust
Quantiles (or percentiles) are much more robust
Because it is not affected by large outliers
It’s important to look at the median absolute percentage:
It gives us a relative measure of the typical error.
45. Acknowledgement
Evaluating Machine Learning Models by Alice Zheng
Many slides in this section are adapted from Prof. Joydeep Ghosh (UT ECE)
who in turn adapted them from Prof. Dik Lee (Univ. of Science and Tech,
Hong Kong)
Tutorial of Data School on ROC Curves and AUC by Kevin Markham