Presentation from PyData Warsaw 2019 by Jakub Czakon.
Choosing a proper metric is a crucial yet difficult part of the machine learning project. In this talk, you will learn about a number of common and lesser-known metrics and performance charts as well as typical decisions when it comes to choosing one for your project. Hopefully, with all that knowledge you will be fully equipped to deal with metric-related problems in your future projects!
Structure from motion is a computer vision technique used to recover the three-dimensional structure of a scene and the camera motion from a set of images. It can be used to build 3D models of scenes without any prior knowledge of the camera parameters or 3D locations of the scene points. Structure from motion involves detecting feature points in multiple images, matching the features between images, estimating the fundamental matrices between image pairs, and then optimizing a bundle adjustment problem to simultaneously compute the 3D structure and camera motion parameters. Some applications of structure from motion include 3D modeling, surveying, robot navigation, virtual and augmented reality, and visual effects.
Advantages and disadvantages of machine learning languagebusiness Corporate
This document discusses the advantages and disadvantages of machine learning language. Some key advantages are that machine learning can be applied across many industries like banking, healthcare and retail. It allows for the processing of large and complex data and can reduce time cycles and utilize resources efficiently. However, some disadvantages are that acquiring and interpreting large datasets for machine learning algorithms can be challenging. Machine learning also requires a lot of training data and may be limited in unpredictable situations since it relies on historical data patterns. It can also be susceptible to errors that are difficult to diagnose and correct.
The Cyrus-Beck algorithm is used for line clipping against non-rectangular convex polygons. It uses a parametric equation to find the intersection point of the line with the polygon boundary. The algorithm calculates the time values for the line endpoints at each polygon edge, then uses those times in the parametric equation to find the clipped line segment P'0 and P'1 that is visible within the polygon clipping window.
Confusion matrix and classification evaluation metricsMinesh A. Jethva
This document discusses classification evaluation metrics and their limitations. It introduces the confusion matrix and metrics calculated from it such as precision, recall, F1-score, and accuracy. The summary highlights that these metrics can be "hacked" and misleading. More robust alternatives like balanced accuracy and MCC are presented that account for true negatives and are not as affected by class imbalance. Comprehensive reporting of multiple metrics from different perspectives is recommended for fully understanding a model's performance.
Scoring Metrics for Classification ModelsKNIMESlides
You have trained a classification model with a highly sophisticated Machine Learning algorithm. Right. It is now time to evaluate its performance on test data, i.e. to score it.
A number of scoring metrics have been proposed over the years in different domains: sensitivity and specificity, precision and recall, accuracy, area under the curve, Cohen’s Kappa, and many more. Generally, they are based on values reported in a confusion matrix.
These slides are from a webinar we presented where we explore the concept of confusion matrix, true/false positives/negatives, and the related, most commonly used scoring metrics for classification models. We also demonstrate how to calculate all those metrics within KNIME Analytics Platform. https://www.knime.com/knime-software/knime-analytics-platform
View the webinar here: https://youtu.be/dOqRjeOv1VA
This document summarizes key concepts in machine learning evaluation including:
1. Common evaluation metrics like accuracy, precision, recall, and ROC curves.
2. Offline evaluation techniques like cross-validation to estimate model performance.
3. Hyperparameter tuning to optimize model configuration.
4. A/B testing to evaluate model impact by comparing control and experiment groups.
5. Causal effect measurement techniques like propensity score matching to estimate treatment effects.
Machine Learning Performance metrics for classificationKuppusamy P
The document discusses different performance metrics for evaluating outlier detection models: recall, false alarm rate, ROC curves, and AUC. ROC curves plot the true positive rate against the false positive rate. AUC measures the entire area under the ROC curve, indicating how well a model can distinguish between classes. F1 score provides a balanced measure of a model's precision and recall that is better than accuracy alone when classes are unevenly distributed. Accuracy can be high even when a model misses many true outliers, so F1 score is a more appropriate metric for outlier detection evaluation.
This document discusses feature selection concepts and methods. It defines features as attributes that determine which class an instance belongs to. Feature selection aims to select a relevant subset of features by removing irrelevant, redundant and unnecessary data. This improves learning accuracy, model performance and interpretability. The document categorizes feature selection algorithms as filter, wrapper or embedded methods based on how they evaluate feature subsets. It also discusses concepts like feature relevance, search strategies, successor generation and evaluation measures used in feature selection algorithms.
Structure from motion is a computer vision technique used to recover the three-dimensional structure of a scene and the camera motion from a set of images. It can be used to build 3D models of scenes without any prior knowledge of the camera parameters or 3D locations of the scene points. Structure from motion involves detecting feature points in multiple images, matching the features between images, estimating the fundamental matrices between image pairs, and then optimizing a bundle adjustment problem to simultaneously compute the 3D structure and camera motion parameters. Some applications of structure from motion include 3D modeling, surveying, robot navigation, virtual and augmented reality, and visual effects.
Advantages and disadvantages of machine learning languagebusiness Corporate
This document discusses the advantages and disadvantages of machine learning language. Some key advantages are that machine learning can be applied across many industries like banking, healthcare and retail. It allows for the processing of large and complex data and can reduce time cycles and utilize resources efficiently. However, some disadvantages are that acquiring and interpreting large datasets for machine learning algorithms can be challenging. Machine learning also requires a lot of training data and may be limited in unpredictable situations since it relies on historical data patterns. It can also be susceptible to errors that are difficult to diagnose and correct.
The Cyrus-Beck algorithm is used for line clipping against non-rectangular convex polygons. It uses a parametric equation to find the intersection point of the line with the polygon boundary. The algorithm calculates the time values for the line endpoints at each polygon edge, then uses those times in the parametric equation to find the clipped line segment P'0 and P'1 that is visible within the polygon clipping window.
Confusion matrix and classification evaluation metricsMinesh A. Jethva
This document discusses classification evaluation metrics and their limitations. It introduces the confusion matrix and metrics calculated from it such as precision, recall, F1-score, and accuracy. The summary highlights that these metrics can be "hacked" and misleading. More robust alternatives like balanced accuracy and MCC are presented that account for true negatives and are not as affected by class imbalance. Comprehensive reporting of multiple metrics from different perspectives is recommended for fully understanding a model's performance.
Scoring Metrics for Classification ModelsKNIMESlides
You have trained a classification model with a highly sophisticated Machine Learning algorithm. Right. It is now time to evaluate its performance on test data, i.e. to score it.
A number of scoring metrics have been proposed over the years in different domains: sensitivity and specificity, precision and recall, accuracy, area under the curve, Cohen’s Kappa, and many more. Generally, they are based on values reported in a confusion matrix.
These slides are from a webinar we presented where we explore the concept of confusion matrix, true/false positives/negatives, and the related, most commonly used scoring metrics for classification models. We also demonstrate how to calculate all those metrics within KNIME Analytics Platform. https://www.knime.com/knime-software/knime-analytics-platform
View the webinar here: https://youtu.be/dOqRjeOv1VA
This document summarizes key concepts in machine learning evaluation including:
1. Common evaluation metrics like accuracy, precision, recall, and ROC curves.
2. Offline evaluation techniques like cross-validation to estimate model performance.
3. Hyperparameter tuning to optimize model configuration.
4. A/B testing to evaluate model impact by comparing control and experiment groups.
5. Causal effect measurement techniques like propensity score matching to estimate treatment effects.
Machine Learning Performance metrics for classificationKuppusamy P
The document discusses different performance metrics for evaluating outlier detection models: recall, false alarm rate, ROC curves, and AUC. ROC curves plot the true positive rate against the false positive rate. AUC measures the entire area under the ROC curve, indicating how well a model can distinguish between classes. F1 score provides a balanced measure of a model's precision and recall that is better than accuracy alone when classes are unevenly distributed. Accuracy can be high even when a model misses many true outliers, so F1 score is a more appropriate metric for outlier detection evaluation.
This document discusses feature selection concepts and methods. It defines features as attributes that determine which class an instance belongs to. Feature selection aims to select a relevant subset of features by removing irrelevant, redundant and unnecessary data. This improves learning accuracy, model performance and interpretability. The document categorizes feature selection algorithms as filter, wrapper or embedded methods based on how they evaluate feature subsets. It also discusses concepts like feature relevance, search strategies, successor generation and evaluation measures used in feature selection algorithms.
This document provides an overview of decision tree algorithms for machine learning. It discusses key concepts such as:
- Decision trees can be used for classification or regression problems.
- They represent rules that can be understood by humans and used in knowledge systems.
- The trees are built by splitting the data into purer subsets based on attribute tests, using measures like information gain.
- Issues like overfitting are addressed through techniques like reduced error pruning and rule post-pruning.
Attention Is All You Need.
With these simple words, the Deep Learning industry was forever changed. Transformers were initially introduced in the field of Natural Language Processing to enhance language translation, but they demonstrated astonishing results even outside language processing. In particular, they recently spread in the Computer Vision community, advancing the state-of-the-art on many vision tasks. But what are Transformers? What is the mechanism of self-attention, and do we really need it? How did they revolutionize Computer Vision? Will they ever replace convolutional neural networks?
These and many other questions will be answered during the talk.
In this tech talk, we will discuss:
- A piece of history: Why did we need a new architecture?
- What is self-attention, and where does this concept come from?
- The Transformer architecture and its mechanisms
- Vision Transformers: An Image is worth 16x16 words
- Video Understanding using Transformers: the space + time approach
- The scale and data problem: Is Attention what we really need?
- The future of Computer Vision through Transformers
Speaker: Davide Coccomini, Nicola Messina
Website: https://www.aicamp.ai/event/eventdetails/W2021101110
This document summarizes key topics from a session on problem solving by search algorithms in artificial intelligence. It discusses uninformed search strategies like breadth-first search and depth-first search. It also covers informed, heuristic search strategies such as greedy best-first search and A* search which use heuristic functions to estimate distance to the goal. Examples are provided to illustrate best first search, and it describes how this algorithm expands nodes and uses priority queues to order nodes by estimated cost. The next session is slated to cover the A* search algorithm in more detail.
The document discusses artificial intelligence and pattern recognition. It introduces various pattern recognition concepts including defining a pattern, examples of patterns in different domains, and approaches to pattern recognition. It also provides an example of using discriminative methods to classify fish into salmon and sea bass using optical sensing and extracted features.
The document discusses capsule networks and their advantages over traditional convolutional neural networks. It covers the original capsule network proposed by Sabour et al. in 2017, as well as extensions like EM routing proposed by Hinton in 2018 and unsupervised training methods proposed by Rawlinson in 2018. Capsule networks represent entities as vectors whose magnitude represents presence and direction represents properties. Dynamic routing allows information to be routed between capsules based on agreement of their predictions.
Taking your machine learning workflow to the next level using Scikit-Learn Pi...Philip Goddard
This document discusses using scikit-learn pipelines to build machine learning workflows in a modular way. It describes how pipelines can encapsulate data preparation steps as well as model training. The document then provides a case study example of building a pipeline to predict customer churn. Key steps include designing pipeline components to handle different data types, writing custom transformers when needed, and using grid search cross-validation to tune hyperparameters of estimators added to the end of the pipeline.
Hill Climbing Algorithm in Artificial IntelligenceBharat Bhushan
Hill Climbing Algorithm in Artificial Intelligence
Hill climbing algorithm is a local search algorithm which continuously moves in the direction of increasing elevation/value to find the peak of the mountain or best solution to the problem. It terminates when it reaches a peak value where no neighbor has a higher value.
Hill climbing algorithm is a technique which is used for optimizing the mathematical problems. One of the widely discussed examples of Hill climbing algorithm is Traveling-salesman Problem in which we need to minimize the distance traveled by the salesman.
It is also called greedy local search as it only looks to its good immediate neighbor state and not beyond that.
A node of hill climbing algorithm has two components which are state and value.
Hill Climbing is mostly used when a good heuristic is available.
In this algorithm, we don't need to maintain and handle the search tree or graph as it only keeps a single current state.
Features of Hill Climbing:
Following are some main features of Hill Climbing Algorithm:
Generate and Test variant: Hill Climbing is the variant of Generate and Test method. The Generate and Test method produce feedback which helps to decide which direction to move in the search space.
Greedy approach: Hill-climbing algorithm search moves in the direction which optimizes the cost.
No backtracking: It does not backtrack the search space, as it does not remember the previous states.
State-space Diagram for Hill Climbing:
The state-space landscape is a graphical representation of the hill-climbing algorithm which is showing a graph between various states of algorithm and Objective function/Cost.
On Y-axis we have taken the function which can be an objective function or cost function, and state-space on the x-axis. If the function on Y-axis is cost then, the goal of search is to find the global minimum and local minimum. If the function of Y-axis is Objective function, then the goal of the search is to find the global maximum and local maximum.
http://imatge-upc.github.io/telecombcn-2016-dlcv/
Deep learning technologies are at the core of the current revolution in artificial intelligence for multimedia data analysis. The convergence of big annotated data and affordable GPU hardware has allowed the training of neural networks for data analysis tasks which had been addressed until now with hand-crafted features. Architectures such as convolutional neural networks, recurrent neural networks and Q-nets for reinforcement learning have shaped a brand new scenario in signal processing. This course will cover the basic principles and applications of deep learning to computer vision problems, such as image classification, object detection or text captioning.
The document discusses machine learning concepts including modeling, evaluation, model selection, training models, and addressing issues like overfitting and underfitting. It explains that modeling tries to emulate human learning through mathematical and statistical formulations. Evaluation methods like holdout, k-fold cross-validation, and leave-one-out cross-validation are used to select models and train them on datasets while avoiding overfitting or underfitting issues. Parametric models have fixed parameters while non-parametric models are based on training data.
This document provides an outline for a course on neural networks and fuzzy systems. The course is divided into two parts, with the first 11 weeks covering neural networks topics like multi-layer feedforward networks, backpropagation, and gradient descent. The document explains that multi-layer networks are needed to solve nonlinear problems by dividing the problem space into smaller linear regions. It also provides notation for multi-layer networks and shows how backpropagation works to calculate weight updates for each layer.
This document describes the syllabus for a course on operating systems. It includes 5 units that will cover topics like process management, CPU scheduling, deadlocks, memory management, file systems, and system calls. The course objectives are to introduce operating system concepts and design issues. Students will learn how to control access to computers and files, recognize user problems, and understand how programming languages, operating systems, and architectures interact.
This document discusses classifying handwritten digits using the MNIST dataset with a simple linear machine learning model. It begins by introducing the MNIST dataset of images and corresponding labels. It then discusses using a linear model with weights and biases to make predictions for each image. The weights represent a filter to distinguish digits. The model is trained using gradient descent to minimize the cross-entropy cost function by adjusting the weights and biases based on batches of training data. The goal is to improve the model's ability to correctly classify handwritten digit images.
Brief introduction to graph based pattern recognition. It shows advantages and disantavantages of using graphs and how existing pattern recognition techniques are adapted to graph space.
The document discusses artificial neural networks and backpropagation. It provides an overview of backpropagation algorithms, including how they were developed over time, the basic methodology of propagating errors backwards, and typical network architectures. It also gives examples of applying backpropagation to problems like robotics, space robots, handwritten digit recognition, and face recognition.
Applying Statistical Modeling and Machine Learning to Perform Time-Series For...PyData
Forecasting time-series data has applications in many fields, including finance, health, etc. There are potential pitfalls when applying classic statistical and machine learning methods to time-series problems. This talk will give folks the basic toolbox to analyze time-series data and perform forecasting using statistical and machine learning models, as well as interpret and convey the outputs.
The document presents an efficient FPGA implementation of convolution that reduces processing time using hardware computing. It implements the discrete linear convolution of two finite length sequences. The existing system uses DSP processors that consume more power and require more chip area with low speed. The proposed system implements convolution using VLSI architecture, consuming less power and requiring less chip area with high speed. It also works for signed and unsigned numbers and reduces processing time.
24 Evaluation Metrics for Binary Classification.
For every metric information about:
- What is the definition and intuition behind it,
- The non-technical explanation that you can communicate to business stakeholders,
- How to calculate or plot it,
- When should you use it.
Recommender Systems from A to Z – Model EvaluationCrossing Minds
The third meetup will be about evaluating different models for our recommender system. We will review the strategies we have to check if a model is under fitting or overfitting. After that, we will present and analyze the losses that are typically used in recommendation systems to train models. We will compare regression, classification, and rank based losses and when it's convenient to use each one. Finally, we are going to cover all the metrics that are typically used to evaluate the performance of different recommendation systems and how to test that the models are giving good results in production.
This document provides an overview of decision tree algorithms for machine learning. It discusses key concepts such as:
- Decision trees can be used for classification or regression problems.
- They represent rules that can be understood by humans and used in knowledge systems.
- The trees are built by splitting the data into purer subsets based on attribute tests, using measures like information gain.
- Issues like overfitting are addressed through techniques like reduced error pruning and rule post-pruning.
Attention Is All You Need.
With these simple words, the Deep Learning industry was forever changed. Transformers were initially introduced in the field of Natural Language Processing to enhance language translation, but they demonstrated astonishing results even outside language processing. In particular, they recently spread in the Computer Vision community, advancing the state-of-the-art on many vision tasks. But what are Transformers? What is the mechanism of self-attention, and do we really need it? How did they revolutionize Computer Vision? Will they ever replace convolutional neural networks?
These and many other questions will be answered during the talk.
In this tech talk, we will discuss:
- A piece of history: Why did we need a new architecture?
- What is self-attention, and where does this concept come from?
- The Transformer architecture and its mechanisms
- Vision Transformers: An Image is worth 16x16 words
- Video Understanding using Transformers: the space + time approach
- The scale and data problem: Is Attention what we really need?
- The future of Computer Vision through Transformers
Speaker: Davide Coccomini, Nicola Messina
Website: https://www.aicamp.ai/event/eventdetails/W2021101110
This document summarizes key topics from a session on problem solving by search algorithms in artificial intelligence. It discusses uninformed search strategies like breadth-first search and depth-first search. It also covers informed, heuristic search strategies such as greedy best-first search and A* search which use heuristic functions to estimate distance to the goal. Examples are provided to illustrate best first search, and it describes how this algorithm expands nodes and uses priority queues to order nodes by estimated cost. The next session is slated to cover the A* search algorithm in more detail.
The document discusses artificial intelligence and pattern recognition. It introduces various pattern recognition concepts including defining a pattern, examples of patterns in different domains, and approaches to pattern recognition. It also provides an example of using discriminative methods to classify fish into salmon and sea bass using optical sensing and extracted features.
The document discusses capsule networks and their advantages over traditional convolutional neural networks. It covers the original capsule network proposed by Sabour et al. in 2017, as well as extensions like EM routing proposed by Hinton in 2018 and unsupervised training methods proposed by Rawlinson in 2018. Capsule networks represent entities as vectors whose magnitude represents presence and direction represents properties. Dynamic routing allows information to be routed between capsules based on agreement of their predictions.
Taking your machine learning workflow to the next level using Scikit-Learn Pi...Philip Goddard
This document discusses using scikit-learn pipelines to build machine learning workflows in a modular way. It describes how pipelines can encapsulate data preparation steps as well as model training. The document then provides a case study example of building a pipeline to predict customer churn. Key steps include designing pipeline components to handle different data types, writing custom transformers when needed, and using grid search cross-validation to tune hyperparameters of estimators added to the end of the pipeline.
Hill Climbing Algorithm in Artificial IntelligenceBharat Bhushan
Hill Climbing Algorithm in Artificial Intelligence
Hill climbing algorithm is a local search algorithm which continuously moves in the direction of increasing elevation/value to find the peak of the mountain or best solution to the problem. It terminates when it reaches a peak value where no neighbor has a higher value.
Hill climbing algorithm is a technique which is used for optimizing the mathematical problems. One of the widely discussed examples of Hill climbing algorithm is Traveling-salesman Problem in which we need to minimize the distance traveled by the salesman.
It is also called greedy local search as it only looks to its good immediate neighbor state and not beyond that.
A node of hill climbing algorithm has two components which are state and value.
Hill Climbing is mostly used when a good heuristic is available.
In this algorithm, we don't need to maintain and handle the search tree or graph as it only keeps a single current state.
Features of Hill Climbing:
Following are some main features of Hill Climbing Algorithm:
Generate and Test variant: Hill Climbing is the variant of Generate and Test method. The Generate and Test method produce feedback which helps to decide which direction to move in the search space.
Greedy approach: Hill-climbing algorithm search moves in the direction which optimizes the cost.
No backtracking: It does not backtrack the search space, as it does not remember the previous states.
State-space Diagram for Hill Climbing:
The state-space landscape is a graphical representation of the hill-climbing algorithm which is showing a graph between various states of algorithm and Objective function/Cost.
On Y-axis we have taken the function which can be an objective function or cost function, and state-space on the x-axis. If the function on Y-axis is cost then, the goal of search is to find the global minimum and local minimum. If the function of Y-axis is Objective function, then the goal of the search is to find the global maximum and local maximum.
http://imatge-upc.github.io/telecombcn-2016-dlcv/
Deep learning technologies are at the core of the current revolution in artificial intelligence for multimedia data analysis. The convergence of big annotated data and affordable GPU hardware has allowed the training of neural networks for data analysis tasks which had been addressed until now with hand-crafted features. Architectures such as convolutional neural networks, recurrent neural networks and Q-nets for reinforcement learning have shaped a brand new scenario in signal processing. This course will cover the basic principles and applications of deep learning to computer vision problems, such as image classification, object detection or text captioning.
The document discusses machine learning concepts including modeling, evaluation, model selection, training models, and addressing issues like overfitting and underfitting. It explains that modeling tries to emulate human learning through mathematical and statistical formulations. Evaluation methods like holdout, k-fold cross-validation, and leave-one-out cross-validation are used to select models and train them on datasets while avoiding overfitting or underfitting issues. Parametric models have fixed parameters while non-parametric models are based on training data.
This document provides an outline for a course on neural networks and fuzzy systems. The course is divided into two parts, with the first 11 weeks covering neural networks topics like multi-layer feedforward networks, backpropagation, and gradient descent. The document explains that multi-layer networks are needed to solve nonlinear problems by dividing the problem space into smaller linear regions. It also provides notation for multi-layer networks and shows how backpropagation works to calculate weight updates for each layer.
This document describes the syllabus for a course on operating systems. It includes 5 units that will cover topics like process management, CPU scheduling, deadlocks, memory management, file systems, and system calls. The course objectives are to introduce operating system concepts and design issues. Students will learn how to control access to computers and files, recognize user problems, and understand how programming languages, operating systems, and architectures interact.
This document discusses classifying handwritten digits using the MNIST dataset with a simple linear machine learning model. It begins by introducing the MNIST dataset of images and corresponding labels. It then discusses using a linear model with weights and biases to make predictions for each image. The weights represent a filter to distinguish digits. The model is trained using gradient descent to minimize the cross-entropy cost function by adjusting the weights and biases based on batches of training data. The goal is to improve the model's ability to correctly classify handwritten digit images.
Brief introduction to graph based pattern recognition. It shows advantages and disantavantages of using graphs and how existing pattern recognition techniques are adapted to graph space.
The document discusses artificial neural networks and backpropagation. It provides an overview of backpropagation algorithms, including how they were developed over time, the basic methodology of propagating errors backwards, and typical network architectures. It also gives examples of applying backpropagation to problems like robotics, space robots, handwritten digit recognition, and face recognition.
Applying Statistical Modeling and Machine Learning to Perform Time-Series For...PyData
Forecasting time-series data has applications in many fields, including finance, health, etc. There are potential pitfalls when applying classic statistical and machine learning methods to time-series problems. This talk will give folks the basic toolbox to analyze time-series data and perform forecasting using statistical and machine learning models, as well as interpret and convey the outputs.
The document presents an efficient FPGA implementation of convolution that reduces processing time using hardware computing. It implements the discrete linear convolution of two finite length sequences. The existing system uses DSP processors that consume more power and require more chip area with low speed. The proposed system implements convolution using VLSI architecture, consuming less power and requiring less chip area with high speed. It also works for signed and unsigned numbers and reduces processing time.
24 Evaluation Metrics for Binary Classification.
For every metric information about:
- What is the definition and intuition behind it,
- The non-technical explanation that you can communicate to business stakeholders,
- How to calculate or plot it,
- When should you use it.
Recommender Systems from A to Z – Model EvaluationCrossing Minds
The third meetup will be about evaluating different models for our recommender system. We will review the strategies we have to check if a model is under fitting or overfitting. After that, we will present and analyze the losses that are typically used in recommendation systems to train models. We will compare regression, classification, and rank based losses and when it's convenient to use each one. Finally, we are going to cover all the metrics that are typically used to evaluate the performance of different recommendation systems and how to test that the models are giving good results in production.
Data Analysis: Evaluation Metrics for Supervised Learning Models of Machine L...Md. Main Uddin Rony
This document discusses various machine learning evaluation metrics for supervised learning models. It covers classification, regression, and ranking metrics. For classification, it describes accuracy, confusion matrix, log-loss, and AUC. For regression, it discusses RMSE and quantiles of errors. For ranking, it explains precision-recall, precision-recall curves, F1 score, and NDCG. The document provides examples and visualizations to illustrate how these metrics are calculated and used to evaluate model performance.
Learning machine learning with YellowbrickRebecca Bilbro
Yellowbrick is an open source Python library that provides visual diagnostic tools called “Visualizers” that extend the Scikit-Learn API to allow human steering of the model selection process. For teachers and students of machine learning, Yellowbrick can be used as a framework for teaching and understanding a large variety of algorithms and methods.
This document discusses various evaluation measures used in machine learning, including accuracy, precision, recall, F1 score, and AUROC for classification problems. For regression problems, the output is continuous and no additional treatment is needed. Classification accuracy is defined as the number of correct predictions divided by the total predictions. The confusion matrix is used to calculate true positives, false positives, etc. Precision measures correct positive predictions, while recall measures all positive predictions. The F1 score balances precision and recall for imbalanced data. AUROC plots the true positive rate against the false positive rate.
This document discusses various techniques for scaling and measurement in research, including:
1. Primary scales of measurement like nominal, ordinal, interval, and ratio scales.
2. Comparative scaling techniques like paired comparison, rank order, and constant sum scales that involve comparing objects.
3. Noncomparative or monadic scaling techniques like continuous and itemized rating scales that involve rating single objects like Likert, semantic differential, and Stapel scales.
4. Factors that influence measurement accuracy like true score, systematic error, and random error in the true score model.
This document discusses various evaluation metrics for binary and multi-class classification models. It explains that metrics are important for quantifying model performance, tracking progress, and debugging. For binary classifiers, it describes point metrics like accuracy, precision, recall from the confusion matrix. Summary metrics like AUROC and AUPRC are discussed as ways to evaluate models across all thresholds. The tradeoff between precision and recall is illustrated. Class imbalance issues and choosing appropriate metrics are also covered.
Using Python library such as numpy, scipy and pandas to carry out supervised learning operations like Support vector machine, decision tree and K-nearest neighbor.
Logistic regression is a machine learning classification algorithm used to predict the probability of a categorical dependent variable given one or more independent variables. It uses a logit link function to transform the probability values into odds ratios between 0 and infinity. The model is trained by minimizing a cost function called logistic loss using gradient descent optimization. Model performance is evaluated using metrics like accuracy, precision, recall, and the confusion matrix, and can be optimized by adjusting the probability threshold for classifications.
Condition technique is a configuration technique in SAP used to configure complex business rules, such as pricing. It consists of several key components, including a field catalog, condition tables, an access sequence, condition types, pricing procedures, and pricing procedure determination. Condition tables contain business rules and are accessed in the order specified by the access sequence. Condition types represent logical components like taxes or discounts. Pricing procedures combine condition types and are assigned to documents like sales orders. Overall, condition technique provides a rules engine for flexibly configuring diverse and changing business rules through its various components.
Presentation for Information Retrieval / Extraction Project on Yelp Data set. The project utilizes various Information Retrieval and Natural Language Processing concepts to build the models.
This document discusses two tasks related to analyzing Yelp reviews and business data. Task 1 involves building an index of business documents using Lucene and assigning categories to new reviews based on the top-ranked categories of similar documents. Task 2 involves analyzing reviews by city and category to identify positive and negative attributes of businesses and compare them across cities. It describes the tools, algorithms, and evaluation metrics used for each task.
The document discusses machine learning classification using the MNIST dataset of handwritten digits. It begins by defining classification and providing examples. It then describes the MNIST dataset and how it is fetched in scikit-learn. The document outlines the steps of classification which include dividing the data into training and test sets, training a classifier on the training set, testing it on the test set, and evaluating performance. It specifically trains a stochastic gradient descent (SGD) classifier on the MNIST data. The performance is evaluated using cross validation accuracy, confusion matrix, and metrics like precision and recall.
WEKA:Credibility Evaluating Whats Been Learnedweka Content
- Training and test sets are used to measure classification success rates, with the test set being independent of the training set. The error rate on the training set is optimistic. Cross validation techniques like 10-fold stratified cross validation are used when data is limited.
- True success rates are predicted using properties of statistics and normal distributions. Confidence levels determine the range within which the true rate is expected to lie.
- Techniques like paired t-tests are used to statistically compare the performance of different algorithms or data mining methods. They determine if performance differences are statistically significant.
This document discusses various techniques for evaluating machine learning models and comparing their performance, including:
- Measuring error rates on separate test and training sets to avoid overfitting
- Using techniques like cross-validation, bootstrapping, and holdout validation when data is limited
- Comparing algorithms using statistical tests like paired t-tests
- Accounting for costs of different prediction outcomes in evaluation and model training
- Visualizing performance using lift charts and ROC curves to compare models
- The Minimum Description Length principle for selecting the model that best compresses the data
The document outlines a job analysis, performance appraisal, and selection tool developed for the position of retail cashier. It describes how subject matter experts derived essential tasks, knowledge, skills, abilities, and other attributes from the job analysis. Experts then used this information to create a performance appraisal with six dimensions and behaviorally anchored rating scales. Finally, a situational judgement test was developed from the job analysis to assess integrity, customer service, and teamwork for selection. The conclusions found the tools enhanced cashier selection and the situational judgement test predicted performance appraisal ratings.
Data analytics in fraud detection and customer feedbackAnkit Jain
This document discusses how data analytics can be used for fraud detection and analyzing customer feedback in ecommerce. It outlines common types of ecommerce frauds committed by buyers and sellers. It then describes how machine learning can be used to identify fraud buyers based on labeled transaction data and generated features. Customer feedback is also discussed, highlighting metrics like net promoter score and how natural language processing and bag of words models can analyze sentiment and pain points from reviews.
The document discusses issues with current evaluation practices in machine learning and proposes ways to improve them. It notes that evaluation has not been a primary concern, unlike in other fields. Common performance measures like accuracy, precision, and ROC analysis each have shortcomings. Confidence estimation using t-tests can also be problematic if assumptions are not met. The document recommends borrowing evaluation measures from other disciplines, constructing new measures, and considering all evaluation steps carefully.
The document discusses evaluating the value of third-party data signals through modeling. It recommends scoping the assessment by identifying business use cases, then exploring data coverage before building custom models. Key performance metrics like AUC and lift are used to establish a baseline and compare performance with and without third-party signals. Feature importance is analyzed to evaluate signal value. The document also provides methods for estimating monetary value by relating model predictions to revenue outcomes. Dimensional deep dives and targeted modeling help further analyze signal contributions.
Similar to Evaluation metrics for binary classification - the ultimate guide (20)
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!SOFTTECHHUB
As the digital landscape continually evolves, operating systems play a critical role in shaping user experiences and productivity. The launch of Nitrux Linux 3.5.0 marks a significant milestone, offering a robust alternative to traditional systems such as Windows 11. This article delves into the essence of Nitrux Linux 3.5.0, exploring its unique features, advantages, and how it stands as a compelling choice for both casual users and tech enthusiasts.
UiPath Test Automation using UiPath Test Suite series, part 5DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 5. In this session, we will cover CI/CD with devops.
Topics covered:
CI/CD with in UiPath
End-to-end overview of CI/CD pipeline with Azure devops
Speaker:
Lyndsey Byblow, Test Suite Sales Engineer @ UiPath, Inc.
In his public lecture, Christian Timmerer provides insights into the fascinating history of video streaming, starting from its humble beginnings before YouTube to the groundbreaking technologies that now dominate platforms like Netflix and ORF ON. Timmerer also presents provocative contributions of his own that have significantly influenced the industry. He concludes by looking at future challenges and invites the audience to join in a discussion.
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfPaige Cruz
Monitoring and observability aren’t traditionally found in software curriculums and many of us cobble this knowledge together from whatever vendor or ecosystem we were first introduced to and whatever is a part of your current company’s observability stack.
While the dev and ops silo continues to crumble….many organizations still relegate monitoring & observability as the purview of ops, infra and SRE teams. This is a mistake - achieving a highly observable system requires collaboration up and down the stack.
I, a former op, would like to extend an invitation to all application developers to join the observability party will share these foundational concepts to build on:
Securing your Kubernetes cluster_ a step-by-step guide to success !KatiaHIMEUR1
Today, after several years of existence, an extremely active community and an ultra-dynamic ecosystem, Kubernetes has established itself as the de facto standard in container orchestration. Thanks to a wide range of managed services, it has never been so easy to set up a ready-to-use Kubernetes cluster.
However, this ease of use means that the subject of security in Kubernetes is often left for later, or even neglected. This exposes companies to significant risks.
In this talk, I'll show you step-by-step how to secure your Kubernetes cluster for greater peace of mind and reliability.
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...James Anderson
Effective Application Security in Software Delivery lifecycle using Deployment Firewall and DBOM
The modern software delivery process (or the CI/CD process) includes many tools, distributed teams, open-source code, and cloud platforms. Constant focus on speed to release software to market, along with the traditional slow and manual security checks has caused gaps in continuous security as an important piece in the software supply chain. Today organizations feel more susceptible to external and internal cyber threats due to the vast attack surface in their applications supply chain and the lack of end-to-end governance and risk management.
The software team must secure its software delivery process to avoid vulnerability and security breaches. This needs to be achieved with existing tool chains and without extensive rework of the delivery processes. This talk will present strategies and techniques for providing visibility into the true risk of the existing vulnerabilities, preventing the introduction of security issues in the software, resolving vulnerabilities in production environments quickly, and capturing the deployment bill of materials (DBOM).
Speakers:
Bob Boule
Robert Boule is a technology enthusiast with PASSION for technology and making things work along with a knack for helping others understand how things work. He comes with around 20 years of solution engineering experience in application security, software continuous delivery, and SaaS platforms. He is known for his dynamic presentations in CI/CD and application security integrated in software delivery lifecycle.
Gopinath Rebala
Gopinath Rebala is the CTO of OpsMx, where he has overall responsibility for the machine learning and data processing architectures for Secure Software Delivery. Gopi also has a strong connection with our customers, leading design and architecture for strategic implementations. Gopi is a frequent speaker and well-known leader in continuous delivery and integrating security into software delivery.
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex ProofsAlex Pruden
This paper presents Reef, a system for generating publicly verifiable succinct non-interactive zero-knowledge proofs that a committed document matches or does not match a regular expression. We describe applications such as proving the strength of passwords, the provenance of email despite redactions, the validity of oblivious DNS queries, and the existence of mutations in DNA. Reef supports the Perl Compatible Regular Expression syntax, including wildcards, alternation, ranges, capture groups, Kleene star, negations, and lookarounds. Reef introduces a new type of automata, Skipping Alternating Finite Automata (SAFA), that skips irrelevant parts of a document when producing proofs without undermining soundness, and instantiates SAFA with a lookup argument. Our experimental evaluation confirms that Reef can generate proofs for documents with 32M characters; the proofs are small and cheap to verify (under a second).
Paper: https://eprint.iacr.org/2023/1886
A tale of scale & speed: How the US Navy is enabling software delivery from l...sonjaschweigert1
Rapid and secure feature delivery is a goal across every application team and every branch of the DoD. The Navy’s DevSecOps platform, Party Barge, has achieved:
- Reduction in onboarding time from 5 weeks to 1 day
- Improved developer experience and productivity through actionable findings and reduction of false positives
- Maintenance of superior security standards and inherent policy enforcement with Authorization to Operate (ATO)
Development teams can ship efficiently and ensure applications are cyber ready for Navy Authorizing Officials (AOs). In this webinar, Sigma Defense and Anchore will give attendees a look behind the scenes and demo secure pipeline automation and security artifacts that speed up application ATO and time to production.
We will cover:
- How to remove silos in DevSecOps
- How to build efficient development pipeline roles and component templates
- How to deliver security artifacts that matter for ATO’s (SBOMs, vulnerability reports, and policy evidence)
- How to streamline operations with automated policy checks on container images
Full-RAG: A modern architecture for hyper-personalizationZilliz
Mike Del Balso, CEO & Co-Founder at Tecton, presents "Full RAG," a novel approach to AI recommendation systems, aiming to push beyond the limitations of traditional models through a deep integration of contextual insights and real-time data, leveraging the Retrieval-Augmented Generation architecture. This talk will outline Full RAG's potential to significantly enhance personalization, address engineering challenges such as data management and model training, and introduce data enrichment with reranking as a key solution. Attendees will gain crucial insights into the importance of hyperpersonalization in AI, the capabilities of Full RAG for advanced personalization, and strategies for managing complex data integrations for deploying cutting-edge AI solutions.
Unlocking Productivity: Leveraging the Potential of Copilot in Microsoft 365, a presentation by Christoforos Vlachos, Senior Solutions Manager – Modern Workplace, Uni Systems
“An Outlook of the Ongoing and Future Relationship between Blockchain Technologies and Process-aware Information Systems.” Invited talk at the joint workshop on Blockchain for Information Systems (BC4IS) and Blockchain for Trusted Data Sharing (B4TDS), co-located with with the 36th International Conference on Advanced Information Systems Engineering (CAiSE), 3 June 2024, Limassol, Cyprus.
Removing Uninteresting Bytes in Software FuzzingAftab Hussain
Imagine a world where software fuzzing, the process of mutating bytes in test seeds to uncover hidden and erroneous program behaviors, becomes faster and more effective. A lot depends on the initial seeds, which can significantly dictate the trajectory of a fuzzing campaign, particularly in terms of how long it takes to uncover interesting behaviour in your code. We introduce DIAR, a technique designed to speedup fuzzing campaigns by pinpointing and eliminating those uninteresting bytes in the seeds. Picture this: instead of wasting valuable resources on meaningless mutations in large, bloated seeds, DIAR removes the unnecessary bytes, streamlining the entire process.
In this work, we equipped AFL, a popular fuzzer, with DIAR and examined two critical Linux libraries -- Libxml's xmllint, a tool for parsing xml documents, and Binutil's readelf, an essential debugging and security analysis command-line tool used to display detailed information about ELF (Executable and Linkable Format). Our preliminary results show that AFL+DIAR does not only discover new paths more quickly but also achieves higher coverage overall. This work thus showcases how starting with lean and optimized seeds can lead to faster, more comprehensive fuzzing campaigns -- and DIAR helps you find such seeds.
- These are slides of the talk given at IEEE International Conference on Software Testing Verification and Validation Workshop, ICSTW 2022.
Essentials of Automations: The Art of Triggers and Actions in FMESafe Software
In this second installment of our Essentials of Automations webinar series, we’ll explore the landscape of triggers and actions, guiding you through the nuances of authoring and adapting workspaces for seamless automations. Gain an understanding of the full spectrum of triggers and actions available in FME, empowering you to enhance your workspaces for efficient automation.
We’ll kick things off by showcasing the most commonly used event-based triggers, introducing you to various automation workflows like manual triggers, schedules, directory watchers, and more. Plus, see how these elements play out in real scenarios.
Whether you’re tweaking your current setup or building from the ground up, this session will arm you with the tools and insights needed to transform your FME usage into a powerhouse of productivity. Join us to discover effective strategies that simplify complex processes, enhancing your productivity and transforming your data management practices with FME. Let’s turn complexity into clarity and make your workspaces work wonders!
4. Intro
Evaluation Metric:
● is a model performance indicator/proxy
● (strongly) depends on the problem
● rarely maps 1-1 to your business problem
● is not a guarantee of performance on other metrics
10. False Positive Rate | Type I error
Class-based metrics
What is it?
● When we predict something that isn’t
● Fraction of false alerts
True
Negative
False
Negative
True
Positive
False
Positive
Actual
Predicted
0 1
1
0
11. False Positive Rate | Type I error
Class-based metrics
from sklearn.metrics import confusion_matrix
y_pred_class = y_pred_pos > threshold
tn, fp, fn, tp = confusion_matrix(y_true,
y_pred_class).ravel()
false_positive_rate = fp / (fp + tn)
How to calculate it?
12. False Positive Rate | Type I error
Class-based metrics
How to choose a threshold?
13. False Positive Rate | Type I error
Class-based metrics
When to use it?
● rarely used alone but can be auxiliary metric
● if the cost of dealing with an alert is high
14. False Negative Rate | Type II error
Class-based metrics
What is it?
● When we don’t predict something
when it is
● fraction of missed fraudulent
transactions True
Negative
False
Negative
True
Positive
False
Positive
Actual
Predicted
0 1
1
0
15. from sklearn.metrics import confusion_matrix
y_pred_class = y_pred_pos > threshold
tn, fp, fn, tp = confusion_matrix(y_true, y_pred_class).ravel()
false_negative_rate = fn / (tp + fn)
False Negative Rate | Type II error
Class-based metrics
How to calculate it?
16. False Negative Rate | Type II error
Class-based metrics
How to choose a threshold?
17. False Negative Rate | Type II error
Class-based metrics
When to use it?
● rarely used alone but can be auxiliary metric
● if the cost of dealing with an alert is high
18. True Negative Rate | Specificity
Class-based metrics
What is it?
● how good we are at predicting
negative class
● same axis as False Positive Rate
● How many non-fraudulent transactions
marked as clean
True
Negative
False
Negative
True
Positive
False
Positive
Actual
Predicted
0 1
1
0
20. True Negative Rate | Specificity
Class-based metrics
How to choose a threshold?
21. True Negative Rate | Specificity
Class-based metrics
When to use it?
● rarely used alone but can be auxiliary metric
● When you want to feel good when you say
“you are healthy” or “this transaction is clean”
22. True Positive Rate | Recall | Sensitivity
Class-based metrics
What is it?
● how good we are at finding positive
class members
● put all guilty in prison
● same axis as False Negative Rate True
Negative
False
Negative
True
Positive
False
Positive
Actual
Predicted
0 1
1
0
24. True Positive Rate | Recall | Sensitivity
Class-based metrics
How to choose a threshold?
25. True Positive Rate | Recall | Sensitivity
Class-based metrics
When to use it?
● Rarely used alone but can be auxiliary metric
● You want to catch all fraudulent transactions
● False alerts are cheap to process
26. Positive Predictive Value | Precision
Class-based metrics
What is it?
● how accurate are we when we say
positive class
● Only guilty people should be in prison
True
Negative
False
Negative
True
Positive
False
Positive
Actual
Predicted
0 1
1
0
27. Positive Predictive Value | Precision
Class-based metrics
How to calculate it?
from sklearn.metrics import confusion_matrix, precision_score
y_pred_class = y_pred_pos > threshold
tn, fp, fn, tp = confusion_matrix(y_true, y_pred_class).ravel()
positive_predictive_value = tp/ (tp + fp)
# or simply
precision_score(y_true, y_pred_class)
29. Class-based metrics
When to use it?
● Rarely used alone but can be auxiliary metric
● False alerts are expensive to process
● You want to catch only fraudulent transactions
Positive Predictive Value | Precision
30. Accuracy
Class-based metrics
What is it?
● Fraction of correctly classified
observations (positive and negative)
True
Negative
False
Negative
True
Positive
False
Positive
Actual
Predicted
0 1
1
0
33. Accuracy
Class-based metrics
When to use it?
● When your problem is balanced
● When every class is equally important to you
● When you need something easy-to-explain to stakeholders
34. F score
Class-based metrics
What is it?
● Combines Precision and Recall into one
score
● Weighted harmonic mean
● Doesn’t care about True Negatives
● F1 score (beta=1)-> harmonic mean
● F2 score (beta=2)-> 2x emphasis on recall
True
Negative
False
Negative
True
Positive
False
Positive
Actual
Predicted
0 1
1
0
35. F score
Class-based metrics
How to calculate it?
from sklearn.metrics import fbeta_score
y_pred_class = y_pred_pos > threshold
fbeta_score(y_true, y_pred_class, beta)
39. F score
Class-based metrics
When to use it?
F1 score
● my go-to metric for binary classification
● easy-to-explain to stakeholders
F2 score
● When you need to adjust precision recall tradeoff
● when finding positive fraudulent transactions is more important than being correct about it
40. Cohen Kappa
Class-based metrics
What is it?
● how much than a random classifier
your model is
● Observed agreement p0 : accuracy
● Expected agreement pe : accuracy of
the random classifier
● Random classifier: samples randomly
according to class frequencies
True
Negative
False
Negative
True
Positive
False
Positive
Actual
Predicted
0 1
1
0
41. Cohen Kappa
Class-based metrics
How to calculate it?
from sklearn.metrics import
cohen_kappa_score
y_pred_class = y_pred_pos > threshold
cohen_kappa_score(y_true, y_pred_class)
43. Cohen Kappa
Class-based metrics
When to use it?
● Unpopular metric for classification
● Works well for unbalanced problems
● Good substitute for accuracy when you need a
metric that is easy to explain
44. Matthews Correlation Coefficient
Class-based metrics
What is it?
● Correlation between predicted classes
and the ground truth labels
True
Negative
False
Negative
True
Positive
False
Positive
Actual
Predicted
0 1
1
0
45. Matthews Correlation Coefficient
Class-based metrics
How to calculate it?
from sklearn.metrics import matthews_corrcoef
y_pred_class = y_pred_pos > threshold
matthews_corrcoef(y_true, y_pred_class)
47. Matthews Correlation Coefficient
Class-based metrics
When to use it?
● Unpopular metric for classification
● Works well for unbalanced problems
● Good substitute for accuracy when you need
a metric that is easy to explain
48. Dollar-focused metrics
Class-based metrics
● Get the cost of False Negative (letting fraudulent transaction)
● Get the cost of False Positive (blocking of clean transaction)
● Find optimal threshold in dollars $ -> optimize business
problem directly
● Blog post from Airbnb (link)
49. Fairness metrics
Class-based metrics
● Divide your dataset into groups based on protected feature (race, sex, etc) ->
get privileged and unprivileged groups
● Calculate True Positive Rate for both groups
● Calculate the difference in TPR -> get Equality of Opportunity Metric
● Blog post on fairness (link)
51. ROC curve
Score-based metrics
What is it?
● visualizes the tradeoff between true
positive rate (TPR) and false positive
rate (FPR).
● for every threshold, we calculate TPR
and FPR and plot it on one chart.
52. ROC curve
Score-based metrics
How to plot it?
from scikitplot.metrics import plot_roc
fig, ax = plt.subplots()
plot_roc(y_true, y_pred, ax=ax)
53. ROC curve
Score-based metrics
When to use it?
● You want to see model performance over all thresholds
● Want to visually compare multiple models
● Care equally about both positive and negative class
54. ROC AUC score
Score-based metrics
What is it?
● One number that summarizes ROC curve
● Area under the ROC curve (integral)
● Alternatively, rank correlation between
predictions and targets (link)
● Is looking at the entire confusion matrix
True
Negative
False
Negative
True
Positive
False
Positive
Actual
Predicted
0 1
1
0
55. ROC AUC score
Score-based metrics
How to calculate it?
from scikitplot.metrics import plot_roc
fig, ax = plt.subplots()
plot_roc(y_true, y_pred, ax=ax)
56. ROC AUC score
Score-based metrics
When to use it?
● You care about ranking predictions (not about getting calibrated probabilities)
● Do not use when data heavily imbalanced and you care only about the positive class (link)
● When you care equally about the positive and negative classes
57. Precision Recall curve
Score-based metrics
What is it?
● visualizes the tradeoff between
precision and recall
● for every threshold, we calculate
precision and recall and plot it on one
chart
58. Precision Recall curve
Score-based metrics
How to plot it?
from scikitplot.metrics import plot_precision_recall
fig, ax = plt.subplots()
plot_precision_recall(y_true, y_pred, ax=ax)
59. Precision Recall curve
Score-based metrics
When to use it?
● You want to see model performance over all thresholds
● Want to visually compare multiple models
● Want to find a good threshold for class assignment
● Care more about the positive class
60. PR AUC score | Average precision
Score-based metrics
What is it?
● One number that summarizes Precision
Recall curve
● Area under the Precision Recall curve
(integral)
● Doesn’t look at True Negatives!
True
Negative
False
Negative
True
Positive
False
Positive
Actual
Predicted
0 1
1
0
61. PR AUC score | Average precision
Score-based metrics
How to calculate it?
from sklearn.metrics import average_precision_score
average_precision_score(y_true, y_pred_pos)
62. PR AUC score | Average precision
Score-based metrics
When to use it?
● Data is heavily imbalanced and you care only about the positive class (link)
● You care more about the positive than negative class
● you want to choose the threshold that fits the business problem
● you want to communicate precision/recall decision
65. Brier score
Score-based metrics
When to use it?
● When you care about calibrated probabilities
● Why care about calibration?
y_true = [0, 1, 1, 0, 1, 1, 1, 0]
y_pred_v1 = [0.28, 0.35, 0.32, 0.29, 0.34, 0.38, 0.37, 0.31]
y_pred_v2 = [0.18, 0.95, 0.92, 0.19, 0.94, 0.98, 0.97, 0.21]
1.000, 1.000
0.295, 0.0158
roc_auc_score(y_true, y_pred_v1), roc_auc_score(y_true, y_pred_v2)
brier_score_loss(y_true, y_pred_v1), brier_score_loss(y_true, y_pred_v2)
66. Cumulative gain chart
Score-based metrics
What is it?
● Shows how much your model gains
over the random model
● Calculate it by:
○ Order predictions
○ Calculate fraction of True
Positives for your model and
random model
○ Plot them on one chart
67. Cumulative gain chart
Score-based metrics
How to plot it?
from scikitplot.metrics import plot_cumulative_gain
fig, ax = plt.subplots()
plot_cumulative_gain(y_true, y_pred, ax=ax)
68. Cumulative gain chart
Score-based metrics
When to use it?
● You want to select the most promising customers transactions from a dataset
● You are looking for a good cutoff point
● Can be a good addition to ROC AUC score
69. Lift chart
Score-based metrics
What is it?
● Shows how much your model gains over
the random model
● Calculate it by:
○ Order predictions
○ Calculate fraction of True Positives
for your model and random model
○ Calculate the ratio of your model
gains over random model
○ Plot them on one chart
70. Lift chart
Score-based metrics
How to plot it?
from scikitplot.metrics import plot_cumulative_gain
fig, ax = plt.subplots()
plot_cumulative_gain(y_true, y_pred, ax=ax)
71. Lift chart
Score-based metrics
When to use it?
● You want to select the most promising customers transactions from a dataset
● You are looking for a good cutoff point
● Can be a good addition to ROC AUC score
73. Precision vs Recall
Common This vs That
● Looks at True Positives and False
Positives
● Only guilty should be in jail
● Higher threshold
○ less predicted positives
○ higher precision
○ lower recall
Precision
● Looks at True Positives and False
Negatives
● Put all guilty in jail
● Lower threshold
○ more predicted positives
○ higher recall
○ lower precision
Recall
74. F1 score vs Accuracy
Common This vs That
● Doesn’t look at True Negatives
● You care more about the positive
class
● Balances Precision and Recall
F1 score
● Looks at all elements from the
confusion matrix
● You care equally about positive
and negative class
● Very bad for imbalanced problems
Accuracy
75. ROC AUC vs PR AUC
Common This vs That
● Looks at all elements from the
confusion matrix (for all
thresholds)
● You care equally about positive
and negative class
● Data is not heavily imbalanced
ROC AUC
● Doesn’t look at True Negatives (for
all thresholds)
● You care more about positive
class
● Data is heavily imbalanced
PR AUC
76. F1 score vs ROC AUC
Common This vs That
● Class based metric -> you need to
choose a threshold
● You care more about positive than
negative class
● Doesn’t look at True Negatives
● Easier to communicate
F1 score
● Score based metric
● Good for ranking predictions
● You care both about positive and
negative class
● You data is not heavily imbalanced
ROC AUC
78. Materials
● Slides available on twitter (@NeptuneML) linkedin/slideshare (link)
● Github repository github.com/neptune-ml (link)
● Blog post “24 Evaluation Metrics for Binary Classification (And
When to Use Them)”
● Blog post “F1 Score vs ROC AUC vs Accuracy vs PR AUC: Which
Evaluation Metric Should You Choose?”
Extras