This document summarizes a presentation on model evaluation given at the 4th annual Valencian Summer School in Machine Learning. It discusses the importance of evaluating models to understand how well they will perform on new data and identify mistakes. Various evaluation metrics are introduced like accuracy, precision, recall, F1 score, and Phi coefficient. The dangers of evaluating on training data are explained, and techniques like train-test splits and cross-validation are recommended to get less optimistic evaluations. Regression metrics like MAE, MSE, and R-squared error are also covered. Different evaluation techniques for specific problem types like imbalanced classification, time series forecasting, and model selection are discussed.
The document provides an overview of machine learning concepts including supervised and unsupervised learning algorithms. It discusses splitting data into training and test sets, training algorithms on the training set, testing algorithms on the test set, and measuring performance. For supervised learning, it describes classification and regression tasks, the bias-variance tradeoff, and how supervised algorithms learn by minimizing a loss function. For unsupervised learning, it discusses clustering, representation learning, dimensionality reduction, and exploratory analysis use cases.
State of the Art in Machine Learning, by Thomas Dietterich, Distinguished Professor Emeritus in the School of EECS at Oregon State University and Chief Scientist of BigML.
*MLSEV 2020: Virtual Conference.
Searching for Anomalies, by Thomas Dietterich, Distinguished Professor Emeritus in the School of EECS at Oregon State University and Chief Scientist of BigML.
*MLSEV 2020: Virtual Conference.
1. The document discusses four case studies of mistakes the author made in data science projects. The first case involved incorrectly predicting mail return rates without considering sample sizes. The second was backtesting a trading strategy without accounting for data leakage. The third was developing statistical software without proper testing. The fourth was incorrectly calculating an A/B test statistic without considering sample size.
2. In each case, the author explains what went wrong and the lessons learned, such as considering sample sizes, understanding where data comes from, testing software appropriately, and not compounding uncertainties when calculating statistics. The author also discusses potential pitfalls in machine learning, like incorrectly sparsifying models or using PCA before regression.
This talk addresses product managers and discusses basics of statistics and analytics and ways to use them effectively in their products.
Video: https://youtu.be/Rsrp040DYKg (orientation is fixed after a few minutes)
April 22, 2017 - Product Folks! Meetup Amman, Jordan
The document provides an overview of machine learning concepts including supervised and unsupervised learning algorithms. It discusses splitting data into training and test sets, training algorithms on the training set, testing algorithms on the test set, and measuring performance. For supervised learning, it describes classification and regression tasks, the bias-variance tradeoff, and how supervised algorithms learn by minimizing a loss function. For unsupervised learning, it discusses clustering, representation learning, dimensionality reduction, and exploratory analysis use cases.
State of the Art in Machine Learning, by Thomas Dietterich, Distinguished Professor Emeritus in the School of EECS at Oregon State University and Chief Scientist of BigML.
*MLSEV 2020: Virtual Conference.
Searching for Anomalies, by Thomas Dietterich, Distinguished Professor Emeritus in the School of EECS at Oregon State University and Chief Scientist of BigML.
*MLSEV 2020: Virtual Conference.
1. The document discusses four case studies of mistakes the author made in data science projects. The first case involved incorrectly predicting mail return rates without considering sample sizes. The second was backtesting a trading strategy without accounting for data leakage. The third was developing statistical software without proper testing. The fourth was incorrectly calculating an A/B test statistic without considering sample size.
2. In each case, the author explains what went wrong and the lessons learned, such as considering sample sizes, understanding where data comes from, testing software appropriately, and not compounding uncertainties when calculating statistics. The author also discusses potential pitfalls in machine learning, like incorrectly sparsifying models or using PCA before regression.
This talk addresses product managers and discusses basics of statistics and analytics and ways to use them effectively in their products.
Video: https://youtu.be/Rsrp040DYKg (orientation is fixed after a few minutes)
April 22, 2017 - Product Folks! Meetup Amman, Jordan
This document provides an overview of model evaluation metrics for supervised machine learning models. It discusses the importance of evaluating models to assess their performance and usefulness. Common evaluation metrics are introduced, including accuracy, precision, recall, F1 score, confusion matrices and more. The document cautions that models should not be evaluated on the training data, as this can provide overly optimistic results, and advocates for train-test splits or cross validation. Real-world examples and demonstrations are provided.
An introduction to machine learning and statisticsSpotle.ai
This document provides an overview of machine learning and predictive modeling. It begins by describing how predictive models can be used in various domains like healthcare, finance, telecom, and business. It then discusses the differences between machine learning and predictive modeling, noting that machine learning aims to allow machines to learn autonomously using feedback mechanisms, while predictive modeling focuses on building statistical models to predict outcomes. The document also uses examples like Microsoft's Tay chatbot to illustrate how machine learning systems can be exposed to real-world data to continuously learn and improve. It concludes by explaining how predictive analytics fits within machine learning as the starting point to build initial predictive models and continuously monitor and refine them.
Lecture 10 - Model Testing and Evaluation, a lecture in subject module Statis...Maninda Edirisooriya
Model Testing and Evaluation is a lesson where you learn how to train different ML models with changes and evaluating them to select the best model out of them. This was one of the lectures of a full course I taught in University of Moratuwa, Sri Lanka on 2023 second half of the year.
This document discusses various machine learning model validation techniques and ensemble methods such as bagging and boosting. It defines key concepts like overfitting, underfitting, bias-variance tradeoff, and different validation metrics. Cross validation techniques like k-fold and bootstrap are explained as ways to estimate model performance on unseen data. Bagging creates multiple models on resampled data and averages their predictions to reduce variance. Boosting iteratively adjusts weights of misclassified observations to build strong models, but risks overfitting. Gradient boosting and XGBoost are powerful ensemble methods.
Data-Driven off a Cliff: Anti-Patterns in Evidence-Based Decision Makingindeedeng
The document discusses common anti-patterns in evidence-based decision making, including being impatient, taking shortcuts in sampling and analysis, focusing on a single metric, and believing too strongly in one's own conclusions. It provides examples of companies making misguided decisions due to these anti-patterns, such as ending A/B tests early, ignoring parts of a sample, overemphasizing short-term metrics, and overrelying on persuasive but incorrect stories. The document advocates being patient, rigorous in sampling and analysis, considering multiple relevant metrics, and acknowledging the potential for fallibility.
MLSEV Virtual. Automating Model SelectionBigML, Inc
1) Bayesian parameter optimization uses machine learning to predict the performance of untrained models based on parameters from previous models to efficiently search the parameter space.
2) However, there are still important issues like choosing the right evaluation metric, ensuring no information leakage between training and test data, and selecting the appropriate model for the problem and available data.
3) Automated model selection requires sufficient data to make accurate predictions; with insufficient data, the process can fail.
DutchMLSchool. Models, Evaluations, and EnsemblesBigML, Inc
DutchMLSchool. Introduction to Machine Learning, Models, Evaluations, and Ensembles (Supervised Learning I) - Main Conference: Introduction to Machine Learning.
DutchMLSchool: 1st edition of the Machine Learning Summer School in The Netherlands.
Since it was introduced in 2014, Stats Engine has served as a fast, powerful, and easy-to-use foundation for tens of thousands of digital experiments. But how exactly does it work?
In this session, we will explain the key differences and advantages of Stats Engine by comparing and contrasting it with a familiar old friend: the t-test.
This document discusses best practices for evaluating supervised machine learning models, including:
1) The importance of splitting data into training and testing sets to avoid "memorizing" the data and get an accurate performance measure.
2) Common dataset splitting methods like linear and random splits.
3) The importance of metrics like accuracy, and how they can be misleading, especially for imbalanced datasets.
4) How different domains may value reducing different types of mistakes, like preferring fewer false negatives for medical diagnosis.
The document describes the 8 step data mining process:
1) Defining the problem, 2) Collecting data, 3) Preparing data, 4) Pre-processing, 5) Selecting an algorithm and parameters, 6) Training and testing, 7) Iterating models, 8) Evaluating the final model. It discusses issues like defining classification vs estimation problems, selecting appropriate inputs and outputs, and determining when sufficient data has been collected for modeling.
Statistical Learning and Model Selection (1).pptxrajalakshmi5921
This document discusses statistical learning and model selection. It introduces statistical learning problems, statistical models, the need for statistical modeling, and issues around evaluating models. Key points include: statistical learning involves using data to build a predictive model; a good model balances bias and variance to minimize prediction error; cross-validation is described as the ideal procedure for evaluating models without overfitting to the test data.
Ways to evaluate a machine learning model’s performanceMala Deep Upadhaya
Some of the ways to evaluate a machine learning model’s performance.
In Summary:
Confusion matrix: Representation of the True positives (TP), False positives (FP), True negatives (TN), False negatives (FN)in a matrix format.
Accuracy: Worse happens when classes are imbalanced.
Precision: Find the answer of How much the model is right when it says it is right!
Recall: Find the answer of How many extra right ones, the model missed when it showed the right ones!
Specificity: Like Recall but the shift is on the negative instances.
F1 score: Is the harmonic mean of precision and recall so the higher the F1 score, the better.
Precision-Recall or PR curve: Curve between precision and recall for various threshold values.
ROC curve: Graph is plotted against TPR and FPR for various threshold values.
Module 4: Model Selection and EvaluationSara Hooker
Delta Analytics is a 501(c)3 non-profit in the Bay Area. We believe that data is powerful, and that anybody should be able to harness it for change. Our teaching fellows partner with schools and organizations worldwide to work with students excited about the power of data to do good.
Welcome to the course! These modules will teach you the fundamental building blocks and the theory necessary to be a responsible machine learning practitioner in your own community. Each module focuses on accessible examples designed to teach you about good practices and the powerful (yet surprisingly simple) algorithms we use to model data.
To learn more about our mission or provide feedback, take a look at www.deltanalytics.org.
R - what do the numbers mean? #RStats This is the presentation for my Demo at Orlando Live60 AILIve. We go through statistics interpretation with examples
This document provides an overview of machine learning fundamentals and supervised learning with scikit-learn. It defines machine learning and discusses when it is appropriate to use compared to traditional programming. It also describes the different types of learning problems including supervised, unsupervised, semi-supervised and reinforcement learning. For supervised learning, it covers classification and regression problems as well as common applications. It then outlines the typical machine learning pipeline including data collection, preparation, model training, evaluation and addresses issues like overfitting and underfitting.
This document discusses several common problems with data handling and quality including building and testing models with the same data, confusion between biological and technical replicates, and identification and handling of outliers. It provides examples and explanations of key concepts such as experimental and sampling units, pseudo-replication, outliers versus high influence points, and leverage plots. The importance of proper data handling techniques like dividing data into training, test, and confirmation sets and using cross-validation is emphasized to avoid overfitting models and generating spurious findings.
This document discusses various evaluation measures used in machine learning, including accuracy, precision, recall, F1 score, and AUROC for classification problems. For regression problems, the output is continuous and no additional treatment is needed. Classification accuracy is defined as the number of correct predictions divided by the total predictions. The confusion matrix is used to calculate true positives, false positives, etc. Precision measures correct positive predictions, while recall measures all positive predictions. The F1 score balances precision and recall for imbalanced data. AUROC plots the true positive rate against the false positive rate.
This document discusses various performance metrics used to evaluate machine learning models, with a focus on classification metrics. It defines key metrics like accuracy, precision, recall, and specificity using a cancer detection example. Accuracy is only useful when classes are balanced, while precision captures true positives and recall focuses on minimizing false negatives. The document emphasizes that the appropriate metric depends on the problem and whether minimizing false positives or false negatives is more important. Confusion matrices are also introduced as a way to visualize model performance.
Digital Transformation and Process Optimization in ManufacturingBigML, Inc
Keyanoush Razavidinani, Digital Services Consultant at A1 Digital, a BigML Partner, highlights why it is important to identify and reduce human bottlenecks that optimize processes and let you focus on important activities. Additionally, Guillem Vidal, Machine Learning Engineer at BigML completes the session by showcasing how Machine Learning is put to use in the manufacturing industry with a use case to detect factory failures.
The Road to Production: Automating your Anomaly Detectors - by jao (Jose A. Ortega), Co-Founder and Chief Technology Officer at BigML.
*Machine Learning School in The Netherlands 2022.
This document provides an overview of model evaluation metrics for supervised machine learning models. It discusses the importance of evaluating models to assess their performance and usefulness. Common evaluation metrics are introduced, including accuracy, precision, recall, F1 score, confusion matrices and more. The document cautions that models should not be evaluated on the training data, as this can provide overly optimistic results, and advocates for train-test splits or cross validation. Real-world examples and demonstrations are provided.
An introduction to machine learning and statisticsSpotle.ai
This document provides an overview of machine learning and predictive modeling. It begins by describing how predictive models can be used in various domains like healthcare, finance, telecom, and business. It then discusses the differences between machine learning and predictive modeling, noting that machine learning aims to allow machines to learn autonomously using feedback mechanisms, while predictive modeling focuses on building statistical models to predict outcomes. The document also uses examples like Microsoft's Tay chatbot to illustrate how machine learning systems can be exposed to real-world data to continuously learn and improve. It concludes by explaining how predictive analytics fits within machine learning as the starting point to build initial predictive models and continuously monitor and refine them.
Lecture 10 - Model Testing and Evaluation, a lecture in subject module Statis...Maninda Edirisooriya
Model Testing and Evaluation is a lesson where you learn how to train different ML models with changes and evaluating them to select the best model out of them. This was one of the lectures of a full course I taught in University of Moratuwa, Sri Lanka on 2023 second half of the year.
This document discusses various machine learning model validation techniques and ensemble methods such as bagging and boosting. It defines key concepts like overfitting, underfitting, bias-variance tradeoff, and different validation metrics. Cross validation techniques like k-fold and bootstrap are explained as ways to estimate model performance on unseen data. Bagging creates multiple models on resampled data and averages their predictions to reduce variance. Boosting iteratively adjusts weights of misclassified observations to build strong models, but risks overfitting. Gradient boosting and XGBoost are powerful ensemble methods.
Data-Driven off a Cliff: Anti-Patterns in Evidence-Based Decision Makingindeedeng
The document discusses common anti-patterns in evidence-based decision making, including being impatient, taking shortcuts in sampling and analysis, focusing on a single metric, and believing too strongly in one's own conclusions. It provides examples of companies making misguided decisions due to these anti-patterns, such as ending A/B tests early, ignoring parts of a sample, overemphasizing short-term metrics, and overrelying on persuasive but incorrect stories. The document advocates being patient, rigorous in sampling and analysis, considering multiple relevant metrics, and acknowledging the potential for fallibility.
MLSEV Virtual. Automating Model SelectionBigML, Inc
1) Bayesian parameter optimization uses machine learning to predict the performance of untrained models based on parameters from previous models to efficiently search the parameter space.
2) However, there are still important issues like choosing the right evaluation metric, ensuring no information leakage between training and test data, and selecting the appropriate model for the problem and available data.
3) Automated model selection requires sufficient data to make accurate predictions; with insufficient data, the process can fail.
DutchMLSchool. Models, Evaluations, and EnsemblesBigML, Inc
DutchMLSchool. Introduction to Machine Learning, Models, Evaluations, and Ensembles (Supervised Learning I) - Main Conference: Introduction to Machine Learning.
DutchMLSchool: 1st edition of the Machine Learning Summer School in The Netherlands.
Since it was introduced in 2014, Stats Engine has served as a fast, powerful, and easy-to-use foundation for tens of thousands of digital experiments. But how exactly does it work?
In this session, we will explain the key differences and advantages of Stats Engine by comparing and contrasting it with a familiar old friend: the t-test.
This document discusses best practices for evaluating supervised machine learning models, including:
1) The importance of splitting data into training and testing sets to avoid "memorizing" the data and get an accurate performance measure.
2) Common dataset splitting methods like linear and random splits.
3) The importance of metrics like accuracy, and how they can be misleading, especially for imbalanced datasets.
4) How different domains may value reducing different types of mistakes, like preferring fewer false negatives for medical diagnosis.
The document describes the 8 step data mining process:
1) Defining the problem, 2) Collecting data, 3) Preparing data, 4) Pre-processing, 5) Selecting an algorithm and parameters, 6) Training and testing, 7) Iterating models, 8) Evaluating the final model. It discusses issues like defining classification vs estimation problems, selecting appropriate inputs and outputs, and determining when sufficient data has been collected for modeling.
Statistical Learning and Model Selection (1).pptxrajalakshmi5921
This document discusses statistical learning and model selection. It introduces statistical learning problems, statistical models, the need for statistical modeling, and issues around evaluating models. Key points include: statistical learning involves using data to build a predictive model; a good model balances bias and variance to minimize prediction error; cross-validation is described as the ideal procedure for evaluating models without overfitting to the test data.
Ways to evaluate a machine learning model’s performanceMala Deep Upadhaya
Some of the ways to evaluate a machine learning model’s performance.
In Summary:
Confusion matrix: Representation of the True positives (TP), False positives (FP), True negatives (TN), False negatives (FN)in a matrix format.
Accuracy: Worse happens when classes are imbalanced.
Precision: Find the answer of How much the model is right when it says it is right!
Recall: Find the answer of How many extra right ones, the model missed when it showed the right ones!
Specificity: Like Recall but the shift is on the negative instances.
F1 score: Is the harmonic mean of precision and recall so the higher the F1 score, the better.
Precision-Recall or PR curve: Curve between precision and recall for various threshold values.
ROC curve: Graph is plotted against TPR and FPR for various threshold values.
Module 4: Model Selection and EvaluationSara Hooker
Delta Analytics is a 501(c)3 non-profit in the Bay Area. We believe that data is powerful, and that anybody should be able to harness it for change. Our teaching fellows partner with schools and organizations worldwide to work with students excited about the power of data to do good.
Welcome to the course! These modules will teach you the fundamental building blocks and the theory necessary to be a responsible machine learning practitioner in your own community. Each module focuses on accessible examples designed to teach you about good practices and the powerful (yet surprisingly simple) algorithms we use to model data.
To learn more about our mission or provide feedback, take a look at www.deltanalytics.org.
R - what do the numbers mean? #RStats This is the presentation for my Demo at Orlando Live60 AILIve. We go through statistics interpretation with examples
This document provides an overview of machine learning fundamentals and supervised learning with scikit-learn. It defines machine learning and discusses when it is appropriate to use compared to traditional programming. It also describes the different types of learning problems including supervised, unsupervised, semi-supervised and reinforcement learning. For supervised learning, it covers classification and regression problems as well as common applications. It then outlines the typical machine learning pipeline including data collection, preparation, model training, evaluation and addresses issues like overfitting and underfitting.
This document discusses several common problems with data handling and quality including building and testing models with the same data, confusion between biological and technical replicates, and identification and handling of outliers. It provides examples and explanations of key concepts such as experimental and sampling units, pseudo-replication, outliers versus high influence points, and leverage plots. The importance of proper data handling techniques like dividing data into training, test, and confirmation sets and using cross-validation is emphasized to avoid overfitting models and generating spurious findings.
This document discusses various evaluation measures used in machine learning, including accuracy, precision, recall, F1 score, and AUROC for classification problems. For regression problems, the output is continuous and no additional treatment is needed. Classification accuracy is defined as the number of correct predictions divided by the total predictions. The confusion matrix is used to calculate true positives, false positives, etc. Precision measures correct positive predictions, while recall measures all positive predictions. The F1 score balances precision and recall for imbalanced data. AUROC plots the true positive rate against the false positive rate.
This document discusses various performance metrics used to evaluate machine learning models, with a focus on classification metrics. It defines key metrics like accuracy, precision, recall, and specificity using a cancer detection example. Accuracy is only useful when classes are balanced, while precision captures true positives and recall focuses on minimizing false negatives. The document emphasizes that the appropriate metric depends on the problem and whether minimizing false positives or false negatives is more important. Confusion matrices are also introduced as a way to visualize model performance.
Digital Transformation and Process Optimization in ManufacturingBigML, Inc
Keyanoush Razavidinani, Digital Services Consultant at A1 Digital, a BigML Partner, highlights why it is important to identify and reduce human bottlenecks that optimize processes and let you focus on important activities. Additionally, Guillem Vidal, Machine Learning Engineer at BigML completes the session by showcasing how Machine Learning is put to use in the manufacturing industry with a use case to detect factory failures.
The Road to Production: Automating your Anomaly Detectors - by jao (Jose A. Ortega), Co-Founder and Chief Technology Officer at BigML.
*Machine Learning School in The Netherlands 2022.
DutchMLSchool 2022 - ML for AML ComplianceBigML, Inc
Machine Learning for Anti Money Laundering Compliance, by Kevin Nagel, Consultant and Data Scientist at INFORM.
*Machine Learning School in The Netherlands 2022.
DutchMLSchool 2022 - Multi Perspective AnomaliesBigML, Inc
Multi Perspective Anomalies, by Jan W Veldsink, Master in the art of AI at Nyenrode, Rabobank, and Grio.
*Machine Learning School in The Netherlands 2022.
DutchMLSchool 2022 - My First Anomaly Detector BigML, Inc
The document discusses building an anomaly detector model to identify unusual transactions in a dataset. It describes loading transaction data with 31 features into the BigML platform and creating an anomaly detector model. The model scores new data and identifies the most anomalous fields to help detect fraud. Creating the anomaly detector involves interpreting the data, exploring the dataset distribution, and setting a threshold score to define what is considered anomalous.
DutchMLSchool 2022 - History and Developments in MLBigML, Inc
History and Present Developments in Machine Learning, by Tom Dietterich, Emeritus Professor of computer science at Oregon State University and Chief Scientist at BigML.
*Machine Learning School in The Netherlands 2022.
Introduction to End-to-End Machine Learning: Classification and Regression - Mercè Martín, VP of Bindings and Applications at BigML.
*Machine Learning School in The Netherlands 2022.
DutchMLSchool 2022 - A Data-Driven CompanyBigML, Inc
A Data-Driven Company: 21 Lessons for Large Organizations to Create Value from AI, by Richard Benjamins, Chief AI and Data Strategist at Telefónica.
*Machine Learning School in The Netherlands 2022.
DutchMLSchool 2022 - ML in the Legal SectorBigML, Inc
How Machine Learning Transforms and Automates Legal Services, by Arnoud Engelfriet, Co-Founder at Lynn Legal.
*Machine Learning School in The Netherlands 2022.
This document describes a proposed solution using machine learning and artificial intelligence to help create a safer stadium experience. The solution involves two parts: 1) linking access to stadiums to a verified identity through a fan app for preregistration, and 2) using AI/ML to help detect unwanted behaviors or events early. The rest of the document provides more details on the proposed smart video review framework, including using computer vision and audio analysis techniques to help identify issues like flares, flags, banners, chants including monkey chants. The goal is to help reviewers more efficiently identify potential problems but with privacy, ethics and human oversight.
DutchMLSchool 2022 - Process Optimization in Manufacturing PlantsBigML, Inc
Process Optimization in Manufacturing Plants, by Keyanoush Razavidinani, Digital Business Consultant at A1 Digital.
*Machine Learning School in The Netherlands 2022.
DutchMLSchool 2022 - Anomaly Detection at ScaleBigML, Inc
Lessons Learned Applying Anomaly Detection at Scale, by Álvaro Clemente, Machine Learning Engineer at BigML.
*Machine Learning School in The Netherlands 2022.
DutchMLSchool 2022 - Citizen Development in AIBigML, Inc
The document discusses the need for citizen developers and humans in the AI/ML process. It notes that while technology and talent are important, company culture must also support broad data analytics and AI/ML adoption. It then provides examples of how involving domain experts can help attribute meaning to correlations and build better causal models to improve AI systems. The document advocates for a systems thinking approach and having humans in the loop to help AI/ML systems consider the wider context and avoid issues like bias.
This new feature is a continuation of and improvement on our previous Image Processing release. Now, Object Detection lets you go a step further with your image data and allows you to locate objects and annotate regions in your images. Once your image regions are defined, you can train and evaluate Object Detection models, make predictions with them, and automate end-to-end Machine Learning workflows on a single platform. To make that possible, BigML enables Object Detection by introducing the regions optype.
As with any other BigML feature, Object Detection is available from the BigML Dashboard, API, and WhizzML for automation. Object Detection is extremely helpful to tackle a wide range of computer vision use cases such as medical image analysis, quality control in manufacturing, license plate recognition in transportation, people detection in security surveillance, among many others.
This new release brings Image Processing to the BigML platform, a feature that enhances our offering to solve image data-driven business problems with remarkable ease of use. Because BigML treats images as any other data type, this unique implementation allows you to easily use image data alongside text, categorical, numeric, date-time, and items data types as input to create any Machine Learning model available in our platform, both supervised and unsupervised.
Now, it is easier than ever to solve a wide variety of computer vision and image classification use cases in a single platform: label your image data, train and evaluate your models, make predictions, and automate your end-to-end Machine Learning workflows. As with any other BigML feature, Image Processing is available from the BigML Dashboard, API, and WhizzML, and it can be applied to solve use cases such as medical image analysis, visual product search, security surveillance, and vehicle damage detection, among others.
Machine Learning in Retail: Know Your Customers' Customer. See Your FutureBigML, Inc
This session presents a quite common situation for those working in food and beverage retail (FnB) and highlights interesting insights to fight waste reduction.
Speaker: Stephen Kinns, CEO and Co-Founder at catsAi.
*ML in Retail 2021: Webinar.
Machine Learning in Retail: ML in the Retail SectorBigML, Inc
This is an introductory session about the role that Machine Learning is playing in the retail sector and how it is being deployed across the different areas of this industry.
Speaker: Atakan Cetinsoy, VP of Predictive Applications at BigML.
*ML in Retail 2021: Webinar.
ML in GRC: Machine Learning in Legal Automation, How to Trust a LawyerbotBigML, Inc
This presentation analyzes the role that Machine Learning plays in legal automation with a real-world Machine Learning application.
Speaker: Arnoud Engelfriet, Co-Founder at Lynn Legal.
*ML in GRC 2021: Virtual Conference.
ML in GRC: Supporting Human Decision Making for Regulatory Adherence with Mac...BigML, Inc
This is a real-life Machine Learning use case about integrated risk.
Speakers: Thomas Rengersen, Product Owner of the Governance Risk and Compliance Tool for Rabobank, and Thomas Alderse Baas, Co-Founder and Director of The Bowmen Group.
*ML in GRC 2021: Virtual Conference.
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data LakeWalaa Eldin Moustafa
Dynamic policy enforcement is becoming an increasingly important topic in today’s world where data privacy and compliance is a top priority for companies, individuals, and regulators alike. In these slides, we discuss how LinkedIn implements a powerful dynamic policy enforcement engine, called ViewShift, and integrates it within its data lake. We show the query engine architecture and how catalog implementations can automatically route table resolutions to compliance-enforcing SQL views. Such views have a set of very interesting properties: (1) They are auto-generated from declarative data annotations. (2) They respect user-level consent and preferences (3) They are context-aware, encoding a different set of transformations for different use cases (4) They are portable; while the SQL logic is only implemented in one SQL dialect, it is accessible in all engines.
#SQL #Views #Privacy #Compliance #DataLake
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...Social Samosa
The Modern Marketing Reckoner (MMR) is a comprehensive resource packed with POVs from 60+ industry leaders on how AI is transforming the 4 key pillars of marketing – product, place, price and promotions.
Build applications with generative AI on Google CloudMárton Kodok
We will explore Vertex AI - Model Garden powered experiences, we are going to learn more about the integration of these generative AI APIs. We are going to see in action what the Gemini family of generative models are for developers to build and deploy AI-driven applications. Vertex AI includes a suite of foundation models, these are referred to as the PaLM and Gemini family of generative ai models, and they come in different versions. We are going to cover how to use via API to: - execute prompts in text and chat - cover multimodal use cases with image prompts. - finetune and distill to improve knowledge domains - run function calls with foundation models to optimize them for specific tasks. At the end of the session, developers will understand how to innovate with generative AI and develop apps using the generative ai industry trends.
Orchestrating the Future: Navigating Today's Data Workflow Challenges with Ai...Kaxil Naik
Navigating today's data landscape isn't just about managing workflows; it's about strategically propelling your business forward. Apache Airflow has stood out as the benchmark in this arena, driving data orchestration forward since its early days. As we dive into the complexities of our current data-rich environment, where the sheer volume of information and its timely, accurate processing are crucial for AI and ML applications, the role of Airflow has never been more critical.
In my journey as the Senior Engineering Director and a pivotal member of Apache Airflow's Project Management Committee (PMC), I've witnessed Airflow transform data handling, making agility and insight the norm in an ever-evolving digital space. At Astronomer, our collaboration with leading AI & ML teams worldwide has not only tested but also proven Airflow's mettle in delivering data reliably and efficiently—data that now powers not just insights but core business functions.
This session is a deep dive into the essence of Airflow's success. We'll trace its evolution from a budding project to the backbone of data orchestration it is today, constantly adapting to meet the next wave of data challenges, including those brought on by Generative AI. It's this forward-thinking adaptability that keeps Airflow at the forefront of innovation, ready for whatever comes next.
The ever-growing demands of AI and ML applications have ushered in an era where sophisticated data management isn't a luxury—it's a necessity. Airflow's innate flexibility and scalability are what makes it indispensable in managing the intricate workflows of today, especially those involving Large Language Models (LLMs).
This talk isn't just a rundown of Airflow's features; it's about harnessing these capabilities to turn your data workflows into a strategic asset. Together, we'll explore how Airflow remains at the cutting edge of data orchestration, ensuring your organization is not just keeping pace but setting the pace in a data-driven future.
Session in https://budapestdata.hu/2024/04/kaxil-naik-astronomer-io/ | https://dataml24.sessionize.com/session/667627
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...Aggregage
This webinar will explore cutting-edge, less familiar but powerful experimentation methodologies which address well-known limitations of standard A/B Testing. Designed for data and product leaders, this session aims to inspire the embrace of innovative approaches and provide insights into the frontiers of experimentation!
3. BigML, Inc 3Evaluations
Why Evaluations
• FACT: No model is perfect - they all make mistakes
• Your data has mistakes
• Models are “approximations”
• Today you have seen models that predict:
• Churn: How many people will churn that we didn’t predict?
• Diabetes: How many patients might have diabetes that we
said were fine?
• Home Prices: How accurate are the predicted prices?
• You have also seen several different kinds of models
• Decision Trees / Ensembles / Logistic Regression /
Deepnets
• Which one works the best for your data
6. BigML, Inc 6Evaluations
What Just Happened?
• We started with the churn Datasource
• Created a Dataset
• Built a Model to predict churn
• We used the Model to predict churn for each customer in the
Dataset using a Batch Prediction
• Downloaded the Batch Prediction as a CSV and looked for
errors. That is, when the Prediction did not match the known
true value for churn
• The comparison was tedious!
• Examining one line at a time
• Hard to understand - need some metrics!!!
7. BigML, Inc 7Evaluations
Evaluation Metrics
• Imagine we have a model that can predict a person’s dominant
hand, that is for any individual it predicts left / right
• Define the positive class
• This selection is arbitrary
• It is the class you are interested in!
• The negative class is the “other” class (or others)
• For this example, we choose : left
8. BigML, Inc 8Evaluations
Evaluation Metrics
• We choose the positive class: left
• True Positive (TP)
• We predicted left and the correct answer was left
• True Negative (TN)
• We predicted right and the correct answer was right
• False Positive (FP)
• Predicted left but the correct answer was right
• False Negative (FN)
• Predict right but the correct answer was left
9. BigML, Inc 9Evaluations
Evaluation Metrics
True Positive: Correctly predicted the positive class
True Negative: Correctly predicted the negative class
False Positive: Incorrectly predicted the positive class
False Negative: Incorrectly predicted the negative class
Remember…
10. BigML, Inc 10Evaluations
Accuracy
TP + TN
Total
• “Percentage correct” - like an exam
• If Accuracy = 1 then no mistakes
• If Accuracy = 0 then all mistakes
• Intuitive but not always useful
• Watch out for unbalanced classes!
• Ex: 90% of people are right-handed and 10% are left
• A silly model which always predicts right handed is
90% accurate
11. BigML, Inc 11Evaluations
Accuracy
Classified as
Left Handed
Classified as
Right Handed
TP = 0
FP = 0
TN = 7
FN = 3
= Left
= RightPositive
Class
Negative
Class TP + TN
Total
= 70%
12. BigML, Inc 12Evaluations
Precision
TP
TP + FP
• “accuracy” or “purity” of positive class
• How well you did separating the positive class from the
negative class
• If Precision = 1 then no FP.
• You may have missed some left handers, but of the
ones you identified, all are left handed. No mistakes.
• If Precision = 0 then no TP
• None of the left handers you identified are actually left
handed. All mistakes.
14. BigML, Inc 14Evaluations
Recall
TP
TP + FN
• percentage of positive class correctly identified
• A measure of how well you identified all of the positive
class examples
• If Recall = 1 then no FN → All left handers identified
• There may be FP, so precision could be <1
• If Recall = 0 then no TP → No left handers identified
15. BigML, Inc 15Evaluations
Recall
Classified as
Left Handed
Classified as
Right Handed
TP = 2
FP = 2
TN = 5
FN = 1
Positive
Class
Negative
Class
= Left
= Right
TP
TP + FN
= 66%
16. BigML, Inc 16Evaluations
f-Measure
2 * Recall * Precision
Recall + Precision
• harmonic mean of Recall & Precision
• If f-measure = 1 then Recall == Precision == 1
• If Precision OR Recall is small then the f-measure is small
18. BigML, Inc 18Evaluations
Phi Coefficient
__________TP*TN_-_FP*FN__________
SQRT[(TP+FP)(TP+FN)(TN+FP)(TN+FN)]
• Returns a value between -1 and 1
• If -1 then predictions are opposite reality
• =0 no correlation between predictions and reality
• =1 then predictions are always correct
19. BigML, Inc 19Evaluations
Phi Coefficient
Classified as
Fraud
Classified as
Not Fraud
TP = 2
FP = 2
TN = 5
FN = 1
Phi = 0.356
Positive
Class
Negative
Class
= Left
= Right
21. BigML, Inc 21Evaluations
What Just Happened?
• Starting with the Diabetes Source, we created a Dataset and
then a Model.
• Using both the Model and the original Dataset, we created an
Evaluation.
• We reviewed the metrics provided by the Evaluation:
• Confusion Matrix
• Accuracy, Precision, Recall, f-measure and
phi
• This Model seemed to perform really, really well…
Question: Can we trust this model?
22. BigML, Inc 22Evaluations
Evaluation Danger!
• Never evaluate with the training data!
• Many models are able to “memorize” the training data
• This will result in overly optimistic evaluations!
23. BigML, Inc 23Evaluations
“Memorizing” Training Data
plasma
glucose
bmi
diabetes
pedigree
age diabetes
148 33,6 0,627 50 TRUE
85 26,6 0,351 31 FALSE
183 23,3 0,672 32 TRUE
89 28,1 0,167 21 FALSE
137 43,1 2,288 33 TRUE
116 25,6 0,201 30 FALSE
78 31 0,248 26 TRUE
115 35,3 0,134 29 FALSE
197 30,5 0,158 53 TRUE
Training Evaluating
plasma
glucose
bmi
diabetes
pedigree
age diabetes
148 33,6 0,627 50 ?
85 26,6 0,351 31 ?
• Exactly the same values!
• Who needs a model?
• What we want to know is how the
model performs with values never
seen at training:
124 22 0,107 46 ?
24. BigML, Inc 24Evaluations
Evaluation Danger!
• Never evaluate with the training data!
• Many models are able to “memorize” the training data
• This will result in overly optimistic evaluations!
• If you only have one Dataset, use a train/test split
25. BigML, Inc 25Evaluations
Train / Test Split
plasma
glucose
bmi
diabetes
pedigree
age diabetes
148 33,6 0,627 50 TRUE
183 23,3 0,672 32 TRUE
89 28,1 0,167 21 FALSE
78 31 0,248 26 TRUE
115 35,3 0,134 29 FALSE
197 30,5 0,158 53 TRUE
Train Test
plasma
glucose
bmi
diabetes
pedigree
age diabetes
85 26,6 0,351 31 FALSE
137 43,1 2,288 33 TRUE
116 25,6 0,201 30 FALSE
• These instances were never seen
at training time.
• Better evaluation of how the
model will perform with “new” data
27. BigML, Inc 27Evaluations
Evaluation Danger!
• Never evaluate with the training data!
• Many models are able to “memorize” the training data
• This will result in overly optimistic evaluations!
• If you only have one Dataset, use a train/test split
• Even a train/test split may not be enough!
• Might get a “lucky” split
• Solution is to repeat several times (formally to cross validate)
29. BigML, Inc 29Evaluations
What Just Happened?
• Starting with the Diabetes Dataset we created a train/test split
• We built a Model using the train set and evaluated it with the
test set
• The scores were much worse than before, showing the danger
of evaluating with training data.
• Then we launched several other types of models and used the
evaluation comparison tool to see which model algorithm
performed the best.
Question:
Couldn’t we search for the best Model?
STAY
TUNED
30. BigML, Inc 30Evaluations
Evaluation
• Never evaluate with the training data!
• Many models are able to “memorize” the training data
• This will result in overly optimistic evaluations!
• If you only have one Dataset, use a train/test split
• Even a train/test split may not be enough!
• Might get a “lucky” split
• Solution is to repeat several times (formally to cross validate)
• Don’t forget that accuracy can be mis-leading!
• Mostly useless with unbalanced classes (left/right?)
• Use weighting, operating points, other tricks…
31. BigML, Inc 31Evaluations
Weighting
Instance Rate Payment Outcome Predict Confidence
1 23 % 134 Paid Paid 20 %
2 23 % 134 Paid Paid 25 %
3 23 % 134 Paid Paid 30 %
... ... ... ... ...
1000 23 % 134 Paid Paid 99,5 %
1001 23 % 134 Default Paid 99,4 %
Problem: Default is “more important”,but occurs less often
than Paid
Solution: Weights tell the model to treat instances of a
specific class (in this case Default) with more importance
32. BigML, Inc 32Evaluations
Operating Points
• The default probability threshold is 50%
• Changing the threshold can change the outcome for a
specific class
Rate Payment …
Actual
Outcome
Probability
PAID
Threshold
@ 50%
Threshold
@ 60%
Threshold
@ 90%
8,4 % $456 … PAID 95 % PAID PAID PAID
9,6 % $134 … PAID 87 % PAID PAID DEFAULT
18 % $937 … DEFAULT 36 % DEFAULT DEFAULT DEFAULT
21 % $35 … PAID 88 % PAID PAID DEFAULT
17,5 % $1.044 … DEFAULT 55 % PAID DEFAULT DEFAULT
33. BigML, Inc 33Evaluations
Lending Club Dataset
• Peer to Peer lending service
• As an investor, we want a way to
identify loans that are a lower risk
• Fortunately, the data for the outcome
(paid or default) for past loans is
available from Lending Club.
• Using this data, we can build a
model to predict which loans are
good or bad
Instance Rate Payment Outcome
1 8,4 % 456 Paid
2 9,6 % 134 Paid
3 18 % 937 Default
MODEL
NEW LOANS
GOOD / BAD
35. BigML, Inc 35Evaluations
What just happened?
• We split the Lending Club data into training and test Datasets
• We created a Model and Evaluation
• Looking at the Accuracy, we saw that the Model was
performing well but because of unbalanced classes
• The resulting Model did well at predicting good loans
• But bad loans are "more important"
• We tried different weights to increase the Recall of bad loans:
• objective balancing: equal consideration
• class weights: bad = 1000, good = 1
• Finally, we explored the impact of changing the probability
threshold
Wait - What about regressions?
39. BigML, Inc 39Evaluations
MSE versus MAE
• For both MAE & MSE: Smaller is better, but values are
unbounded
• MSE is always larger than or equal to MAE
43. BigML, Inc 43Evaluations
R-Squared Error
• RSE: measure of how much better the model is than
always predicting the mean
• < 0 model is worse then mean
• MSEmodel > MSEmean
• = 0 model is no better than the mean
• MSEmodel = MSEmean
• ➞ 1 model fits the data “perfectly”
• MSEmodel = 0 (or MSEmean >> MSEmodel)
MSEmodel
MSEmean
RSE = 1 -
45. BigML, Inc 45Evaluations
What just happened?
• We split the RedFin data into training and test Datasets
• We created a Model and Evaluation
• We examined the Evaluation metrics
Wait - What about Time Series?
46. BigML, Inc 46Data Transformations
Independent Data
Color Mass PPAP
red 11 pen
green 45 apple
red 53 apple
yellow 0 pen
blue 2 pen
green 422 pineapple
yellow 555 pineapple
blue 7 pen
Discovering patterns:
• Color = “red” Mass < 100
• PPAP = “pineapple” Color
≠ “blue”
• Color = “blue” PPAP =
“pen”
47. BigML, Inc 47Data Transformations
Independent Data
Color Mass PPAP
green 45 apple
blue 2 pen
green 422 pineapple
blue 7 pen
yellow 0 pen
yellow 9 pineapple
red 555 apple
red 11 pen
Patterns still hold when rows
re-arranged:
• Color = “red” Mass < 100
• PPAP = “pineapple” Color
≠ “blue”
• Color = “blue” PPAP =
“pen”