This document discusses techniques for evaluating and improving classifiers. It begins by explaining how to evaluate a classifier's accuracy using metrics like accuracy, precision, recall, and F-measure. It introduces the confusion matrix and shows how different parts of the matrix relate to these metrics. The document then discusses issues like overfitting, underfitting, bias and variance that can impact a classifier's performance. It explains that the goal is to balance bias and variance to minimize total error and achieve optimal classification.
Lecture 10 - Model Testing and Evaluation, a lecture in subject module Statis...Maninda Edirisooriya
Model Testing and Evaluation is a lesson where you learn how to train different ML models with changes and evaluating them to select the best model out of them. This was one of the lectures of a full course I taught in University of Moratuwa, Sri Lanka on 2023 second half of the year.
Lecture 10 - Model Testing and Evaluation, a lecture in subject module Statis...Maninda Edirisooriya
Model Testing and Evaluation is a lesson where you learn how to train different ML models with changes and evaluating them to select the best model out of them. This was one of the lectures of a full course I taught in University of Moratuwa, Sri Lanka on 2023 second half of the year.
Slides for the 2016/2017 edition of the Data Mining and Text Mining Course at the Politecnico di Milano. The course is also part of the joint program with the University of Illinois at Chicago.
Bridging the Gap: Machine Learning for Ubiquitous Computing -- EvaluationThomas Ploetz
Tutorial @Ubicomp 2015: Bridging the Gap -- Machine Learning for Ubiquitous Computing (evaluation session).
A tutorial on promises and pitfalls of Machine Learning for Ubicomp (and Human Computer Interaction). From Practitioners for Practitioners.
Presenter: Nils Hammerla <n.hammerla@gmail.com>
video recording of talks as they wer held at Ubicomp:
https://youtu.be/LgnnlqOIXJc?list=PLh96aGaacSgXw0MyktFqmgijLHN-aQvdq
Top 10 Data Science Practioner Pitfalls - Mark LandrySri Ambati
Over-fitting, misread data, NAs, collinear column elimination and other common issues play havoc in the day of practicing data scientist. In this talk, we review top 10 common pitfalls and steps to avoid them. #h2ony
- Powered by the open source machine learning software H2O.ai. Contributors welcome at: https://github.com/h2oai
- To view videos on H2O open source machine learning software, go to: https://www.youtube.com/user/0xdata
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...sameer shah
"Join us for STATATHON, a dynamic 2-day event dedicated to exploring statistical knowledge and its real-world applications. From theory to practice, participants engage in intensive learning sessions, workshops, and challenges, fostering a deeper understanding of statistical methodologies and their significance in various fields."
Slides for the 2016/2017 edition of the Data Mining and Text Mining Course at the Politecnico di Milano. The course is also part of the joint program with the University of Illinois at Chicago.
Bridging the Gap: Machine Learning for Ubiquitous Computing -- EvaluationThomas Ploetz
Tutorial @Ubicomp 2015: Bridging the Gap -- Machine Learning for Ubiquitous Computing (evaluation session).
A tutorial on promises and pitfalls of Machine Learning for Ubicomp (and Human Computer Interaction). From Practitioners for Practitioners.
Presenter: Nils Hammerla <n.hammerla@gmail.com>
video recording of talks as they wer held at Ubicomp:
https://youtu.be/LgnnlqOIXJc?list=PLh96aGaacSgXw0MyktFqmgijLHN-aQvdq
Top 10 Data Science Practioner Pitfalls - Mark LandrySri Ambati
Over-fitting, misread data, NAs, collinear column elimination and other common issues play havoc in the day of practicing data scientist. In this talk, we review top 10 common pitfalls and steps to avoid them. #h2ony
- Powered by the open source machine learning software H2O.ai. Contributors welcome at: https://github.com/h2oai
- To view videos on H2O open source machine learning software, go to: https://www.youtube.com/user/0xdata
Similar to IME 672 - Classifier Evaluation I.pptx (20)
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...sameer shah
"Join us for STATATHON, a dynamic 2-day event dedicated to exploring statistical knowledge and its real-world applications. From theory to practice, participants engage in intensive learning sessions, workshops, and challenges, fostering a deeper understanding of statistical methodologies and their significance in various fields."
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...Subhajit Sahu
Abstract — Levelwise PageRank is an alternative method of PageRank computation which decomposes the input graph into a directed acyclic block-graph of strongly connected components, and processes them in topological order, one level at a time. This enables calculation for ranks in a distributed fashion without per-iteration communication, unlike the standard method where all vertices are processed in each iteration. It however comes with a precondition of the absence of dead ends in the input graph. Here, the native non-distributed performance of Levelwise PageRank was compared against Monolithic PageRank on a CPU as well as a GPU. To ensure a fair comparison, Monolithic PageRank was also performed on a graph where vertices were split by components. Results indicate that Levelwise PageRank is about as fast as Monolithic PageRank on the CPU, but quite a bit slower on the GPU. Slowdown on the GPU is likely caused by a large submission of small workloads, and expected to be non-issue when the computation is performed on massive graphs.
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...John Andrews
SlideShare Description for "Chatty Kathy - UNC Bootcamp Final Project Presentation"
Title: Chatty Kathy: Enhancing Physical Activity Among Older Adults
Description:
Discover how Chatty Kathy, an innovative project developed at the UNC Bootcamp, aims to tackle the challenge of low physical activity among older adults. Our AI-driven solution uses peer interaction to boost and sustain exercise levels, significantly improving health outcomes. This presentation covers our problem statement, the rationale behind Chatty Kathy, synthetic data and persona creation, model performance metrics, a visual demonstration of the project, and potential future developments. Join us for an insightful Q&A session to explore the potential of this groundbreaking project.
Project Team: Jay Requarth, Jana Avery, John Andrews, Dr. Dick Davis II, Nee Buntoum, Nam Yeongjin & Mat Nicholas
Learn SQL from basic queries to Advance queriesmanishkhaire30
Dive into the world of data analysis with our comprehensive guide on mastering SQL! This presentation offers a practical approach to learning SQL, focusing on real-world applications and hands-on practice. Whether you're a beginner or looking to sharpen your skills, this guide provides the tools you need to extract, analyze, and interpret data effectively.
Key Highlights:
Foundations of SQL: Understand the basics of SQL, including data retrieval, filtering, and aggregation.
Advanced Queries: Learn to craft complex queries to uncover deep insights from your data.
Data Trends and Patterns: Discover how to identify and interpret trends and patterns in your datasets.
Practical Examples: Follow step-by-step examples to apply SQL techniques in real-world scenarios.
Actionable Insights: Gain the skills to derive actionable insights that drive informed decision-making.
Join us on this journey to enhance your data analysis capabilities and unlock the full potential of SQL. Perfect for data enthusiasts, analysts, and anyone eager to harness the power of data!
#DataAnalysis #SQL #LearningSQL #DataInsights #DataScience #Analytics
Adjusting OpenMP PageRank : SHORT REPORT / NOTESSubhajit Sahu
For massive graphs that fit in RAM, but not in GPU memory, it is possible to take
advantage of a shared memory system with multiple CPUs, each with multiple cores, to
accelerate pagerank computation. If the NUMA architecture of the system is properly taken
into account with good vertex partitioning, the speedup can be significant. To take steps in
this direction, experiments are conducted to implement pagerank in OpenMP using two
different approaches, uniform and hybrid. The uniform approach runs all primitives required
for pagerank in OpenMP mode (with multiple threads). On the other hand, the hybrid
approach runs certain primitives in sequential mode (i.e., sumAt, multiply).
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data LakeWalaa Eldin Moustafa
Dynamic policy enforcement is becoming an increasingly important topic in today’s world where data privacy and compliance is a top priority for companies, individuals, and regulators alike. In these slides, we discuss how LinkedIn implements a powerful dynamic policy enforcement engine, called ViewShift, and integrates it within its data lake. We show the query engine architecture and how catalog implementations can automatically route table resolutions to compliance-enforcing SQL views. Such views have a set of very interesting properties: (1) They are auto-generated from declarative data annotations. (2) They respect user-level consent and preferences (3) They are context-aware, encoding a different set of transformations for different use cases (4) They are portable; while the SQL logic is only implemented in one SQL dialect, it is accessible in all engines.
#SQL #Views #Privacy #Compliance #DataLake
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdfGetInData
Recently we have observed the rise of open-source Large Language Models (LLMs) that are community-driven or developed by the AI market leaders, such as Meta (Llama3), Databricks (DBRX) and Snowflake (Arctic). On the other hand, there is a growth in interest in specialized, carefully fine-tuned yet relatively small models that can efficiently assist programmers in day-to-day tasks. Finally, Retrieval-Augmented Generation (RAG) architectures have gained a lot of traction as the preferred approach for LLMs context and prompt augmentation for building conversational SQL data copilots, code copilots and chatbots.
In this presentation, we will show how we built upon these three concepts a robust Data Copilot that can help to democratize access to company data assets and boost performance of everyone working with data platforms.
Why do we need yet another (open-source ) Copilot?
How can we build one?
Architecture and evaluation
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdf
IME 672 - Classifier Evaluation I.pptx
1. Data Mining &
Knowledge Discovery
IME 672
Dr. Faiz Hamid
Department of Industrial & Management Engineering
Indian Institute of Technology Kanpur
Email: fhamid@iitk.ac.in
3. Classifier Evaluation
• Estimate how accurately the classifier can predict on future
data on which the classifier has not been trained
• Compare the performance of classifiers if there are more than
one
• How to estimate accuracy?
• Are some measures of a classifier’s accuracy more
appropriate than others?
4. Classifier Evaluation Metrics
Confusion Matrix
C1 ¬ C1
C1 True Positives (TP) False Negatives (FN)
¬ C1 False Positives (FP) True Negatives (TN)
Actual class
Predicted class
• Positive tuples - tuples of the main class of interest
• Negative tuples - all other tuples
• Confusion matrix – a tool for analysing how well a classifier can recognize tuples of
different classes
• True positives (TP) - positive tuples correctly labeled by the classifier
• True negatives (TN) - negative tuples correctly labeled by the classifier
• False positives (FP) - negative tuples incorrectly labeled as positive
• False negatives (FN) - positive tuples mislabeled as negative
• Confusion matrices can be easily drawn for multiple classes
Confusion between the positive and
negative class
5. Classifier Evaluation Metrics
• Classifier Accuracy, or recognition rate:
percentage of test set tuples that are
correctly classified
Accuracy = (TP + TN)/All
• Error rate: 1 – accuracy, or
Error rate = (FP + FN)/All
• Sensitivity (Recall): True Positive
recognition rate
• Sensitivity = TP/P
• Specificity: True Negative recognition
rate
• Specificity = TN/N
AP C ¬C
C TP FN P
¬C FP TN N
P’ N’ All
6. Classifier Evaluation Metrics
• Precision: exactness – what % of tuples that the classifier
labeled as positive are actually positive
• Recall: completeness – what % of positive tuples did the
classifier label as positive?
Precision =
# positive tuples retrieved
# tuples retrieved
=
TP
TP + FP
AP C ¬C
C TP FN P
¬C FP TN N
P’ N’ All
Recall =
# positive tuples retrieved
# positive tuples
=
TP
P
Precision = 1 Recall = 1
7. Classifier Evaluation Metrics
• F measure (F1 or F-score): harmonic mean of precision and recall
• Fß: weighted measure of precision and recall
– assigns ß times as much weight to recall as to precision
8. Classifier Evaluation Metrics
Example of Confusion Matrix:
buy_computer
= yes
buy_computer
= no
Total
buy_computer = yes 6954 46 7000
buy_computer = no 412 2588 3000
Total 7366 2634 10000
Actual class
Predicted class
9. Classifier Evaluation Metrics
• Classify medical data tuples
• Positive tuples (cancer = yes)
• Negative tuples (cancer = no)
• The classifier seems quite accurate; 96.5% accuracy
• Sensitivity = TP/P = 90/300×100 = 30% (accuracy on the cancer tuples)
• Specificity = TN/N = 9560/9700×100 = 98.56% (accuracy on noncancer
tuples)
• Classifier is correctly labeling only the noncancer tuples and
misclassifying most of the cancer tuples!!!
• Accuracy rate of 98.56% is not acceptable
• Only 3% of the training set are cancer tuples
Actual
class
Predicted class
10. Overfitting and Underfitting
• Overall goal in machine learning is to obtain a model/
hypothesis that generalizes well to new, unseen data
– Goal is not to memorize the training data (far more efficient ways to store data
than inside a random forest)
• A good model has a “high generalization accuracy” or “low
generalization error”
• Assumptions we generally make are:
– i.i.d. assumption: inputs are independent, and training and test examples are
identically distributed (drawn from the same probability distribution)
– For some random model that has not been fit to the training set, we expect
both the training and test error to be equal
– Training error or accuracy provides an (optimistically) biased estimate of the
generalization performance
11. Overfitting and Underfitting
• In statistics, a fit refers to how well a target function is
approximated
• Overfitting refers to a model that models the training data too
well
– Model learns the detail and noise/ random fluctuations in the training data as
concepts
– These concepts do not apply to new data; negatively impacts the performance
– More likely with nonparametric and nonlinear models that have more
flexibility when learning a target function
– Example: decision trees
– Techniques to reduce overfitting:
• Reduce model complexity
• Regularization, Early stopping during the training phase
• Cross-validation
12. Overfitting and Underfitting
• Underfitting refers to a model that can neither model the
training data nor generalize to new data
• Model cannot capture the underlying trend of the data
• Usually happens when:
– we have less data to build an accurate model
– we try to build a linear model with a non-linear data
• Techniques to reduce underfitting :
– Increase training data
– Increase model complexity
– Increase number of features, performing feature engineering
– Increase number of epochs/duration of training
15. Bias and Variance
• Bias
– Assumptions made by a model to make a function easier to learn
– This is the error when the approximated function is trivial for a very complex
problem, thereby ignoring the structural relationship between the predictors
and the target
– High bias results in underfitting and a higher training error
– Can be reduced by augmenting features which better describe the association
with target variable
• Variance
– Extent to which the approximated function learned by a model differs a lot
between different training sets
– High variance results in overfitting
– Regularization methods are commonly used to control the variance
16. Bias and Variance
• Suppose there is an unknown target function or “true function” to which we do
want to approximate
• Suppose we have different training sets drawn from an unknown distribution
defined as “true function + noise”
f(x) f(x)
• Plot shows different linear regression models, each fit to a
different training set
• None of these models approximate the true function well,
except at two points (around x=-10 and x=6)
• Bias is large because the difference between the true value and
the predicted value, on average is large
• Plot shows different unpruned decision tree models, each
fit to a different training set
• These models fit the training data very closely
• However, the expectation over training sets, the average
hypothesis would fit the true function perfectly (given
that the noise is unbiased and has an expected value of 0)
• However, the variance is very high, since on average, a
prediction differs a lot from the expected value of the
prediction
17. Bias and Variance
Source: https://sebastianraschka.com/pdf/lecture-notes/stat479fs18/08_eval-intro_notes.pdf
Overfitting
Underfitting
18. Bias-Variance Tradeoff
• Find a balance between bias and variance that minimizes the
total error
• Ensemble and cross validation are frequently used methods to
minimize the total error
• Scenario #1: High Bias, Low Variance - underfitting
• Scenario #2: Low Bias, High Variance - overfitting
• Scenario #3: Low Bias, Low Variance - optimal state
• Scenario #4: High Bias, High Variance - something wrong
with data (training and validation distribution mismatch,
noisy data etc.)