SlideShare a Scribd company logo
Nearest Neighbor, Decision Tree
Danna Gurari
University of Texas at Austin
Spring 2021
https://www.ischool.utexas.edu/~dannag/Courses/IntroToMachineLearning/CourseContent.html
Review
• Last week:
• Binary classification applications
• Evaluating classification models
• Biological neurons: inspiration
• Artificial neurons: Perceptron & Adaline
• Gradient descent
• Assignments (Canvas)
• Lab assignment 1 due tonight
• Problem set 3 due next week
• Questions?
Today’s Topics
• Multiclass classification applications and evaluating models
• Motivation for new ML era: need non-linear models
• Nearest neighbor classification
• Decision tree classification
• Parametric versus non-parametric models
Today’s Topics
• Multiclass classification applications and evaluating models
• Motivation for new ML era: need non-linear models
• Nearest neighbor classification
• Decision tree classification
• Parametric versus non-parametric models
Today’s Focus: Multiclass Classification
Predict 3+ classes
Multiclass Classification: Cancer Diagnosis
https://news.stanford.edu/2017/01/25/artificial-intelligence-used-identify-skin-cancer/
https://www.nature.com/articles/nature21056
Doctor-level performance in recognizing 2,032 diseases
Multiclass Classification: Face Recognition
Security
https://www.anyvision.co/
Visual assistance for people
with vision impairments
Social media
Multiclass Classification: Shopping
Camfind (https://venturebeat.com/2014/09/24/camfind-app-
brings-accurate-visual-search-to-google-glass-exclusive/)
Multiclass Classification: Song Recognition
https://gbksoft.com/blog/how-to-make-a-shazam-like-app/
Goal: Design Models that Generalize Well to
New, Previously Unseen Examples
Input:
Label: Cat Dog Person
1. Split data into a “training set” and “ ”
Goal: Design Models that Generalize Well to
New, Previously Unseen Examples
Training Data Test Data
Input:
Label: Cat Dog Person
Training Data Test Data
Goal: Design Models that Generalize Well to
New, Previously Unseen Examples
Input:
Label: Cat Dog
2. Train model on “training set” to try to minimize prediction error on it
Person
Prediction Model
Goal: Design Models that Generalize Well to
New, Previously Unseen Examples
3. Apply trained model on “ ” to measure generalization error
? ? ?
Input:
Label:
Prediction Model
Goal: Design Models that Generalize Well to
New, Previously Unseen Examples
3. Apply trained model on “ ” to measure generalization error
Cat ? ?
Input:
Label:
Prediction Model
Goal: Design Models that Generalize Well to
New, Previously Unseen Examples
3. Apply trained model on “ ” to measure generalization error
Cat Giraffe ?
Input:
Label:
Prediction Model
Goal: Design Models that Generalize Well to
New, Previously Unseen Examples
3. Apply trained model on “ ” to measure generalization error
Cat Giraffe Person
Input:
Label:
Prediction Model
Goal: Design Models that Generalize Well to
New, Previously Unseen Examples
3. Apply trained model on “ ” to measure generalization error
Cat Giraffe Person
Input:
Label:
Cat Pumpkin
Human Annotated Label: Person
• Accuracy: percentage correct
• Precision:
• Recall:
Evaluation Methods
• Confusion matrix; e.g.,
http://gabrielelanaro.github.io/blog/2016/0
2/03/multiclass-evaluation-measures.html
Today’s Topics
• Multiclass classification applications and evaluating models
• Motivation for new ML era: need non-linear models
• Nearest neighbor classification
• Decision tree classification
• Parametric versus non-parametric models
Recall: Historical Context of ML Models
1613
Human “Computers”
1945
First
programmable
machine
Turing
Test
1959
Machine
Learning
1956
Early
1800 1974 1980 1987 1993
1rst AI
Winter
2nd AI
Winter
Linear
regression
1957
Perceptron
1950
AI
Recall: Vision for Perceptron
Frank Rosenblatt
(Psychologist)
“[The perceptron is] the embryo of an
electronic computer that [the Navy] expects
will be able to walk, talk, see, write,
reproduce itself and be conscious of its
existence…. [It] is expected to be finished in
about a year at a cost of $100,000.”
1958 New York Times article: https://www.nytimes.com/1958/07/08/archives/new-
navy-device-learns-by-doing-psychologist-shows-embryo-of.html
https://en.wikipedia.org/wiki/Frank_Rosenblatt
“Perceptrons” Book: Instigator for “AI Winter”
1613
Human “Computers”
1945
First
programmable
machine
Turing
Test
1959
Machine
Learning
1956
Early
1800 1974 1980 1987 1993
1rst AI
Winter
2nd AI
Winter
Linear
regression
1957
Perceptron
1950
AI
1969
Minsky & Papert publish a book called
“Perceptrons” to discuss its limitations
Perceptron Limitation: XOR Problem
XOR = “Exclusive Or”
- Input: two binary values x1 and x2
- Output:
- 1, when exactly one input equals 1
- 0, otherwise
x1 x2 x1 XOR x2
0 0
0 1
1 0
1 1
?
?
?
?
Perceptron Limitation: XOR Problem
XOR = “Exclusive Or”
- Input: two binary values x1 and x2
- Output:
- 1, when exactly one input equals 1
- 0, otherwise
x1 x2 x1 XOR x2
0 0
0 1
1 0
1 1
?
?
?
?
Perceptron Limitation: XOR Problem
XOR = “Exclusive Or”
- Input: two binary values x1 and x2
- Output:
- 1, when exactly one input equals 1
- 0, otherwise
x1 x2 x1 XOR x2
0 0
0 1
1 0
1 1
0
?
?
?
Perceptron Limitation: XOR Problem
XOR = “Exclusive Or”
- Input: two binary values x1 and x2
- Output:
- 1, when exactly one input equals 1
- 0, otherwise
x1 x2 x1 XOR x2
0 0
0 1
1 0
1 1
0
1
?
?
Perceptron Limitation: XOR Problem
XOR = “Exclusive Or”
- Input: two binary values x1 and x2
- Output:
- 1, when exactly one input equals 1
- 0, otherwise
x1 x2 x1 XOR x2
0 0
0 1
1 0
1 1
0
1
1
?
Perceptron Limitation: XOR Problem
XOR = “Exclusive Or”
- Input: two binary values x1 and x2
- Output:
- 1, when exactly one input equals 1
- 0, otherwise
x1 x2 x1 XOR x2
0 0
0 1
1 0
1 1
0
1
1
0
Perceptron Limitation: XOR Problem
x1 x2 x1 XOR x2
0 0
0 1
1 0
1 1
0
1
1
0
How to separate 1s from 0s with a perceptron (linear function)?
Perceptron Limitation: XOR Problem
Frank Rosenblatt
(Psychologist)
“[The perceptron is] the embryo of an
electronic computer that [the Navy] expects
will be able to walk, talk, see, write,
reproduce itself and be conscious of its
existence…. [It] is expected to be finished in
about a year at a cost of $100,000.”
1958 New York Times article: https://www.nytimes.com/1958/07/08/archives/new-
navy-device-learns-by-doing-psychologist-shows-embryo-of.html
How can a machine be “conscious”
when it can’t solve the XOR problem?
How to Overcome Limitation?
Non-linear models: e.g.,
• Linear regression: perform non-linear transformation of input features
• K-nearest neighbors
• Decision trees
• And many more to follow…
Today’s Topics
• Multiclass classification applications and evaluating models
• Motivation for new ML era: need non-linear models
• Nearest neighbor classification
• Decision tree classification
• Parametric versus non-parametric model
Historical Context of ML Models
1613
Human “Computers”
1945
First
programmable
machine
Turing
Test
1959
Machine
Learning
1956
Early
1800 1974 1980 1987 1993
1rst AI
Winter
2nd AI
Winter
Linear
regression
1957
Perceptron
1950
AI
1951
K-Nearest Neighbors
E. Fix and J.L. Hodges. Discriminatory Analysis, Nonparametric Discrimination: Consistency Properties. Technical Report 4, USAF School of Aviation Medicine, Randolph Field, 1951
Recall: Instance-Based Learning
Memorizes examples and uses a similarity
measure to those examples to make predictions
Figure Source: Hands-on Machine Learning with Scikit-Learn & TensorFlow, Aurelien Geron
e.g., Predict What Scene The Image Shows
Input Output
Kitchen
Database Search
Input:
Label:
1. Create Large Database
e.g., Predict What Scene The Image Shows
Kitchen Coast
Store
e.g., Predict What Scene The Image Shows
2. Organize Database so Visually Similar Examples Neighbor Each Other
Adapted from slides by Antonio Torralba
3. Predict Class Using Label of Most Similar Example(s) in the Database
Input:
Label: ?
Adapted from slides by Antonio Torralba
e.g., Predict What Scene The Image Shows
3. Predict Class Using Label of Most Similar Example(s) in the Database
e.g., Predict What Scene The Image Shows
Input:
Label: Cathedral
Adapted from slides by Antonio Torralba
3. Predict Class Using Label of Most Similar Example(s) in the Database
e.g., Predict What Scene The Image Shows
Input:
Label: ?
Adapted from slides by Antonio Torralba
3. Predict Class Using Label of Most Similar Example(s) in the Database
e.g., Predict What Scene The Image Shows
Input:
Label: Highway
Adapted from slides by Antonio Torralba
K-Nearest Neighbor Classification
https://github.com/amueller/introduction_to_ml_with_python/blob/master/02-supervised-learning.ipynb
Training Data:
• Given:
• x = {10.5, 3}
• Predict:
• When k = 1:
Novel Examples:
K-Nearest Neighbor Classification
Training Data:
• Given:
• x = {10.5, 3}
• Predict:
• When k = 1:
• Class 1
• When k = 6:
• How to avoid ties?
• Set “k” to odd value
for binary problems
• Prefer “closer”
neighbors
Novel Examples:
https://github.com/amueller/introduction_to_ml_with_python/blob/master/02-supervised-learning.ipynb
K-Nearest Neighbors: Measuring Distance
How to measure distance between a novel example and test example?
• Commonly use, Minkowski distance:
• When p = 2, Euclidean distance:
• When p = 1, Manhattan distance:
https://en.wikipedia.org/wiki/Taxicab_geometry
K-Nearest Neighbors: Measuring Distance
• Given:
• x = {10.5, 3}
• k =1:
Euclidean Distance
Training Data:
https://github.com/amueller/introduction_to_ml_with_python/blob/master/02-supervised-learning.ipynb
K-Nearest Neighbors: Measuring Distance
• Given:
• x = {10.5, 3}
• k =1:
Euclidean Distance
Training Data:
Note: May want to
scale the data to the
same range first
https://github.com/amueller/introduction_to_ml_with_python/blob/master/02-supervised-learning.ipynb
K-Nearest Neighbors: Measuring Distance
• Given:
• x = {10.5, 3}
• k =1:
Manhattan Distance
Training Data:
https://github.com/amueller/introduction_to_ml_with_python/blob/master/02-supervised-learning.ipynb
K-Nearest Neighbors: Measuring Distance
• Given:
• x = {10.5, 3}
• k =1:
Manhattan Distance
Training Data:
Note: May want to
scale the data to the
same range first
https://github.com/amueller/introduction_to_ml_with_python/blob/master/02-supervised-learning.ipynb
K-Nearest Neighbors: Measuring Distance
How to measure distance for novel categorical test data?
• e.g., Train = blue
• e.g., Test = blue; identical values so assign distance 0
• e.g., Test = white; different values so assign distance 1
https://github.com/amueller/introduction_to_ml_with_python/blob/master/02-supervised-learning.ipynb
K-Nearest Neighbor Classification: What “K”?
When K=1: When K=3:
https://github.com/amueller/introduction_to_ml_with_python/blob/master/02-supervised-learning.ipynb
K-Nearest Neighbor Classification: What “K”?
When K=1: When K=3:
Given that predictions can change with different
values of “k”, what value of “k” should one use?
https://github.com/amueller/introduction_to_ml_with_python/blob/master/02-supervised-learning.ipynb
K-Nearest Neighbor Classification: What “K”?
What happens to the decision boundary as “k” grows?
https://github.com/amueller/introduction_to_ml_with_python/blob/master/02-supervised-learning.ipynb
K-Nearest Neighbor Classification: What “K”?
What happens when “k” equals the training data size?
https://github.com/amueller/introduction_to_ml_with_python/blob/master/02-supervised-learning.ipynb
K-Nearest Neighbor Classification: What “K”?
(Higher Model Complexity) (Lower Model Complexity)
https://github.com/amueller/introduction_to_ml_with_python/blob/master/02-supervised-learning.ipynb
K-Nearest Neighbor Classification: What “K”?
At what value
for “k” is model
overfitting the
most?
k = 1
https://github.com/amueller/introduction_to_ml_with_python/blob/master/02-supervised-learning.ipynb
K-Nearest Neighbor Classification: What “K”?
What is the best
value for “k”?
k = 6
https://github.com/amueller/introduction_to_ml_with_python/blob/master/02-supervised-learning.ipynb
K-Nearest Neighbor: How to Use to Predict
More than Two Classes?
• Tally number of examples belonging to each class and again choose
the majority vote winners
What are Strengths of KNN?
• Adapts as new data is added
• Training is relatively fast
• Easy to understand
What are Weaknesses of KNN?
• For large datasets, requires large storage space
• For large datasets, this approach can be very slow or infeasible
• Note: can improve speed with efficient data structures such as KD-trees
• Vulnerable to noisy/irrelevant examples
• Sensitive to imbalanced datasets where more frequent class will
dominate majority voting
Today’s Topics
• Multiclass classification applications and evaluating models
• Motivation for new ML era: need non-linear models
• Nearest neighbor classification
• Decision tree classification
• Parametric versus non-parametric models
Historical Context of ML Models
1613
Human “Computers”
1945
First
programmable
machine
Turing
Test
1959
Machine
Learning
1956
Early
1800 1974 1980 1987 1993
1rst AI
Winter
2nd AI
Winter
Linear
regression
1957
Perceptron
1950
AI
Quinlan, J. R. (1986). Induction of decision trees. Machine learning, 1(1), 81-106.
Hunt, E.B. (1962). Concept learning: An information processing problem. New York: Wiley.
Quinlan, J.R. (1979). Discovering rules by induction from large collections of examples. In D. Michie (Ed.), Expert systems in the
micro electronic age. Edinburgh University Press.
K-nearest
neighbors
1962
Concept Learning
1979
ID3
Example: Decision Tree
http://dataaspirant.com/2017/01/30/how-decision-tree-algorithm-works/
Example: Decision Tree
http://dataaspirant.com/2017/01/30/how-decision-tree-algorithm-works/
Test Example
Salary: $44,869
Commute: 35 min
Free Coffee: Yes
Example: Decision Tree
http://dataaspirant.com/2017/01/30/how-decision-tree-algorithm-works/
Test Example
Salary: $62,200
Commute: 45 min
Free Coffee: Yes
Example: Decision Tree
http://dataaspirant.com/2017/01/30/how-decision-tree-algorithm-works/
Machine must
learn to create a
decision tree
Decision Tree: Generic Structure
• Goal: predict class label (aka: class, ground truth)
• Representation: Tree
• Internal (non-leaf) nodes = tests an attribute
• Branches = attribute value
• Leaf = classification label
Decision Tree: Generic Structure
• Goal: predict class label (aka: class, ground truth)
• Representation: Tree
• Internal (non-leaf) nodes = tests an attribute
• Branches = attribute value
• Leaf = classification label
Decision Tree: Generic Structure
• Goal: predict class label
• Representation: Tree
• Internal (non-leaf) nodes = tests an attribute
• Branches = attribute value
• Leaf = classification label
How can a machine learn a decision tree?
Decision Tree: Generic Learning Algorithm
• Greedy approach (NP complete problem)
aka – “data” aka – “feature names”
Decision Tree: Generic Learning Algorithm
• Greedy approach (NP complete problem)
Decision Tree: Generic Learning Algorithm
• Greedy approach (NP complete problem)
Key Decision
Next “Best” Attribute: Use Entropy
Number of classes
Fraction of examples belonging to class i
Encodes in bits
In a binary setting,
• Entropy is 0 when fraction of examples belonging to a class is 0 or 1
• Entropy is 1 when fraction of examples belonging to each class is 0.5
https://www.logcalculator.net/log-2
Next “Best” Attribute: Use Entropy
e.g., Will you like a movie?
• Let C1 = “Yes” and C2 = “No”
• Current entropy?
Next “Best” Attribute: Use Entropy
e.g., Will you like a movie?
• Let C1 = “Yes” and C2 = “No”
• Current entropy?
Next “Best” Attribute: Use Entropy
e.g., Will you like a movie?
• Let C1 = “Yes” and C2 = “No”
• Current entropy?
Next “Best” Attribute: Use Entropy
e.g., Will you like a movie?
• Let C1 = “Yes” and C2 = “No”
• Entropy if we split on “Type”?
• Left tree: “Comedy” = ?
Next “Best” Attribute: Use Entropy
e.g., Will you like a movie?
• Let C1 = “Yes” and C2 = “No”
• Entropy if we split on “Type”?
• Left tree: “Comedy” = ?
Next “Best” Attribute: Use Entropy
e.g., Will you like a movie?
• Let C1 = “Yes” and C2 = “No”
• Entropy if we split on “Type”?
• Left tree: “Comedy” = 0.92
• Right tree: “Drama” = ?
Next “Best” Attribute: Use Entropy
e.g., Will you like a movie?
• Let C1 = “Yes” and C2 = “No”
• Entropy if we split on “Type”?
• Left tree: “Comedy” = 0.92
• Right tree: “Drama” = ?
Next “Best” Attribute: Use Entropy
e.g., Will you like a movie?
• Let C1 = “Yes” and C2 = “No”
• Entropy if we split on “Type”?
• Left tree: “Comedy” = 0.92
• Right tree: “Drama” = 0.97
• Information gain by split on “Type”?
Next “Best” Attribute: Use Entropy
e.g., Will you like a movie?
• Let C1 = “Yes” and C2 = “No”
• Entropy if we split on “Length”?
• Left tree: “Short” = ?
Next “Best” Attribute: Use Entropy
e.g., Will you like a movie?
• Let C1 = “Yes” and C2 = “No”
• Entropy if we split on “Length”?
• Left tree: “Short” = 0.92
• Middle tree: “Medium” = ?
Next “Best” Attribute: Use Entropy
e.g., Will you like a movie?
• Let C1 = “Yes” and C2 = “No”
• Entropy if we split on “Length”?
• Left tree: “Short” = 0.92
• Middle tree: “Medium” = 0.82
• Right tree: “Long” = ?
Next “Best” Attribute: Use Entropy
e.g., Will you like a movie?
• Let C1 = “Yes” and C2 = “No”
• Entropy if we split on “Length”?
• Left tree: “Short” = 0.92
• Middle tree: “Medium” = 0.82
• Right tree: “Long” = 0
• Information gain by split on “Length”?
Next “Best” Attribute: Use Entropy
e.g., Will you like a movie?
• Let C1 = “Yes” and C2 = “No”
• Entropy if we split on “IMDb Rating”?
• Order attribute values:
{4.5, 5.1, 6.9, 7.2, 7.5, 8.0, 8.3, 9.3}
Split at midpoint and measure entropy
Next “Best” Attribute: Use Entropy
e.g., Will you like a movie?
• Let C1 = “Yes” and C2 = “No”
• Entropy if we split on “IMDb Rating”?
• Order attribute values:
{4.5, 5.1, 6.9, 7.2, 7.5, 8.0, 8.3, 9.3}
Split at midpoint and measure entropy
Next “Best” Attribute: Use Entropy
e.g., Will you like a movie?
• Let C1 = “Yes” and C2 = “No”
• Entropy if we split on “IMDb Rating”?
• Order attribute values:
{4.5, 5.1, 6.9, 7.2, 7.5, 8.0, 8.3, 9.3}
Split at midpoint and measure entropy
Next “Best” Attribute: Use Entropy
e.g., Will you like a movie?
• Let C1 = “Yes” and C2 = “No”
• Entropy if we split on “IMDb Rating”?
• Order attribute values:
{4.5, 5.1, 6.9, 7.2, 7.5, 8.0, 8.3, 9.3}
Split at midpoint and measure entropy
Next “Best” Attribute: Use Entropy
e.g., Will you like a movie?
• Let C1 = “Yes” and C2 = “No”
• Entropy if we split on “IMDb Rating”?
• Order attribute values:
{4.5, 5.1, 6.9, 7.2, 7.5, 8.0, 8.3, 9.3}
Split at midpoint and measure entropy
Next “Best” Attribute: Use Entropy
e.g., Will you like a movie?
• Let C1 = “Yes” and C2 = “No”
• Entropy if we split on “IMDb Rating”?
• Order attribute values:
{4.5, 5.1, 6.9, 7.2, 7.5, 8.0, 8.3, 9.3}
Split at midpoint and measure entropy
Decision Tree: What is Our First Split?
• Greedy approach (NP complete problem)
IG = 0 IG = 0.19 IG = 0.95
Decision Tree: What Tree Results?
IMDb Rating
<= 7.05 > 7.05
Recurse on this tree?
Decision Tree: What Tree Results?
No
Recurse on this tree?
IMDb Rating
<= 7.05 > 7.05
Decision Tree: What Tree Results?
No Yes
IMDb Rating
<= 7.05 > 7.05
Decision Tree: Generic Learning Algorithm
• Greedy approach (NP complete problem)
Key Decision
• Entropy (maximize information gain)
• Gini Index (used in CART algorithm)
• Gain ratio (used in C4.5 algorithm)
• Mean squared error
• …
Overfitting
• At what tree size, does overfitting begin?
http://www.alanfielding.co.uk/multivar/crt/dt_example_04.htm
Evaluate on training data
Evaluate on test data
Regularization to Avoid Overfitting
• Pruning
• Pre-pruning: stop tree growth earlier
• Post-pruning: prune tree afterwards
https://www.clariba.com/blog/tech-20140811-decision-trees-with-sap-predictive-analytics-and-sap-hana-emilio-nieto
Today’s Topics
• Multiclass classification applications and evaluating models
• Motivation for new ML era: need non-linear models
• Nearest neighbor classification
• Decision tree classification
• Parametric versus non-parametric models
Machine Learning Goal
• Learn function that maps input features (X) to an output prediction (Y)
Y = f(x)
Machine Learning Goal
• Learn function that maps input features (X) to an output prediction (Y)
• Parametric model: has a fixed number of parameters
Y = f(x)
Machine Learning Goal
• Learn function that maps input features (X) to an output prediction (Y)
• Parametric model: has a fixed number of parameters
• Non-parametric model: does not specify the number of parameters
Y = f(x)
Class Discussion
• For each model, is it parametric or non-parametric?
• Linear regression
• K-nearest neighbors
• What are advantages and disadvantages of parametric models?
• What are advantages and disadvantages of non-parametric models?
Today’s Topics
• Multiclass classification applications and evaluating models
• Motivation for new ML era: need non-linear models
• Nearest neighbor classification
• Decision tree classification
• Parametric versus non-parametric models
References Used for Today’s Material
• Chapter 3 of Deep Learning book by Goodfellow et al.
• http://www.cs.utoronto.ca/~fidler/teaching/2015/slides/CSC411/06_trees.pdf
• http://www.cs.utoronto.ca/~fidler/teaching/2015/slides/CSC411/tutorial3_CrossVal-DTs.pdf
• http://www.cs.cmu.edu/~epxing/Class/10701/slides/classification15.pdf

More Related Content

Similar to Nearest Neighbor And Decision Tree - NN DT

Future of AI-powered automation in business
Future of AI-powered automation in businessFuture of AI-powered automation in business
Future of AI-powered automation in business
Louis Dorard
 
Introduction to ML.NET
Introduction to ML.NETIntroduction to ML.NET
Introduction to ML.NET
Marco Parenzan
 
rsec2a-2016-jheaton-morning
rsec2a-2016-jheaton-morningrsec2a-2016-jheaton-morning
rsec2a-2016-jheaton-morningJeff Heaton
 
07-Classification.pptx
07-Classification.pptx07-Classification.pptx
07-Classification.pptx
Shree Shree
 
Core Methods In Educational Data Mining
Core Methods In Educational Data MiningCore Methods In Educational Data Mining
Core Methods In Educational Data Mining
ebelani
 
Machine Learning ICS 273A
Machine Learning ICS 273AMachine Learning ICS 273A
Machine Learning ICS 273Abutest
 
Machine Learning ICS 273A
Machine Learning ICS 273AMachine Learning ICS 273A
Machine Learning ICS 273Abutest
 
Big Data Real Time Training in Chennai
Big Data Real Time Training in ChennaiBig Data Real Time Training in Chennai
Big Data Real Time Training in Chennai
Vijay Susheedran C G
 
Big Data 101 - An introduction
Big Data 101 - An introductionBig Data 101 - An introduction
Big Data 101 - An introductionNeeraj Tewari
 
Hacking Predictive Modeling - RoadSec 2018
Hacking Predictive Modeling - RoadSec 2018Hacking Predictive Modeling - RoadSec 2018
Hacking Predictive Modeling - RoadSec 2018
HJ van Veen
 
The Magical Art of Extracting Meaning From Data
The Magical Art of Extracting Meaning From DataThe Magical Art of Extracting Meaning From Data
The Magical Art of Extracting Meaning From Data
lmrei
 
VSSML18. Clustering and Latent Dirichlet Allocation
VSSML18. Clustering and Latent Dirichlet AllocationVSSML18. Clustering and Latent Dirichlet Allocation
VSSML18. Clustering and Latent Dirichlet Allocation
BigML, Inc
 
2023-08-22 CoLLAs Tutorial - Beyond CIL.pdf
2023-08-22 CoLLAs Tutorial - Beyond CIL.pdf2023-08-22 CoLLAs Tutorial - Beyond CIL.pdf
2023-08-22 CoLLAs Tutorial - Beyond CIL.pdf
Vincenzo Lomonaco
 
DataEngConf: Building the Next New York Times Recommendation Engine
DataEngConf: Building the Next New York Times Recommendation EngineDataEngConf: Building the Next New York Times Recommendation Engine
DataEngConf: Building the Next New York Times Recommendation Engine
Hakka Labs
 
Introduction ML - Introduçao a Machine learning
Introduction ML - Introduçao a Machine learningIntroduction ML - Introduçao a Machine learning
Introduction ML - Introduçao a Machine learning
julianaantunes58
 
Information to Wisdom: Commonsense Knowledge Extraction and Compilation - Part 3
Information to Wisdom: Commonsense Knowledge Extraction and Compilation - Part 3Information to Wisdom: Commonsense Knowledge Extraction and Compilation - Part 3
Information to Wisdom: Commonsense Knowledge Extraction and Compilation - Part 3
Dr. Aparna Varde
 
Iterative knowledge extraction from social networks. The Web Conference 2018
Iterative knowledge extraction from social networks. The Web Conference 2018Iterative knowledge extraction from social networks. The Web Conference 2018
Iterative knowledge extraction from social networks. The Web Conference 2018
Marco Brambilla
 
Improving Semantic Search Using Query Log Analysis
Improving Semantic Search Using Query Log AnalysisImproving Semantic Search Using Query Log Analysis
Improving Semantic Search Using Query Log Analysis
Stuart Wrigley
 
BSSML17 - Clusters
BSSML17 - ClustersBSSML17 - Clusters
BSSML17 - Clusters
BigML, Inc
 
Foundations of Machine Learning - StampedeCon AI Summit 2017
Foundations of Machine Learning - StampedeCon AI Summit 2017Foundations of Machine Learning - StampedeCon AI Summit 2017
Foundations of Machine Learning - StampedeCon AI Summit 2017
StampedeCon
 

Similar to Nearest Neighbor And Decision Tree - NN DT (20)

Future of AI-powered automation in business
Future of AI-powered automation in businessFuture of AI-powered automation in business
Future of AI-powered automation in business
 
Introduction to ML.NET
Introduction to ML.NETIntroduction to ML.NET
Introduction to ML.NET
 
rsec2a-2016-jheaton-morning
rsec2a-2016-jheaton-morningrsec2a-2016-jheaton-morning
rsec2a-2016-jheaton-morning
 
07-Classification.pptx
07-Classification.pptx07-Classification.pptx
07-Classification.pptx
 
Core Methods In Educational Data Mining
Core Methods In Educational Data MiningCore Methods In Educational Data Mining
Core Methods In Educational Data Mining
 
Machine Learning ICS 273A
Machine Learning ICS 273AMachine Learning ICS 273A
Machine Learning ICS 273A
 
Machine Learning ICS 273A
Machine Learning ICS 273AMachine Learning ICS 273A
Machine Learning ICS 273A
 
Big Data Real Time Training in Chennai
Big Data Real Time Training in ChennaiBig Data Real Time Training in Chennai
Big Data Real Time Training in Chennai
 
Big Data 101 - An introduction
Big Data 101 - An introductionBig Data 101 - An introduction
Big Data 101 - An introduction
 
Hacking Predictive Modeling - RoadSec 2018
Hacking Predictive Modeling - RoadSec 2018Hacking Predictive Modeling - RoadSec 2018
Hacking Predictive Modeling - RoadSec 2018
 
The Magical Art of Extracting Meaning From Data
The Magical Art of Extracting Meaning From DataThe Magical Art of Extracting Meaning From Data
The Magical Art of Extracting Meaning From Data
 
VSSML18. Clustering and Latent Dirichlet Allocation
VSSML18. Clustering and Latent Dirichlet AllocationVSSML18. Clustering and Latent Dirichlet Allocation
VSSML18. Clustering and Latent Dirichlet Allocation
 
2023-08-22 CoLLAs Tutorial - Beyond CIL.pdf
2023-08-22 CoLLAs Tutorial - Beyond CIL.pdf2023-08-22 CoLLAs Tutorial - Beyond CIL.pdf
2023-08-22 CoLLAs Tutorial - Beyond CIL.pdf
 
DataEngConf: Building the Next New York Times Recommendation Engine
DataEngConf: Building the Next New York Times Recommendation EngineDataEngConf: Building the Next New York Times Recommendation Engine
DataEngConf: Building the Next New York Times Recommendation Engine
 
Introduction ML - Introduçao a Machine learning
Introduction ML - Introduçao a Machine learningIntroduction ML - Introduçao a Machine learning
Introduction ML - Introduçao a Machine learning
 
Information to Wisdom: Commonsense Knowledge Extraction and Compilation - Part 3
Information to Wisdom: Commonsense Knowledge Extraction and Compilation - Part 3Information to Wisdom: Commonsense Knowledge Extraction and Compilation - Part 3
Information to Wisdom: Commonsense Knowledge Extraction and Compilation - Part 3
 
Iterative knowledge extraction from social networks. The Web Conference 2018
Iterative knowledge extraction from social networks. The Web Conference 2018Iterative knowledge extraction from social networks. The Web Conference 2018
Iterative knowledge extraction from social networks. The Web Conference 2018
 
Improving Semantic Search Using Query Log Analysis
Improving Semantic Search Using Query Log AnalysisImproving Semantic Search Using Query Log Analysis
Improving Semantic Search Using Query Log Analysis
 
BSSML17 - Clusters
BSSML17 - ClustersBSSML17 - Clusters
BSSML17 - Clusters
 
Foundations of Machine Learning - StampedeCon AI Summit 2017
Foundations of Machine Learning - StampedeCon AI Summit 2017Foundations of Machine Learning - StampedeCon AI Summit 2017
Foundations of Machine Learning - StampedeCon AI Summit 2017
 

More from julianaantunes58

Curso Meninas Digitais Dezembro Introdução a Informatica
Curso Meninas Digitais Dezembro Introdução a InformaticaCurso Meninas Digitais Dezembro Introdução a Informatica
Curso Meninas Digitais Dezembro Introdução a Informatica
julianaantunes58
 
algoritmos geneticos teresa b. lugemir
algoritmos geneticos teresa b.   lugemiralgoritmos geneticos teresa b.   lugemir
algoritmos geneticos teresa b. lugemir
julianaantunes58
 
Ambientação Digital e EaD - Unidade 1.pdf
Ambientação Digital e EaD - Unidade 1.pdfAmbientação Digital e EaD - Unidade 1.pdf
Ambientação Digital e EaD - Unidade 1.pdf
julianaantunes58
 
Unidade 3 - Ambientação Digital e EaD.pdf
Unidade 3 - Ambientação Digital e EaD.pdfUnidade 3 - Ambientação Digital e EaD.pdf
Unidade 3 - Ambientação Digital e EaD.pdf
julianaantunes58
 
chapter 11 HANDS ON MACHINE LEARNING SCIKIT
chapter 11 HANDS ON MACHINE LEARNING SCIKITchapter 11 HANDS ON MACHINE LEARNING SCIKIT
chapter 11 HANDS ON MACHINE LEARNING SCIKIT
julianaantunes58
 
INTELIGENCIA ARTIFICIAL 9 0anosUSP-Completo
INTELIGENCIA ARTIFICIAL 9 0anosUSP-CompletoINTELIGENCIA ARTIFICIAL 9 0anosUSP-Completo
INTELIGENCIA ARTIFICIAL 9 0anosUSP-Completo
julianaantunes58
 

More from julianaantunes58 (6)

Curso Meninas Digitais Dezembro Introdução a Informatica
Curso Meninas Digitais Dezembro Introdução a InformaticaCurso Meninas Digitais Dezembro Introdução a Informatica
Curso Meninas Digitais Dezembro Introdução a Informatica
 
algoritmos geneticos teresa b. lugemir
algoritmos geneticos teresa b.   lugemiralgoritmos geneticos teresa b.   lugemir
algoritmos geneticos teresa b. lugemir
 
Ambientação Digital e EaD - Unidade 1.pdf
Ambientação Digital e EaD - Unidade 1.pdfAmbientação Digital e EaD - Unidade 1.pdf
Ambientação Digital e EaD - Unidade 1.pdf
 
Unidade 3 - Ambientação Digital e EaD.pdf
Unidade 3 - Ambientação Digital e EaD.pdfUnidade 3 - Ambientação Digital e EaD.pdf
Unidade 3 - Ambientação Digital e EaD.pdf
 
chapter 11 HANDS ON MACHINE LEARNING SCIKIT
chapter 11 HANDS ON MACHINE LEARNING SCIKITchapter 11 HANDS ON MACHINE LEARNING SCIKIT
chapter 11 HANDS ON MACHINE LEARNING SCIKIT
 
INTELIGENCIA ARTIFICIAL 9 0anosUSP-Completo
INTELIGENCIA ARTIFICIAL 9 0anosUSP-CompletoINTELIGENCIA ARTIFICIAL 9 0anosUSP-Completo
INTELIGENCIA ARTIFICIAL 9 0anosUSP-Completo
 

Recently uploaded

From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
Product School
 
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Product School
 
GraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge GraphGraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge Graph
Guy Korland
 
Search and Society: Reimagining Information Access for Radical Futures
Search and Society: Reimagining Information Access for Radical FuturesSearch and Society: Reimagining Information Access for Radical Futures
Search and Society: Reimagining Information Access for Radical Futures
Bhaskar Mitra
 
Accelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish CachingAccelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish Caching
Thijs Feryn
 
КАТЕРИНА АБЗЯТОВА «Ефективне планування тестування ключові аспекти та практ...
КАТЕРИНА АБЗЯТОВА  «Ефективне планування тестування  ключові аспекти та практ...КАТЕРИНА АБЗЯТОВА  «Ефективне планування тестування  ключові аспекти та практ...
КАТЕРИНА АБЗЯТОВА «Ефективне планування тестування ключові аспекти та практ...
QADay
 
Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*
Frank van Harmelen
 
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMsTo Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
Paul Groth
 
UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3
DianaGray10
 
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
UiPathCommunity
 
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Tobias Schneck
 
JMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and GrafanaJMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and Grafana
RTTS
 
Connector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a buttonConnector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a button
DianaGray10
 
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdfFIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance
 
Knowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and backKnowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and back
Elena Simperl
 
"Impact of front-end architecture on development cost", Viktor Turskyi
"Impact of front-end architecture on development cost", Viktor Turskyi"Impact of front-end architecture on development cost", Viktor Turskyi
"Impact of front-end architecture on development cost", Viktor Turskyi
Fwdays
 
When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...
Elena Simperl
 
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Jeffrey Haguewood
 
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
Product School
 
ODC, Data Fabric and Architecture User Group
ODC, Data Fabric and Architecture User GroupODC, Data Fabric and Architecture User Group
ODC, Data Fabric and Architecture User Group
CatarinaPereira64715
 

Recently uploaded (20)

From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
 
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
 
GraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge GraphGraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge Graph
 
Search and Society: Reimagining Information Access for Radical Futures
Search and Society: Reimagining Information Access for Radical FuturesSearch and Society: Reimagining Information Access for Radical Futures
Search and Society: Reimagining Information Access for Radical Futures
 
Accelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish CachingAccelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish Caching
 
КАТЕРИНА АБЗЯТОВА «Ефективне планування тестування ключові аспекти та практ...
КАТЕРИНА АБЗЯТОВА  «Ефективне планування тестування  ключові аспекти та практ...КАТЕРИНА АБЗЯТОВА  «Ефективне планування тестування  ключові аспекти та практ...
КАТЕРИНА АБЗЯТОВА «Ефективне планування тестування ключові аспекти та практ...
 
Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*
 
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMsTo Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
 
UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3
 
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
 
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
 
JMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and GrafanaJMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and Grafana
 
Connector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a buttonConnector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a button
 
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdfFIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdf
 
Knowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and backKnowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and back
 
"Impact of front-end architecture on development cost", Viktor Turskyi
"Impact of front-end architecture on development cost", Viktor Turskyi"Impact of front-end architecture on development cost", Viktor Turskyi
"Impact of front-end architecture on development cost", Viktor Turskyi
 
When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...
 
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
 
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
 
ODC, Data Fabric and Architecture User Group
ODC, Data Fabric and Architecture User GroupODC, Data Fabric and Architecture User Group
ODC, Data Fabric and Architecture User Group
 

Nearest Neighbor And Decision Tree - NN DT

  • 1. Nearest Neighbor, Decision Tree Danna Gurari University of Texas at Austin Spring 2021 https://www.ischool.utexas.edu/~dannag/Courses/IntroToMachineLearning/CourseContent.html
  • 2. Review • Last week: • Binary classification applications • Evaluating classification models • Biological neurons: inspiration • Artificial neurons: Perceptron & Adaline • Gradient descent • Assignments (Canvas) • Lab assignment 1 due tonight • Problem set 3 due next week • Questions?
  • 3. Today’s Topics • Multiclass classification applications and evaluating models • Motivation for new ML era: need non-linear models • Nearest neighbor classification • Decision tree classification • Parametric versus non-parametric models
  • 4. Today’s Topics • Multiclass classification applications and evaluating models • Motivation for new ML era: need non-linear models • Nearest neighbor classification • Decision tree classification • Parametric versus non-parametric models
  • 5. Today’s Focus: Multiclass Classification Predict 3+ classes
  • 6. Multiclass Classification: Cancer Diagnosis https://news.stanford.edu/2017/01/25/artificial-intelligence-used-identify-skin-cancer/ https://www.nature.com/articles/nature21056 Doctor-level performance in recognizing 2,032 diseases
  • 7. Multiclass Classification: Face Recognition Security https://www.anyvision.co/ Visual assistance for people with vision impairments Social media
  • 8. Multiclass Classification: Shopping Camfind (https://venturebeat.com/2014/09/24/camfind-app- brings-accurate-visual-search-to-google-glass-exclusive/)
  • 9. Multiclass Classification: Song Recognition https://gbksoft.com/blog/how-to-make-a-shazam-like-app/
  • 10. Goal: Design Models that Generalize Well to New, Previously Unseen Examples Input: Label: Cat Dog Person
  • 11. 1. Split data into a “training set” and “ ” Goal: Design Models that Generalize Well to New, Previously Unseen Examples Training Data Test Data Input: Label: Cat Dog Person
  • 12. Training Data Test Data Goal: Design Models that Generalize Well to New, Previously Unseen Examples Input: Label: Cat Dog 2. Train model on “training set” to try to minimize prediction error on it Person
  • 13. Prediction Model Goal: Design Models that Generalize Well to New, Previously Unseen Examples 3. Apply trained model on “ ” to measure generalization error ? ? ? Input: Label:
  • 14. Prediction Model Goal: Design Models that Generalize Well to New, Previously Unseen Examples 3. Apply trained model on “ ” to measure generalization error Cat ? ? Input: Label:
  • 15. Prediction Model Goal: Design Models that Generalize Well to New, Previously Unseen Examples 3. Apply trained model on “ ” to measure generalization error Cat Giraffe ? Input: Label:
  • 16. Prediction Model Goal: Design Models that Generalize Well to New, Previously Unseen Examples 3. Apply trained model on “ ” to measure generalization error Cat Giraffe Person Input: Label:
  • 17. Prediction Model Goal: Design Models that Generalize Well to New, Previously Unseen Examples 3. Apply trained model on “ ” to measure generalization error Cat Giraffe Person Input: Label: Cat Pumpkin Human Annotated Label: Person
  • 18. • Accuracy: percentage correct • Precision: • Recall: Evaluation Methods • Confusion matrix; e.g., http://gabrielelanaro.github.io/blog/2016/0 2/03/multiclass-evaluation-measures.html
  • 19. Today’s Topics • Multiclass classification applications and evaluating models • Motivation for new ML era: need non-linear models • Nearest neighbor classification • Decision tree classification • Parametric versus non-parametric models
  • 20. Recall: Historical Context of ML Models 1613 Human “Computers” 1945 First programmable machine Turing Test 1959 Machine Learning 1956 Early 1800 1974 1980 1987 1993 1rst AI Winter 2nd AI Winter Linear regression 1957 Perceptron 1950 AI
  • 21. Recall: Vision for Perceptron Frank Rosenblatt (Psychologist) “[The perceptron is] the embryo of an electronic computer that [the Navy] expects will be able to walk, talk, see, write, reproduce itself and be conscious of its existence…. [It] is expected to be finished in about a year at a cost of $100,000.” 1958 New York Times article: https://www.nytimes.com/1958/07/08/archives/new- navy-device-learns-by-doing-psychologist-shows-embryo-of.html https://en.wikipedia.org/wiki/Frank_Rosenblatt
  • 22. “Perceptrons” Book: Instigator for “AI Winter” 1613 Human “Computers” 1945 First programmable machine Turing Test 1959 Machine Learning 1956 Early 1800 1974 1980 1987 1993 1rst AI Winter 2nd AI Winter Linear regression 1957 Perceptron 1950 AI 1969 Minsky & Papert publish a book called “Perceptrons” to discuss its limitations
  • 23. Perceptron Limitation: XOR Problem XOR = “Exclusive Or” - Input: two binary values x1 and x2 - Output: - 1, when exactly one input equals 1 - 0, otherwise x1 x2 x1 XOR x2 0 0 0 1 1 0 1 1 ? ? ? ?
  • 24. Perceptron Limitation: XOR Problem XOR = “Exclusive Or” - Input: two binary values x1 and x2 - Output: - 1, when exactly one input equals 1 - 0, otherwise x1 x2 x1 XOR x2 0 0 0 1 1 0 1 1 ? ? ? ?
  • 25. Perceptron Limitation: XOR Problem XOR = “Exclusive Or” - Input: two binary values x1 and x2 - Output: - 1, when exactly one input equals 1 - 0, otherwise x1 x2 x1 XOR x2 0 0 0 1 1 0 1 1 0 ? ? ?
  • 26. Perceptron Limitation: XOR Problem XOR = “Exclusive Or” - Input: two binary values x1 and x2 - Output: - 1, when exactly one input equals 1 - 0, otherwise x1 x2 x1 XOR x2 0 0 0 1 1 0 1 1 0 1 ? ?
  • 27. Perceptron Limitation: XOR Problem XOR = “Exclusive Or” - Input: two binary values x1 and x2 - Output: - 1, when exactly one input equals 1 - 0, otherwise x1 x2 x1 XOR x2 0 0 0 1 1 0 1 1 0 1 1 ?
  • 28. Perceptron Limitation: XOR Problem XOR = “Exclusive Or” - Input: two binary values x1 and x2 - Output: - 1, when exactly one input equals 1 - 0, otherwise x1 x2 x1 XOR x2 0 0 0 1 1 0 1 1 0 1 1 0
  • 29. Perceptron Limitation: XOR Problem x1 x2 x1 XOR x2 0 0 0 1 1 0 1 1 0 1 1 0 How to separate 1s from 0s with a perceptron (linear function)?
  • 30. Perceptron Limitation: XOR Problem Frank Rosenblatt (Psychologist) “[The perceptron is] the embryo of an electronic computer that [the Navy] expects will be able to walk, talk, see, write, reproduce itself and be conscious of its existence…. [It] is expected to be finished in about a year at a cost of $100,000.” 1958 New York Times article: https://www.nytimes.com/1958/07/08/archives/new- navy-device-learns-by-doing-psychologist-shows-embryo-of.html How can a machine be “conscious” when it can’t solve the XOR problem?
  • 31. How to Overcome Limitation? Non-linear models: e.g., • Linear regression: perform non-linear transformation of input features • K-nearest neighbors • Decision trees • And many more to follow…
  • 32. Today’s Topics • Multiclass classification applications and evaluating models • Motivation for new ML era: need non-linear models • Nearest neighbor classification • Decision tree classification • Parametric versus non-parametric model
  • 33. Historical Context of ML Models 1613 Human “Computers” 1945 First programmable machine Turing Test 1959 Machine Learning 1956 Early 1800 1974 1980 1987 1993 1rst AI Winter 2nd AI Winter Linear regression 1957 Perceptron 1950 AI 1951 K-Nearest Neighbors E. Fix and J.L. Hodges. Discriminatory Analysis, Nonparametric Discrimination: Consistency Properties. Technical Report 4, USAF School of Aviation Medicine, Randolph Field, 1951
  • 34. Recall: Instance-Based Learning Memorizes examples and uses a similarity measure to those examples to make predictions Figure Source: Hands-on Machine Learning with Scikit-Learn & TensorFlow, Aurelien Geron
  • 35. e.g., Predict What Scene The Image Shows Input Output Kitchen Database Search
  • 36. Input: Label: 1. Create Large Database e.g., Predict What Scene The Image Shows Kitchen Coast Store
  • 37. e.g., Predict What Scene The Image Shows 2. Organize Database so Visually Similar Examples Neighbor Each Other Adapted from slides by Antonio Torralba
  • 38. 3. Predict Class Using Label of Most Similar Example(s) in the Database Input: Label: ? Adapted from slides by Antonio Torralba e.g., Predict What Scene The Image Shows
  • 39. 3. Predict Class Using Label of Most Similar Example(s) in the Database e.g., Predict What Scene The Image Shows Input: Label: Cathedral Adapted from slides by Antonio Torralba
  • 40. 3. Predict Class Using Label of Most Similar Example(s) in the Database e.g., Predict What Scene The Image Shows Input: Label: ? Adapted from slides by Antonio Torralba
  • 41. 3. Predict Class Using Label of Most Similar Example(s) in the Database e.g., Predict What Scene The Image Shows Input: Label: Highway Adapted from slides by Antonio Torralba
  • 43. K-Nearest Neighbor Classification Training Data: • Given: • x = {10.5, 3} • Predict: • When k = 1: • Class 1 • When k = 6: • How to avoid ties? • Set “k” to odd value for binary problems • Prefer “closer” neighbors Novel Examples: https://github.com/amueller/introduction_to_ml_with_python/blob/master/02-supervised-learning.ipynb
  • 44. K-Nearest Neighbors: Measuring Distance How to measure distance between a novel example and test example? • Commonly use, Minkowski distance: • When p = 2, Euclidean distance: • When p = 1, Manhattan distance: https://en.wikipedia.org/wiki/Taxicab_geometry
  • 45. K-Nearest Neighbors: Measuring Distance • Given: • x = {10.5, 3} • k =1: Euclidean Distance Training Data: https://github.com/amueller/introduction_to_ml_with_python/blob/master/02-supervised-learning.ipynb
  • 46. K-Nearest Neighbors: Measuring Distance • Given: • x = {10.5, 3} • k =1: Euclidean Distance Training Data: Note: May want to scale the data to the same range first https://github.com/amueller/introduction_to_ml_with_python/blob/master/02-supervised-learning.ipynb
  • 47. K-Nearest Neighbors: Measuring Distance • Given: • x = {10.5, 3} • k =1: Manhattan Distance Training Data: https://github.com/amueller/introduction_to_ml_with_python/blob/master/02-supervised-learning.ipynb
  • 48. K-Nearest Neighbors: Measuring Distance • Given: • x = {10.5, 3} • k =1: Manhattan Distance Training Data: Note: May want to scale the data to the same range first https://github.com/amueller/introduction_to_ml_with_python/blob/master/02-supervised-learning.ipynb
  • 49. K-Nearest Neighbors: Measuring Distance How to measure distance for novel categorical test data? • e.g., Train = blue • e.g., Test = blue; identical values so assign distance 0 • e.g., Test = white; different values so assign distance 1 https://github.com/amueller/introduction_to_ml_with_python/blob/master/02-supervised-learning.ipynb
  • 50. K-Nearest Neighbor Classification: What “K”? When K=1: When K=3: https://github.com/amueller/introduction_to_ml_with_python/blob/master/02-supervised-learning.ipynb
  • 51. K-Nearest Neighbor Classification: What “K”? When K=1: When K=3: Given that predictions can change with different values of “k”, what value of “k” should one use? https://github.com/amueller/introduction_to_ml_with_python/blob/master/02-supervised-learning.ipynb
  • 52. K-Nearest Neighbor Classification: What “K”? What happens to the decision boundary as “k” grows? https://github.com/amueller/introduction_to_ml_with_python/blob/master/02-supervised-learning.ipynb
  • 53. K-Nearest Neighbor Classification: What “K”? What happens when “k” equals the training data size? https://github.com/amueller/introduction_to_ml_with_python/blob/master/02-supervised-learning.ipynb
  • 54. K-Nearest Neighbor Classification: What “K”? (Higher Model Complexity) (Lower Model Complexity) https://github.com/amueller/introduction_to_ml_with_python/blob/master/02-supervised-learning.ipynb
  • 55. K-Nearest Neighbor Classification: What “K”? At what value for “k” is model overfitting the most? k = 1 https://github.com/amueller/introduction_to_ml_with_python/blob/master/02-supervised-learning.ipynb
  • 56. K-Nearest Neighbor Classification: What “K”? What is the best value for “k”? k = 6 https://github.com/amueller/introduction_to_ml_with_python/blob/master/02-supervised-learning.ipynb
  • 57. K-Nearest Neighbor: How to Use to Predict More than Two Classes? • Tally number of examples belonging to each class and again choose the majority vote winners
  • 58. What are Strengths of KNN? • Adapts as new data is added • Training is relatively fast • Easy to understand
  • 59. What are Weaknesses of KNN? • For large datasets, requires large storage space • For large datasets, this approach can be very slow or infeasible • Note: can improve speed with efficient data structures such as KD-trees • Vulnerable to noisy/irrelevant examples • Sensitive to imbalanced datasets where more frequent class will dominate majority voting
  • 60. Today’s Topics • Multiclass classification applications and evaluating models • Motivation for new ML era: need non-linear models • Nearest neighbor classification • Decision tree classification • Parametric versus non-parametric models
  • 61. Historical Context of ML Models 1613 Human “Computers” 1945 First programmable machine Turing Test 1959 Machine Learning 1956 Early 1800 1974 1980 1987 1993 1rst AI Winter 2nd AI Winter Linear regression 1957 Perceptron 1950 AI Quinlan, J. R. (1986). Induction of decision trees. Machine learning, 1(1), 81-106. Hunt, E.B. (1962). Concept learning: An information processing problem. New York: Wiley. Quinlan, J.R. (1979). Discovering rules by induction from large collections of examples. In D. Michie (Ed.), Expert systems in the micro electronic age. Edinburgh University Press. K-nearest neighbors 1962 Concept Learning 1979 ID3
  • 66. Decision Tree: Generic Structure • Goal: predict class label (aka: class, ground truth) • Representation: Tree • Internal (non-leaf) nodes = tests an attribute • Branches = attribute value • Leaf = classification label
  • 67. Decision Tree: Generic Structure • Goal: predict class label (aka: class, ground truth) • Representation: Tree • Internal (non-leaf) nodes = tests an attribute • Branches = attribute value • Leaf = classification label
  • 68. Decision Tree: Generic Structure • Goal: predict class label • Representation: Tree • Internal (non-leaf) nodes = tests an attribute • Branches = attribute value • Leaf = classification label How can a machine learn a decision tree?
  • 69. Decision Tree: Generic Learning Algorithm • Greedy approach (NP complete problem) aka – “data” aka – “feature names”
  • 70. Decision Tree: Generic Learning Algorithm • Greedy approach (NP complete problem)
  • 71. Decision Tree: Generic Learning Algorithm • Greedy approach (NP complete problem) Key Decision
  • 72. Next “Best” Attribute: Use Entropy Number of classes Fraction of examples belonging to class i Encodes in bits In a binary setting, • Entropy is 0 when fraction of examples belonging to a class is 0 or 1 • Entropy is 1 when fraction of examples belonging to each class is 0.5 https://www.logcalculator.net/log-2
  • 73. Next “Best” Attribute: Use Entropy e.g., Will you like a movie? • Let C1 = “Yes” and C2 = “No” • Current entropy?
  • 74. Next “Best” Attribute: Use Entropy e.g., Will you like a movie? • Let C1 = “Yes” and C2 = “No” • Current entropy?
  • 75. Next “Best” Attribute: Use Entropy e.g., Will you like a movie? • Let C1 = “Yes” and C2 = “No” • Current entropy?
  • 76. Next “Best” Attribute: Use Entropy e.g., Will you like a movie? • Let C1 = “Yes” and C2 = “No” • Entropy if we split on “Type”? • Left tree: “Comedy” = ?
  • 77. Next “Best” Attribute: Use Entropy e.g., Will you like a movie? • Let C1 = “Yes” and C2 = “No” • Entropy if we split on “Type”? • Left tree: “Comedy” = ?
  • 78. Next “Best” Attribute: Use Entropy e.g., Will you like a movie? • Let C1 = “Yes” and C2 = “No” • Entropy if we split on “Type”? • Left tree: “Comedy” = 0.92 • Right tree: “Drama” = ?
  • 79. Next “Best” Attribute: Use Entropy e.g., Will you like a movie? • Let C1 = “Yes” and C2 = “No” • Entropy if we split on “Type”? • Left tree: “Comedy” = 0.92 • Right tree: “Drama” = ?
  • 80. Next “Best” Attribute: Use Entropy e.g., Will you like a movie? • Let C1 = “Yes” and C2 = “No” • Entropy if we split on “Type”? • Left tree: “Comedy” = 0.92 • Right tree: “Drama” = 0.97 • Information gain by split on “Type”?
  • 81. Next “Best” Attribute: Use Entropy e.g., Will you like a movie? • Let C1 = “Yes” and C2 = “No” • Entropy if we split on “Length”? • Left tree: “Short” = ?
  • 82. Next “Best” Attribute: Use Entropy e.g., Will you like a movie? • Let C1 = “Yes” and C2 = “No” • Entropy if we split on “Length”? • Left tree: “Short” = 0.92 • Middle tree: “Medium” = ?
  • 83. Next “Best” Attribute: Use Entropy e.g., Will you like a movie? • Let C1 = “Yes” and C2 = “No” • Entropy if we split on “Length”? • Left tree: “Short” = 0.92 • Middle tree: “Medium” = 0.82 • Right tree: “Long” = ?
  • 84. Next “Best” Attribute: Use Entropy e.g., Will you like a movie? • Let C1 = “Yes” and C2 = “No” • Entropy if we split on “Length”? • Left tree: “Short” = 0.92 • Middle tree: “Medium” = 0.82 • Right tree: “Long” = 0 • Information gain by split on “Length”?
  • 85. Next “Best” Attribute: Use Entropy e.g., Will you like a movie? • Let C1 = “Yes” and C2 = “No” • Entropy if we split on “IMDb Rating”? • Order attribute values: {4.5, 5.1, 6.9, 7.2, 7.5, 8.0, 8.3, 9.3} Split at midpoint and measure entropy
  • 86. Next “Best” Attribute: Use Entropy e.g., Will you like a movie? • Let C1 = “Yes” and C2 = “No” • Entropy if we split on “IMDb Rating”? • Order attribute values: {4.5, 5.1, 6.9, 7.2, 7.5, 8.0, 8.3, 9.3} Split at midpoint and measure entropy
  • 87. Next “Best” Attribute: Use Entropy e.g., Will you like a movie? • Let C1 = “Yes” and C2 = “No” • Entropy if we split on “IMDb Rating”? • Order attribute values: {4.5, 5.1, 6.9, 7.2, 7.5, 8.0, 8.3, 9.3} Split at midpoint and measure entropy
  • 88. Next “Best” Attribute: Use Entropy e.g., Will you like a movie? • Let C1 = “Yes” and C2 = “No” • Entropy if we split on “IMDb Rating”? • Order attribute values: {4.5, 5.1, 6.9, 7.2, 7.5, 8.0, 8.3, 9.3} Split at midpoint and measure entropy
  • 89. Next “Best” Attribute: Use Entropy e.g., Will you like a movie? • Let C1 = “Yes” and C2 = “No” • Entropy if we split on “IMDb Rating”? • Order attribute values: {4.5, 5.1, 6.9, 7.2, 7.5, 8.0, 8.3, 9.3} Split at midpoint and measure entropy
  • 90. Next “Best” Attribute: Use Entropy e.g., Will you like a movie? • Let C1 = “Yes” and C2 = “No” • Entropy if we split on “IMDb Rating”? • Order attribute values: {4.5, 5.1, 6.9, 7.2, 7.5, 8.0, 8.3, 9.3} Split at midpoint and measure entropy
  • 91. Decision Tree: What is Our First Split? • Greedy approach (NP complete problem) IG = 0 IG = 0.19 IG = 0.95
  • 92. Decision Tree: What Tree Results? IMDb Rating <= 7.05 > 7.05 Recurse on this tree?
  • 93. Decision Tree: What Tree Results? No Recurse on this tree? IMDb Rating <= 7.05 > 7.05
  • 94. Decision Tree: What Tree Results? No Yes IMDb Rating <= 7.05 > 7.05
  • 95. Decision Tree: Generic Learning Algorithm • Greedy approach (NP complete problem) Key Decision • Entropy (maximize information gain) • Gini Index (used in CART algorithm) • Gain ratio (used in C4.5 algorithm) • Mean squared error • …
  • 96. Overfitting • At what tree size, does overfitting begin? http://www.alanfielding.co.uk/multivar/crt/dt_example_04.htm Evaluate on training data Evaluate on test data
  • 97. Regularization to Avoid Overfitting • Pruning • Pre-pruning: stop tree growth earlier • Post-pruning: prune tree afterwards https://www.clariba.com/blog/tech-20140811-decision-trees-with-sap-predictive-analytics-and-sap-hana-emilio-nieto
  • 98. Today’s Topics • Multiclass classification applications and evaluating models • Motivation for new ML era: need non-linear models • Nearest neighbor classification • Decision tree classification • Parametric versus non-parametric models
  • 99. Machine Learning Goal • Learn function that maps input features (X) to an output prediction (Y) Y = f(x)
  • 100. Machine Learning Goal • Learn function that maps input features (X) to an output prediction (Y) • Parametric model: has a fixed number of parameters Y = f(x)
  • 101. Machine Learning Goal • Learn function that maps input features (X) to an output prediction (Y) • Parametric model: has a fixed number of parameters • Non-parametric model: does not specify the number of parameters Y = f(x)
  • 102. Class Discussion • For each model, is it parametric or non-parametric? • Linear regression • K-nearest neighbors • What are advantages and disadvantages of parametric models? • What are advantages and disadvantages of non-parametric models?
  • 103. Today’s Topics • Multiclass classification applications and evaluating models • Motivation for new ML era: need non-linear models • Nearest neighbor classification • Decision tree classification • Parametric versus non-parametric models
  • 104. References Used for Today’s Material • Chapter 3 of Deep Learning book by Goodfellow et al. • http://www.cs.utoronto.ca/~fidler/teaching/2015/slides/CSC411/06_trees.pdf • http://www.cs.utoronto.ca/~fidler/teaching/2015/slides/CSC411/tutorial3_CrossVal-DTs.pdf • http://www.cs.cmu.edu/~epxing/Class/10701/slides/classification15.pdf