SlideShare a Scribd company logo
INTRO TO MACHINE
LEARNING
Justin Sebok
CONTENTS
What is machine learning?
Types of machine learning
Supervised learning and examples
Unsupervised learning and examples
WHAT IS MACHINE LEARNING?
Wikipedia: Machine Learning is a subfield of computer science which
gives computers the ability to learn without being explicitly
programmed.
WHAT IS MACHINE LEARNING?
Wikipedia: Machine Learning is a subfield of computer science which
gives computers the ability to learn without being explicitly
programmed.
WTF does that mean?!
WHAT IS MACHINE LEARNING?
Wikipedia: Machine Learning is a subfield of computer science which
gives computers the ability to learn without being explicitly
programmed.
WTF does that mean?!
Basically, Machine Learning involves using some “algorithms” which
learn using data to improve their predictions of something using
patterns in the data.
Data Algorith
m
Prediction
s
WHAT IS MACHINE LEARNING?
“… without being explicitly programmed”
This is what makes machine learning so powerful. Rather than
requiring specific instructions like in traditional computing, machine
learning allows the computers to improve their predictions just using
the data inputs.
TWO MAIN TYPES OF MACHINE
LEARNING ALGORITHM
Supervised Learning: We know what we are trying to predict. We
use some examples that we (and the model) know the answer to, to
“train” our model. It can then generate predictions to examples we
don’t know the answer to.
Examples: Predict the price a house will sell at. Identify the gender of
someone based on a photograph.
Unsupervised Learning: We don’t know what we are trying to
predict. We are trying to identify some naturally occurring patterns in
the data which may be informative.
Examples: Try to identify “clusters” of customers based on data we
have on them
TWO MAIN TYPES OF MACHINE
LEARNING ALGORITHM
Supervised Learning: We know what we are trying to predict. We
use some examples that we (and the model) know the answer to, to
“train” our model. It can then generate predictions to examples we
don’t know the answer to.
Examples: Predict the price a house will sell at. Identify the gender of
someone based on a photograph.
Unsupervised Learning: We don’t know what we are trying to
predict. We are trying to identify some naturally occurring patterns in
the data which may be informative.
Examples: Try to identify “clusters” of customers based on data we
have on them
TWO MAIN TYPES OF MACHINE
LEARNING ALGORITHM
Supervised Learning: We know what we are trying to predict. We
use some examples that we (and the model) know the answer to, to
“train” our model. It can then generate predictions to examples we
don’t know the answer to.
Examples: Predict the price a house will sell at. Identify the gender of
someone based on a photograph.
Unsupervised Learning: We don’t know what we are trying to
predict. We are trying to identify some naturally occurring patterns in
the data which may be informative.
Examples: Try to identify “clusters” of customers based on data we
have on them
TYPES OF SUPERVISED LEARNING
Supervised learning can be further broken down based on two
possible types of problem they may be trying to solve.
Classification Problems: These are problems where there is a finite
and countable number of possible solutions. There may be as few as
2 or as many as 1000+ possible solutions, but as long as we can
identify and count them all this doesn’t matter.
Examples: Identify the colour seen in a picture.
Regression Problems: These are problems where the feature we
are trying to predict is a number on a continuous scale.
Examples: Predict someone’s height.
TYPES OF SUPERVISED LEARNING
Supervised learning can be further broken down based on two
possible types of problem they may be trying to solve.
Classification Problems: These are problems where there is a finite
and countable number of possible solutions. These are categories or
classes. There may be as few as 2 or as many as 1000+ possible
solutions, but as long as we can identify and count them all this
doesn’t matter.
Examples: Identify plant species.
Regression Problems: These are problems where the feature we
are trying to predict is a number on a continuous scale.
Examples: Predict someone’s height.
TYPES OF SUPERVISED LEARNING
Supervised learning can be further broken down based on two
possible types of problem they may be trying to solve.
Classification Problems: These are problems where there is a finite
and countable number of possible solutions. These are categories or
classes. There may be as few as 2 or as many as 1000+ possible
solutions, but as long as we can identify and count them all this
doesn’t matter.
Examples: Identify plant species.
Regression Problems: These are problems where the feature we
are trying to predict is a number on a continuous scale.
Examples: Predict someone’s height.
INTRO TO A FEW SUPERVISED
LEARNING MODELS
Nearest Neighbours (Classification and Regression)
Decision Trees (Classification and Regression)
Linear Regression (Regression)
QUICK TERMINOLOGY
Observation: One of the “things” we are looking at. Could be a
person, a time, or a place.
Feature: Some aspect of the observation that we know. Could be a
person’s hair colour, the latitude and longitude of a city, or the
number of rooms a house has. May be denoted as x
Label: The feature of an observation which we are trying to predict.
For labelled observations, we already know the answer. May be
denoted as y
NEAREST NEIGHBOURS
Conceptually one of the simplest Machine Learning algorithms.
Uses the proximity or similarity of observations to make predictions
about them
NEAREST NEIGHBOURS
Conceptually one of the simplest Machine Learning algorithms.
Uses the proximity or similarity of observations to make predictions
about them
Method:
For the 1-Nearest Neighbour algorithm, find the closest labelled
observation to the unlabelled observation and apply the same label.
While it may seem very simple, it is often very effective!
It can be used for classification or regression
1 NEAREST NEIGHBOUR
PREDICTIONS
1 NEAREST NEIGHBOUR
PREDICTIONS
?
1 NEAREST NEIGHBOUR
PREDICTIONS
1 NEAREST NEIGHBOUR
PREDICTIONS
?
1 NEAREST NEIGHBOUR
PREDICTIONS
1 NEAREST NEIGHBOUR
PREDICTIONS
?
1 NEAREST NEIGHBOUR
PREDICTIONS
?
Here there is
some
ambiguity. We
are equal
distance from
both classes.
In this case, for
1-NN we would
just flip a coin
to choose a
class at random
1 NEAREST NEIGHBOUR
PREDICTIONS
?
6
3 0
8
6
1.5
5
1 NEAREST NEIGHBOUR
PREDICTIONS
6
3 0
8
6
1.5
5
8
K-NEAREST NEIGHBOURS
The problem with 1-Nearest Neighbours is that outliers may result in
incorrect predictions.
What is an outlier?
K-NEAREST NEIGHBOURS
The problem with 1-Nearest Neighbours is that outliers may result in
incorrect predictions.
What is an outlier?
Outlier is a point which is distant or very different from other
observations.
This may be a legitimate datapoint, or may be an example of “noise”
in the data
ANY OUTLIERS HERE?
ANY OUTLIERS HERE?
K-NEAREST NEIGHBOURS
The problem with 1-Nearest Neighbours is that outliers may result in
incorrect predictions.
How could we attempt to counteract this problem?
K-NEAREST NEIGHBOURS
The problem with 1-Nearest Neighbours is that outliers may result in
incorrect predictions.
How could we attempt to counteract this problem?
Why not try 2-Nearest Neighbours? Simply look at the 2 nearest
labelled examples and apply the label that they have.
K-NEAREST NEIGHBOURS
The problem with 1-Nearest Neighbours is that outliers may result in
incorrect predictions.
How could we attempt to counteract this problem?
Why not try 2-Nearest Neighbours? Simply look at the 2 nearest
labelled examples and apply the label that they have.
What happens when we have a tie?
K-NEAREST NEIGHBOURS
The problem with 1-Nearest Neighbours is that outliers may result in
incorrect predictions.
How could we attempt to counteract this problem?
Why not try 2-Nearest Neighbours? Simply look at the 2 nearest
labelled examples and apply the label that they have.
What happens when we have a tie?
Flip a coin…
Or we could use 3-Nearest Neighbours – No ties if we only have 2
classes
3-NEAREST NEIGHBOUR
PREDICTIONS
?
3-NEAREST NEIGHBOUR
PREDICTIONS
3-NEAREST NEIGHBOUR
PREDICTIONS
?
6
3 0
8
6
1.5
5
How can we
use the 3-
nearest
neighbour
approach in
regression?
3 NEAREST NEIGHBOUR
PREDICTIONS
6
3 0
8
6
1.5
5
4.67
SO WHAT K-VALUE DO I USE?
Choice of how many neighbours to use illustrates one of the main
trade-offs seen in machine learning:
Variance vs Bias
SO WHAT K-VALUE DO I USE?
Choice of how many neighbours to use illustrates one of the main
trade-offs seen in machine learning:
Variance vs Bias
Variance is the error in prediction we get from following our training
data too closely. We end up basing our predictions on “random noise”
in the data. If we choose too small a k-value, we may have a high
level of variance.
SO WHAT K-VALUE DO I USE?
Choice of how many neighbours to use illustrates one of the main
trade-offs seen in machine learning:
Variance vs Bias
Variance is the error in prediction we get from following our training
data too closely. We end up basing our predictions on “random noise”
in the data. If we choose too small a k-value, we may have a high
level of variance.
Bias is the error in prediction we get from using a simplified model to
predict very complex real-world things. If we choose too large a k-
value, we may have a high level of bias.
VARIANCE VS BIAS
One big part of machine learning is striking the right balance
between these two types of errors.
PROBLEM OF DIMENSIONALITY
1 Dimension: 5
observations to fill the
space
How many observations
do we need to fill 2
dimensions?
PROBLEM OF DIMENSIONALITY
1 Dimension: 5
observations to fill the
space
2 Dimensions: 25
observations to fill the
space
How many
observations do we
need to fill 3
dimensions?
PROBLEM OF DIMENSIONALITY
1 Dimension: 5
observations to fill the
space
2 Dimensions: 25
observations to fill the
space
3 Dimensions: 125
observations to fill the
space
As dimensionality increases, the
number of observations required
to “fill the space” increases
exponentially
DECISION TREES
Another quite simple Machine Learning technique.
We attempt to “cut” the space where our observations exist and
predict labels based on the sections our observations end up in.
DECISION TREES
We can display these cuts in the
form of a tree, hence the name.
Here is an example of such a
tree used for predicting height
Another quite simple Machine Learning technique.
We attempt to “cut” the space where our observations exist and
predict labels based on the sections our observations end up in.
DECISION TREE – “CUTTING THE
SPACE”
DECISION TREE - “CUTTING THE
SPACE”
This is an
example of
“cutting the
space”
DECISION TREES
Once we have cut our space into chunks, how do we generate
predictions in that area?
?
DECISION TREES
Once we have cut our space into chunks, how do we generate
predictions in that area?
DECISION TREES
Once we have cut our space into chunks, how do we generate
predictions in that area?
6 8
5
?
DECISION TREES
Once we have cut our space into chunks, how do we generate
predictions in that area?
6 8
5
6.33
DECISION TREE – WHERE DO WE
CUT?
Each cut should
improve the
prediction
accuracy by as
much as
possible
DECISION TREE – WHERE DO WE
CUT?
DECISION TREE – WHERE DO WE
CUT?
HOW COULD WE CUT A
“REGRESSION” DECISION TREE?
HOW COULD WE CUT A
“REGRESSION” DECISION TREE?
Very similar to the way classification trees are cut.
Each cut should reduce the difference between predicted output in an
area and the actual training output
BIAS AND VARIANCE IN DECISION
TREES
What would a decision tree with a high degree of bias look like?
What would a decision tree with a high degree of variance look like?
LINEAR REGRESSION
I will assume everyone knows the basics of linear regression.
While I won’t go into any of the maths, it is very useful to look at this
with the other models.
LINEAR REGRESSION
I will assume everyone knows the basics of linear regression.
What is a very basic definition of linear regression?
LINEAR REGRESSION
What would a linear regression line with a high degree of bias look
like?
What would a linear regression line with a high
degree of variance look like?
SPECTRUM OF SUPERVISED
LEARNING TECHNIQUES
No
assumptions
about data
Lots of
assumptions
about data
Where do the techniques we
have discussed fall on this
spectrum?
SPECTRUM OF SUPERVISED
LEARNING TECHNIQUES
No
assumptions
about data
Lots of
assumptions
about data
Not
computationa
lly efficient
Very
computationally
efficient
The more assumptions we can make
about our data, the more
computationally efficient we can make it
SPECTRUM OF SUPERVISED
LEARNING TECHNIQUES
No
assumptions
about data
Lots of
assumptions
about data
Not
computationa
lly efficient
Very
computationally
efficient
K-Nearest
Neighbour
s
Decisi
on
Trees
Linear
Regression

More Related Content

What's hot

Machine Learning Tutorial Part - 1 | Machine Learning Tutorial For Beginners ...
Machine Learning Tutorial Part - 1 | Machine Learning Tutorial For Beginners ...Machine Learning Tutorial Part - 1 | Machine Learning Tutorial For Beginners ...
Machine Learning Tutorial Part - 1 | Machine Learning Tutorial For Beginners ...
Simplilearn
 
Machine learning
Machine learningMachine learning
Machine learning
Dr Geetha Mohan
 
Supervised Machine Learning Techniques
Supervised Machine Learning TechniquesSupervised Machine Learning Techniques
Supervised Machine Learning Techniques
Tara ram Goyal
 
Introduction to Deep Learning
Introduction to Deep LearningIntroduction to Deep Learning
Introduction to Deep Learning
Oswald Campesato
 
Machine learning
Machine learningMachine learning
Machine learning
Rohit Kumar
 
Machine Learning and Real-World Applications
Machine Learning and Real-World ApplicationsMachine Learning and Real-World Applications
Machine Learning and Real-World Applications
MachinePulse
 
Machine Learning Algorithms | Machine Learning Tutorial | Data Science Algori...
Machine Learning Algorithms | Machine Learning Tutorial | Data Science Algori...Machine Learning Algorithms | Machine Learning Tutorial | Data Science Algori...
Machine Learning Algorithms | Machine Learning Tutorial | Data Science Algori...
Simplilearn
 
Applications in Machine Learning
Applications in Machine LearningApplications in Machine Learning
Applications in Machine Learning
Joel Graff
 
Genetic algorithms
Genetic algorithmsGenetic algorithms
Genetic algorithms
zamakhan
 
Lecture1 introduction to machine learning
Lecture1 introduction to machine learningLecture1 introduction to machine learning
Lecture1 introduction to machine learning
UmmeSalmaM1
 
Introduction to Machine Learning
Introduction to Machine LearningIntroduction to Machine Learning
Introduction to Machine Learning
Rahul Jain
 
Supervised Machine Learning With Types And Techniques
Supervised Machine Learning With Types And TechniquesSupervised Machine Learning With Types And Techniques
Supervised Machine Learning With Types And Techniques
SlideTeam
 
Machine Learning
Machine LearningMachine Learning
Machine Learning
Anastasia Jakubow
 
Feature Engineering in Machine Learning
Feature Engineering in Machine LearningFeature Engineering in Machine Learning
Feature Engineering in Machine Learning
Knoldus Inc.
 
Supervised Machine Learning
Supervised Machine LearningSupervised Machine Learning
Supervised Machine Learning
Ankit Rai
 
An introduction to Deep Learning
An introduction to Deep LearningAn introduction to Deep Learning
An introduction to Deep Learning
Julien SIMON
 
Introduction to Machine Learning Classifiers
Introduction to Machine Learning ClassifiersIntroduction to Machine Learning Classifiers
Introduction to Machine Learning Classifiers
Functional Imperative
 
Presentation on supervised learning
Presentation on supervised learningPresentation on supervised learning
Presentation on supervised learning
Tonmoy Bhagawati
 
Machine Learning
Machine LearningMachine Learning
Machine Learning
Vivek Garg
 
ML DL AI DS BD - An Introduction
ML DL AI DS BD - An IntroductionML DL AI DS BD - An Introduction
ML DL AI DS BD - An Introduction
Dony Riyanto
 

What's hot (20)

Machine Learning Tutorial Part - 1 | Machine Learning Tutorial For Beginners ...
Machine Learning Tutorial Part - 1 | Machine Learning Tutorial For Beginners ...Machine Learning Tutorial Part - 1 | Machine Learning Tutorial For Beginners ...
Machine Learning Tutorial Part - 1 | Machine Learning Tutorial For Beginners ...
 
Machine learning
Machine learningMachine learning
Machine learning
 
Supervised Machine Learning Techniques
Supervised Machine Learning TechniquesSupervised Machine Learning Techniques
Supervised Machine Learning Techniques
 
Introduction to Deep Learning
Introduction to Deep LearningIntroduction to Deep Learning
Introduction to Deep Learning
 
Machine learning
Machine learningMachine learning
Machine learning
 
Machine Learning and Real-World Applications
Machine Learning and Real-World ApplicationsMachine Learning and Real-World Applications
Machine Learning and Real-World Applications
 
Machine Learning Algorithms | Machine Learning Tutorial | Data Science Algori...
Machine Learning Algorithms | Machine Learning Tutorial | Data Science Algori...Machine Learning Algorithms | Machine Learning Tutorial | Data Science Algori...
Machine Learning Algorithms | Machine Learning Tutorial | Data Science Algori...
 
Applications in Machine Learning
Applications in Machine LearningApplications in Machine Learning
Applications in Machine Learning
 
Genetic algorithms
Genetic algorithmsGenetic algorithms
Genetic algorithms
 
Lecture1 introduction to machine learning
Lecture1 introduction to machine learningLecture1 introduction to machine learning
Lecture1 introduction to machine learning
 
Introduction to Machine Learning
Introduction to Machine LearningIntroduction to Machine Learning
Introduction to Machine Learning
 
Supervised Machine Learning With Types And Techniques
Supervised Machine Learning With Types And TechniquesSupervised Machine Learning With Types And Techniques
Supervised Machine Learning With Types And Techniques
 
Machine Learning
Machine LearningMachine Learning
Machine Learning
 
Feature Engineering in Machine Learning
Feature Engineering in Machine LearningFeature Engineering in Machine Learning
Feature Engineering in Machine Learning
 
Supervised Machine Learning
Supervised Machine LearningSupervised Machine Learning
Supervised Machine Learning
 
An introduction to Deep Learning
An introduction to Deep LearningAn introduction to Deep Learning
An introduction to Deep Learning
 
Introduction to Machine Learning Classifiers
Introduction to Machine Learning ClassifiersIntroduction to Machine Learning Classifiers
Introduction to Machine Learning Classifiers
 
Presentation on supervised learning
Presentation on supervised learningPresentation on supervised learning
Presentation on supervised learning
 
Machine Learning
Machine LearningMachine Learning
Machine Learning
 
ML DL AI DS BD - An Introduction
ML DL AI DS BD - An IntroductionML DL AI DS BD - An Introduction
ML DL AI DS BD - An Introduction
 

Viewers also liked

Semantic Computing Executive Briefing
Semantic Computing Executive Briefing Semantic Computing Executive Briefing
Semantic Computing Executive Briefing
Graeme Wood
 
Machine Learning Intro Session
Machine Learning Intro SessionMachine Learning Intro Session
Machine Learning Intro Session
Naveen Rajan
 
Intro to machine learning for web folks @ BlendWebMix
Intro to machine learning for web folks @ BlendWebMixIntro to machine learning for web folks @ BlendWebMix
Intro to machine learning for web folks @ BlendWebMix
Louis Dorard
 
Lecture 02 introduction to ai
Lecture 02 introduction to aiLecture 02 introduction to ai
Lecture 02 introduction to ai
Hema Kashyap
 
Intro to Machine Learning
Intro to Machine LearningIntro to Machine Learning
Intro to Machine Learning
Mohammed Ashour
 
Machine learning intro
Machine learning introMachine learning intro
Machine learning intro
Sergey Shelpuk
 
Basics of Machine Learning
Basics of Machine LearningBasics of Machine Learning
Basics of Machine Learning
Frank Evans
 
Machine learning
Machine learningMachine learning
Machine learning
pdingles
 
Commercializing legal AI research: lessons learned
Commercializing legal AI research: lessons learnedCommercializing legal AI research: lessons learned
Commercializing legal AI research: lessons learned
Anna Ronkainen
 
An Intuitive Intro To Machine Learning
An Intuitive Intro To Machine LearningAn Intuitive Intro To Machine Learning
An Intuitive Intro To Machine Learning
Ben Freundorfer
 
AI in legal practice – the research perspective
AI in legal practice – the research perspectiveAI in legal practice – the research perspective
AI in legal practice – the research perspective
Anna Ronkainen
 
Basics of Machine Learning
Basics of Machine LearningBasics of Machine Learning
Basics of Machine Learning
Pranav Challa
 
Artificial intelligence
Artificial intelligenceArtificial intelligence
Artificial intelligence
Umesh Meher
 
Introduction to AI
Introduction to AIIntroduction to AI
Introduction to AI
Dr. Loganathan R
 
An introduction to AI (artificial intelligence)
An introduction to AI (artificial intelligence)An introduction to AI (artificial intelligence)
An introduction to AI (artificial intelligence)
Bellaj Badr
 
Machine learning basics using trees algorithm (Random forest, Gradient Boosting)
Machine learning basics using trees algorithm (Random forest, Gradient Boosting)Machine learning basics using trees algorithm (Random forest, Gradient Boosting)
Machine learning basics using trees algorithm (Random forest, Gradient Boosting)
Parth Khare
 
Linear regression
Linear regression Linear regression
Linear regression
Babasab Patil
 
Supervised and unsupervised learning
Supervised and unsupervised learningSupervised and unsupervised learning
Supervised and unsupervised learning
Paras Kohli
 
What AI is and examples of how it is used in legal
What AI is and examples of how it is used in legalWhat AI is and examples of how it is used in legal
What AI is and examples of how it is used in legal
Ben Gardner
 
Artificial Intelligence
Artificial IntelligenceArtificial Intelligence
Artificial Intelligence
Javaria Chiragh
 

Viewers also liked (20)

Semantic Computing Executive Briefing
Semantic Computing Executive Briefing Semantic Computing Executive Briefing
Semantic Computing Executive Briefing
 
Machine Learning Intro Session
Machine Learning Intro SessionMachine Learning Intro Session
Machine Learning Intro Session
 
Intro to machine learning for web folks @ BlendWebMix
Intro to machine learning for web folks @ BlendWebMixIntro to machine learning for web folks @ BlendWebMix
Intro to machine learning for web folks @ BlendWebMix
 
Lecture 02 introduction to ai
Lecture 02 introduction to aiLecture 02 introduction to ai
Lecture 02 introduction to ai
 
Intro to Machine Learning
Intro to Machine LearningIntro to Machine Learning
Intro to Machine Learning
 
Machine learning intro
Machine learning introMachine learning intro
Machine learning intro
 
Basics of Machine Learning
Basics of Machine LearningBasics of Machine Learning
Basics of Machine Learning
 
Machine learning
Machine learningMachine learning
Machine learning
 
Commercializing legal AI research: lessons learned
Commercializing legal AI research: lessons learnedCommercializing legal AI research: lessons learned
Commercializing legal AI research: lessons learned
 
An Intuitive Intro To Machine Learning
An Intuitive Intro To Machine LearningAn Intuitive Intro To Machine Learning
An Intuitive Intro To Machine Learning
 
AI in legal practice – the research perspective
AI in legal practice – the research perspectiveAI in legal practice – the research perspective
AI in legal practice – the research perspective
 
Basics of Machine Learning
Basics of Machine LearningBasics of Machine Learning
Basics of Machine Learning
 
Artificial intelligence
Artificial intelligenceArtificial intelligence
Artificial intelligence
 
Introduction to AI
Introduction to AIIntroduction to AI
Introduction to AI
 
An introduction to AI (artificial intelligence)
An introduction to AI (artificial intelligence)An introduction to AI (artificial intelligence)
An introduction to AI (artificial intelligence)
 
Machine learning basics using trees algorithm (Random forest, Gradient Boosting)
Machine learning basics using trees algorithm (Random forest, Gradient Boosting)Machine learning basics using trees algorithm (Random forest, Gradient Boosting)
Machine learning basics using trees algorithm (Random forest, Gradient Boosting)
 
Linear regression
Linear regression Linear regression
Linear regression
 
Supervised and unsupervised learning
Supervised and unsupervised learningSupervised and unsupervised learning
Supervised and unsupervised learning
 
What AI is and examples of how it is used in legal
What AI is and examples of how it is used in legalWhat AI is and examples of how it is used in legal
What AI is and examples of how it is used in legal
 
Artificial Intelligence
Artificial IntelligenceArtificial Intelligence
Artificial Intelligence
 

Similar to Intro to modelling-supervised learning

M08 BiasVarianceTradeoff
M08 BiasVarianceTradeoffM08 BiasVarianceTradeoff
M08 BiasVarianceTradeoff
Raman Kannan
 
Lect 8 learning types (M.L.).pdf
Lect 8 learning types (M.L.).pdfLect 8 learning types (M.L.).pdf
Lect 8 learning types (M.L.).pdf
HassanElalfy4
 
Introduction to ml
Introduction to mlIntroduction to ml
Introduction to ml
SuyashSingh70
 
MachineLlearning introduction
MachineLlearning introductionMachineLlearning introduction
MachineLlearning introduction
The IOT Academy
 
Explaining Black-Box Machine Learning Predictions - Sameer Singh, Assistant P...
Explaining Black-Box Machine Learning Predictions - Sameer Singh, Assistant P...Explaining Black-Box Machine Learning Predictions - Sameer Singh, Assistant P...
Explaining Black-Box Machine Learning Predictions - Sameer Singh, Assistant P...
Sri Ambati
 
Anomaly Detection for Real-World Systems
Anomaly Detection for Real-World SystemsAnomaly Detection for Real-World Systems
Anomaly Detection for Real-World Systems
Manojit Nandi
 
Slides(ppt)
Slides(ppt)Slides(ppt)
Slides(ppt)
butest
 
Module 5: Decision Trees
Module 5: Decision TreesModule 5: Decision Trees
Module 5: Decision Trees
Sara Hooker
 
Lightning Talks: An Innovation Showcase
Lightning Talks: An Innovation ShowcaseLightning Talks: An Innovation Showcase
Lightning Talks: An Innovation Showcase
Somo
 
PyGotham 2016
PyGotham 2016PyGotham 2016
PyGotham 2016
Manojit Nandi
 
Probabilistic modeling in deep learning
Probabilistic modeling in deep learningProbabilistic modeling in deep learning
Probabilistic modeling in deep learning
Denis Dus
 
Deep Learning Class #0 - You Can Do It
Deep Learning Class #0 - You Can Do ItDeep Learning Class #0 - You Can Do It
Deep Learning Class #0 - You Can Do It
Holberton School
 
DL Classe 0 - You can do it
DL Classe 0 - You can do itDL Classe 0 - You can do it
DL Classe 0 - You can do it
Gregory Renard
 
Barga Data Science lecture 9
Barga Data Science lecture 9Barga Data Science lecture 9
Barga Data Science lecture 9
Roger Barga
 
Module 4: Model Selection and Evaluation
Module 4: Model Selection and EvaluationModule 4: Model Selection and Evaluation
Module 4: Model Selection and Evaluation
Sara Hooker
 
Machine Learning Interview Questions
Machine Learning Interview QuestionsMachine Learning Interview Questions
Machine Learning Interview Questions
Rock Interview
 
Mis End Term Exam Theory Concepts
Mis End Term Exam Theory ConceptsMis End Term Exam Theory Concepts
Mis End Term Exam Theory Concepts
Vidya sagar Sharma
 
Lessons learned from building practical deep learning systems
Lessons learned from building practical deep learning systemsLessons learned from building practical deep learning systems
Lessons learned from building practical deep learning systems
Xavier Amatriain
 
Think-Aloud Protocols
Think-Aloud ProtocolsThink-Aloud Protocols
Think-Aloud Protocols
butest
 
Machine Can Think
Machine Can ThinkMachine Can Think
Machine Can Think
Rahul Jaiman
 

Similar to Intro to modelling-supervised learning (20)

M08 BiasVarianceTradeoff
M08 BiasVarianceTradeoffM08 BiasVarianceTradeoff
M08 BiasVarianceTradeoff
 
Lect 8 learning types (M.L.).pdf
Lect 8 learning types (M.L.).pdfLect 8 learning types (M.L.).pdf
Lect 8 learning types (M.L.).pdf
 
Introduction to ml
Introduction to mlIntroduction to ml
Introduction to ml
 
MachineLlearning introduction
MachineLlearning introductionMachineLlearning introduction
MachineLlearning introduction
 
Explaining Black-Box Machine Learning Predictions - Sameer Singh, Assistant P...
Explaining Black-Box Machine Learning Predictions - Sameer Singh, Assistant P...Explaining Black-Box Machine Learning Predictions - Sameer Singh, Assistant P...
Explaining Black-Box Machine Learning Predictions - Sameer Singh, Assistant P...
 
Anomaly Detection for Real-World Systems
Anomaly Detection for Real-World SystemsAnomaly Detection for Real-World Systems
Anomaly Detection for Real-World Systems
 
Slides(ppt)
Slides(ppt)Slides(ppt)
Slides(ppt)
 
Module 5: Decision Trees
Module 5: Decision TreesModule 5: Decision Trees
Module 5: Decision Trees
 
Lightning Talks: An Innovation Showcase
Lightning Talks: An Innovation ShowcaseLightning Talks: An Innovation Showcase
Lightning Talks: An Innovation Showcase
 
PyGotham 2016
PyGotham 2016PyGotham 2016
PyGotham 2016
 
Probabilistic modeling in deep learning
Probabilistic modeling in deep learningProbabilistic modeling in deep learning
Probabilistic modeling in deep learning
 
Deep Learning Class #0 - You Can Do It
Deep Learning Class #0 - You Can Do ItDeep Learning Class #0 - You Can Do It
Deep Learning Class #0 - You Can Do It
 
DL Classe 0 - You can do it
DL Classe 0 - You can do itDL Classe 0 - You can do it
DL Classe 0 - You can do it
 
Barga Data Science lecture 9
Barga Data Science lecture 9Barga Data Science lecture 9
Barga Data Science lecture 9
 
Module 4: Model Selection and Evaluation
Module 4: Model Selection and EvaluationModule 4: Model Selection and Evaluation
Module 4: Model Selection and Evaluation
 
Machine Learning Interview Questions
Machine Learning Interview QuestionsMachine Learning Interview Questions
Machine Learning Interview Questions
 
Mis End Term Exam Theory Concepts
Mis End Term Exam Theory ConceptsMis End Term Exam Theory Concepts
Mis End Term Exam Theory Concepts
 
Lessons learned from building practical deep learning systems
Lessons learned from building practical deep learning systemsLessons learned from building practical deep learning systems
Lessons learned from building practical deep learning systems
 
Think-Aloud Protocols
Think-Aloud ProtocolsThink-Aloud Protocols
Think-Aloud Protocols
 
Machine Can Think
Machine Can ThinkMachine Can Think
Machine Can Think
 

Recently uploaded

在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
v7oacc3l
 
一比一原版(Glasgow毕业证书)格拉斯哥大学毕业证如何办理
一比一原版(Glasgow毕业证书)格拉斯哥大学毕业证如何办理一比一原版(Glasgow毕业证书)格拉斯哥大学毕业证如何办理
一比一原版(Glasgow毕业证书)格拉斯哥大学毕业证如何办理
g4dpvqap0
 
Everything you wanted to know about LIHTC
Everything you wanted to know about LIHTCEverything you wanted to know about LIHTC
Everything you wanted to know about LIHTC
Roger Valdez
 
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
sameer shah
 
Population Growth in Bataan: The effects of population growth around rural pl...
Population Growth in Bataan: The effects of population growth around rural pl...Population Growth in Bataan: The effects of population growth around rural pl...
Population Growth in Bataan: The effects of population growth around rural pl...
Bill641377
 
06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM
06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM
06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM
Timothy Spann
 
The Building Blocks of QuestDB, a Time Series Database
The Building Blocks of QuestDB, a Time Series DatabaseThe Building Blocks of QuestDB, a Time Series Database
The Building Blocks of QuestDB, a Time Series Database
javier ramirez
 
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
Social Samosa
 
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
Timothy Spann
 
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
74nqk8xf
 
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
apvysm8
 
The Ipsos - AI - Monitor 2024 Report.pdf
The  Ipsos - AI - Monitor 2024 Report.pdfThe  Ipsos - AI - Monitor 2024 Report.pdf
The Ipsos - AI - Monitor 2024 Report.pdf
Social Samosa
 
一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理
aqzctr7x
 
Predictably Improve Your B2B Tech Company's Performance by Leveraging Data
Predictably Improve Your B2B Tech Company's Performance by Leveraging DataPredictably Improve Your B2B Tech Company's Performance by Leveraging Data
Predictably Improve Your B2B Tech Company's Performance by Leveraging Data
Kiwi Creative
 
Palo Alto Cortex XDR presentation .......
Palo Alto Cortex XDR presentation .......Palo Alto Cortex XDR presentation .......
Palo Alto Cortex XDR presentation .......
Sachin Paul
 
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
g4dpvqap0
 
Challenges of Nation Building-1.pptx with more important
Challenges of Nation Building-1.pptx with more importantChallenges of Nation Building-1.pptx with more important
Challenges of Nation Building-1.pptx with more important
Sm321
 
一比一原版(Harvard毕业证书)哈佛大学毕业证如何办理
一比一原版(Harvard毕业证书)哈佛大学毕业证如何办理一比一原版(Harvard毕业证书)哈佛大学毕业证如何办理
一比一原版(Harvard毕业证书)哈佛大学毕业证如何办理
zsjl4mimo
 
Analysis insight about a Flyball dog competition team's performance
Analysis insight about a Flyball dog competition team's performanceAnalysis insight about a Flyball dog competition team's performance
Analysis insight about a Flyball dog competition team's performance
roli9797
 
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
nuttdpt
 

Recently uploaded (20)

在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
 
一比一原版(Glasgow毕业证书)格拉斯哥大学毕业证如何办理
一比一原版(Glasgow毕业证书)格拉斯哥大学毕业证如何办理一比一原版(Glasgow毕业证书)格拉斯哥大学毕业证如何办理
一比一原版(Glasgow毕业证书)格拉斯哥大学毕业证如何办理
 
Everything you wanted to know about LIHTC
Everything you wanted to know about LIHTCEverything you wanted to know about LIHTC
Everything you wanted to know about LIHTC
 
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
 
Population Growth in Bataan: The effects of population growth around rural pl...
Population Growth in Bataan: The effects of population growth around rural pl...Population Growth in Bataan: The effects of population growth around rural pl...
Population Growth in Bataan: The effects of population growth around rural pl...
 
06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM
06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM
06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM
 
The Building Blocks of QuestDB, a Time Series Database
The Building Blocks of QuestDB, a Time Series DatabaseThe Building Blocks of QuestDB, a Time Series Database
The Building Blocks of QuestDB, a Time Series Database
 
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
 
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
 
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
 
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
 
The Ipsos - AI - Monitor 2024 Report.pdf
The  Ipsos - AI - Monitor 2024 Report.pdfThe  Ipsos - AI - Monitor 2024 Report.pdf
The Ipsos - AI - Monitor 2024 Report.pdf
 
一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理
 
Predictably Improve Your B2B Tech Company's Performance by Leveraging Data
Predictably Improve Your B2B Tech Company's Performance by Leveraging DataPredictably Improve Your B2B Tech Company's Performance by Leveraging Data
Predictably Improve Your B2B Tech Company's Performance by Leveraging Data
 
Palo Alto Cortex XDR presentation .......
Palo Alto Cortex XDR presentation .......Palo Alto Cortex XDR presentation .......
Palo Alto Cortex XDR presentation .......
 
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
 
Challenges of Nation Building-1.pptx with more important
Challenges of Nation Building-1.pptx with more importantChallenges of Nation Building-1.pptx with more important
Challenges of Nation Building-1.pptx with more important
 
一比一原版(Harvard毕业证书)哈佛大学毕业证如何办理
一比一原版(Harvard毕业证书)哈佛大学毕业证如何办理一比一原版(Harvard毕业证书)哈佛大学毕业证如何办理
一比一原版(Harvard毕业证书)哈佛大学毕业证如何办理
 
Analysis insight about a Flyball dog competition team's performance
Analysis insight about a Flyball dog competition team's performanceAnalysis insight about a Flyball dog competition team's performance
Analysis insight about a Flyball dog competition team's performance
 
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
 

Intro to modelling-supervised learning

  • 2. CONTENTS What is machine learning? Types of machine learning Supervised learning and examples Unsupervised learning and examples
  • 3. WHAT IS MACHINE LEARNING? Wikipedia: Machine Learning is a subfield of computer science which gives computers the ability to learn without being explicitly programmed.
  • 4. WHAT IS MACHINE LEARNING? Wikipedia: Machine Learning is a subfield of computer science which gives computers the ability to learn without being explicitly programmed. WTF does that mean?!
  • 5. WHAT IS MACHINE LEARNING? Wikipedia: Machine Learning is a subfield of computer science which gives computers the ability to learn without being explicitly programmed. WTF does that mean?! Basically, Machine Learning involves using some “algorithms” which learn using data to improve their predictions of something using patterns in the data. Data Algorith m Prediction s
  • 6. WHAT IS MACHINE LEARNING? “… without being explicitly programmed” This is what makes machine learning so powerful. Rather than requiring specific instructions like in traditional computing, machine learning allows the computers to improve their predictions just using the data inputs.
  • 7. TWO MAIN TYPES OF MACHINE LEARNING ALGORITHM Supervised Learning: We know what we are trying to predict. We use some examples that we (and the model) know the answer to, to “train” our model. It can then generate predictions to examples we don’t know the answer to. Examples: Predict the price a house will sell at. Identify the gender of someone based on a photograph. Unsupervised Learning: We don’t know what we are trying to predict. We are trying to identify some naturally occurring patterns in the data which may be informative. Examples: Try to identify “clusters” of customers based on data we have on them
  • 8. TWO MAIN TYPES OF MACHINE LEARNING ALGORITHM Supervised Learning: We know what we are trying to predict. We use some examples that we (and the model) know the answer to, to “train” our model. It can then generate predictions to examples we don’t know the answer to. Examples: Predict the price a house will sell at. Identify the gender of someone based on a photograph. Unsupervised Learning: We don’t know what we are trying to predict. We are trying to identify some naturally occurring patterns in the data which may be informative. Examples: Try to identify “clusters” of customers based on data we have on them
  • 9. TWO MAIN TYPES OF MACHINE LEARNING ALGORITHM Supervised Learning: We know what we are trying to predict. We use some examples that we (and the model) know the answer to, to “train” our model. It can then generate predictions to examples we don’t know the answer to. Examples: Predict the price a house will sell at. Identify the gender of someone based on a photograph. Unsupervised Learning: We don’t know what we are trying to predict. We are trying to identify some naturally occurring patterns in the data which may be informative. Examples: Try to identify “clusters” of customers based on data we have on them
  • 10. TYPES OF SUPERVISED LEARNING Supervised learning can be further broken down based on two possible types of problem they may be trying to solve. Classification Problems: These are problems where there is a finite and countable number of possible solutions. There may be as few as 2 or as many as 1000+ possible solutions, but as long as we can identify and count them all this doesn’t matter. Examples: Identify the colour seen in a picture. Regression Problems: These are problems where the feature we are trying to predict is a number on a continuous scale. Examples: Predict someone’s height.
  • 11. TYPES OF SUPERVISED LEARNING Supervised learning can be further broken down based on two possible types of problem they may be trying to solve. Classification Problems: These are problems where there is a finite and countable number of possible solutions. These are categories or classes. There may be as few as 2 or as many as 1000+ possible solutions, but as long as we can identify and count them all this doesn’t matter. Examples: Identify plant species. Regression Problems: These are problems where the feature we are trying to predict is a number on a continuous scale. Examples: Predict someone’s height.
  • 12. TYPES OF SUPERVISED LEARNING Supervised learning can be further broken down based on two possible types of problem they may be trying to solve. Classification Problems: These are problems where there is a finite and countable number of possible solutions. These are categories or classes. There may be as few as 2 or as many as 1000+ possible solutions, but as long as we can identify and count them all this doesn’t matter. Examples: Identify plant species. Regression Problems: These are problems where the feature we are trying to predict is a number on a continuous scale. Examples: Predict someone’s height.
  • 13. INTRO TO A FEW SUPERVISED LEARNING MODELS Nearest Neighbours (Classification and Regression) Decision Trees (Classification and Regression) Linear Regression (Regression)
  • 14. QUICK TERMINOLOGY Observation: One of the “things” we are looking at. Could be a person, a time, or a place. Feature: Some aspect of the observation that we know. Could be a person’s hair colour, the latitude and longitude of a city, or the number of rooms a house has. May be denoted as x Label: The feature of an observation which we are trying to predict. For labelled observations, we already know the answer. May be denoted as y
  • 15. NEAREST NEIGHBOURS Conceptually one of the simplest Machine Learning algorithms. Uses the proximity or similarity of observations to make predictions about them
  • 16. NEAREST NEIGHBOURS Conceptually one of the simplest Machine Learning algorithms. Uses the proximity or similarity of observations to make predictions about them Method: For the 1-Nearest Neighbour algorithm, find the closest labelled observation to the unlabelled observation and apply the same label. While it may seem very simple, it is often very effective! It can be used for classification or regression
  • 23. 1 NEAREST NEIGHBOUR PREDICTIONS ? Here there is some ambiguity. We are equal distance from both classes. In this case, for 1-NN we would just flip a coin to choose a class at random
  • 26. K-NEAREST NEIGHBOURS The problem with 1-Nearest Neighbours is that outliers may result in incorrect predictions. What is an outlier?
  • 27. K-NEAREST NEIGHBOURS The problem with 1-Nearest Neighbours is that outliers may result in incorrect predictions. What is an outlier? Outlier is a point which is distant or very different from other observations. This may be a legitimate datapoint, or may be an example of “noise” in the data
  • 30. K-NEAREST NEIGHBOURS The problem with 1-Nearest Neighbours is that outliers may result in incorrect predictions. How could we attempt to counteract this problem?
  • 31. K-NEAREST NEIGHBOURS The problem with 1-Nearest Neighbours is that outliers may result in incorrect predictions. How could we attempt to counteract this problem? Why not try 2-Nearest Neighbours? Simply look at the 2 nearest labelled examples and apply the label that they have.
  • 32. K-NEAREST NEIGHBOURS The problem with 1-Nearest Neighbours is that outliers may result in incorrect predictions. How could we attempt to counteract this problem? Why not try 2-Nearest Neighbours? Simply look at the 2 nearest labelled examples and apply the label that they have. What happens when we have a tie?
  • 33. K-NEAREST NEIGHBOURS The problem with 1-Nearest Neighbours is that outliers may result in incorrect predictions. How could we attempt to counteract this problem? Why not try 2-Nearest Neighbours? Simply look at the 2 nearest labelled examples and apply the label that they have. What happens when we have a tie? Flip a coin… Or we could use 3-Nearest Neighbours – No ties if we only have 2 classes
  • 36. 3-NEAREST NEIGHBOUR PREDICTIONS ? 6 3 0 8 6 1.5 5 How can we use the 3- nearest neighbour approach in regression?
  • 38. SO WHAT K-VALUE DO I USE? Choice of how many neighbours to use illustrates one of the main trade-offs seen in machine learning: Variance vs Bias
  • 39. SO WHAT K-VALUE DO I USE? Choice of how many neighbours to use illustrates one of the main trade-offs seen in machine learning: Variance vs Bias Variance is the error in prediction we get from following our training data too closely. We end up basing our predictions on “random noise” in the data. If we choose too small a k-value, we may have a high level of variance.
  • 40. SO WHAT K-VALUE DO I USE? Choice of how many neighbours to use illustrates one of the main trade-offs seen in machine learning: Variance vs Bias Variance is the error in prediction we get from following our training data too closely. We end up basing our predictions on “random noise” in the data. If we choose too small a k-value, we may have a high level of variance. Bias is the error in prediction we get from using a simplified model to predict very complex real-world things. If we choose too large a k- value, we may have a high level of bias.
  • 41. VARIANCE VS BIAS One big part of machine learning is striking the right balance between these two types of errors.
  • 42. PROBLEM OF DIMENSIONALITY 1 Dimension: 5 observations to fill the space How many observations do we need to fill 2 dimensions?
  • 43. PROBLEM OF DIMENSIONALITY 1 Dimension: 5 observations to fill the space 2 Dimensions: 25 observations to fill the space How many observations do we need to fill 3 dimensions?
  • 44. PROBLEM OF DIMENSIONALITY 1 Dimension: 5 observations to fill the space 2 Dimensions: 25 observations to fill the space 3 Dimensions: 125 observations to fill the space As dimensionality increases, the number of observations required to “fill the space” increases exponentially
  • 45. DECISION TREES Another quite simple Machine Learning technique. We attempt to “cut” the space where our observations exist and predict labels based on the sections our observations end up in.
  • 46. DECISION TREES We can display these cuts in the form of a tree, hence the name. Here is an example of such a tree used for predicting height Another quite simple Machine Learning technique. We attempt to “cut” the space where our observations exist and predict labels based on the sections our observations end up in.
  • 47. DECISION TREE – “CUTTING THE SPACE”
  • 48. DECISION TREE - “CUTTING THE SPACE” This is an example of “cutting the space”
  • 49. DECISION TREES Once we have cut our space into chunks, how do we generate predictions in that area? ?
  • 50. DECISION TREES Once we have cut our space into chunks, how do we generate predictions in that area?
  • 51. DECISION TREES Once we have cut our space into chunks, how do we generate predictions in that area? 6 8 5 ?
  • 52. DECISION TREES Once we have cut our space into chunks, how do we generate predictions in that area? 6 8 5 6.33
  • 53. DECISION TREE – WHERE DO WE CUT? Each cut should improve the prediction accuracy by as much as possible
  • 54. DECISION TREE – WHERE DO WE CUT?
  • 55. DECISION TREE – WHERE DO WE CUT?
  • 56. HOW COULD WE CUT A “REGRESSION” DECISION TREE?
  • 57. HOW COULD WE CUT A “REGRESSION” DECISION TREE? Very similar to the way classification trees are cut. Each cut should reduce the difference between predicted output in an area and the actual training output
  • 58. BIAS AND VARIANCE IN DECISION TREES What would a decision tree with a high degree of bias look like? What would a decision tree with a high degree of variance look like?
  • 59. LINEAR REGRESSION I will assume everyone knows the basics of linear regression. While I won’t go into any of the maths, it is very useful to look at this with the other models.
  • 60. LINEAR REGRESSION I will assume everyone knows the basics of linear regression. What is a very basic definition of linear regression?
  • 61. LINEAR REGRESSION What would a linear regression line with a high degree of bias look like? What would a linear regression line with a high degree of variance look like?
  • 62. SPECTRUM OF SUPERVISED LEARNING TECHNIQUES No assumptions about data Lots of assumptions about data Where do the techniques we have discussed fall on this spectrum?
  • 63. SPECTRUM OF SUPERVISED LEARNING TECHNIQUES No assumptions about data Lots of assumptions about data Not computationa lly efficient Very computationally efficient The more assumptions we can make about our data, the more computationally efficient we can make it
  • 64. SPECTRUM OF SUPERVISED LEARNING TECHNIQUES No assumptions about data Lots of assumptions about data Not computationa lly efficient Very computationally efficient K-Nearest Neighbour s Decisi on Trees Linear Regression