SlideShare a Scribd company logo
1 of 82
Download to read offline
Introduction to Machine Learning
Sudeshna Sarkar
IIT Kharagpur
Module 1: Introduction
Part A: Introduction
Overview of Course
1. Introduction
2. Linear Regression and Decision Trees
3. Instance based learning
Feature Selection
4. Probability and Bayes Learning
5. Support Vector Machines
6. Neural Network
7. Introduction to Computational Learning Theory
8. Clustering
Module 1
1. Introduction
a) Introduction
b) Different types of learning
c) Hypothesis space, Inductive Bias
d) Evaluation, Training and test set, cross-validation
2. Linear Regression and Decision Trees
3. Instance based learning
Feature Selection
4. Probability and Bayes Learning
5. Support Vector Machines
6. Neural Network
7. Introduction to Computational Learning Theory
8. Clustering
Machine Learning History
• 1950s:
– Samuel's checker-playing program
• 1960s:
– Neural network: Rosenblatt's perceptron
– Minsky & Papert prove limitations of Perceptron
• 1970s:
– Symbolic concept induction
– Expert systems and knowledge acquisition bottleneck
– Qui la ’s ID3
– Natural language processing (symbolic)
Machine Learning History
• 1980s:
– Advanced decision tree and rule learning
– Learning and planning and problem solving
– Resurgence of neural network
– Valia t’s PAC learning theory
– Focus on experimental methodology
• 90's ML and Statistics
– Data Mining
– Adaptive agents and web applications
– Text learning
– Reinforcement learning
– Ensembles
– Bayes Net learning
• 1994: Self-driving car
road test
• 1997: Deep Blue
beats Gary Kasparov
Machine Learning History
• Popularity of this field in
recent time and the reasons
behind that
– New software/ algorithms
• Neural networks
• Deep learning
– New hardware
• GPU’s
– Cloud Enabled
– Availability of Big Data
• 2009: Google builds self
driving car
• 2011: Watson wins
Jeopardy
• 2014: Human vision
surpassed by ML systems
Programs vs learning algorithms
Algorithmic solution Machine learning solution
Computer
Data Program
Output
Computer
Data Output
Program
Machine Learning : Definition
• Learning is the ability to improve one's behaviour based on
experience.
• Build computer systems that automatically improve with
experience
• What are the fundamental laws that govern all learning
processes?
• Machine Learning explores algorithms that can
– learn from data / build a model from data
– use the model for prediction, decision making or solving
some tasks
Machine Learning : Definition
• A computer program is said to learn from
experience E with respect to some class of
tasks T and performance measure P, if its
performance at tasks in T, as measured by P,
improves with experience E.
[Mitchell]
Components of a learning problem
• Task: The behaviour or task being improved.
– For example: classification, acting in an
environment
• Data: The experiences that are being used to
improve performance in the task.
• Measure of improvement :
– For example: increasing accuracy in prediction,
acquiring new, improved speed and efficiency
Black-box Learner
Experiences
Data
Background knowledge/
Bias
Problem/
Task
Answer/
Performance
Learner
Experiences
Data
Background knowledge/
Bias
Problem/
Task
Answer/
Performance
Learner Reasoner
Models
Many domains and applications
Medicine:
• Diagnose a disease
– Input: symptoms, lab measurements, test results,
DNA tests,
– Output: o e of set of possi le diseases, or o e
of the a o e
• Data: historical medical records
• Learn: which future patients will respond best to
which treatments
Many domains and applications
Vision:
• say what objects appear in an image
• convert hand-written digits to characters 0..9
• detect where objects appear in an image
Many domains and applications
Robot control:
• Design autonomous mobile robots that learn
from experience to
– Play soccer
– Navigate from their own experience
Many domains and applications
NLP:
• detect where entities are mentioned in NL
• detect what facts are expressed in NL
• detect if a product/movie review is positive,
negative, or neutral
Speech recognition
Machine translation
Many domains and applications
Financial:
• predict if a stock will rise or fall
• predict if a user will click on an ad or not
Application in Business Intelligence
• Forecasting product sales quantities taking
seasonality and trend into account.
• Identifying cross selling promotional opportunities
for consumer goods.
• …
Some other applications
• Fraud detection : Credit card Providers
• determine whether or not someone will
default on a home mortgage.
• Understand consumer sentiment based off of
unstructured text data.
• Fore asti g o e ’s o i tio rates ased
off external macroeconomic factors.
Learner
Experiences
Data
Background knowledge/
Bias
Problem/
Task
Answer/
Performance
Learner Reasoner
Models
Design a Learner
Experiences
Data
Background
knowledge/
Bias
Problem/
Task
Answer/
Performance
Learn
er
Reaso
ner
Models
1. Choose the training
experience
2. Choose the target
function (that is to be
learned)
3. Choose how to
represent the target
function
4. Choose a learning
algorithm to infer the
target function
Choosing a Model Representation
• The richer the representation, the more useful
it is for subsequent problem solving.
• The richer the representation, the more
difficult it is to learn.
• Components of Representation
– Features
– Function class / hypothesis language
Foundations of Machine Learning
Sudeshna Sarkar
IIT Kharagpur
Module 1: Introduction
Part B: Different types of learning
Module 1
1. Introduction
a) Introduction
b) Different types of learning
c) Hypothesis space, Inductive Bias
d) Evaluation, Training and test set, cross-validation
2. Linear Regression and Decision Trees
3. Instance based learning
Feature Selection
4. Probability and Bayes Learning
5. Neural Network
6. Support Vector Machines
7. Introduction to Computational Learning Theory
8. Clustering
Broad types of machine learning
• Supervised Learning
– X,y (pre-classified training examples)
– Given an observation x, what is the best label for y?
• Unsupervised learning
– X
– Give a set of ’s, cluster or su arize the
• Semi-supervised Learning
• Reinforcement Learning
– Determine what to do based on rewards and punishments.
Supervised Learning
Learning
Algorithm
Model
New Input x
Output y
Input1 Output1
Input2 Output2
Input3 Output3
Input-n Output-n
X y
Unsupervised Learning
Learning
Algorithm
Clusters
Input1
Input2
Input3
Input-n
X
Semi-supervised learning
Reinforcement Learning
Agent Environment
Action at
State st
Reward rt
St+1
rt+1
Reinforcement Learning
RLearner Environment
Action at
State st
Reward rt
St+1
rt+1
User Environment
Action at
State st
Reward rt
St+1
rt+1
Policy
state
Best
action
State,
update
Q-values
Supervised Learning
Given:
– a set of input features 1, … , 𝑛
– A target feature
– a set of training examples where the values for the input
features and the target features are given for each
example
– a new example, where only the values for the input
features are given
Predict the values for the target features for the new
example.
– classification when Y is discrete
– regression when Y is continuous
Classification
Example: Credit scoring
Differentiating between
low-risk and high-risk
customers from their
income and savings
Regression
Example: Price of a
used car
x : car attributes
y : price
y = g (x, θ )
g ( ) model,
𝜃 parameters
y = wx+w0
x: mileage
y:
price
Features
• Often, the individual observations are analyzed
into a set of quantifiable properties which are
called features. May be
– categorical (e.g. "A", "B", "AB" or "O", for blood type)
– ordinal (e.g. "large", "medium" or "small")
– integer-valued (e.g. the number of words in a text)
– real-valued (e.g. height)
Example Data
Action Author Thread Length Where
e1 skips known new long Home
e2 reads unknown new short Work
e3 skips unknown old long Work
e4 skips known old long home
e5 reads known new short home
e6 skips known old long work
e7 ??? known new short work
e8 ??? unknown new short work
Training Examples:
New Examples:
Testing
Training
Set
Learning
Algorithm
Hypothesis Predicted
y
X
Training
Training phase
Label machine
learning
algorithm
Input
feature
extractor
features
Testing Phase
Input
features
classifier
model
Label
feature
extractor
Classification learning
• Task T:
– input:
– output:
• Performance metric P:
• Experience E:
Classification learning
• Task T:
– input: a set of instances d1,…,dn
• an instance has a set of features
• we can represent an instance as a vector d=<x1,…,xn>
– output: a set of predictions
• one of a fixed set of constant values:
– {+1,-1} or {cancer, healthy}, or {rose, hi is us, jas i e, …}, or …
• Performance metric P:
• Experience E:
ŷ1,..., ŷn
Classification Learning
Task Instance Labels
medical
diagnosis
patient record:
blood pressure diastolic, blood
pressure systolic,
age, sex (0 or 1), BMI,
cholesterol
{-1,+1} = low, high risk
of heart disease
finding entity
names in text
a word in context: capitalized
(0,1), word-after-this-equals-
Inc, bigram-before-this-equals-
acquired-by, …
{first,later,outside} =
first word in name,
second or later word
in name, not in a
name
image
recognition
image:
1920*1080 pixels, each with a
code for color
{0,1} = no house,
house
Classification learning
• Task T:
– input: a set of instances d1,…,dn
– output: a set of predictions
• Performance metric P:
– Prob (wrong prediction)
• Experience E:
– a set of labeled examples (x,y) where y is the true
label for x
– ideally, examples should be sampled from some fixed
distribution D
ŷ1,..., ŷn
on examples from D
we care about performance on the
distribution, not the training data
Classification Learning
Task Instance Labels Getting data
medical
diagnosis
patient record:
lab readings
risk of heart
disease
wait and look
for heart
disease
finding entity
names in text
a word in context:
capitalized,
nearby words, ...
{first,later,outside} text with
manually
annotated
entities
image
recognition
image:
pixels
no house, house hand-labeled
images
Representations
1. Decision Tree
2. Linear function
Weekend
EatOut Late
EatOut Home
Yes
Yes
No
No
Representations
3. Multivariate linear
function
4. Single layer perceptron
Representations
5. Multi-layer neural
network
Hypothesis Space
• One way to think about a supervised learning
machine is as a device that explores a
h pothesis space .
– Each setting of the parameters in the machine is a
different hypothesis about the function that maps
input vectors to output vectors.
Terminology
• Features: The number of features or distinct
traits that can be used to describe each item
in a quantitative manner.
• Feature vector: n-dimensional vector of
numerical features that represent some object
• Instance Space X: Set of all possible objects
describable by features.
• Example (x,y): Instance x with label y=f(x).
Terminology
• Concept c: Subset of objects from X (c is
unknown).
• Target Function f: Maps each instance x ∈ X to
target label y ∈ Y
• Example (x,y): Instance x with label y=f(x).
• Training Data S: Collection of examples observed
by learning algorithm.
Used to discover potentially predictive relationships
Foundations of Machine Learning
Sudeshna Sarkar
IIT Kharagpur
Module 1: Introduction
Part c: Hypothesis Space and Inductive Bias
Inductive Learning or Prediction
• Given examples of a function (X, F(X))
– Predict function F(X) for new examples X
• Classification
F(X) = Discrete
• Regression
F(X) = Continuous
• Probability estimation
F(X) = Probability(X):
Features
• Features: Properties that describe each
instance in a quantitative manner.
• Feature vector: n-dimensional vector of
features that represent some object
Feature Space
0.0 1.0 2.0 3.0 4.0 5.0 6.0
0.0
1.0
2.0
3.0
Example:
<0.5,2.8,+>
+
+
+ +
+
+
+
+
- -
-
- -
-
-
-
-
- +
+
+
-
-
-
+
+
Slide by Jesse Davis: University of Washington
Terminology
0.0 1.0 2.0 3.0 4.0 5.0 6.0
0.0
1.0
2.0
3.0
Hypothesis:
Function for labeling examples
+
+
+ +
+
+
+
+
- -
-
- -
-
-
-
-
- +
+
+
-
-
-
+
+ Label: -
Label: +
?
?
?
?
Slide by Jesse Davis: University of Washington
Terminology
0.0 1.0 2.0 3.0 4.0 5.0 6.0
0.0
1.0
2.0
3.0
Hypothesis Space:
Set of legal hypotheses
+
+
+ +
+
+
+
+
- -
-
- -
-
-
-
-
- +
+
+
-
-
-
+
+
Slide by Jesse Davis: University of Washington
Representations
1. Decision Tree
2. Linear function
Weekend
EatOut Late
EatOut Home
Yes
Yes
No
No
Representations
3. Multivariate linear
function
4. Single layer perceptron
5. Multi-layer neural
networks
Hypothesis Space
• The space of all hypotheses that can, in
principle, be output by a learning algorithm.
• We can think about a supervised learning
machine as a device that explores a
h pothesis space .
– Each setting of the parameters in the machine is a
different hypothesis about the function that maps
input vectors to output vectors.
Terminology
• Example (x,y): Instance x with label y.
• Training Data S: Collection of examples observed by
learning algorithm.
• Instance Space X: Set of all possible objects
describable by features.
• Concept c: Subset of objects from X (c is unknown).
• Target Function f: Maps each instance x ∈ X to target
label y ∈ Y
Classifier
• Hypothesis h: Function that approximates f.
• Hypothesis Space ℋ : Set of functions we allow for
approximating f.
• The set of hypotheses that can be produced, can be
restricted further by specifying a language bias.
• Input: Training set 𝒮 ⊆
• Output: A hypothesis ℎ ∈ ℋ
Hypothesis Spaces
• If there are 4 (N) input features, there are
2 6
2
𝑁
possible Boolean functions.
• We cannot figure out which one is correct
unless we see every possible input-output pair
24
(2𝑁
)
Example
Hypothesis language
1. may contain representations of all polynomial
functions from X to Y if X = ℛ𝑛 and Y = ℛ,
2. may be able to represent all conjunctive
concepts over X when = 𝐵𝑛
and = 𝐵 (with
B the set of booleans).
• Hypothesis language reflects an inductive bias
that the learner has
Inductive Bias
• Need to make assumptions
– E perie ce alo e does ’t allow us to ake
conclusions about unseen data instances
• Two types of bias:
– Restriction: Limit the hypothesis space
(e.g., look at rules)
– Preference: Impose ordering on hypothesis space
(e.g., more general, consistent with data)
Inductive learning
• Inductive learning: Inducing a general function
from training examples
– Construct hypothesis h to agree with c on the training
examples.
– A hypothesis is consistent if it agrees with all training
examples.
– A hypothesis said to generalize well if it correctly
predicts the value of y for novel example.
• Inductive Learning is an Ill Posed Problem:
Unless we see all possible examples the data is not sufficient
for an inductive learning algorithm to find a unique solution.
Inductive Learning Hypothesis
• Any hypothesis h found to approximate the
target function c well over a sufficiently large
set of training examples D will also
approximate the target function well over
other unobserved examples.
Learning as Refining the Hypothesis
Space
• Concept learning is a task of searching an hypotheses
space of possible representations looking for the
representation(s) that best fits the data, given the
bias.
• The tendency to prefer one hypothesis over another
is called a bias.
• Given a representation, data, and a bias, the problem
of learning can be reduced to one of search.
Occam's Razor
⁻ A classical example of Inductive Bias
• the simplest consistent hypothesis about the
target function is actually the best
Some more Types of Inductive Bias
• Minimum description length: when forming a
hypothesis, attempt to minimize the length of the
description of the hypothesis.
• Maximum margin: when drawing a boundary
between two classes, attempt to maximize the width
of the boundary (SVM)
Important issues in Machine Learning
• What are good hypothesis spaces?
• Algorithms that work with the hypothesis spaces
• How to optimize accuracy over future data points
(overfitting)
• How can we have confidence in the result? (How
much training data – statistical qs)
• Are some learning problems computationally
intractable?
Generalization
• Components of generalization error
– Bias: how much the average model over all
training sets differ from the true model?
• Error due to inaccurate assumptions/simplifications
made by the model
– Variance: how much models estimated from
different training sets differ from each other
Underfitting and Overfitting
• Underfitting: odel is too si ple to represe t all
the relevant class characteristics
– High bias and low variance
– High training error and high test error
• Overfitting: odel is too co ple a d fits
irrelevant characteristics (noise) in the data
– Low bias and high variance
– Low training error and high test error
Foundations of Machine Learning
Sudeshna Sarkar
IIT Kharagpur
Module 1: Introduction
Part D: Evaluation and Cross validation
Experimental Evaluation of Learning
Algorithms
• Evaluating the performance of learning systems is
important because:
– Learning systems are usually designed to predict the
lass of future unla eled data points.
• Typical choices for Performance Evaluation:
– Error
– Accuracy
– Precision/Recall
• Typical choices for Sampling Methods:
– Train/Test Sets
– K-Fold Cross-validation
Evaluating predictions
• Suppose we want to make a prediction of a
value for a target feature on example x:
– y is the observed value of target feature on
example x.
– is the predicted value of target feature on
example x.
– How is the error measured?
Measures of error
• Absolute error:
𝑛
|𝑓 − |
• Sum of squares error:
𝑛
𝑓 −
𝑛
𝑖=
• Number of misclassifications:
𝑛
𝛿 𝑓 ,
𝑛
𝑖=
• 𝛿 𝑓 , is 1 if f(x)  y, and 0, otherwise.
Confusion Matrix
• Accuracy =
(TP+TN)/(P+N)
• Precision =
TP/(TP+FP)
• Recall/TP rate =
TP/P
• FP Rate = FP/N
True class 
Hypothesized
class
Pos Neg
Yes TP FP
No FN TN
P=TP+FN N=FP+TN
Sample Error and True Error
• The sample error of hypothesis f with respect to
target function c and data sample S is:
errors(f)= 1/n xS(f(x),c(x))
• The true error (denoted errorD(f)) of hypothesis f
with respect to target function c and distribution D,
is the probability that h will misclassify an instance
drawn at random according to D.
errorD(f)= PrxD[f(x)  c(x)]
Why Errors
• Errors in learning are caused by:
– Limited representation (representation bias)
– Limited search (search bias)
– Limited data (variance)
– Limited features (noise)
Difficulties in evaluating hypotheses
with limited data
• Bias in the estimate: The sample error is a poor
estimator of true error
– ==> test the hypothesis on an independent test set
• We divide the examples into:
– Training examples that are used to train the learner
– Test examples that are used to evaluate the learner
• Variance in the estimate: The smaller the test set,
the greater the expected variance.
Validation set
Validation fails to use all the available data
k-fold cross-validation
1. Split the data into k equal subsets
2. Perform k rounds of learning; on each round
– 1/k of the data is held out as a test set and
– the remaining examples are used as training data.
3. Compute the average test set score of the k rounds
K-fold cross validation
Trade-off
• In machine learning, there is always a trade-
off between
– complex hypotheses that fit the training data well
– simpler hypotheses that may generalise better.
• As the amount of training data increases, the
generalization error decreases.

More Related Content

Similar to Week 1.pdf

Lecture 5 machine learning updated
Lecture 5   machine learning updatedLecture 5   machine learning updated
Lecture 5 machine learning updatedVajira Thambawita
 
Lecture 2 Basic Concepts in Machine Learning for Language Technology
Lecture 2 Basic Concepts in Machine Learning for Language TechnologyLecture 2 Basic Concepts in Machine Learning for Language Technology
Lecture 2 Basic Concepts in Machine Learning for Language TechnologyMarina Santini
 
Lecture 09(introduction to machine learning)
Lecture 09(introduction to machine learning)Lecture 09(introduction to machine learning)
Lecture 09(introduction to machine learning)Jeet Das
 
林守德/Practical Issues in Machine Learning
林守德/Practical Issues in Machine Learning林守德/Practical Issues in Machine Learning
林守德/Practical Issues in Machine Learning台灣資料科學年會
 
Week_1 Machine Learning introduction.pptx
Week_1 Machine Learning introduction.pptxWeek_1 Machine Learning introduction.pptx
Week_1 Machine Learning introduction.pptxmuhammadsamroz
 
Intro to machine learning
Intro to machine learningIntro to machine learning
Intro to machine learningAkshay Kanchan
 
Presentation on supervised learning
Presentation on supervised learningPresentation on supervised learning
Presentation on supervised learningTonmoy Bhagawati
 
mining sirdar , overman, assistant managerppt.ppt
mining sirdar , overman, assistant managerppt.pptmining sirdar , overman, assistant managerppt.ppt
mining sirdar , overman, assistant managerppt.pptUttamVishwakarma7
 
Artificial Neural Networks for data mining
Artificial Neural Networks for data miningArtificial Neural Networks for data mining
Artificial Neural Networks for data miningALIZAIB KHAN
 
Unit 3 – AIML.pptx
Unit 3 – AIML.pptxUnit 3 – AIML.pptx
Unit 3 – AIML.pptxhiblooms
 
Rahul_Kirtoniya_11800121032_CSE_Machine_Learning.pptx
Rahul_Kirtoniya_11800121032_CSE_Machine_Learning.pptxRahul_Kirtoniya_11800121032_CSE_Machine_Learning.pptx
Rahul_Kirtoniya_11800121032_CSE_Machine_Learning.pptxRahulKirtoniya
 
1. Demystifying ML.pdf
1. Demystifying ML.pdf1. Demystifying ML.pdf
1. Demystifying ML.pdfJyoti Yadav
 
Information Retrieval 08
Information Retrieval 08 Information Retrieval 08
Information Retrieval 08 Jeet Das
 
Statistical Learning and Model Selection (1).pptx
Statistical Learning and Model Selection (1).pptxStatistical Learning and Model Selection (1).pptx
Statistical Learning and Model Selection (1).pptxrajalakshmi5921
 

Similar to Week 1.pdf (20)

ai4.ppt
ai4.pptai4.ppt
ai4.ppt
 
Launching into machine learning
Launching into machine learningLaunching into machine learning
Launching into machine learning
 
Lecture 5 machine learning updated
Lecture 5   machine learning updatedLecture 5   machine learning updated
Lecture 5 machine learning updated
 
Learning
LearningLearning
Learning
 
Lecture 2 Basic Concepts in Machine Learning for Language Technology
Lecture 2 Basic Concepts in Machine Learning for Language TechnologyLecture 2 Basic Concepts in Machine Learning for Language Technology
Lecture 2 Basic Concepts in Machine Learning for Language Technology
 
Lecture 09(introduction to machine learning)
Lecture 09(introduction to machine learning)Lecture 09(introduction to machine learning)
Lecture 09(introduction to machine learning)
 
AL slides.ppt
AL slides.pptAL slides.ppt
AL slides.ppt
 
林守德/Practical Issues in Machine Learning
林守德/Practical Issues in Machine Learning林守德/Practical Issues in Machine Learning
林守德/Practical Issues in Machine Learning
 
Week_1 Machine Learning introduction.pptx
Week_1 Machine Learning introduction.pptxWeek_1 Machine Learning introduction.pptx
Week_1 Machine Learning introduction.pptx
 
Machine learning
Machine learning Machine learning
Machine learning
 
Intro to machine learning
Intro to machine learningIntro to machine learning
Intro to machine learning
 
Presentation on supervised learning
Presentation on supervised learningPresentation on supervised learning
Presentation on supervised learning
 
mining sirdar , overman, assistant managerppt.ppt
mining sirdar , overman, assistant managerppt.pptmining sirdar , overman, assistant managerppt.ppt
mining sirdar , overman, assistant managerppt.ppt
 
Artificial Neural Networks for data mining
Artificial Neural Networks for data miningArtificial Neural Networks for data mining
Artificial Neural Networks for data mining
 
Artificial Neural Networks for Data Mining
Artificial Neural Networks for Data MiningArtificial Neural Networks for Data Mining
Artificial Neural Networks for Data Mining
 
Unit 3 – AIML.pptx
Unit 3 – AIML.pptxUnit 3 – AIML.pptx
Unit 3 – AIML.pptx
 
Rahul_Kirtoniya_11800121032_CSE_Machine_Learning.pptx
Rahul_Kirtoniya_11800121032_CSE_Machine_Learning.pptxRahul_Kirtoniya_11800121032_CSE_Machine_Learning.pptx
Rahul_Kirtoniya_11800121032_CSE_Machine_Learning.pptx
 
1. Demystifying ML.pdf
1. Demystifying ML.pdf1. Demystifying ML.pdf
1. Demystifying ML.pdf
 
Information Retrieval 08
Information Retrieval 08 Information Retrieval 08
Information Retrieval 08
 
Statistical Learning and Model Selection (1).pptx
Statistical Learning and Model Selection (1).pptxStatistical Learning and Model Selection (1).pptx
Statistical Learning and Model Selection (1).pptx
 

Recently uploaded

Interactive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationInteractive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationnomboosow
 
Beyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactBeyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactPECB
 
Paris 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityParis 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityGeoBlogs
 
Z Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphZ Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphThiyagu K
 
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdfBASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdfSoniaTolstoy
 
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxPOINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxSayali Powar
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfciinovamais
 
9548086042 for call girls in Indira Nagar with room service
9548086042  for call girls in Indira Nagar  with room service9548086042  for call girls in Indira Nagar  with room service
9548086042 for call girls in Indira Nagar with room servicediscovermytutordmt
 
Measures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDMeasures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDThiyagu K
 
The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13Steve Thomason
 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)eniolaolutunde
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdfQucHHunhnh
 
Arihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfArihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfchloefrazer622
 
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions  for the students and aspirants of Chemistry12th.pptxOrganic Name Reactions  for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions for the students and aspirants of Chemistry12th.pptxVS Mahajan Coaching Centre
 
Web & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfWeb & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfJayanti Pande
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introductionMaksud Ahmed
 

Recently uploaded (20)

Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1
 
Mattingly "AI & Prompt Design: The Basics of Prompt Design"
Mattingly "AI & Prompt Design: The Basics of Prompt Design"Mattingly "AI & Prompt Design: The Basics of Prompt Design"
Mattingly "AI & Prompt Design: The Basics of Prompt Design"
 
Interactive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationInteractive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communication
 
Beyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactBeyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global Impact
 
Paris 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityParis 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activity
 
Z Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphZ Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot Graph
 
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdfBASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
 
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxPOINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdf
 
9548086042 for call girls in Indira Nagar with room service
9548086042  for call girls in Indira Nagar  with room service9548086042  for call girls in Indira Nagar  with room service
9548086042 for call girls in Indira Nagar with room service
 
Measures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDMeasures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SD
 
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptxINDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
 
The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13
 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdf
 
Arihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfArihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdf
 
Advance Mobile Application Development class 07
Advance Mobile Application Development class 07Advance Mobile Application Development class 07
Advance Mobile Application Development class 07
 
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions  for the students and aspirants of Chemistry12th.pptxOrganic Name Reactions  for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
 
Web & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfWeb & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdf
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introduction
 

Week 1.pdf

  • 1. Introduction to Machine Learning Sudeshna Sarkar IIT Kharagpur Module 1: Introduction Part A: Introduction
  • 2. Overview of Course 1. Introduction 2. Linear Regression and Decision Trees 3. Instance based learning Feature Selection 4. Probability and Bayes Learning 5. Support Vector Machines 6. Neural Network 7. Introduction to Computational Learning Theory 8. Clustering
  • 3. Module 1 1. Introduction a) Introduction b) Different types of learning c) Hypothesis space, Inductive Bias d) Evaluation, Training and test set, cross-validation 2. Linear Regression and Decision Trees 3. Instance based learning Feature Selection 4. Probability and Bayes Learning 5. Support Vector Machines 6. Neural Network 7. Introduction to Computational Learning Theory 8. Clustering
  • 4. Machine Learning History • 1950s: – Samuel's checker-playing program • 1960s: – Neural network: Rosenblatt's perceptron – Minsky & Papert prove limitations of Perceptron • 1970s: – Symbolic concept induction – Expert systems and knowledge acquisition bottleneck – Qui la ’s ID3 – Natural language processing (symbolic)
  • 5. Machine Learning History • 1980s: – Advanced decision tree and rule learning – Learning and planning and problem solving – Resurgence of neural network – Valia t’s PAC learning theory – Focus on experimental methodology • 90's ML and Statistics – Data Mining – Adaptive agents and web applications – Text learning – Reinforcement learning – Ensembles – Bayes Net learning • 1994: Self-driving car road test • 1997: Deep Blue beats Gary Kasparov
  • 6. Machine Learning History • Popularity of this field in recent time and the reasons behind that – New software/ algorithms • Neural networks • Deep learning – New hardware • GPU’s – Cloud Enabled – Availability of Big Data • 2009: Google builds self driving car • 2011: Watson wins Jeopardy • 2014: Human vision surpassed by ML systems
  • 7. Programs vs learning algorithms Algorithmic solution Machine learning solution Computer Data Program Output Computer Data Output Program
  • 8. Machine Learning : Definition • Learning is the ability to improve one's behaviour based on experience. • Build computer systems that automatically improve with experience • What are the fundamental laws that govern all learning processes? • Machine Learning explores algorithms that can – learn from data / build a model from data – use the model for prediction, decision making or solving some tasks
  • 9. Machine Learning : Definition • A computer program is said to learn from experience E with respect to some class of tasks T and performance measure P, if its performance at tasks in T, as measured by P, improves with experience E. [Mitchell]
  • 10. Components of a learning problem • Task: The behaviour or task being improved. – For example: classification, acting in an environment • Data: The experiences that are being used to improve performance in the task. • Measure of improvement : – For example: increasing accuracy in prediction, acquiring new, improved speed and efficiency
  • 13. Many domains and applications Medicine: • Diagnose a disease – Input: symptoms, lab measurements, test results, DNA tests, – Output: o e of set of possi le diseases, or o e of the a o e • Data: historical medical records • Learn: which future patients will respond best to which treatments
  • 14. Many domains and applications Vision: • say what objects appear in an image • convert hand-written digits to characters 0..9 • detect where objects appear in an image
  • 15. Many domains and applications Robot control: • Design autonomous mobile robots that learn from experience to – Play soccer – Navigate from their own experience
  • 16. Many domains and applications NLP: • detect where entities are mentioned in NL • detect what facts are expressed in NL • detect if a product/movie review is positive, negative, or neutral Speech recognition Machine translation
  • 17. Many domains and applications Financial: • predict if a stock will rise or fall • predict if a user will click on an ad or not
  • 18. Application in Business Intelligence • Forecasting product sales quantities taking seasonality and trend into account. • Identifying cross selling promotional opportunities for consumer goods. • …
  • 19. Some other applications • Fraud detection : Credit card Providers • determine whether or not someone will default on a home mortgage. • Understand consumer sentiment based off of unstructured text data. • Fore asti g o e ’s o i tio rates ased off external macroeconomic factors.
  • 21. Design a Learner Experiences Data Background knowledge/ Bias Problem/ Task Answer/ Performance Learn er Reaso ner Models 1. Choose the training experience 2. Choose the target function (that is to be learned) 3. Choose how to represent the target function 4. Choose a learning algorithm to infer the target function
  • 22. Choosing a Model Representation • The richer the representation, the more useful it is for subsequent problem solving. • The richer the representation, the more difficult it is to learn. • Components of Representation – Features – Function class / hypothesis language
  • 23. Foundations of Machine Learning Sudeshna Sarkar IIT Kharagpur Module 1: Introduction Part B: Different types of learning
  • 24. Module 1 1. Introduction a) Introduction b) Different types of learning c) Hypothesis space, Inductive Bias d) Evaluation, Training and test set, cross-validation 2. Linear Regression and Decision Trees 3. Instance based learning Feature Selection 4. Probability and Bayes Learning 5. Neural Network 6. Support Vector Machines 7. Introduction to Computational Learning Theory 8. Clustering
  • 25. Broad types of machine learning • Supervised Learning – X,y (pre-classified training examples) – Given an observation x, what is the best label for y? • Unsupervised learning – X – Give a set of ’s, cluster or su arize the • Semi-supervised Learning • Reinforcement Learning – Determine what to do based on rewards and punishments.
  • 26. Supervised Learning Learning Algorithm Model New Input x Output y Input1 Output1 Input2 Output2 Input3 Output3 Input-n Output-n X y
  • 29. Reinforcement Learning Agent Environment Action at State st Reward rt St+1 rt+1
  • 30. Reinforcement Learning RLearner Environment Action at State st Reward rt St+1 rt+1 User Environment Action at State st Reward rt St+1 rt+1 Policy state Best action State, update Q-values
  • 31. Supervised Learning Given: – a set of input features 1, … , 𝑛 – A target feature – a set of training examples where the values for the input features and the target features are given for each example – a new example, where only the values for the input features are given Predict the values for the target features for the new example. – classification when Y is discrete – regression when Y is continuous
  • 32. Classification Example: Credit scoring Differentiating between low-risk and high-risk customers from their income and savings
  • 33. Regression Example: Price of a used car x : car attributes y : price y = g (x, θ ) g ( ) model, 𝜃 parameters y = wx+w0 x: mileage y: price
  • 34. Features • Often, the individual observations are analyzed into a set of quantifiable properties which are called features. May be – categorical (e.g. "A", "B", "AB" or "O", for blood type) – ordinal (e.g. "large", "medium" or "small") – integer-valued (e.g. the number of words in a text) – real-valued (e.g. height)
  • 35. Example Data Action Author Thread Length Where e1 skips known new long Home e2 reads unknown new short Work e3 skips unknown old long Work e4 skips known old long home e5 reads known new short home e6 skips known old long work e7 ??? known new short work e8 ??? unknown new short work Training Examples: New Examples:
  • 37. Training phase Label machine learning algorithm Input feature extractor features Testing Phase Input features classifier model Label feature extractor
  • 38. Classification learning • Task T: – input: – output: • Performance metric P: • Experience E:
  • 39. Classification learning • Task T: – input: a set of instances d1,…,dn • an instance has a set of features • we can represent an instance as a vector d=<x1,…,xn> – output: a set of predictions • one of a fixed set of constant values: – {+1,-1} or {cancer, healthy}, or {rose, hi is us, jas i e, …}, or … • Performance metric P: • Experience E: ŷ1,..., ŷn
  • 40. Classification Learning Task Instance Labels medical diagnosis patient record: blood pressure diastolic, blood pressure systolic, age, sex (0 or 1), BMI, cholesterol {-1,+1} = low, high risk of heart disease finding entity names in text a word in context: capitalized (0,1), word-after-this-equals- Inc, bigram-before-this-equals- acquired-by, … {first,later,outside} = first word in name, second or later word in name, not in a name image recognition image: 1920*1080 pixels, each with a code for color {0,1} = no house, house
  • 41. Classification learning • Task T: – input: a set of instances d1,…,dn – output: a set of predictions • Performance metric P: – Prob (wrong prediction) • Experience E: – a set of labeled examples (x,y) where y is the true label for x – ideally, examples should be sampled from some fixed distribution D ŷ1,..., ŷn on examples from D we care about performance on the distribution, not the training data
  • 42. Classification Learning Task Instance Labels Getting data medical diagnosis patient record: lab readings risk of heart disease wait and look for heart disease finding entity names in text a word in context: capitalized, nearby words, ... {first,later,outside} text with manually annotated entities image recognition image: pixels no house, house hand-labeled images
  • 43. Representations 1. Decision Tree 2. Linear function Weekend EatOut Late EatOut Home Yes Yes No No
  • 46. Hypothesis Space • One way to think about a supervised learning machine is as a device that explores a h pothesis space . – Each setting of the parameters in the machine is a different hypothesis about the function that maps input vectors to output vectors.
  • 47. Terminology • Features: The number of features or distinct traits that can be used to describe each item in a quantitative manner. • Feature vector: n-dimensional vector of numerical features that represent some object • Instance Space X: Set of all possible objects describable by features. • Example (x,y): Instance x with label y=f(x).
  • 48. Terminology • Concept c: Subset of objects from X (c is unknown). • Target Function f: Maps each instance x ∈ X to target label y ∈ Y • Example (x,y): Instance x with label y=f(x). • Training Data S: Collection of examples observed by learning algorithm. Used to discover potentially predictive relationships
  • 49. Foundations of Machine Learning Sudeshna Sarkar IIT Kharagpur Module 1: Introduction Part c: Hypothesis Space and Inductive Bias
  • 50. Inductive Learning or Prediction • Given examples of a function (X, F(X)) – Predict function F(X) for new examples X • Classification F(X) = Discrete • Regression F(X) = Continuous • Probability estimation F(X) = Probability(X):
  • 51. Features • Features: Properties that describe each instance in a quantitative manner. • Feature vector: n-dimensional vector of features that represent some object
  • 52. Feature Space 0.0 1.0 2.0 3.0 4.0 5.0 6.0 0.0 1.0 2.0 3.0 Example: <0.5,2.8,+> + + + + + + + + - - - - - - - - - - + + + - - - + + Slide by Jesse Davis: University of Washington
  • 53. Terminology 0.0 1.0 2.0 3.0 4.0 5.0 6.0 0.0 1.0 2.0 3.0 Hypothesis: Function for labeling examples + + + + + + + + - - - - - - - - - - + + + - - - + + Label: - Label: + ? ? ? ? Slide by Jesse Davis: University of Washington
  • 54. Terminology 0.0 1.0 2.0 3.0 4.0 5.0 6.0 0.0 1.0 2.0 3.0 Hypothesis Space: Set of legal hypotheses + + + + + + + + - - - - - - - - - - + + + - - - + + Slide by Jesse Davis: University of Washington
  • 55. Representations 1. Decision Tree 2. Linear function Weekend EatOut Late EatOut Home Yes Yes No No
  • 56. Representations 3. Multivariate linear function 4. Single layer perceptron 5. Multi-layer neural networks
  • 57. Hypothesis Space • The space of all hypotheses that can, in principle, be output by a learning algorithm. • We can think about a supervised learning machine as a device that explores a h pothesis space . – Each setting of the parameters in the machine is a different hypothesis about the function that maps input vectors to output vectors.
  • 58. Terminology • Example (x,y): Instance x with label y. • Training Data S: Collection of examples observed by learning algorithm. • Instance Space X: Set of all possible objects describable by features. • Concept c: Subset of objects from X (c is unknown). • Target Function f: Maps each instance x ∈ X to target label y ∈ Y
  • 59. Classifier • Hypothesis h: Function that approximates f. • Hypothesis Space ℋ : Set of functions we allow for approximating f. • The set of hypotheses that can be produced, can be restricted further by specifying a language bias. • Input: Training set 𝒮 ⊆ • Output: A hypothesis ℎ ∈ ℋ
  • 60. Hypothesis Spaces • If there are 4 (N) input features, there are 2 6 2 𝑁 possible Boolean functions. • We cannot figure out which one is correct unless we see every possible input-output pair 24 (2𝑁 )
  • 61. Example Hypothesis language 1. may contain representations of all polynomial functions from X to Y if X = ℛ𝑛 and Y = ℛ, 2. may be able to represent all conjunctive concepts over X when = 𝐵𝑛 and = 𝐵 (with B the set of booleans). • Hypothesis language reflects an inductive bias that the learner has
  • 62. Inductive Bias • Need to make assumptions – E perie ce alo e does ’t allow us to ake conclusions about unseen data instances • Two types of bias: – Restriction: Limit the hypothesis space (e.g., look at rules) – Preference: Impose ordering on hypothesis space (e.g., more general, consistent with data)
  • 63. Inductive learning • Inductive learning: Inducing a general function from training examples – Construct hypothesis h to agree with c on the training examples. – A hypothesis is consistent if it agrees with all training examples. – A hypothesis said to generalize well if it correctly predicts the value of y for novel example. • Inductive Learning is an Ill Posed Problem: Unless we see all possible examples the data is not sufficient for an inductive learning algorithm to find a unique solution.
  • 64. Inductive Learning Hypothesis • Any hypothesis h found to approximate the target function c well over a sufficiently large set of training examples D will also approximate the target function well over other unobserved examples.
  • 65. Learning as Refining the Hypothesis Space • Concept learning is a task of searching an hypotheses space of possible representations looking for the representation(s) that best fits the data, given the bias. • The tendency to prefer one hypothesis over another is called a bias. • Given a representation, data, and a bias, the problem of learning can be reduced to one of search.
  • 66. Occam's Razor ⁻ A classical example of Inductive Bias • the simplest consistent hypothesis about the target function is actually the best
  • 67. Some more Types of Inductive Bias • Minimum description length: when forming a hypothesis, attempt to minimize the length of the description of the hypothesis. • Maximum margin: when drawing a boundary between two classes, attempt to maximize the width of the boundary (SVM)
  • 68. Important issues in Machine Learning • What are good hypothesis spaces? • Algorithms that work with the hypothesis spaces • How to optimize accuracy over future data points (overfitting) • How can we have confidence in the result? (How much training data – statistical qs) • Are some learning problems computationally intractable?
  • 69. Generalization • Components of generalization error – Bias: how much the average model over all training sets differ from the true model? • Error due to inaccurate assumptions/simplifications made by the model – Variance: how much models estimated from different training sets differ from each other
  • 70. Underfitting and Overfitting • Underfitting: odel is too si ple to represe t all the relevant class characteristics – High bias and low variance – High training error and high test error • Overfitting: odel is too co ple a d fits irrelevant characteristics (noise) in the data – Low bias and high variance – Low training error and high test error
  • 71. Foundations of Machine Learning Sudeshna Sarkar IIT Kharagpur Module 1: Introduction Part D: Evaluation and Cross validation
  • 72. Experimental Evaluation of Learning Algorithms • Evaluating the performance of learning systems is important because: – Learning systems are usually designed to predict the lass of future unla eled data points. • Typical choices for Performance Evaluation: – Error – Accuracy – Precision/Recall • Typical choices for Sampling Methods: – Train/Test Sets – K-Fold Cross-validation
  • 73. Evaluating predictions • Suppose we want to make a prediction of a value for a target feature on example x: – y is the observed value of target feature on example x. – is the predicted value of target feature on example x. – How is the error measured?
  • 74. Measures of error • Absolute error: 𝑛 |𝑓 − | • Sum of squares error: 𝑛 𝑓 − 𝑛 𝑖= • Number of misclassifications: 𝑛 𝛿 𝑓 , 𝑛 𝑖= • 𝛿 𝑓 , is 1 if f(x)  y, and 0, otherwise.
  • 75. Confusion Matrix • Accuracy = (TP+TN)/(P+N) • Precision = TP/(TP+FP) • Recall/TP rate = TP/P • FP Rate = FP/N True class  Hypothesized class Pos Neg Yes TP FP No FN TN P=TP+FN N=FP+TN
  • 76. Sample Error and True Error • The sample error of hypothesis f with respect to target function c and data sample S is: errors(f)= 1/n xS(f(x),c(x)) • The true error (denoted errorD(f)) of hypothesis f with respect to target function c and distribution D, is the probability that h will misclassify an instance drawn at random according to D. errorD(f)= PrxD[f(x)  c(x)]
  • 77. Why Errors • Errors in learning are caused by: – Limited representation (representation bias) – Limited search (search bias) – Limited data (variance) – Limited features (noise)
  • 78. Difficulties in evaluating hypotheses with limited data • Bias in the estimate: The sample error is a poor estimator of true error – ==> test the hypothesis on an independent test set • We divide the examples into: – Training examples that are used to train the learner – Test examples that are used to evaluate the learner • Variance in the estimate: The smaller the test set, the greater the expected variance.
  • 79. Validation set Validation fails to use all the available data
  • 80. k-fold cross-validation 1. Split the data into k equal subsets 2. Perform k rounds of learning; on each round – 1/k of the data is held out as a test set and – the remaining examples are used as training data. 3. Compute the average test set score of the k rounds
  • 82. Trade-off • In machine learning, there is always a trade- off between – complex hypotheses that fit the training data well – simpler hypotheses that may generalise better. • As the amount of training data increases, the generalization error decreases.