This document provides an introduction to machine learning, including examples of applications in medical diagnosis, object recognition, and finance. It outlines the main types of machine learning: supervised learning, unsupervised learning, and reinforcement learning. Supervised learning involves predicting target values based on labeled training data, and can be used for classification or regression problems. Unsupervised learning involves discovering hidden patterns in unlabeled data through clustering. Reinforcement learning involves agents learning policies from rewards and punishments. The document also discusses inductive learning, hypothesis spaces, evaluation methods like accuracy and cross-validation, and challenges in evaluating models with limited data.
2. Outline
Introduction to Machine Learning
Applications
Machine Learning Solution
Types of Machine Learning
Supervised Learning
Unsupervised Learning
Reinforcement Learning
Inductive Learning and Inductive Bias
Experimental Evaluation
3. An Example Application
• An emergency room in a hospital measures 17
variables (e.g., blood pressure, age, etc) of newly
admitted patients.
• A decision is needed: whether to put a new
patient in an intensive-care unit.
• Due to the high cost of ICU, those patients who
may survive less than a month are given higher
priority.
• Problem: to predict high-risk patients and
discriminate them from low-risk patients.
4. An Example Application (Contd..)
A credit card company receives thousands of applications
for new cards. Each application contains information about
an applicant,
• Age
• Marital status
• Annual salary
• Outstanding debit
• Credit rating
Problem: to decide whether an application should approved,
or to classify applications into two categories, approved and
not approved.
6. Machine Learning Paradigm
Learning is the ability to improve one’s behavior
based on experience
Building computer vision that automatically
improve with experience
A computer program is said to learn from
experience E with respect to some class of task T
and performance measure P if its performance on
task in T as measured by P improves with experience E.
Prediction and classification are the tasks and
experience is the data.
10. Supervised Learning
Given:
1. A set of input features X1, X2,…Xn
2. Target feature Y
3. A set of training examples where the values for
the input and target features are given for each
example
4. A new example where only the values for the
input features are given
11. Supervised Learning
Predict the values for the target feature for the
new example:
• Classification when Y is discrete
• Regression when Y is continuous
12. Classification
• Example: Credit
scoring
• Differentiating
between low-risk
and high-risk
customers from
their income and
savings
Discriminant: IF income > θ1 AND savings > θ2
THEN low-risk ELSE high-risk
13. Regression
y = wx+w0
• Example: Price of a
used car
• x : car attributes
y : price
y = g (x | θ )
g ( ) model,
θ parameters
21. Unsupervised Learning (Clustering)
• Class Labels of the data are unknown
• Given a set of data, the task is to establish the
existence of classes or clusters in data
25. Reinforcement Learning
• Topics:
– Policies: what actions should an agent take in a particular
situation
– Utility estimation: how good is a state (used by policy)
• No supervised output but delayed reward
• Credit assignment problem (what was responsible for the
outcome)
• Applications:
– Game playing
– Robot in a maze
– Multiple agents, partial observability, ...
27. Inductive Learning
• Inductive learning or “Prediction”:
– Given examples of a function (X, F(X))
– Predict function F(X) for new examples X
• Classification
F(X) = Discrete
• Regression
F(X) = Continuous
• Probability estimation
F(X) = Probability(X):
28. Terminology
0.0 1.0 2.0 3.0 4.0 5.0 6.0
0.01.02.03.0
Feature Space:
Properties that describe the problem
32. Inductive Bias
• Need to make assumptions
– Experience alone doesn’t allow us to make
conclusions about unseen data instances
• Two types of bias:
– Restriction: Limit the hypothesis space
(e.g., look at rules)
– Preference: Impose ordering on hypothesis space
(e.g., more general, consistent with data)
33. Evaluation
• Evaluation is important because: systems are
designed to predict the class of future unlabeled
data points.
• Typical choices of performance evaluation are:
Error
Accuracy
Precision/Recall
• Typical Choices of sampling methods for data:
Train/test set
K-fold cross validation
34. Evaluation for Regression Problem
• Suppose
y : observed value of target feature on example x
ŷ : predicted value of target feature on example x
Absolute error (for single training ex.)
(for single training ex.)
38. Confusion Matrix
Precision: Out of the examples that LA marks as a positive, how many are
correctly Positive.
Recall: How many of the positive examples the LA treats as positive.
39. Difficulty in Evaluating Limited data
If used all data for training: Will get bad estimate of the
error, because there should be independent set for
training and testing.
But size of the training set will decrease and will result in
over fitting.
CROSS VALIDATION
40. Cross Validation
Hold –Out Cross Validation:
The available data set D is divided into two disjoint
subsets:
the training set Dtrain (for learning a model)
the test set Dtest(for testing the model)
This method is mainly used when the data set D is large.