What is MachineLearning?
“Learning is any process by which a system improves
performance from experience.”
- Herbert Simon
Definition by Tom Mitchell (1998):
Machine Learning is the study of algorithms that
• improve their performance P
• at some task T
• with experience E.
A well-defined learning task is given by <P, T, E>.
Introduction
distinguish two approaches
•knowledge-based: a computer program whose logic encodes a large
number of properties of the world, usually developed by a team of
experts over many years.
• machine learning: extract information directly from historical data
and extrapolate to make predictions
6.
When Do WeUse Machine Learning?
ML is used when:
• Human expertise does not exist (navigating on Mars)
• Humans can’t explain their expertise (speech recognition)
• Models must be customized (personalized medicine)
• Models are based on huge amounts of data (genomics)
Learning isn’t always useful:
• There is no need to “learn” to calculate payroll
Based on slide by E. Alpaydin
5
7.
7
Slide credit: GeoffreyHinton
Some more examples of tasks that are best
solved by using a learning algorithm
• Recognizing patterns:
– Facial identities or facial expressions
– Handwritten or spoken words
– Medical images
• Generating patterns:
– Generating images or motion sequences
• Recognizing anomalies:
– Unusual credit card transactions
– Unusual patterns of sensor readings in a nuclear power plant
• Prediction:
– Future stock prices or currency exchange rates
8.
8
Slide credit: PedroDomingos
Sample Applications
• Web search
• Computational biology
• Finance
• E-commerce
• Space exploration
• Robotics
• Information extraction
• Social networks
• Debugging software
• [Your favorite area]
Defining the LearningTask
Improve on task T, with respect to
performance metric P, based on experience E
T: Playing checkers
P: Percentage of games won against an arbitrary
opponent E: Playing practice games against itself
T: Recognizing hand-written words
P: Percentage of words correctly classified
E: Database of human-labeled images of
handwritten words
T: Driving on four-lane highways using vision
sensors
P: Average distance traveled before a human-
judged error
E: A sequence of images and steering commands recorded while
observing a human driver.
T: Categorize email messages as spam or legitimate.
P: Percentage of email messages correctly classified.
Slide credit: Ray Mooney
10
Types of Learning
•Supervised (inductive) learning
– Given: training data + desired outputs (labels)
• Unsupervised learning
– Given: training data (without desired outputs)
• Semi-supervised learning
– Given: training data + a few desired outputs
• Reinforcement learning
– Rewards from sequence of actions
Based on slide by Pedro Domingos
24
13.
23
Supervised Machine Learning
Givendata like this, how can we learn to
predict the prices of other houses? as a
function of the size of their living areas?
House price = F (living area)
What is an instance space?
To make our housing example more
interesting, let’s consider a slightly richer
dataset in which we also know the
number of bedrooms in each house:
House price = F (living area, # bedrooms)
What is an instance space?
14.
14
Output
An item
drawn froman
output space
Input
An item
drawn from an
input space
System
Supervised Learning
We consider systems that apply a function
to input items x and return an output .
Dan
Roth
+/-
15.
15
Supervised Learning
In (supervised)machine learning, we deal with systems
whose is learned from examples.
Output
An item
drawn from an
output space
Input
An item
drawn from an
input space
System
16.
16
Why use learning?
Wetypically use machine learning when the function we want
the system to apply is unknown to us, and we cannot “think”
about it. The function could actually be simple.
17.
17
Output
An item
drawn froma
label space
Input
An item
drawn from an
instance space
Learned Model
Supervised Learning
Target function
The space of all
functions our
algorithm
“considers” is called
the Hypothesis
space.
18.
18
Supervised learning: Training
Givethe learner examples in
The learner returns a model
Labeled Training
Data
…
Learned
model
Learning
Algorithm
is the model we’ll use in
our application
( Dan
Roth,
+)
If
(the…character
of the …token
is..) AND
(the …. is…. )
then Negative.
Otherwise,
Positive.
An input
example
An element in the
instance Space
21
Apply the modelto the raw test data
Evaluate by comparing predicted labels against the test labels
Test
Labels
Raw Test
Data
Supervised learning: Testing
Learned
model
Predicted
Labels
22.
Data In, ModelOut
DATA
𝒟
MACHINE LEARNING MODEL 𝑓
⋅
acceleration 𝑎 =
Force 𝐹
mass
𝑚
We would like to recover a model like this!
23.
Data In, ModelOut
MACHINE LEARNING MODEL 𝑓
⋅
acceleration 𝑎 =
Force 𝐹 mass
𝑚
ML Design (hypothesis class, loss function, optimizer, hyperparameters, features, …)
𝑚
𝑎 = 𝑤0 + 𝑤1𝐹 + 𝑤2𝑚 + 𝑤3 𝐹 ∗ 𝑚
+ 𝑤4
( 𝐹
)
Example hypothesis
class:
(for varying values
of 𝑤0,
… 𝑤4 )
𝑚
𝑎 = 0 + 0𝐹 + 0𝑚 + 0 𝐹 ∗ 𝑚
+
1( 𝐹
)
Learning = finding “good” values for the weights 𝑤0 , 𝑤1 ,
… , 𝑤#
DATA 𝒟
24.
24
Key Issues inMachine Learning
Modeling
How to formulate application problems as machine learning problems ? How
to represent the data?
Learning Protocols (where is the data & labels coming from?)
Representation
What functions should we learn (hypothesis spaces) ?
How to map raw input to an instance space?
Any rigorous way to find these? Any general approach?
Algorithms
What are good algorithms?
How do we define success?
Generalization vs. over fitting
The computational problem
25.
25
Using supervised learning
Whatis our instance space?
Gloss: What kind of features are we using?
What is our label space?
Gloss: What kind of learning task are we dealing with?
What is our hypothesis space?
Gloss: What kind of functions (models) are we learning?
What learning algorithm do we use?
Gloss: How do we learn the model from the labeled data?
What is our loss function/evaluation metric?
Gloss: How do we measure success? What drives learning?
26.
26
Output
An item
drawn froma label
space
Input
An item
drawn from an
instance space X
Learned
Model
1. The instance space
Designing an appropriate instance space
is crucial for how well we can predict .
27.
27
1. The instancespace
When we apply machine learning to a task, we first need to define the instance space
.
Instances are defined by features:
Boolean features:
Is there a folder named after the sender?
Does this email contain the word ‘class’?
Does this email contain the word ‘waiting’?
Does this email contain the word ‘class’ and the word ‘waiting’?
Numerical features:
How often does ‘learning’ occur in this email?
How long is the email?
How many emails have I seen from this sender over the last day/week/month?
Bag of tokens
Just list all the tokens in the input
Does it add anything if
you already have the
previous two features?
28.
28
as a vectorspace
is an N-dimensional vector space (e.g. {0,1}N
, )
Each dimension = one feature.
Each is a feature vector (hence the boldface ).
Think of = [ … ] as a point in:
𝑥1
𝑥2
- Patient Age
- Clump thickness
- Tumor Color
- Distance from optic nerv
- Cell type
29.
29
Good features areessential
The choice of features is crucial for how well a task can be learned.
In many application areas (language, vision, etc.), a lot of work goes into
designing suitable features.
This requires domain expertise.
We can’t teach you what specific features
to use for your task.
But we will touch on some general principles
30.
30
Output
An item
drawn froma label
space
Input
An item
drawn from an
instance space
Learned
Model
2. The label space
The label space determines what kind of supervised learning task
we are dealing with
31.
31
Supervised learning tasksI
Output labels are categorical:
Binary classification: Two possible labels
Multiclass classification: possible labels
Output labels are structured objects (sequences of labels, parse trees, graphs,
etc.)
Structure learning: multiple labels that are related (thus constrained)
Three events. When
classifying the temporal
relations between them we
need to account for the
relations between them.
34
Output
An item
drawn froma
label space
Input
An item
drawn from an
instance space
Learned
Model
3. The model
We need to choose what kind of model
we want to learn
34.
Types of Learning
•Supervised learning
Input: Examples of inputs and desired outputs
Output: Model that predicts output given a new input
• Unsupervised learning
Input: Examples of some data (no “outputs”)
Output: Representation of structure in the data
• Reinforcement learning
Input: Sequence of agent interactions with an environment
Output: Policy that maps agent’s observations to actions
38
Unsupervised learning:?!
1. Inunsupervised learning the training data is unlabeled. The system tries to learn
without a teacher.
2. Unsupervised methods must rely solely on the intrinsic structure of data points
to learn a good hypothesis. Thus, unsupervised methods do not need a teacher or
domain expert who provides labels for data points.
3. Two large families of unsupervised methods
1. Clustering
2. Feature Learning
1. Two important applications of feature learning are dimensionality reduction and data visualization.
Notation:
In the clustering problem, we are given a training set, {x(1)
, x(2)
, ……. X(n)
}, and want to
group the data into a few cohesive “clusters.” but no labels y(i)
are given.
Unsupervised Learning
38.
Unsupervised Learning
• Givenx1, x2, ..., xn (without labels)
• The goal is to find groups of similar observations.
• Output hidden structure behind the x’s
– E.g., clustering
31
Organize computing clustersSocial network analysis
Image credit: NASA/JPL-Caltech/E. Churchwell (Univ. of Wisconsin, Madison)
Astronomical data analysis
Market segmentation
Slide credit: Andrew Ng
Unsupervised Learning
33
42.
Unsupervised Learning
• Independentcomponent analysis – separate a
combined signal into its original sources
34
Image credit: statsoft.com Audio from http://www.ism.ac.jp/~shiro/research/blindsep.html
43.
Unsupervised Learning
• Independentcomponent analysis – separate a
combined signal into its original sources
35
Image credit: statsoft.com Audio from http://www.ism.ac.jp/~shiro/research/blindsep.html
44.
• Here aresome of the most important Unsupervised learning algorithms.
Clustering
K-means
DBSCAN
Hierarchical Cluster Analysis (HCA)
Visualization and dimensionality reduction
Principal Component Analysis (PCA)
Kernel PCA
Association rule learning
Apriori
Eclat
Anomaly detection and novelty detection
One-class SVM
Isolation Forest
Unsupervised Learning
45.
• In supervisedlearning, we saw algorithms that tried to make their outputs mimic the labels y
given in the training set. In that setting, the labels gave an unambiguous “right answer” for each
of the inputs x.
• In contrast, for many sequential decision-making and control problems, it is very difficult to
provide this type of explicit supervision to a learning algorithm.
• Example:
• if we have just built a four-legged robot and are trying to program it to walk, then initially, we
have no idea what the “correct” actions to take are to make it walk, and so we do not know how
to provide explicit supervision for a learning algorithm to try to mimic.
• In the reinforcement learning framework,
• The learning system, called an agent in this context, can observe the environment, select and
perform actions, and get rewards in return (or penalties in the form of negative rewards).
• It must then learn by itself what is the best strategy, called a policy, to get the most reward over
time
• A policy defines what action the agent should choose when it is in a given situation.
Reinforcement Learning
Machine learning isProgramming 2.0
Traditional Programming Machine learning (ML)
Revision:
49.
Task specification inML: programs →
examples
def compute_force(m, a):
‘’’
returns force (in N) needed to move
mass m (in kg) at
acceleration a (in m/s^2)
‘’’
F = m * a
return F
Mass m (kg) Acceleration a (m/s^2) Force F (N)
2.5 4 10
5 2 10
20 0.5 10
40 0.25 10
40 2.5 100
20 5 100
50 2 100
Here is a program to
implement Newton’s
second law of motion
Here are some
examples. Try to
imitate them.
50.
Task specification inML: programs →
examples
def cow_or_turtle(image):
???
Here is a program to
recognize an image as
a cow or a turtle
Here are some
examples. Try to
imitate them.
“cows
”
“turtles”
51.
Putting a trainedML system to use
Here are some
examples. Try to
imitate them.
“cows
”
“turtles”
“cow
”
52.
Putting a trainedML system to use
Here are some
examples. Try to
imitate them.
“cows
”
“turtles”
“turtle”
53.
Framing an MLproblem (Mitchell’s P, T, E)
Data curation (sourcing, scraping, collection, labeling)
Data analysis / visualization
ML Design (hypothesis class, loss function, optimizer, hyperparameters, features)
Train model
Validate / Evaluate
Deploy (and generate new data)
Monitor performance on new data
ML
Workflow
54.
Framing an MLproblem (Mitchell’s P, T, E)
Data curation (sourcing, scraping, collection, labeling)
Data analysis / visualization
ML Design (hypothesis class, loss function, optimizer, hyperparameters, features)
Train model
Validate / Evaluate
Deploy (and generate new data)
Monitor performance on new data
ML
Workflow
55.
Framing an MLproblem (Mitchell’s P, T, E)
Data curation (sourcing, scraping, collection, labeling)
Data analysis / visualization
ML Design (hypothesis class, loss function, optimizer, hyperparameters,
features)
Train model
Validate / Evaluate
Deploy (and generate new data)
Monitor performance on new data
ML
Workflow
Main focus
of this class
56.
Framing an MLproblem (Mitchell’s P, T, D)
Data curation (sourcing, scraping, collection, labeling)
Data analysis / visualization
ML Design (hypothesis class, loss function, optimizer, hyperparameters, features)
Train model
Validate / Evaluate
Deploy (and generate new data)
Monitor performance on new data
ML
Workflow
Main focus
of this class
57.
Framing an MLproblem (Mitchell’s P, T, D)
Data curation (sourcing, scraping, collection, labeling)
Data analysis / visualization
ML Design (hypothesis class, loss function, optimizer, hyperparameters, features)
Train model
Validate / Evaluate
Deploy (and generate new data)
Monitor performance on new data
ML
Workflow
Main focus
of this class
Project
58.
Framing an MLproblem (Mitchell’s P, T,
D)
Data curation (sourcing, scraping, collection, labeling)
Data analysis / visualization
ML Design (hypothesis class, loss function, optimizer, hyperparameters, features)
Train model
Validate / Evaluate
Deploy (and generate new data)
Monitor performance on new data
ML
Workflow
Main focus
of this class
Project
ML for SocialGood
Applying AI for social good | McKinsey
Elizabeth Bondi et al., SPOT poachers in action: Augmenting
conservation drones with automatic detection in near real time,
AAAI 2018
Designing a LearningSystem
• Choose the training experience
• Choose exactly what is to be learned
– i.e. the target function
• Choose how to represent the target function
• Choose a learning algorithm to infer the target
function from the experience
Environment/
Experience
Learner
Knowledge
Performance
Element
Based on slide by Ray Mooney
Training data
Testing data
41
63.
Training vs. TestDistribution
• We generally assume that the training and
test examples are independently drawn from
the same overall distribution of data
– We call this “i.i.d” which stands for “independent
and identically distributed”
• If examples are not independent, requires
collective classification
• If test distribution is different, requires
transfer learning
Slide credit: Ray Mooney
42
64.
ML in aNutshell
• Tens of thousands of machine learning
algorithms
– Hundreds new every year
• Every ML algorithm has three
components:
– Representation
– Optimization
– Evaluation
Slide credit: Pedro Domingos
43
65.
44
Slide credit: RayMooney
Various Function Representations
• Numerical functions
– Linear regression
– Neural networks
– Support vector machines
• Symbolic functions
– Decision trees
– Rules in propositional logic
– Rules in first-order predicate logic
• Instance-based functions
– Nearest-neighbor
– Case-based
• Probabilistic Graphical Models
– Naïve Bayes
– Bayesian networks
– Hidden-Markov Models (HMMs)
– Probabilistic Context Free Grammars (PCFGs)
– Markov networks
ML in Practice
•Understand domain, prior knowledge, and goals
• Data integration, selection, cleaning, pre-processing, etc.
• Learn models
• Interpret results
• Consolidate and deploy discovered knowledge
Loop
48
Based on a slide by Pedro Domingos
Editor's Notes
#21 Could potentially check g(x) (Exact learning);
Can the test data, even without the labels, be used to learn something? Maybe something about the distribution of the tests?
#24 Badges game
Don’t give me the answer
Start thinking about how to write a program that will figure out whether my name has + or – next to it.
#25 What is the real input the algorithm sees?
What are the labels? Folder names; maybe there is a hierarchy
Set of rules: if the sender is… put it in their folder. Decision tree: if condition A is satisfied, Check condition B; if not, Check condition C.; NN
The representation might dictate the learning algorithm.
Number of correct predictions? Maybe it’s an non-symmetric loss: I never want to file incorrectly, I prefer to abstain. Maybe there is a hierarchy, and I want a loss that is sensitive to the hierarchy.