Ramanujan College, University of Delhi
New Delhi-110019, India.
Topic: Machine Learning
What is Machine Learning?
“Learning is any process by which a system improves
performance from experience.”
- Herbert Simon
Definition by Tom Mitchell (1998):
Machine Learning is the study of algorithms that
• improve their performance P
• at some task T
• with experience E.
A well-defined learning task is given by <P, T, E>.
Traditional Programming
Machine Learning
Computer
Data
Program
Output
Computer
Data
Output
Program
Slide credit: Pedro Domingos
4
4
Machine Learning
Traditional
Programming
Program/
Rules
Data
Output/
Answer
Machine Learning
Output/Answers
Data
Program/
Rules/model
Makes sense.
Use Machine
Learning
Sort these numbers in
decreasing order
2, 4, 18, 1, 77, 0, 85
Does not make sense.
Do not use Machine
Learning
(what are the risks if you
do?)
Rock paper scissors
Introduction
distinguish two approaches
• knowledge-based: a computer program whose logic encodes a large
number of properties of the world, usually developed by a team of
experts over many years.
• machine learning: extract information directly from historical data
and extrapolate to make predictions
When Do We Use Machine Learning?
ML is used when:
• Human expertise does not exist (navigating on Mars)
• Humans can’t explain their expertise (speech recognition)
• Models must be customized (personalized medicine)
• Models are based on huge amounts of data (genomics)
Learning isn’t always useful:
• There is no need to “learn” to calculate payroll
Based on slide by E. Alpaydin
5
7
Slide credit: Geoffrey Hinton
Some more examples of tasks that are best
solved by using a learning algorithm
• Recognizing patterns:
– Facial identities or facial expressions
– Handwritten or spoken words
– Medical images
• Generating patterns:
– Generating images or motion sequences
• Recognizing anomalies:
– Unusual credit card transactions
– Unusual patterns of sensor readings in a nuclear power plant
• Prediction:
– Future stock prices or currency exchange rates
8
Slide credit: Pedro Domingos
Sample Applications
• Web search
• Computational biology
• Finance
• E-commerce
• Space exploration
• Robotics
• Information extraction
• Social networks
• Debugging software
• [Your favorite area]
Samuel’s Checkers-Player
“Machine Learning: Field of study that gives
computers the ability to learn without being
explicitly programmed.” -Arthur Samuel (1959)
9
Defining the Learning Task
Improve on task T, with respect to
performance metric P, based on experience E
T: Playing checkers
P: Percentage of games won against an arbitrary
opponent E: Playing practice games against itself
T: Recognizing hand-written words
P: Percentage of words correctly classified
E: Database of human-labeled images of
handwritten words
T: Driving on four-lane highways using vision
sensors
P: Average distance traveled before a human-
judged error
E: A sequence of images and steering commands recorded while
observing a human driver.
T: Categorize email messages as spam or legitimate.
P: Percentage of email messages correctly classified.
Slide credit: Ray Mooney
10
Types of Learning
23
Types of Learning
• Supervised (inductive) learning
– Given: training data + desired outputs (labels)
• Unsupervised learning
– Given: training data (without desired outputs)
• Semi-supervised learning
– Given: training data + a few desired outputs
• Reinforcement learning
– Rewards from sequence of actions
Based on slide by Pedro Domingos
24
23
Supervised Machine Learning
Given data like this, how can we learn to
predict the prices of other houses? as a
function of the size of their living areas?
House price = F (living area)
What is an instance space?
To make our housing example more
interesting, let’s consider a slightly richer
dataset in which we also know the
number of bedrooms in each house:
House price = F (living area, # bedrooms)
What is an instance space?
14
Output
An item
drawn from an
output space
Input
An item
drawn from an
input space
System
Supervised Learning
We consider systems that apply a function
to input items x and return an output .
Dan
Roth
+/-
15
Supervised Learning
In (supervised) machine learning, we deal with systems
whose is learned from examples.
Output
An item
drawn from an
output space
Input
An item
drawn from an
input space
System
16
Why use learning?
We typically use machine learning when the function we want
the system to apply is unknown to us, and we cannot “think”
about it. The function could actually be simple.
17
Output
An item
drawn from a
label space
Input
An item
drawn from an
instance space
Learned Model
Supervised Learning
Target function
The space of all
functions our
algorithm
“considers” is called
the Hypothesis
space.
18
Supervised learning: Training
Give the learner examples in
The learner returns a model
Labeled Training
Data
…
Learned
model
Learning
Algorithm
is the model we’ll use in
our application
( Dan
Roth,
+)
If
(the…character
of the …token
is..) AND
(the …. is…. )
then Negative.
Otherwise,
Positive.
An input
example
An element in the
instance Space
19
Supervised learning: Testing
Reserve some labeled data for testing
Labeled
Test Data
…
20
Supervised learning: Testing
Labeled
Test Data
Test
Labels
...
Raw Test
Data
21
Apply the model to the raw test data
Evaluate by comparing predicted labels against the test labels
Test
Labels
Raw Test
Data
Supervised learning: Testing
Learned
model
Predicted
Labels
Data In, Model Out
DATA
𝒟
MACHINE LEARNING MODEL 𝑓
⋅
acceleration 𝑎 =
Force 𝐹
mass
𝑚
We would like to recover a model like this!
Data In, Model Out
MACHINE LEARNING MODEL 𝑓
⋅
acceleration 𝑎 =
Force 𝐹 mass
𝑚
ML Design (hypothesis class, loss function, optimizer, hyperparameters, features, …)
𝑚
𝑎 = 𝑤0 + 𝑤1𝐹 + 𝑤2𝑚 + 𝑤3 𝐹 ∗ 𝑚
+ 𝑤4
( 𝐹
)
Example hypothesis
class:
(for varying values
of 𝑤0,
… 𝑤4 )
𝑚
𝑎 = 0 + 0𝐹 + 0𝑚 + 0 𝐹 ∗ 𝑚
+
1( 𝐹
)
Learning = finding “good” values for the weights 𝑤0 , 𝑤1 ,
… , 𝑤#
DATA 𝒟
24
Key Issues in Machine Learning
Modeling
How to formulate application problems as machine learning problems ? How
to represent the data?
Learning Protocols (where is the data & labels coming from?)
Representation
What functions should we learn (hypothesis spaces) ?
How to map raw input to an instance space?
Any rigorous way to find these? Any general approach?
Algorithms
What are good algorithms?
How do we define success?
Generalization vs. over fitting
The computational problem
25
Using supervised learning
What is our instance space?
Gloss: What kind of features are we using?
What is our label space?
Gloss: What kind of learning task are we dealing with?
What is our hypothesis space?
Gloss: What kind of functions (models) are we learning?
What learning algorithm do we use?
Gloss: How do we learn the model from the labeled data?
What is our loss function/evaluation metric?
Gloss: How do we measure success? What drives learning?
26
Output
An item
drawn from a label
space
Input
An item
drawn from an
instance space X
Learned
Model
1. The instance space
Designing an appropriate instance space
is crucial for how well we can predict .
27
1. The instance space
When we apply machine learning to a task, we first need to define the instance space
.
Instances are defined by features:
Boolean features:
Is there a folder named after the sender?
Does this email contain the word ‘class’?
Does this email contain the word ‘waiting’?
Does this email contain the word ‘class’ and the word ‘waiting’?
Numerical features:
How often does ‘learning’ occur in this email?
How long is the email?
How many emails have I seen from this sender over the last day/week/month?
Bag of tokens
Just list all the tokens in the input
Does it add anything if
you already have the
previous two features?
28
as a vector space
is an N-dimensional vector space (e.g. {0,1}N
, )
Each dimension = one feature.
Each is a feature vector (hence the boldface ).
Think of = [ … ] as a point in:
𝑥1
𝑥2
- Patient Age
- Clump thickness
- Tumor Color
- Distance from optic nerv
- Cell type
29
Good features are essential
The choice of features is crucial for how well a task can be learned.
In many application areas (language, vision, etc.), a lot of work goes into
designing suitable features.
This requires domain expertise.
We can’t teach you what specific features
to use for your task.
But we will touch on some general principles
30
Output
An item
drawn from a label
space
Input
An item
drawn from an
instance space
Learned
Model
2. The label space
The label space determines what kind of supervised learning task
we are dealing with
31
Supervised learning tasks I
Output labels are categorical:
Binary classification: Two possible labels
Multiclass classification: possible labels
Output labels are structured objects (sequences of labels, parse trees, graphs,
etc.)
Structure learning: multiple labels that are related (thus constrained)
Three events. When
classifying the temporal
relations between them we
need to account for the
relations between them.
23
Supervised Learning
34
Output
An item
drawn from a
label space
Input
An item
drawn from an
instance space
Learned
Model
3. The model
We need to choose what kind of model
we want to learn
Types of Learning
• Supervised learning
 Input: Examples of inputs and desired outputs
 Output: Model that predicts output given a new input
• Unsupervised learning
 Input: Examples of some data (no “outputs”)
 Output: Representation of structure in the data
• Reinforcement learning
 Input: Sequence of agent interactions with an environment
 Output: Policy that maps agent’s observations to actions
23
Supervised Learning
23
Supervised Learning
38
Unsupervised learning:?!
1. In unsupervised learning the training data is unlabeled. The system tries to learn
without a teacher.
2. Unsupervised methods must rely solely on the intrinsic structure of data points
to learn a good hypothesis. Thus, unsupervised methods do not need a teacher or
domain expert who provides labels for data points.
3. Two large families of unsupervised methods
1. Clustering
2. Feature Learning
1. Two important applications of feature learning are dimensionality reduction and data visualization.
Notation:
In the clustering problem, we are given a training set, {x(1)
, x(2)
, ……. X(n)
}, and want to
group the data into a few cohesive “clusters.” but no labels y(i)
are given.
Unsupervised Learning
Unsupervised Learning
• Given x1, x2, ..., xn (without labels)
• The goal is to find groups of similar observations.
• Output hidden structure behind the x’s
– E.g., clustering
31
23
Unsupervised Learning
• The goal is to find groups of similar observations (Clustering).
[Source: Daphne Koller]
Genes
Individuals
Unsupervised Learning
Genomics application: group individuals by genetic similarity
32
Organize computing clusters Social network analysis
Image credit: NASA/JPL-Caltech/E. Churchwell (Univ. of Wisconsin, Madison)
Astronomical data analysis
Market segmentation
Slide credit: Andrew Ng
Unsupervised Learning
33
Unsupervised Learning
• Independent component analysis – separate a
combined signal into its original sources
34
Image credit: statsoft.com Audio from http://www.ism.ac.jp/~shiro/research/blindsep.html
Unsupervised Learning
• Independent component analysis – separate a
combined signal into its original sources
35
Image credit: statsoft.com Audio from http://www.ism.ac.jp/~shiro/research/blindsep.html
• Here are some of the most important Unsupervised learning algorithms.
Clustering
K-means
DBSCAN
Hierarchical Cluster Analysis (HCA)
Visualization and dimensionality reduction
Principal Component Analysis (PCA)
Kernel PCA
Association rule learning
Apriori
Eclat
Anomaly detection and novelty detection
One-class SVM
Isolation Forest
Unsupervised Learning
• In supervised learning, we saw algorithms that tried to make their outputs mimic the labels y
given in the training set. In that setting, the labels gave an unambiguous “right answer” for each
of the inputs x.
• In contrast, for many sequential decision-making and control problems, it is very difficult to
provide this type of explicit supervision to a learning algorithm.
• Example:
• if we have just built a four-legged robot and are trying to program it to walk, then initially, we
have no idea what the “correct” actions to take are to make it walk, and so we do not know how
to provide explicit supervision for a learning algorithm to try to mimic.
• In the reinforcement learning framework,
• The learning system, called an agent in this context, can observe the environment, select and
perform actions, and get rewards in return (or penalties in the form of negative rewards).
• It must then learn by itself what is the best strategy, called a policy, to get the most reward over
time
• A policy defines what action the agent should choose when it is in a given situation.
Reinforcement Learning
Reinforcement Learning
Reinforcement Learning
• Examples:
– Credit assignment problem
– Game playing
– Robot in a maze
– Balance a pole on your hand
36
Machine learning is Programming 2.0
Traditional Programming Machine learning (ML)
Revision:
Task specification in ML: programs →
examples
def compute_force(m, a):
‘’’
returns force (in N) needed to move
mass m (in kg) at
acceleration a (in m/s^2)
‘’’
F = m * a
return F
Mass m (kg) Acceleration a (m/s^2) Force F (N)
2.5 4 10
5 2 10
20 0.5 10
40 0.25 10
40 2.5 100
20 5 100
50 2 100
Here is a program to
implement Newton’s
second law of motion
Here are some
examples. Try to
imitate them.
Task specification in ML: programs →
examples
def cow_or_turtle(image):
???
Here is a program to
recognize an image as
a cow or a turtle
Here are some
examples. Try to
imitate them.
“cows
”
“turtles”
Putting a trained ML system to use
Here are some
examples. Try to
imitate them.
“cows
”
“turtles”
“cow
”
Putting a trained ML system to use
Here are some
examples. Try to
imitate them.
“cows
”
“turtles”
“turtle”
Framing an ML problem (Mitchell’s P, T, E)
Data curation (sourcing, scraping, collection, labeling)
Data analysis / visualization
ML Design (hypothesis class, loss function, optimizer, hyperparameters, features)
Train model
Validate / Evaluate
Deploy (and generate new data)
Monitor performance on new data
ML
Workflow
Framing an ML problem (Mitchell’s P, T, E)
Data curation (sourcing, scraping, collection, labeling)
Data analysis / visualization
ML Design (hypothesis class, loss function, optimizer, hyperparameters, features)
Train model
Validate / Evaluate
Deploy (and generate new data)
Monitor performance on new data
ML
Workflow
Framing an ML problem (Mitchell’s P, T, E)
Data curation (sourcing, scraping, collection, labeling)
Data analysis / visualization
ML Design (hypothesis class, loss function, optimizer, hyperparameters,
features)
Train model
Validate / Evaluate
Deploy (and generate new data)
Monitor performance on new data
ML
Workflow
Main focus
of this class
Framing an ML problem (Mitchell’s P, T, D)
Data curation (sourcing, scraping, collection, labeling)
Data analysis / visualization
ML Design (hypothesis class, loss function, optimizer, hyperparameters, features)
Train model
Validate / Evaluate
Deploy (and generate new data)
Monitor performance on new data
ML
Workflow
Main focus
of this class
Framing an ML problem (Mitchell’s P, T, D)
Data curation (sourcing, scraping, collection, labeling)
Data analysis / visualization
ML Design (hypothesis class, loss function, optimizer, hyperparameters, features)
Train model
Validate / Evaluate
Deploy (and generate new data)
Monitor performance on new data
ML
Workflow
Main focus
of this class
Project
Framing an ML problem (Mitchell’s P, T,
D)
Data curation (sourcing, scraping, collection, labeling)
Data analysis / visualization
ML Design (hypothesis class, loss function, optimizer, hyperparameters, features)
Train model
Validate / Evaluate
Deploy (and generate new data)
Monitor performance on new data
ML
Workflow
Main focus
of this class
Project
23
Supervised Learning:
ML for Social Good
Applying AI for social good | McKinsey
Elizabeth Bondi et al., SPOT poachers in action: Augmenting
conservation drones with automatic detection in near real time,
AAAI 2018
40
Framing a Learning Problem
Designing a Learning System
• Choose the training experience
• Choose exactly what is to be learned
– i.e. the target function
• Choose how to represent the target function
• Choose a learning algorithm to infer the target
function from the experience
Environment/
Experience
Learner
Knowledge
Performance
Element
Based on slide by Ray Mooney
Training data
Testing data
41
Training vs. Test Distribution
• We generally assume that the training and
test examples are independently drawn from
the same overall distribution of data
– We call this “i.i.d” which stands for “independent
and identically distributed”
• If examples are not independent, requires
collective classification
• If test distribution is different, requires
transfer learning
Slide credit: Ray Mooney
42
ML in a Nutshell
• Tens of thousands of machine learning
algorithms
– Hundreds new every year
• Every ML algorithm has three
components:
– Representation
– Optimization
– Evaluation
Slide credit: Pedro Domingos
43
44
Slide credit: Ray Mooney
Various Function Representations
• Numerical functions
– Linear regression
– Neural networks
– Support vector machines
• Symbolic functions
– Decision trees
– Rules in propositional logic
– Rules in first-order predicate logic
• Instance-based functions
– Nearest-neighbor
– Case-based
• Probabilistic Graphical Models
– Naïve Bayes
– Bayesian networks
– Hidden-Markov Models (HMMs)
– Probabilistic Context Free Grammars (PCFGs)
– Markov networks
45
Slide credit: Ray Mooney
Various Search/Optimization
Algorithms
• Gradient descent
– Perceptron
– Backpropagation
• Dynamic Programming
– HMM Learning
– PCFG Learning
• Divide and Conquer
– Decision tree induction
– Rule learning
• Evolutionary Computation
– Genetic Algorithms (GAs)
– Genetic Programming (GP)
– Neuro-evolution
47
Slide credit: Pedro Domingos
Evaluation
• Accuracy
• Precision and recall
• Squared error
• Likelihood
• Posterior probability
• Cost / Utility
• Margin
• Entropy
• K-L divergence
• etc.
47
ML in Practice
• Understand domain, prior knowledge, and goals
• Data integration, selection, cleaning, pre-processing, etc.
• Learn models
• Interpret results
• Consolidate and deploy discovered knowledge
Loop
48
Based on a slide by Pedro Domingos

Machine learning for beginners students.

  • 1.
    Ramanujan College, Universityof Delhi New Delhi-110019, India. Topic: Machine Learning
  • 2.
    What is MachineLearning? “Learning is any process by which a system improves performance from experience.” - Herbert Simon Definition by Tom Mitchell (1998): Machine Learning is the study of algorithms that • improve their performance P • at some task T • with experience E. A well-defined learning task is given by <P, T, E>.
  • 3.
  • 4.
    4 Machine Learning Traditional Programming Program/ Rules Data Output/ Answer Machine Learning Output/Answers Data Program/ Rules/model Makessense. Use Machine Learning Sort these numbers in decreasing order 2, 4, 18, 1, 77, 0, 85 Does not make sense. Do not use Machine Learning (what are the risks if you do?) Rock paper scissors
  • 5.
    Introduction distinguish two approaches •knowledge-based: a computer program whose logic encodes a large number of properties of the world, usually developed by a team of experts over many years. • machine learning: extract information directly from historical data and extrapolate to make predictions
  • 6.
    When Do WeUse Machine Learning? ML is used when: • Human expertise does not exist (navigating on Mars) • Humans can’t explain their expertise (speech recognition) • Models must be customized (personalized medicine) • Models are based on huge amounts of data (genomics) Learning isn’t always useful: • There is no need to “learn” to calculate payroll Based on slide by E. Alpaydin 5
  • 7.
    7 Slide credit: GeoffreyHinton Some more examples of tasks that are best solved by using a learning algorithm • Recognizing patterns: – Facial identities or facial expressions – Handwritten or spoken words – Medical images • Generating patterns: – Generating images or motion sequences • Recognizing anomalies: – Unusual credit card transactions – Unusual patterns of sensor readings in a nuclear power plant • Prediction: – Future stock prices or currency exchange rates
  • 8.
    8 Slide credit: PedroDomingos Sample Applications • Web search • Computational biology • Finance • E-commerce • Space exploration • Robotics • Information extraction • Social networks • Debugging software • [Your favorite area]
  • 9.
    Samuel’s Checkers-Player “Machine Learning:Field of study that gives computers the ability to learn without being explicitly programmed.” -Arthur Samuel (1959) 9
  • 10.
    Defining the LearningTask Improve on task T, with respect to performance metric P, based on experience E T: Playing checkers P: Percentage of games won against an arbitrary opponent E: Playing practice games against itself T: Recognizing hand-written words P: Percentage of words correctly classified E: Database of human-labeled images of handwritten words T: Driving on four-lane highways using vision sensors P: Average distance traveled before a human- judged error E: A sequence of images and steering commands recorded while observing a human driver. T: Categorize email messages as spam or legitimate. P: Percentage of email messages correctly classified. Slide credit: Ray Mooney 10
  • 11.
  • 12.
    Types of Learning •Supervised (inductive) learning – Given: training data + desired outputs (labels) • Unsupervised learning – Given: training data (without desired outputs) • Semi-supervised learning – Given: training data + a few desired outputs • Reinforcement learning – Rewards from sequence of actions Based on slide by Pedro Domingos 24
  • 13.
    23 Supervised Machine Learning Givendata like this, how can we learn to predict the prices of other houses? as a function of the size of their living areas? House price = F (living area) What is an instance space? To make our housing example more interesting, let’s consider a slightly richer dataset in which we also know the number of bedrooms in each house: House price = F (living area, # bedrooms) What is an instance space?
  • 14.
    14 Output An item drawn froman output space Input An item drawn from an input space System Supervised Learning We consider systems that apply a function to input items x and return an output . Dan Roth +/-
  • 15.
    15 Supervised Learning In (supervised)machine learning, we deal with systems whose is learned from examples. Output An item drawn from an output space Input An item drawn from an input space System
  • 16.
    16 Why use learning? Wetypically use machine learning when the function we want the system to apply is unknown to us, and we cannot “think” about it. The function could actually be simple.
  • 17.
    17 Output An item drawn froma label space Input An item drawn from an instance space Learned Model Supervised Learning Target function The space of all functions our algorithm “considers” is called the Hypothesis space.
  • 18.
    18 Supervised learning: Training Givethe learner examples in The learner returns a model Labeled Training Data … Learned model Learning Algorithm is the model we’ll use in our application ( Dan Roth, +) If (the…character of the …token is..) AND (the …. is…. ) then Negative. Otherwise, Positive. An input example An element in the instance Space
  • 19.
    19 Supervised learning: Testing Reservesome labeled data for testing Labeled Test Data …
  • 20.
    20 Supervised learning: Testing Labeled TestData Test Labels ... Raw Test Data
  • 21.
    21 Apply the modelto the raw test data Evaluate by comparing predicted labels against the test labels Test Labels Raw Test Data Supervised learning: Testing Learned model Predicted Labels
  • 22.
    Data In, ModelOut DATA 𝒟 MACHINE LEARNING MODEL 𝑓 ⋅ acceleration 𝑎 = Force 𝐹 mass 𝑚 We would like to recover a model like this!
  • 23.
    Data In, ModelOut MACHINE LEARNING MODEL 𝑓 ⋅ acceleration 𝑎 = Force 𝐹 mass 𝑚 ML Design (hypothesis class, loss function, optimizer, hyperparameters, features, …) 𝑚 𝑎 = 𝑤0 + 𝑤1𝐹 + 𝑤2𝑚 + 𝑤3 𝐹 ∗ 𝑚 + 𝑤4 ( 𝐹 ) Example hypothesis class: (for varying values of 𝑤0, … 𝑤4 ) 𝑚 𝑎 = 0 + 0𝐹 + 0𝑚 + 0 𝐹 ∗ 𝑚 + 1( 𝐹 ) Learning = finding “good” values for the weights 𝑤0 , 𝑤1 , … , 𝑤# DATA 𝒟
  • 24.
    24 Key Issues inMachine Learning Modeling How to formulate application problems as machine learning problems ? How to represent the data? Learning Protocols (where is the data & labels coming from?) Representation What functions should we learn (hypothesis spaces) ? How to map raw input to an instance space? Any rigorous way to find these? Any general approach? Algorithms What are good algorithms? How do we define success? Generalization vs. over fitting The computational problem
  • 25.
    25 Using supervised learning Whatis our instance space? Gloss: What kind of features are we using? What is our label space? Gloss: What kind of learning task are we dealing with? What is our hypothesis space? Gloss: What kind of functions (models) are we learning? What learning algorithm do we use? Gloss: How do we learn the model from the labeled data? What is our loss function/evaluation metric? Gloss: How do we measure success? What drives learning?
  • 26.
    26 Output An item drawn froma label space Input An item drawn from an instance space X Learned Model 1. The instance space Designing an appropriate instance space is crucial for how well we can predict .
  • 27.
    27 1. The instancespace When we apply machine learning to a task, we first need to define the instance space . Instances are defined by features: Boolean features: Is there a folder named after the sender? Does this email contain the word ‘class’? Does this email contain the word ‘waiting’? Does this email contain the word ‘class’ and the word ‘waiting’? Numerical features: How often does ‘learning’ occur in this email? How long is the email? How many emails have I seen from this sender over the last day/week/month? Bag of tokens Just list all the tokens in the input Does it add anything if you already have the previous two features?
  • 28.
    28 as a vectorspace is an N-dimensional vector space (e.g. {0,1}N , ) Each dimension = one feature. Each is a feature vector (hence the boldface ). Think of = [ … ] as a point in: 𝑥1 𝑥2 - Patient Age - Clump thickness - Tumor Color - Distance from optic nerv - Cell type
  • 29.
    29 Good features areessential The choice of features is crucial for how well a task can be learned. In many application areas (language, vision, etc.), a lot of work goes into designing suitable features. This requires domain expertise. We can’t teach you what specific features to use for your task. But we will touch on some general principles
  • 30.
    30 Output An item drawn froma label space Input An item drawn from an instance space Learned Model 2. The label space The label space determines what kind of supervised learning task we are dealing with
  • 31.
    31 Supervised learning tasksI Output labels are categorical: Binary classification: Two possible labels Multiclass classification: possible labels Output labels are structured objects (sequences of labels, parse trees, graphs, etc.) Structure learning: multiple labels that are related (thus constrained) Three events. When classifying the temporal relations between them we need to account for the relations between them.
  • 32.
  • 33.
    34 Output An item drawn froma label space Input An item drawn from an instance space Learned Model 3. The model We need to choose what kind of model we want to learn
  • 34.
    Types of Learning •Supervised learning  Input: Examples of inputs and desired outputs  Output: Model that predicts output given a new input • Unsupervised learning  Input: Examples of some data (no “outputs”)  Output: Representation of structure in the data • Reinforcement learning  Input: Sequence of agent interactions with an environment  Output: Policy that maps agent’s observations to actions
  • 35.
  • 36.
  • 37.
    38 Unsupervised learning:?! 1. Inunsupervised learning the training data is unlabeled. The system tries to learn without a teacher. 2. Unsupervised methods must rely solely on the intrinsic structure of data points to learn a good hypothesis. Thus, unsupervised methods do not need a teacher or domain expert who provides labels for data points. 3. Two large families of unsupervised methods 1. Clustering 2. Feature Learning 1. Two important applications of feature learning are dimensionality reduction and data visualization. Notation: In the clustering problem, we are given a training set, {x(1) , x(2) , ……. X(n) }, and want to group the data into a few cohesive “clusters.” but no labels y(i) are given. Unsupervised Learning
  • 38.
    Unsupervised Learning • Givenx1, x2, ..., xn (without labels) • The goal is to find groups of similar observations. • Output hidden structure behind the x’s – E.g., clustering 31
  • 39.
    23 Unsupervised Learning • Thegoal is to find groups of similar observations (Clustering).
  • 40.
    [Source: Daphne Koller] Genes Individuals UnsupervisedLearning Genomics application: group individuals by genetic similarity 32
  • 41.
    Organize computing clustersSocial network analysis Image credit: NASA/JPL-Caltech/E. Churchwell (Univ. of Wisconsin, Madison) Astronomical data analysis Market segmentation Slide credit: Andrew Ng Unsupervised Learning 33
  • 42.
    Unsupervised Learning • Independentcomponent analysis – separate a combined signal into its original sources 34 Image credit: statsoft.com Audio from http://www.ism.ac.jp/~shiro/research/blindsep.html
  • 43.
    Unsupervised Learning • Independentcomponent analysis – separate a combined signal into its original sources 35 Image credit: statsoft.com Audio from http://www.ism.ac.jp/~shiro/research/blindsep.html
  • 44.
    • Here aresome of the most important Unsupervised learning algorithms. Clustering K-means DBSCAN Hierarchical Cluster Analysis (HCA) Visualization and dimensionality reduction Principal Component Analysis (PCA) Kernel PCA Association rule learning Apriori Eclat Anomaly detection and novelty detection One-class SVM Isolation Forest Unsupervised Learning
  • 45.
    • In supervisedlearning, we saw algorithms that tried to make their outputs mimic the labels y given in the training set. In that setting, the labels gave an unambiguous “right answer” for each of the inputs x. • In contrast, for many sequential decision-making and control problems, it is very difficult to provide this type of explicit supervision to a learning algorithm. • Example: • if we have just built a four-legged robot and are trying to program it to walk, then initially, we have no idea what the “correct” actions to take are to make it walk, and so we do not know how to provide explicit supervision for a learning algorithm to try to mimic. • In the reinforcement learning framework, • The learning system, called an agent in this context, can observe the environment, select and perform actions, and get rewards in return (or penalties in the form of negative rewards). • It must then learn by itself what is the best strategy, called a policy, to get the most reward over time • A policy defines what action the agent should choose when it is in a given situation. Reinforcement Learning
  • 46.
  • 47.
    Reinforcement Learning • Examples: –Credit assignment problem – Game playing – Robot in a maze – Balance a pole on your hand 36
  • 48.
    Machine learning isProgramming 2.0 Traditional Programming Machine learning (ML) Revision:
  • 49.
    Task specification inML: programs → examples def compute_force(m, a): ‘’’ returns force (in N) needed to move mass m (in kg) at acceleration a (in m/s^2) ‘’’ F = m * a return F Mass m (kg) Acceleration a (m/s^2) Force F (N) 2.5 4 10 5 2 10 20 0.5 10 40 0.25 10 40 2.5 100 20 5 100 50 2 100 Here is a program to implement Newton’s second law of motion Here are some examples. Try to imitate them.
  • 50.
    Task specification inML: programs → examples def cow_or_turtle(image): ??? Here is a program to recognize an image as a cow or a turtle Here are some examples. Try to imitate them. “cows ” “turtles”
  • 51.
    Putting a trainedML system to use Here are some examples. Try to imitate them. “cows ” “turtles” “cow ”
  • 52.
    Putting a trainedML system to use Here are some examples. Try to imitate them. “cows ” “turtles” “turtle”
  • 53.
    Framing an MLproblem (Mitchell’s P, T, E) Data curation (sourcing, scraping, collection, labeling) Data analysis / visualization ML Design (hypothesis class, loss function, optimizer, hyperparameters, features) Train model Validate / Evaluate Deploy (and generate new data) Monitor performance on new data ML Workflow
  • 54.
    Framing an MLproblem (Mitchell’s P, T, E) Data curation (sourcing, scraping, collection, labeling) Data analysis / visualization ML Design (hypothesis class, loss function, optimizer, hyperparameters, features) Train model Validate / Evaluate Deploy (and generate new data) Monitor performance on new data ML Workflow
  • 55.
    Framing an MLproblem (Mitchell’s P, T, E) Data curation (sourcing, scraping, collection, labeling) Data analysis / visualization ML Design (hypothesis class, loss function, optimizer, hyperparameters, features) Train model Validate / Evaluate Deploy (and generate new data) Monitor performance on new data ML Workflow Main focus of this class
  • 56.
    Framing an MLproblem (Mitchell’s P, T, D) Data curation (sourcing, scraping, collection, labeling) Data analysis / visualization ML Design (hypothesis class, loss function, optimizer, hyperparameters, features) Train model Validate / Evaluate Deploy (and generate new data) Monitor performance on new data ML Workflow Main focus of this class
  • 57.
    Framing an MLproblem (Mitchell’s P, T, D) Data curation (sourcing, scraping, collection, labeling) Data analysis / visualization ML Design (hypothesis class, loss function, optimizer, hyperparameters, features) Train model Validate / Evaluate Deploy (and generate new data) Monitor performance on new data ML Workflow Main focus of this class Project
  • 58.
    Framing an MLproblem (Mitchell’s P, T, D) Data curation (sourcing, scraping, collection, labeling) Data analysis / visualization ML Design (hypothesis class, loss function, optimizer, hyperparameters, features) Train model Validate / Evaluate Deploy (and generate new data) Monitor performance on new data ML Workflow Main focus of this class Project
  • 59.
  • 60.
    ML for SocialGood Applying AI for social good | McKinsey Elizabeth Bondi et al., SPOT poachers in action: Augmenting conservation drones with automatic detection in near real time, AAAI 2018
  • 61.
  • 62.
    Designing a LearningSystem • Choose the training experience • Choose exactly what is to be learned – i.e. the target function • Choose how to represent the target function • Choose a learning algorithm to infer the target function from the experience Environment/ Experience Learner Knowledge Performance Element Based on slide by Ray Mooney Training data Testing data 41
  • 63.
    Training vs. TestDistribution • We generally assume that the training and test examples are independently drawn from the same overall distribution of data – We call this “i.i.d” which stands for “independent and identically distributed” • If examples are not independent, requires collective classification • If test distribution is different, requires transfer learning Slide credit: Ray Mooney 42
  • 64.
    ML in aNutshell • Tens of thousands of machine learning algorithms – Hundreds new every year • Every ML algorithm has three components: – Representation – Optimization – Evaluation Slide credit: Pedro Domingos 43
  • 65.
    44 Slide credit: RayMooney Various Function Representations • Numerical functions – Linear regression – Neural networks – Support vector machines • Symbolic functions – Decision trees – Rules in propositional logic – Rules in first-order predicate logic • Instance-based functions – Nearest-neighbor – Case-based • Probabilistic Graphical Models – Naïve Bayes – Bayesian networks – Hidden-Markov Models (HMMs) – Probabilistic Context Free Grammars (PCFGs) – Markov networks
  • 66.
    45 Slide credit: RayMooney Various Search/Optimization Algorithms • Gradient descent – Perceptron – Backpropagation • Dynamic Programming – HMM Learning – PCFG Learning • Divide and Conquer – Decision tree induction – Rule learning • Evolutionary Computation – Genetic Algorithms (GAs) – Genetic Programming (GP) – Neuro-evolution
  • 67.
    47 Slide credit: PedroDomingos Evaluation • Accuracy • Precision and recall • Squared error • Likelihood • Posterior probability • Cost / Utility • Margin • Entropy • K-L divergence • etc.
  • 68.
  • 69.
    ML in Practice •Understand domain, prior knowledge, and goals • Data integration, selection, cleaning, pre-processing, etc. • Learn models • Interpret results • Consolidate and deploy discovered knowledge Loop 48 Based on a slide by Pedro Domingos

Editor's Notes

  • #21 Could potentially check g(x) (Exact learning); Can the test data, even without the labels, be used to learn something? Maybe something about the distribution of the tests?
  • #24 Badges game Don’t give me the answer Start thinking about how to write a program that will figure out whether my name has + or – next to it.
  • #25 What is the real input the algorithm sees? What are the labels? Folder names; maybe there is a hierarchy Set of rules: if the sender is… put it in their folder. Decision tree: if condition A is satisfied, Check condition B; if not, Check condition C.; NN The representation might dictate the learning algorithm. Number of correct predictions? Maybe it’s an non-symmetric loss: I never want to file incorrectly, I prefer to abstain. Maybe there is a hierarchy, and I want a loss that is sensitive to the hierarchy.