Machine learning for beginners students.

Ramanujan College, University of Delhi
New Delhi-110019, India.
Topic: Machine Learning

What is Machine Learning?
“Learning is any process by which a system improves
performance from experience.”
- Herbert Simon
Definition by Tom Mitchell (1998):
Machine Learning is the study of algorithms that
• improve their performance P
• at some task T
• with experience E.
A well-defined learning task is given by <P, T, E>.

Traditional Programming
Machine Learning
Computer
Data
Program
Output
Computer
Data
Output
Program
Slide credit: Pedro Domingos
4

4
Machine Learning
Traditional
Programming
Program/
Rules
Data
Output/
Answer
Machine Learning
Output/Answers
Data
Program/
Rules/model
Makes sense.
Use Machine
Learning
Sort these numbers in
decreasing order
2, 4, 18, 1, 77, 0, 85
Does not make sense.
Do not use Machine
Learning
(what are the risks if you
do?)
Rock paper scissors

Introduction
distinguish two approaches
• knowledge-based: a computer program whose logic encodes a large
number of properties of the world, usually developed by a team of
experts over many years.
• machine learning: extract information directly from historical data
and extrapolate to make predictions

When Do We Use Machine Learning?
ML is used when:
• Human expertise does not exist (navigating on Mars)
• Humans can’t explain their expertise (speech recognition)
• Models must be customized (personalized medicine)
• Models are based on huge amounts of data (genomics)
Learning isn’t always useful:
• There is no need to “learn” to calculate payroll
Based on slide by E. Alpaydin
5

7
Slide credit: Geoffrey Hinton
Some more examples of tasks that are best
solved by using a learning algorithm
• Recognizing patterns:
– Facial identities or facial expressions
– Handwritten or spoken words
– Medical images
• Generating patterns:
– Generating images or motion sequences
• Recognizing anomalies:
– Unusual credit card transactions
– Unusual patterns of sensor readings in a nuclear power plant
• Prediction:
– Future stock prices or currency exchange rates

8
Sample Applications
• Web search
• Computational biology
• Finance
• E-commerce
• Space exploration
• Robotics
• Information extraction
• Social networks
• Debugging software
• [Your favorite area]

Samuel’s Checkers-Player
“Machine Learning: Field of study that gives
computers the ability to learn without being
explicitly programmed.” -Arthur Samuel (1959)
9

Defining the Learning Task
Improve on task T, with respect to
performance metric P, based on experience E
T: Playing checkers
P: Percentage of games won against an arbitrary
opponent E: Playing practice games against itself
T: Recognizing hand-written words
P: Percentage of words correctly classified
E: Database of human-labeled images of
handwritten words
T: Driving on four-lane highways using vision
sensors
P: Average distance traveled before a human-
judged error
E: A sequence of images and steering commands recorded while
observing a human driver.
T: Categorize email messages as spam or legitimate.
P: Percentage of email messages correctly classified.
Slide credit: Ray Mooney
10

Types of Learning
• Supervised (inductive) learning
– Given: training data + desired outputs (labels)
• Unsupervised learning
– Given: training data (without desired outputs)
• Semi-supervised learning
– Given: training data + a few desired outputs
• Reinforcement learning
– Rewards from sequence of actions
Based on slide by Pedro Domingos
24

23
Supervised Machine Learning
Given data like this, how can we learn to
predict the prices of other houses? as a
function of the size of their living areas?
House price = F (living area)
What is an instance space?
To make our housing example more
interesting, let’s consider a slightly richer
dataset in which we also know the
number of bedrooms in each house:
House price = F (living area, # bedrooms)
What is an instance space?

14
Output
An item
drawn from an
output space
Input
An item
drawn from an
input space
System
Supervised Learning
We consider systems that apply a function
to input items x and return an output .
Dan
Roth
+/-

15
Supervised Learning
In (supervised) machine learning, we deal with systems
whose is learned from examples.
Output
An item
drawn from an
output space
Input
An item
drawn from an
input space
System

16
Why use learning?
We typically use machine learning when the function we want
the system to apply is unknown to us, and we cannot “think”
about it. The function could actually be simple.

17
Output
An item
drawn from a
label space
Input
An item
drawn from an
instance space
Learned Model
Supervised Learning
Target function
The space of all
functions our
algorithm
“considers” is called
the Hypothesis
space.

18
Supervised learning: Training
Give the learner examples in
The learner returns a model
Labeled Training
Data
…
Learned
model
Learning
Algorithm
is the model we’ll use in
our application
( Dan
Roth,
+)
If
(the…character
of the …token
is..) AND
(the …. is…. )
then Negative.
Otherwise,
Positive.
An input
example
An element in the
instance Space

19
Supervised learning: Testing
Reserve some labeled data for testing
Labeled
Test Data
…

20
Labeled
Test Data
Test
Labels
...
Raw Test
Data

21
Apply the model to the raw test data
Evaluate by comparing predicted labels against the test labels
Test
Labels
Raw Test
Data
Learned
model
Predicted
Labels

Data In, Model Out
DATA
𝒟
MACHINE LEARNING MODEL 𝑓
⋅
acceleration 𝑎 =
Force 𝐹
mass
𝑚
We would like to recover a model like this!

Data In, Model Out
MACHINE LEARNING MODEL 𝑓
⋅
acceleration 𝑎 =
Force 𝐹 mass
𝑚
ML Design (hypothesis class, loss function, optimizer, hyperparameters, features, …)
𝑚
𝑎 = 𝑤0 + 𝑤1𝐹 + 𝑤2𝑚 + 𝑤3 𝐹 ∗ 𝑚
+ 𝑤4
( 𝐹
)
Example hypothesis
class:
(for varying values
of 𝑤0,
… 𝑤4 )
𝑚
𝑎 = 0 + 0𝐹 + 0𝑚 + 0 𝐹 ∗ 𝑚
+
1( 𝐹
)
Learning = finding “good” values for the weights 𝑤0 , 𝑤1 ,
… , 𝑤#
DATA 𝒟

24
Key Issues in Machine Learning
Modeling
How to formulate application problems as machine learning problems ? How
to represent the data?
Learning Protocols (where is the data & labels coming from?)
Representation
What functions should we learn (hypothesis spaces) ?
How to map raw input to an instance space?
Any rigorous way to find these? Any general approach?
Algorithms
What are good algorithms?
How do we define success?
Generalization vs. over fitting
The computational problem

25
Using supervised learning
What is our instance space?
Gloss: What kind of features are we using?
What is our label space?
Gloss: What kind of learning task are we dealing with?
What is our hypothesis space?
Gloss: What kind of functions (models) are we learning?
What learning algorithm do we use?
Gloss: How do we learn the model from the labeled data?
What is our loss function/evaluation metric?
Gloss: How do we measure success? What drives learning?

26
Output
An item
drawn from a label
space
Input
An item
drawn from an
instance space X
Learned
Model
1. The instance space
Designing an appropriate instance space
is crucial for how well we can predict .

27
1. The instance space
When we apply machine learning to a task, we first need to define the instance space
.
Instances are defined by features:
Boolean features:
Is there a folder named after the sender?
Does this email contain the word ‘class’?
Does this email contain the word ‘waiting’?
Does this email contain the word ‘class’ and the word ‘waiting’?
Numerical features:
How often does ‘learning’ occur in this email?
How long is the email?
How many emails have I seen from this sender over the last day/week/month?
Bag of tokens
Just list all the tokens in the input
Does it add anything if
you already have the
previous two features?

28
as a vector space
is an N-dimensional vector space (e.g. {0,1}N
, )
Each dimension = one feature.
Each is a feature vector (hence the boldface ).
Think of = [ … ] as a point in:
𝑥1
𝑥2
- Patient Age
- Clump thickness
- Tumor Color
- Distance from optic nerv
- Cell type

29
Good features are essential
The choice of features is crucial for how well a task can be learned.
In many application areas (language, vision, etc.), a lot of work goes into
designing suitable features.
This requires domain expertise.
We can’t teach you what specific features
to use for your task.
But we will touch on some general principles

30
Output
An item
drawn from a label
space
Input
An item
drawn from an
instance space
Learned
Model
2. The label space
The label space determines what kind of supervised learning task
we are dealing with

31
Supervised learning tasks I
Output labels are categorical:
Binary classification: Two possible labels
Multiclass classification: possible labels
Output labels are structured objects (sequences of labels, parse trees, graphs,
etc.)
Structure learning: multiple labels that are related (thus constrained)
Three events. When
classifying the temporal
relations between them we
need to account for the
relations between them.

34
Output
An item
drawn from a
label space
Input
An item
drawn from an
instance space
Learned
Model
3. The model
We need to choose what kind of model
we want to learn

Types of Learning
• Supervised learning
 Input: Examples of inputs and desired outputs
 Output: Model that predicts output given a new input
• Unsupervised learning
 Input: Examples of some data (no “outputs”)
 Output: Representation of structure in the data
• Reinforcement learning
 Input: Sequence of agent interactions with an environment
 Output: Policy that maps agent’s observations to actions

38
Unsupervised learning:?!
1. In unsupervised learning the training data is unlabeled. The system tries to learn
without a teacher.
2. Unsupervised methods must rely solely on the intrinsic structure of data points
to learn a good hypothesis. Thus, unsupervised methods do not need a teacher or
domain expert who provides labels for data points.
3. Two large families of unsupervised methods
1. Clustering
2. Feature Learning
1. Two important applications of feature learning are dimensionality reduction and data visualization.
Notation:
In the clustering problem, we are given a training set, {x(1)
, x(2)
, ……. X(n)
}, and want to
group the data into a few cohesive “clusters.” but no labels y(i)
are given.
Unsupervised Learning

• Given x1, x2, ..., xn (without labels)
• The goal is to find groups of similar observations.
• Output hidden structure behind the x’s
– E.g., clustering
31

23
• The goal is to find groups of similar observations (Clustering).

[Source: Daphne Koller]
Genes
Individuals
Genomics application: group individuals by genetic similarity
32

Organize computing clusters Social network analysis
Image credit: NASA/JPL-Caltech/E. Churchwell (Univ. of Wisconsin, Madison)
Astronomical data analysis
Market segmentation
Slide credit: Andrew Ng
33

• Independent component analysis – separate a
combined signal into its original sources
34
Image credit: statsoft.com Audio from http://www.ism.ac.jp/~shiro/research/blindsep.html

• Independent component analysis – separate a
combined signal into its original sources
35
Image credit: statsoft.com Audio from http://www.ism.ac.jp/~shiro/research/blindsep.html

• Here are some of the most important Unsupervised learning algorithms.
Clustering
K-means
DBSCAN
Hierarchical Cluster Analysis (HCA)
Visualization and dimensionality reduction
Principal Component Analysis (PCA)
Kernel PCA
Association rule learning
Apriori
Eclat
Anomaly detection and novelty detection
One-class SVM
Isolation Forest

• In supervised learning, we saw algorithms that tried to make their outputs mimic the labels y
given in the training set. In that setting, the labels gave an unambiguous “right answer” for each
of the inputs x.
• In contrast, for many sequential decision-making and control problems, it is very difficult to
provide this type of explicit supervision to a learning algorithm.
• Example:
• if we have just built a four-legged robot and are trying to program it to walk, then initially, we
have no idea what the “correct” actions to take are to make it walk, and so we do not know how
to provide explicit supervision for a learning algorithm to try to mimic.
• In the reinforcement learning framework,
• The learning system, called an agent in this context, can observe the environment, select and
perform actions, and get rewards in return (or penalties in the form of negative rewards).
• It must then learn by itself what is the best strategy, called a policy, to get the most reward over
time
• A policy defines what action the agent should choose when it is in a given situation.
Reinforcement Learning

Reinforcement Learning
• Examples:
– Credit assignment problem
– Game playing
– Robot in a maze
– Balance a pole on your hand
36

Machine learning is Programming 2.0
Traditional Programming Machine learning (ML)
Revision:

Task specification in ML: programs →
examples
def compute_force(m, a):
‘’’
returns force (in N) needed to move
mass m (in kg) at
acceleration a (in m/s^2)
‘’’
F = m * a
return F
Mass m (kg) Acceleration a (m/s^2) Force F (N)
2.5 4 10
5 2 10
20 0.5 10
40 0.25 10
40 2.5 100
20 5 100
50 2 100
Here is a program to
implement Newton’s
second law of motion
Here are some
examples. Try to
imitate them.

Task specification in ML: programs →
examples
def cow_or_turtle(image):
???
Here is a program to
recognize an image as
a cow or a turtle
Here are some
examples. Try to
imitate them.
“cows
”
“turtles”

Putting a trained ML system to use
Here are some
examples. Try to
imitate them.
“cows
”
“turtles”
“cow
”

Putting a trained ML system to use
Here are some
examples. Try to
imitate them.
“cows
”
“turtles”
“turtle”

Framing an ML problem (Mitchell’s P, T, E)
Data curation (sourcing, scraping, collection, labeling)
Data analysis / visualization
ML Design (hypothesis class, loss function, optimizer, hyperparameters, features)
Train model
Validate / Evaluate
Deploy (and generate new data)
Monitor performance on new data
ML
Workflow

Framing an ML problem (Mitchell’s P, T, E)
ML Design (hypothesis class, loss function, optimizer, hyperparameters,
features)
Train model
Validate / Evaluate
ML
Workflow
Main focus
of this class

Framing an ML problem (Mitchell’s P, T, D)
Train model
Validate / Evaluate
ML
Workflow
Main focus
of this class

Framing an ML problem (Mitchell’s P, T, D)
Train model
Validate / Evaluate
ML
Workflow
Main focus
of this class
Project

Framing an ML problem (Mitchell’s P, T,
D)
Train model
Validate / Evaluate
ML
Workflow
Main focus
of this class
Project

ML for Social Good
Applying AI for social good | McKinsey
Elizabeth Bondi et al., SPOT poachers in action: Augmenting
conservation drones with automatic detection in near real time,
AAAI 2018

Designing a Learning System
• Choose the training experience
• Choose exactly what is to be learned
– i.e. the target function
• Choose how to represent the target function
• Choose a learning algorithm to infer the target
function from the experience
Environment/
Experience
Learner
Knowledge
Performance
Element
Based on slide by Ray Mooney
Training data
Testing data
41

Training vs. Test Distribution
• We generally assume that the training and
test examples are independently drawn from
the same overall distribution of data
– We call this “i.i.d” which stands for “independent
and identically distributed”
• If examples are not independent, requires
collective classification
• If test distribution is different, requires
transfer learning
42

ML in a Nutshell
• Tens of thousands of machine learning
algorithms
– Hundreds new every year
• Every ML algorithm has three
components:
– Representation
– Optimization
– Evaluation
43

44
Various Function Representations
• Numerical functions
– Linear regression
– Neural networks
– Support vector machines
• Symbolic functions
– Decision trees
– Rules in propositional logic
– Rules in first-order predicate logic
• Instance-based functions
– Nearest-neighbor
– Case-based
• Probabilistic Graphical Models
– Naïve Bayes
– Bayesian networks
– Hidden-Markov Models (HMMs)
– Probabilistic Context Free Grammars (PCFGs)
– Markov networks

45
Various Search/Optimization
Algorithms
• Gradient descent
– Perceptron
– Backpropagation
• Dynamic Programming
– HMM Learning
– PCFG Learning
• Divide and Conquer
– Decision tree induction
– Rule learning
• Evolutionary Computation
– Genetic Algorithms (GAs)
– Genetic Programming (GP)
– Neuro-evolution

47
Evaluation
• Accuracy
• Precision and recall
• Squared error
• Likelihood
• Posterior probability
• Cost / Utility
• Margin
• Entropy
• K-L divergence
• etc.

ML in Practice
• Understand domain, prior knowledge, and goals
• Data integration, selection, cleaning, pre-processing, etc.
• Learn models
• Interpret results
• Consolidate and deploy discovered knowledge
Loop
48
Based on a slide by Pedro Domingos

Machine learning for beginners students.

More Related Content

Similar to Machine learning for beginners students.

Recently uploaded

Machine learning for beginners students.

Editor's Notes