Course: Machine Vision
Recognition (1)
Session 10
D5627 – I Gede Putra Kusuma Negara, B.Eng., PhD
Outline
• Pattern Recognition
• Supervised Learning
• Classification
Pattern Recognition
What is Pattern Recognition?
• A pattern is an entity, vaguely defined, that could be given a
name, e.g.,
– Fingerprint image,
– Handwritten word,
– Human face
– Etc.
• Pattern recognition is the study of how machines can
– Observe the environment,
– Learn to distinguish patterns of interest,
– Make sound and reasonable decisions about the categories of the
patterns.
Pattern Recognition Application
(example)
Problem Domain Application Input Pattern Pattern Classes
Document image analysis
Optical character
recognition
Document image Characters, words
Document classification Internet search Text document Semantic categories
Document classification Junk mail filtering Email Junk/non-junk
Multimedia database
retrieval
Internet search Video clip Video genres
Speech recognition
Telephone directory assis-
tance
Speech waveform Spoken words
Natural language
processing
Information extraction Sentences Parts of speech
Biometric recognition Personal identification Face, iris, fingerprint
Authorized users for access
control
Medical Computer aided diagnosis Microscopic image Cancerous/healthy cell
Military Automatic target recognition Optical or infrared image Target type
Industrial automation
Printed circuit board
inspection
Intensity or range image
Defective/non-defective
product
Industrial automation Fruit sorting
Images taken on a conveyor
belt
Grade of quality
Remote sensing Forecasting crop yield Multispectral image Land use categories
Bioinformatics Sequence analysis DNA sequence Known types of genes
Data mining
Searching for meaningful
patterns
Points in multidimensional
space
Compact and well-
separated clusters
Example
• Problem: Sorting incoming fish on a conveyor belt according
to species.
• Assume that we have only two kinds of fish:
– Sea bass
– Salmon
Decision Process
• What kind of information can distinguish one species from
the other?
– length, width, weight, number and shape of fins, tail shape, etc.
• What can cause problems during sensing?
– lighting conditions, position of fish on the conveyor belt, camera
noise, etc.
• What are the steps in the process?
– capture image → isolate fish → take measurements → make
decision
Selecting Features
• Assume a fisherman told us that a sea bass is generally longer than a
salmon.
• We can use length as a feature and decide between sea bass and
salmon according to a threshold on length.
• How can we choose this threshold?
Selecting Features
• How can we choose the
threshold l∗ to make a reliable
decision?
• Even though sea bass is longer
than salmon on the average,
there are many examples of
fish where this observation
does not hold.
Selecting Features
• Try another feature: average
lightness of the fish scales.
• It looks easier to choose the
threshold x∗ but we still
cannot make a perfect
decision
Cost of Error
• We should also consider costs of different errors we make in
our decisions.
• For example, if the fish packing company knows that:
– Customers who buy salmon will object vigorously if they see sea
bass in their cans.
– Customers who buy sea bass will not be unhappy if they
occasionally see some expensive salmon in their cans.
• How does this knowledge affect our decision?
Multiple Features
• Assume we also observed that sea bass are typically wider
than salmon.
• We can use two features in our decision:
– lightness: x1
– width: x2
• Each fish image is now represented as a point (feature
vector ) in a two-dimensional feature space.
X =
x1
x2
æ
è
ç
ç
ö
ø
÷
÷
Multiple Features
• We can draw a decision
boundary to divide the
feature space into two
regions. Does it look better
than using only lightness
• Does adding more features
always improve the results?
– Avoid unreliable features.
– Be careful about correlations
with existing features.
– Be careful about measurement
costs
– Be careful about noise in the
measurements
• Is there some curse for working
in very high dimensions?
Decision Boundaries
• Can we do better with
another decision rule?
• More complex models result
in more complex boundaries
• We may distinguish training
samples perfectly but how
can we predict how well we
can generalize to unknown
samples?
Decision Boundaries
• How can we manage the
tradeoff between complexity
of decision rules and their
performance to unknown
samples?
• Different criteria lead to
different decision boundaries
Pattern Recognition System
Sensor
Pre-processing
and
enhancement
Feature
extraction/select
ion
Supervised
learning
(Classification)
Unsupervised
learning
(Clustering)
Regression
Real
world
Class
assignment
Cluster
assignment
Predicted
variables
Post-processing: feedback/adaptation
Supervised Learning
Supervised Learning
• Supervised learning is the machine learning task of inferring a
function from labeled training data. The training data consist of a set
of training examples.
• In supervised learning, each example is a pair consisting of an input
object (typically a vector) and a desired output value (also called the
supervisory signal)
• A supervised learning algorithm analyzes the training data and
produces an inferred function, which can be used for mapping new
examples
• An optimal scenario will allow for the algorithm to correctly
determine the class labels for unseen instances.
• This requires the learning algorithm to generalize from the training
data to unseen situations in a "reasonable" way
Supervised Learning Steps
• Data acquisition and sensing:
– Measurements of physical variables.
– Important issues: bandwidth, resolution, sensitivity, distortion,
SNR, latency, etc.
• Pre-processing:
– Removal of noise in data.
– Isolation of patterns of interest from the background.
– Color to grayscale conversion
• Feature extraction:
– Finding a new representation in terms of features
Supervised Learning Steps
• Model learning and estimation:
– Learning a mapping between features and pattern groups and
categories
• Classification:
– Using features and learned models to assign a pattern to a
category.
• Post-processing:
– Evaluation of confidence in decisions.
– Exploitation of context to improve performance.
– Combination of experts.
Classification
Classification Problem
• Apply a prediction function to a feature representation of the image
to get the desired output:
f( ) = “Brick”
f( ) = “Fabric”
f( ) = “Grass”
Testing stage Training stage
Classification Steps
Real world
Features
Decision
Model
Features
Training Data
Data
acquisition/training
Pre-processing
Feature extraction
Classification
Post-processing
Model
learning/estimation
Feature
extraction/selection
Pre-processing
Texture Classification (example)
• We are going to classify texture image
• Image source: 40 texture images from MIT VisTex database
– Number of classes: 5 (brick, fabric, grass, sand, stone)
– Train set: 30 images (6 images from each class)
– Test set: 10 images (2 images from each class)
• Features:
– Gray-level co-occurrence matrix (GLCM)
– Fourier power spectrum (#ring: 4, #sector:8)
• Classification method:
– K-nearest neighbor
– Naïve-bayes
• Evaluation: classification accuracy
Design Cycle
Data collection:
• Collecting training and testing data.
• How can we know when we have adequately large and representative
set of samples?
Collect
data
Select
features
Select
Model
Train
Classifier
Evaluate
classifier
Testing images
Training images
Data Collection
Brick
Fabric
Grass
Sand
Stone
Design Cycle
Feature selection:
• Domain dependence and prior information.
• Computational cost and feasibility.
• Discriminative features:
– Similar values for similar patterns.
– Different values for different patterns.
• Invariant features with respect to translation, rotation and scale.
• Robust features with respect to occlusion, distortion, deformation, and
variations in environment.
Collect
data
Select
features
Select
Model
Train
Classifier
Evaluate
classifier
Feature Selection
Gray-level co-occurrence matrix
(GLCM)
Fourier power spectrum
Features Formula
Energy
Inertia
Inverse diff
moment
Entropy
Pij
2
å
Pij
å *(i- j)2
Pij
å / 1+(i- j)2
{ }
Pij
2
*(-lnPij )
å
#ring = 4
#sector = 8
Design Cycle
Model selection:
• Domain dependence and prior information.
• Definition of design criteria.
• Parametric vs. non-parametric models.
• Handling of missing features.
• Computational complexity.
• Types of models: templates, decision-theoretic or statistical, syntactic or
structural, neural, and hybrid.
• How can we know how close we are to the true model underlying the
patterns?
Collect
data
Select
features
Select
Model
Train
Classifier
Evaluate
classifier
Model Selection
k-Nearest neighbor
• Assign label of nearest training data
point to each test data point
Naïve bayes (algorithm)
• For each value yk
– Estimate P(Y = yk) from the data.
– For each value xij of each attribute
Xi
• Estimate P(Xi=xij | Y = yk)
• Classify a new point via:
• In practice, the independence
assumption doesn’t often hold true,
but Naïve Bayes performs very well
despite it.
x x
x
x
x
x
x
x
o
o
o
o
o
o
o
x2
x1
+
+
1-Nearest neighbor
3-Nearest neighbor
 



i
k
i
k
y
new y
Y
X
P
y
Y
P
Y
k
)
|
(
)
(
max
arg
Design Cycle
Training:
• How can we learn the rule from data?
• Supervised learning: a teacher provides a category label or cost for each
pattern in the training set.
• Unsupervised learning: the system forms clusters or natural groupings of
the input patterns.
• Reinforcement learning: no desired category is given but the teacher
provides feedback to the system such as the decision is right or wrong.
Collect
data
Select
features
Select
Model
Train
Classifier
Evaluate
classifier
Design Cycle
Evaluation:
• How can we estimate the performance with training samples?
• How can we predict the performance with future data?
• Problems of over-fitting and generalization.
Collect
data
Select
features
Select
Model
Train
Classifier
Evaluate
classifier
Evaluation
Classification
Method
K-Nearest neighbor Naïve Bayes
Feature Accuracy F score Accuracy F score
GLCM
Fourier Power
Spectrum
Acknowledgment
Some of slides in this PowerPoint presentation are adaptation from
various slides, many thanks to:
1. Selim Aksoy, Department of Computer Engineering, Bilkent
University (http://www.cs.bilkent.edu.tr/~saksoy/)
2. James Hays, Computer Science Department, Brown University,
(http://cs.brown.edu/~hays/)
Thank You

PPT s09-machine vision-s2

  • 1.
    Course: Machine Vision Recognition(1) Session 10 D5627 – I Gede Putra Kusuma Negara, B.Eng., PhD
  • 2.
    Outline • Pattern Recognition •Supervised Learning • Classification
  • 3.
  • 4.
    What is PatternRecognition? • A pattern is an entity, vaguely defined, that could be given a name, e.g., – Fingerprint image, – Handwritten word, – Human face – Etc. • Pattern recognition is the study of how machines can – Observe the environment, – Learn to distinguish patterns of interest, – Make sound and reasonable decisions about the categories of the patterns.
  • 5.
    Pattern Recognition Application (example) ProblemDomain Application Input Pattern Pattern Classes Document image analysis Optical character recognition Document image Characters, words Document classification Internet search Text document Semantic categories Document classification Junk mail filtering Email Junk/non-junk Multimedia database retrieval Internet search Video clip Video genres Speech recognition Telephone directory assis- tance Speech waveform Spoken words Natural language processing Information extraction Sentences Parts of speech Biometric recognition Personal identification Face, iris, fingerprint Authorized users for access control Medical Computer aided diagnosis Microscopic image Cancerous/healthy cell Military Automatic target recognition Optical or infrared image Target type Industrial automation Printed circuit board inspection Intensity or range image Defective/non-defective product Industrial automation Fruit sorting Images taken on a conveyor belt Grade of quality Remote sensing Forecasting crop yield Multispectral image Land use categories Bioinformatics Sequence analysis DNA sequence Known types of genes Data mining Searching for meaningful patterns Points in multidimensional space Compact and well- separated clusters
  • 6.
    Example • Problem: Sortingincoming fish on a conveyor belt according to species. • Assume that we have only two kinds of fish: – Sea bass – Salmon
  • 7.
    Decision Process • Whatkind of information can distinguish one species from the other? – length, width, weight, number and shape of fins, tail shape, etc. • What can cause problems during sensing? – lighting conditions, position of fish on the conveyor belt, camera noise, etc. • What are the steps in the process? – capture image → isolate fish → take measurements → make decision
  • 8.
    Selecting Features • Assumea fisherman told us that a sea bass is generally longer than a salmon. • We can use length as a feature and decide between sea bass and salmon according to a threshold on length. • How can we choose this threshold?
  • 9.
    Selecting Features • Howcan we choose the threshold l∗ to make a reliable decision? • Even though sea bass is longer than salmon on the average, there are many examples of fish where this observation does not hold.
  • 10.
    Selecting Features • Tryanother feature: average lightness of the fish scales. • It looks easier to choose the threshold x∗ but we still cannot make a perfect decision
  • 11.
    Cost of Error •We should also consider costs of different errors we make in our decisions. • For example, if the fish packing company knows that: – Customers who buy salmon will object vigorously if they see sea bass in their cans. – Customers who buy sea bass will not be unhappy if they occasionally see some expensive salmon in their cans. • How does this knowledge affect our decision?
  • 12.
    Multiple Features • Assumewe also observed that sea bass are typically wider than salmon. • We can use two features in our decision: – lightness: x1 – width: x2 • Each fish image is now represented as a point (feature vector ) in a two-dimensional feature space. X = x1 x2 æ è ç ç ö ø ÷ ÷
  • 13.
    Multiple Features • Wecan draw a decision boundary to divide the feature space into two regions. Does it look better than using only lightness • Does adding more features always improve the results? – Avoid unreliable features. – Be careful about correlations with existing features. – Be careful about measurement costs – Be careful about noise in the measurements • Is there some curse for working in very high dimensions?
  • 14.
    Decision Boundaries • Canwe do better with another decision rule? • More complex models result in more complex boundaries • We may distinguish training samples perfectly but how can we predict how well we can generalize to unknown samples?
  • 15.
    Decision Boundaries • Howcan we manage the tradeoff between complexity of decision rules and their performance to unknown samples? • Different criteria lead to different decision boundaries
  • 16.
  • 17.
  • 18.
    Supervised Learning • Supervisedlearning is the machine learning task of inferring a function from labeled training data. The training data consist of a set of training examples. • In supervised learning, each example is a pair consisting of an input object (typically a vector) and a desired output value (also called the supervisory signal) • A supervised learning algorithm analyzes the training data and produces an inferred function, which can be used for mapping new examples • An optimal scenario will allow for the algorithm to correctly determine the class labels for unseen instances. • This requires the learning algorithm to generalize from the training data to unseen situations in a "reasonable" way
  • 19.
    Supervised Learning Steps •Data acquisition and sensing: – Measurements of physical variables. – Important issues: bandwidth, resolution, sensitivity, distortion, SNR, latency, etc. • Pre-processing: – Removal of noise in data. – Isolation of patterns of interest from the background. – Color to grayscale conversion • Feature extraction: – Finding a new representation in terms of features
  • 20.
    Supervised Learning Steps •Model learning and estimation: – Learning a mapping between features and pattern groups and categories • Classification: – Using features and learned models to assign a pattern to a category. • Post-processing: – Evaluation of confidence in decisions. – Exploitation of context to improve performance. – Combination of experts.
  • 21.
  • 22.
    Classification Problem • Applya prediction function to a feature representation of the image to get the desired output: f( ) = “Brick” f( ) = “Fabric” f( ) = “Grass”
  • 23.
    Testing stage Trainingstage Classification Steps Real world Features Decision Model Features Training Data Data acquisition/training Pre-processing Feature extraction Classification Post-processing Model learning/estimation Feature extraction/selection Pre-processing
  • 24.
    Texture Classification (example) •We are going to classify texture image • Image source: 40 texture images from MIT VisTex database – Number of classes: 5 (brick, fabric, grass, sand, stone) – Train set: 30 images (6 images from each class) – Test set: 10 images (2 images from each class) • Features: – Gray-level co-occurrence matrix (GLCM) – Fourier power spectrum (#ring: 4, #sector:8) • Classification method: – K-nearest neighbor – Naïve-bayes • Evaluation: classification accuracy
  • 25.
    Design Cycle Data collection: •Collecting training and testing data. • How can we know when we have adequately large and representative set of samples? Collect data Select features Select Model Train Classifier Evaluate classifier
  • 26.
    Testing images Training images DataCollection Brick Fabric Grass Sand Stone
  • 27.
    Design Cycle Feature selection: •Domain dependence and prior information. • Computational cost and feasibility. • Discriminative features: – Similar values for similar patterns. – Different values for different patterns. • Invariant features with respect to translation, rotation and scale. • Robust features with respect to occlusion, distortion, deformation, and variations in environment. Collect data Select features Select Model Train Classifier Evaluate classifier
  • 28.
    Feature Selection Gray-level co-occurrencematrix (GLCM) Fourier power spectrum Features Formula Energy Inertia Inverse diff moment Entropy Pij 2 å Pij å *(i- j)2 Pij å / 1+(i- j)2 { } Pij 2 *(-lnPij ) å #ring = 4 #sector = 8
  • 29.
    Design Cycle Model selection: •Domain dependence and prior information. • Definition of design criteria. • Parametric vs. non-parametric models. • Handling of missing features. • Computational complexity. • Types of models: templates, decision-theoretic or statistical, syntactic or structural, neural, and hybrid. • How can we know how close we are to the true model underlying the patterns? Collect data Select features Select Model Train Classifier Evaluate classifier
  • 30.
    Model Selection k-Nearest neighbor •Assign label of nearest training data point to each test data point Naïve bayes (algorithm) • For each value yk – Estimate P(Y = yk) from the data. – For each value xij of each attribute Xi • Estimate P(Xi=xij | Y = yk) • Classify a new point via: • In practice, the independence assumption doesn’t often hold true, but Naïve Bayes performs very well despite it. x x x x x x x x o o o o o o o x2 x1 + + 1-Nearest neighbor 3-Nearest neighbor      i k i k y new y Y X P y Y P Y k ) | ( ) ( max arg
  • 31.
    Design Cycle Training: • Howcan we learn the rule from data? • Supervised learning: a teacher provides a category label or cost for each pattern in the training set. • Unsupervised learning: the system forms clusters or natural groupings of the input patterns. • Reinforcement learning: no desired category is given but the teacher provides feedback to the system such as the decision is right or wrong. Collect data Select features Select Model Train Classifier Evaluate classifier
  • 32.
    Design Cycle Evaluation: • Howcan we estimate the performance with training samples? • How can we predict the performance with future data? • Problems of over-fitting and generalization. Collect data Select features Select Model Train Classifier Evaluate classifier
  • 33.
    Evaluation Classification Method K-Nearest neighbor NaïveBayes Feature Accuracy F score Accuracy F score GLCM Fourier Power Spectrum
  • 34.
    Acknowledgment Some of slidesin this PowerPoint presentation are adaptation from various slides, many thanks to: 1. Selim Aksoy, Department of Computer Engineering, Bilkent University (http://www.cs.bilkent.edu.tr/~saksoy/) 2. James Hays, Computer Science Department, Brown University, (http://cs.brown.edu/~hays/)
  • 35.