1
Pattern Recognition
Chapter 1
Introduction
2
What is Pattern Recognition (PR)?
• It is the study of how machines can
– observe the environment
– learn to distinguish patterns of interest from their
background
– make sound and reasonable decisions about the
categories of the patterns.
3
What is a pattern?
• Watanable [163] defines a pattern as
“ the opposite of a chaos; it is an entity, vaguely defined,
that could be given a name”
4
Other Patterns
• Insurance, credit card applications -
applicants are characterized by
– # of accidents, make of car, year of model
– Income, # of dependents, credit worthiness, mortgage
amount
• Web documents
– Key words based descriptions (e.g., documents
containing “terrorism”, “Osama” are different from
those containing “football”, “NFL”).
• Housing market
– Location, size, year, school district, etc.
5
Pattern Class
• A collection of “similar” (not necessarily identical) objects
– Intra-class variability
– Inter-class variability
Characters that look similar
The letter “T” in different typefaces
6
Pattern Class Model
• Different descriptions, which are typically
mathematical in form for each class/population
(e.g., a probability density like Gaussian)
7
Pattern Recognition
• Key Objectives:
– Process the sensed data to eliminate noise
– Hypothesize the models that describe each class
population (e.g., recover the process that generated the
patterns).
– Given a sensed pattern, choose the best-fitting model
for it and then assign it to class associated with the
model.
8
Classification vs Clustering
– Classification (known categories)
– Clustering (creation of new categories)
Category “A”
Category “B”
Classification (Recognition)
(Supervised Classification)
Clustering
(Unsupervised Classification)
9
Emerging PR Applications
Problem Input Output
Speech recognition Speech waveforms Spoken words, speaker
identity
Non-destructive testing Ultrasound, eddy current,
acoustic emission waveforms
Presence/absence of flaw,
type of flaw
Detection and diagnosis of
disease
EKG, EEG waveforms Types of cardiac
conditions, classes of brain
conditions
Natural resource
identification
Multispectral images Terrain forms, vegetation
cover
Aerial reconnaissance Visual, infrared, radar images Tanks, airfields
Character recognition
(page readers, zip code,
license plate)
Optical scanned image Alphanumeric characters
10
Emerging PR Applications (cont’d)
Problem Input Output
Identification and counting
of cells
Slides of blood samples, micro-
sections of tissues
Type of cells
Inspection (PC boards, IC
masks, textiles)
Scanned image (visible,
infrared)
Acceptable/unacceptable
Manufacturing 3-D images (structured light,
laser, stereo)
Identify objects, pose,
assembly
Web search Key words specified by a user Text relevant to the user
Fingerprint identification Input image from fingerprint
sensors
Owner of the fingerprint,
fingerprint classes
Online handwriting
retrieval
Query word written by a user Occurrence of the word in
the database
11
Main PR Areas
• Template matching
– The pattern to be recognized is matched against a stored template
while taking into account all allowable pose (translation and
rotation) and scale changes.
• Statistical pattern recognition
– Focuses on the statistical properties of the patterns (i.e., probability
densities).
• Structural Pattern Recognition
– Describe complicated objects in terms of simple primitives and
structural relationships.
• Syntactic pattern recognition
– Decisions consist of logical rules or grammars.
• Artificial Neural Networks
– Inspired by biological neural network models.
12
Template Matching
Template
Input scene
13
Deformable Template:
Corpus Callosum Segmentation
Prototype registration to the
low-level segmented image
Shape training set
Prototype and
variation
learning
Prototype warping
14
Statistical Pattern Recognition
• Patterns represented in a feature space
• Statistical model for pattern generation in
feature space
15
Statistical Pattern Recognition
Preprocessing
Feature
extraction
Classification
Learning
Feature
selection
Recognition
Training
Pattern
Patterns
+
Class labels
Preprocessing
16
Structural Pattern Recognition
• Describe complicated objects in terms of simple
primitives and structural relationships.
• Decision-making when features are non-numeric or
structural
N
M
L
T
X
Z
Scene
Object Background
D E
L T X Y Z
M N
D E
17
Syntactic Pattern Recognition
Preprocessing
Primitive,
relation
extraction
Syntax,
structural
analysis
Grammatical,
structural
inference
Primitive
selection
Recognition
Training
pattern
Patterns
+
Class labels
Preprocessing
Describe patterns using deterministic grammars or formal languages
18
Chromosome Grammars
Image of human chromosomes
Hierarchical-structure description of a submedian chromosome
19
Artificial Neural Networks
• Massive parallelism is essential for complex
pattern recognition tasks (e.g., speech and
image recognition)
– Human take only a few hundred ms for most
cognitive tasks; suggests parallel computation
• Biological networks attempt to achieve good
performance via dense interconnection of
simple computational elements (neurons)
– Number of neurons  1010 – 1012
– Number of interconnections/neuron  103 – 104
– Total number of interconnections  1014
20
Artificial Neural Nodes
• Nodes in neural networks are nonlinear, typically
analog
where is an internal threshold
x1
x2
xd
Y (output)
w1
wd
21
• Feed-forward nets with one or more layers (hidden)
between the input and output nodes
• A three-layer net can generate arbitrary complex decision
regions
• These nets can be trained by the back-propagation training
algorithm
Multilayer Perceptron
.
.
.
.
.
.
.
.
.
d inputs First hidden layer
NH1 input units
Second hidden layer
NH2 input units
c outputs
22
Comparing Pattern Recognition
Models
• Template Matching
– Learning is difficult for deformable templates
• Structural / Syntactic
– Primitive extraction is sensitive to noise
– Describing a pattern in terms of primitives is difficult
• Statistical
– Assumption of density model for each class
• Artificial Neural Network
– Parameter tuning and local minima in learning
23
Main Components of a PR system
24
Complexity of PR – An Example
Fish Classification:
Sea Bass / Salmon
Preprocessing involves:
(1) image enhancement
(2) separating touching
or occluding fish
(3) finding the boundary
of the fish
25
How to separate
sea bass from salmon?
• Possible features to be used:
– Length
– Lightness
– Width
– Number and shape of fins
– Position of the mouth
– Etc …
26
Decision Using Length
• Choose the optimal threshold using a number of
training examples.
27
Decision Using Average Lightness
• Choose the optimal threshold using a number of
training examples.
Overlap in the histograms is small compared to length feature
28
Cost of Missclassification
• There are two possible classification errors.
(1) deciding the fish was a sea bass when it was a
salmon.
(2) deciding the fish was a salmon when it was a sea
bass.
• Which error is more important ?
29
Decision Using Multiple Features
• To improve recognition, we might have to use
more than one feature at a time.
– Single features might not yield the best performance.
– Combinations of features might yield better
performance.
1
2
x
x
 
 
 
1
2
:
:
x lightness
x width
30
Decision Boundary
• Partition the feature space into two regions by
finding the decision boundary that minimizes the
error.
31
How Many Features and Which?
• Issues with feature extraction:
– Correlated features do not improve performance.
– It might be difficult to extract certain features.
– It might be computationally expensive to extract many
features.
– “Curse” of dimensionality …
32
Curse of Dimensionality
• Adding too many features can, paradoxically, lead to a
worsening of performance.
– Divide each of the input features into a number of intervals, so that the
value of a feature can be specified approximately by saying in which
interval it lies.
– If each input feature is divided into M divisions, then the total number of
cells is Md (d: # of features) which grows exponentially with d.
– Since each cell must contain at least one point, the number of training
data grows exponentially !!
33
Model Complexity
• We can get perfect classification performance on the
training data by choosing complex models.
• Complex models are tuned to the particular training
samples, rather than on the characteristics of the true
model.
Issue of generalization
34
Generalization
• The ability of the classifier to produce correct results on
novel patterns.
• How can we improve generalization performance ?
– More training examples (i.e., better pdf estimates).
– Simpler models (i.e., simpler classification boundaries) usually yield
better performance.
Simplify the decision boundary!
35
Key Questions in PR
• How should we quantify and favor simpler
classifiers ?
• Can we predict how well the system will
generalize to novel patterns ?
36
The Design Cycle
37
Overview of Important Issues
• Noise / Segmentation
• Data Collection / Feature Extraction
• Pattern Representation / Missing Features
• Model Selection / Overfitting
• Prior Knowledge / Context
• Classifier Combination
• Costs and Risks
• Computational Complexity
38
Issue: Noise
• Various types of noise (e.g., shadows, conveyor
belt might shake, etc.)
• Noise can reduce the reliability of the feature
values measured.
• Knowledge of the noise process can help improve
performance.
39
Issue: Segmentation
• Individual patterns have to be segmented.
– How can we segment without having categorized them
first ?
– How can we categorize them without having segmented
them first ?
• How do we "group" together the proper number of
elements ?
40
Issue: Data Collection
• How do we know that we have collected an
adequately large and representative set of
examples for training/testing the system?
41
Issue: Feature Extraction
• It is a domain-specific problem which influences
classifier's performance.
• Which features are most promising ?
• Are there ways to automatically learn which
features are best ?
• How many should we use ?
• Choose features that are robust to noise.
• Favor features that lead to simpler decision
regions.
42
Issue: Pattern Representation
• Similar patterns should have similar
representations.
• Patterns from different classes should have
dissimilar representations.
• Pattern representations should be invariant
to transformations such as:
– translations, rotations, size, reflections, non-
rigid deformations
43
Issue: Missing Features
• Certain features might be missing (e.g., due to
occlusion).
• How should the classifier make the best decision
with missing features ?
• How should we train the classifier with missing
features ?
44
Issue: Model Selection
• How do we know when to reject a class of models
and try another one ?
• Is the model selection process just a trial and error
process ?
• Can we automate this process ?
45
Issue: Overfitting
• Models complex than necessary lead to overfitting
(i.e., good performance on the training data but
poor performance on novel data).
• How can we adjust the complexity of the model ?
(not very complex or simple).
• Are there principled methods for finding the best
complexity ?
46
Issue: Classifier Combination
• Performance can be improved using a "pool" of
classifiers.
• How should we combine multiple classifiers ?
47
Issue: Costs and Risks
• Each classification is associated with a cost or risk
(e.g., classification error).
• How can we incorporate knowledge about such
risks ?
• Can we estimate the lowest possible risk of any
classifier ?
48
Issue: Computational Complexity
• How does an algorithm scale with
– the number of feature dimensions
– number of patterns
– number of categories
• What is the tradeoff between computational
ease and performance ?
49
General Purpose PR Systems?
• Humans have the ability to switch rapidly and
seamlessly between different pattern recognition
tasks
• It is very difficult to design a device that is
capable of performing a variety of classification
tasks
– Different decision tasks may require different features.
– Different features might yield different solutions.
– Different tradeoffs (e.g., classification error) exist for
different tasks.

introduction to pattern recognition

  • 1.
  • 2.
    2 What is PatternRecognition (PR)? • It is the study of how machines can – observe the environment – learn to distinguish patterns of interest from their background – make sound and reasonable decisions about the categories of the patterns.
  • 3.
    3 What is apattern? • Watanable [163] defines a pattern as “ the opposite of a chaos; it is an entity, vaguely defined, that could be given a name”
  • 4.
    4 Other Patterns • Insurance,credit card applications - applicants are characterized by – # of accidents, make of car, year of model – Income, # of dependents, credit worthiness, mortgage amount • Web documents – Key words based descriptions (e.g., documents containing “terrorism”, “Osama” are different from those containing “football”, “NFL”). • Housing market – Location, size, year, school district, etc.
  • 5.
    5 Pattern Class • Acollection of “similar” (not necessarily identical) objects – Intra-class variability – Inter-class variability Characters that look similar The letter “T” in different typefaces
  • 6.
    6 Pattern Class Model •Different descriptions, which are typically mathematical in form for each class/population (e.g., a probability density like Gaussian)
  • 7.
    7 Pattern Recognition • KeyObjectives: – Process the sensed data to eliminate noise – Hypothesize the models that describe each class population (e.g., recover the process that generated the patterns). – Given a sensed pattern, choose the best-fitting model for it and then assign it to class associated with the model.
  • 8.
    8 Classification vs Clustering –Classification (known categories) – Clustering (creation of new categories) Category “A” Category “B” Classification (Recognition) (Supervised Classification) Clustering (Unsupervised Classification)
  • 9.
    9 Emerging PR Applications ProblemInput Output Speech recognition Speech waveforms Spoken words, speaker identity Non-destructive testing Ultrasound, eddy current, acoustic emission waveforms Presence/absence of flaw, type of flaw Detection and diagnosis of disease EKG, EEG waveforms Types of cardiac conditions, classes of brain conditions Natural resource identification Multispectral images Terrain forms, vegetation cover Aerial reconnaissance Visual, infrared, radar images Tanks, airfields Character recognition (page readers, zip code, license plate) Optical scanned image Alphanumeric characters
  • 10.
    10 Emerging PR Applications(cont’d) Problem Input Output Identification and counting of cells Slides of blood samples, micro- sections of tissues Type of cells Inspection (PC boards, IC masks, textiles) Scanned image (visible, infrared) Acceptable/unacceptable Manufacturing 3-D images (structured light, laser, stereo) Identify objects, pose, assembly Web search Key words specified by a user Text relevant to the user Fingerprint identification Input image from fingerprint sensors Owner of the fingerprint, fingerprint classes Online handwriting retrieval Query word written by a user Occurrence of the word in the database
  • 11.
    11 Main PR Areas •Template matching – The pattern to be recognized is matched against a stored template while taking into account all allowable pose (translation and rotation) and scale changes. • Statistical pattern recognition – Focuses on the statistical properties of the patterns (i.e., probability densities). • Structural Pattern Recognition – Describe complicated objects in terms of simple primitives and structural relationships. • Syntactic pattern recognition – Decisions consist of logical rules or grammars. • Artificial Neural Networks – Inspired by biological neural network models.
  • 12.
  • 13.
    13 Deformable Template: Corpus CallosumSegmentation Prototype registration to the low-level segmented image Shape training set Prototype and variation learning Prototype warping
  • 14.
    14 Statistical Pattern Recognition •Patterns represented in a feature space • Statistical model for pattern generation in feature space
  • 15.
  • 16.
    16 Structural Pattern Recognition •Describe complicated objects in terms of simple primitives and structural relationships. • Decision-making when features are non-numeric or structural N M L T X Z Scene Object Background D E L T X Y Z M N D E
  • 17.
  • 18.
    18 Chromosome Grammars Image ofhuman chromosomes Hierarchical-structure description of a submedian chromosome
  • 19.
    19 Artificial Neural Networks •Massive parallelism is essential for complex pattern recognition tasks (e.g., speech and image recognition) – Human take only a few hundred ms for most cognitive tasks; suggests parallel computation • Biological networks attempt to achieve good performance via dense interconnection of simple computational elements (neurons) – Number of neurons  1010 – 1012 – Number of interconnections/neuron  103 – 104 – Total number of interconnections  1014
  • 20.
    20 Artificial Neural Nodes •Nodes in neural networks are nonlinear, typically analog where is an internal threshold x1 x2 xd Y (output) w1 wd
  • 21.
    21 • Feed-forward netswith one or more layers (hidden) between the input and output nodes • A three-layer net can generate arbitrary complex decision regions • These nets can be trained by the back-propagation training algorithm Multilayer Perceptron . . . . . . . . . d inputs First hidden layer NH1 input units Second hidden layer NH2 input units c outputs
  • 22.
    22 Comparing Pattern Recognition Models •Template Matching – Learning is difficult for deformable templates • Structural / Syntactic – Primitive extraction is sensitive to noise – Describing a pattern in terms of primitives is difficult • Statistical – Assumption of density model for each class • Artificial Neural Network – Parameter tuning and local minima in learning
  • 23.
  • 24.
    24 Complexity of PR– An Example Fish Classification: Sea Bass / Salmon Preprocessing involves: (1) image enhancement (2) separating touching or occluding fish (3) finding the boundary of the fish
  • 25.
    25 How to separate seabass from salmon? • Possible features to be used: – Length – Lightness – Width – Number and shape of fins – Position of the mouth – Etc …
  • 26.
    26 Decision Using Length •Choose the optimal threshold using a number of training examples.
  • 27.
    27 Decision Using AverageLightness • Choose the optimal threshold using a number of training examples. Overlap in the histograms is small compared to length feature
  • 28.
    28 Cost of Missclassification •There are two possible classification errors. (1) deciding the fish was a sea bass when it was a salmon. (2) deciding the fish was a salmon when it was a sea bass. • Which error is more important ?
  • 29.
    29 Decision Using MultipleFeatures • To improve recognition, we might have to use more than one feature at a time. – Single features might not yield the best performance. – Combinations of features might yield better performance. 1 2 x x       1 2 : : x lightness x width
  • 30.
    30 Decision Boundary • Partitionthe feature space into two regions by finding the decision boundary that minimizes the error.
  • 31.
    31 How Many Featuresand Which? • Issues with feature extraction: – Correlated features do not improve performance. – It might be difficult to extract certain features. – It might be computationally expensive to extract many features. – “Curse” of dimensionality …
  • 32.
    32 Curse of Dimensionality •Adding too many features can, paradoxically, lead to a worsening of performance. – Divide each of the input features into a number of intervals, so that the value of a feature can be specified approximately by saying in which interval it lies. – If each input feature is divided into M divisions, then the total number of cells is Md (d: # of features) which grows exponentially with d. – Since each cell must contain at least one point, the number of training data grows exponentially !!
  • 33.
    33 Model Complexity • Wecan get perfect classification performance on the training data by choosing complex models. • Complex models are tuned to the particular training samples, rather than on the characteristics of the true model. Issue of generalization
  • 34.
    34 Generalization • The abilityof the classifier to produce correct results on novel patterns. • How can we improve generalization performance ? – More training examples (i.e., better pdf estimates). – Simpler models (i.e., simpler classification boundaries) usually yield better performance. Simplify the decision boundary!
  • 35.
    35 Key Questions inPR • How should we quantify and favor simpler classifiers ? • Can we predict how well the system will generalize to novel patterns ?
  • 36.
  • 37.
    37 Overview of ImportantIssues • Noise / Segmentation • Data Collection / Feature Extraction • Pattern Representation / Missing Features • Model Selection / Overfitting • Prior Knowledge / Context • Classifier Combination • Costs and Risks • Computational Complexity
  • 38.
    38 Issue: Noise • Varioustypes of noise (e.g., shadows, conveyor belt might shake, etc.) • Noise can reduce the reliability of the feature values measured. • Knowledge of the noise process can help improve performance.
  • 39.
    39 Issue: Segmentation • Individualpatterns have to be segmented. – How can we segment without having categorized them first ? – How can we categorize them without having segmented them first ? • How do we "group" together the proper number of elements ?
  • 40.
    40 Issue: Data Collection •How do we know that we have collected an adequately large and representative set of examples for training/testing the system?
  • 41.
    41 Issue: Feature Extraction •It is a domain-specific problem which influences classifier's performance. • Which features are most promising ? • Are there ways to automatically learn which features are best ? • How many should we use ? • Choose features that are robust to noise. • Favor features that lead to simpler decision regions.
  • 42.
    42 Issue: Pattern Representation •Similar patterns should have similar representations. • Patterns from different classes should have dissimilar representations. • Pattern representations should be invariant to transformations such as: – translations, rotations, size, reflections, non- rigid deformations
  • 43.
    43 Issue: Missing Features •Certain features might be missing (e.g., due to occlusion). • How should the classifier make the best decision with missing features ? • How should we train the classifier with missing features ?
  • 44.
    44 Issue: Model Selection •How do we know when to reject a class of models and try another one ? • Is the model selection process just a trial and error process ? • Can we automate this process ?
  • 45.
    45 Issue: Overfitting • Modelscomplex than necessary lead to overfitting (i.e., good performance on the training data but poor performance on novel data). • How can we adjust the complexity of the model ? (not very complex or simple). • Are there principled methods for finding the best complexity ?
  • 46.
    46 Issue: Classifier Combination •Performance can be improved using a "pool" of classifiers. • How should we combine multiple classifiers ?
  • 47.
    47 Issue: Costs andRisks • Each classification is associated with a cost or risk (e.g., classification error). • How can we incorporate knowledge about such risks ? • Can we estimate the lowest possible risk of any classifier ?
  • 48.
    48 Issue: Computational Complexity •How does an algorithm scale with – the number of feature dimensions – number of patterns – number of categories • What is the tradeoff between computational ease and performance ?
  • 49.
    49 General Purpose PRSystems? • Humans have the ability to switch rapidly and seamlessly between different pattern recognition tasks • It is very difficult to design a device that is capable of performing a variety of classification tasks – Different decision tasks may require different features. – Different features might yield different solutions. – Different tradeoffs (e.g., classification error) exist for different tasks.

Editor's Notes