Class01

9.520
CBCL MIT

Statistical Learning Theory and
Applications

Sayan Mukherjee and Ryan Rifkin
+ tomaso poggio

+ Alex Rakhlin

9.520, spring 2003

Learning: Brains and Machines
CBCL MIT

Learning is the gateway to
understanding the brain and to
making intelligent machines.

Problem of learning:
a focus for
o modern math
o computer algorithms
o neuroscience

9.520, spring 2003

Multidisciplinary Approach to Learning
CBCL MIT
1 l 
min  ∑ V ( yi , f ( xi )) + µ f
2

Learning theory
K
 i =1
f ∈H l

+ algorithms

• Information extraction (text,Web…)
• Computer vision and graphics
ENGINEERING • Man-machine interfaces
APPLICATIONS • Bioinformatics (DNA arrays)
• Artificial Markets (society of
learning agents)

Computational Learning to recognize objects in
Neuroscience: visual cortex
models+experiments

9.520, spring 2003

Class
CBCL MIT

Rules of the game: problem sets (2 + 1 optional)
final project
grading
participation!

Web site:
www.ai.mit.edu/projects/cbcl/courses/course9.520/index.html

9.520, spring 2003

Overview of overview
CBCL MIT

o Supervised learning: the problem and how to frame
it within classical math
o Examples of in-house applications
o Learning and the brain

9.520, spring 2003

Learning from Examples
CBCL MIT

INPUT
f OUTPUT

EXAMPLES

INPUT1 OUTPUT1

INPUT2 OUTPUT2
...........
INPUTn OUTPUTn

9.520, spring 2003

Learning from Examples: formal setting
CBCL MIT

Given a set of l examples (past data)

{ ( x1 , y1 ), ( x2 , y2 ),...,( xl , yl )}
Question: find function f such that
f ( x) = y
ˆ
is a good predictor of y for a future input x

9.520, spring 2003

Classical equivalent view: supervised learning as
problem of multivariate function approximation
CBCL MIT

= data from f
= function f
= approximation of f y

x
Generalization: estimating value of function where
there are no data

Regression: function is real valued

Classification: function is binary
Problem is ill-posed!
9.520, spring 2003

Well-posed problems
CBCL MIT

A problem is well-posed if its solution
• exists J. S. Hadamard, 1865-1963

• is unique
• is stable, eg depends continuously on the data

Inverse problems (tomography, radar, scattering…)
are typically ill-posed
9.520, spring 2003

Two key requirements to solve the problem of
learning from examples:
CBCL MIT
well-posedness and consistency.
A standard way to learn from examples is

The problem is in general ill-posed and does not have a
predictive (or consistent) solution. By choosing an appropriate
hypothesis space H it can be made well-posed. Independently,
it is also known that an appropriate hypothesis space can
guarantee consistency.
We have proved recently a surprising theorem: consistency
and well posedness are equivalent, eg one implies the other.

Thus a stable solution is also predictive and viceversa.
9.520, spring 2003
Mukherjee, Niyogi, Poggio, 2002

A simple, “magic” algorithm ensures consistency
and well-posedness
CBCL MIT

1 l 
min  ∑ V ( f ( xi ) − yi ) + λ
2
f K
 i =1
f ∈H l


f (x) = ∑i α i K (x, x i ) + p(x)
l
implies

Equation includes Regularization Networks (examples are
Radial Basis Functions and Support Vector Machines)

9.520, spring 2003
For a review, see Poggio and Smale, Notices of the AMS, 2003

Classical framework but with more general
loss function
CBCL MIT

The algorithm uses a quite general space of functions or “hypotheses” :
RKHSs. n of the classical framework can provide a better measure
of “loss” (for instance for classification)…

1 l 
min  ∑ V ( f ( xi ) − yi ) + λ
2
f K
 i =1
f ∈H l


9.520, spring 2003 Girosi, Caprile, Poggio, 1990

Formally…
CBCL MIT

9.520, spring 2003

Equivalence to networks
CBCL MIT

Many different V lead to the same solution…

f (x) = ∑i α i K (x, x i ) + b
l

X X
1 l

…and can be “written” as
K K K

the same type of network..

+

f

9.520, spring 2003

Unified framework: RN, SVMR and SVMC
CBCL MIT

1 l 
min  ∑ V ( f ( xi ) − yi ) + λ
2
f K
 i =1
f ∈H l


Equation includes Regularization Networks, eg Radial Basis
Functions, and Support Vector Machines (classification and
regression) and some multilayer Perceptrons.

Review by Evgeniou, Pontil and Poggio
9.520, spring 2003 Advances in Computational Mathematics, 2000

The theoretical foundations of statistical learning
are becoming part of mainstream mathematics
CBCL MIT

9.520, spring 2003

Theory summary
CBCL MIT

In the course we will introduce

• Stability (well-posedness)
• Consistency (predictivity)
• RKHSs hypotheses spaces
• Regularization techniques leading to RN and SVMs
• Generalization bounds based on stability
• Alternative bounds (VC and Vgamma dimensions)

• Related topics, extensions beyond SVMs

• Applications
• A new key result

9.520, spring 2003

Overview of overview
CBCL MIT

o Supervised learning: real math
o Examples of recent and ongoing in-house research on
applications
o Learning and the brain

9.520, spring 2003

Learning from Examples: engineering
applications
CBCL MIT

INPUT OUTPUT

Bioinformatics
Artificial Markets
Object categorization
Object identification
Image analysis
Graphics
Text Classification
…..
9.520, spring 2003

Bioinformatics application: predicting type of
cancer from DNA chips signals
CBCL MIT
Learning from examples paradigm

Prediction
Statistical Learning Prediction
Algorithm

Examples
New sample

9.520, spring 2003

Bioinformatics application: predicting type of
cancer from DNA chips
CBCL MIT

New feature selection SVM:

Only 38 training examples, 7100 features

AML vs ALL: 40 genes 34/34 correct, 0 rejects.
5 genes 31/31 correct, 3 rejects of which 1 is an error.

9.520, spring 2003

Face identification: example
CBCL MIT

An old view-based system: 15 views

Performance: 98% on 68 person database
Beymer, 1995

9.520, spring 2003

Face identification
CBCL MIT

Training Data Identification Result
A

SVM Classifier

Feature extraction

New face image

9.520, spring 2003 Bernd Heisele, Jennifer Huang, 2002

Real time detection and identification
CBCL MIT

 Ported to Oxygen’s H21 (handheld device)
Identification of rotated faces up to about 45º
 Robust against changes in illumination and
background
 Frame rate of 15 Hz
Ho, Heisele, Poggio, 2000

Weinstein, Ho, Heisele, Poggio,
9.520, spring 2003 Steele, Agarwal, 2002

Learning Object Detection:
Finding Frontal Faces ...
CBCL MIT

Training Database
1000+ Real, 3000+ VIRTUAL
50,0000+ Non-Face Pattern
Sung, Poggio 1995
9.520, spring 2003

Recent work on face detection
CBCL MIT
• Detection of faces
in images

• Robustness against
slight rotations in
depth and image
plane

• Full face vs.
component-based
classifier

9.520, spring 2003 Heisele, Pontil, Poggio, 2000

The best existing system for face
CBCL
detection? MIT
Test: CBCL set / 1,154 real faces / 14,579 non-faces / 132,365 shifted windows
Training: 2,457 synthetic faces (58x58) / 13,654 non-faces

1

0.9

0.8

0.7

0.6
Correct

0.5

0.4

0.3

component classifier
0.2
whole face classifier

0.1

0
0 0.00001 0.00002 0.00003 0.00004 0.00005 0.00006 0.00007 0.00008 0.00009 0.0001
False positives / number of windows

9.520, spring 2003 Heisele, Serre, Poggio et al., 2000

Trainable System for Object Detection:
Pedestrian detection - Results
CBCL MIT

9.520, spring 2003 Papageorgiou and Poggio, 1998

Trainable System for Object Detection:
Pedestrian detection - Training
CBCL MIT

Papageorgiou and Poggio, 1998
9.520, spring 2003

The system was tested in a test car
(Mercedes)
CBCL MIT

9.520, spring 2003

CBCL MIT

9.520, spring 2003

System installed in experimental Mercedes
CBCL MIT

A fast version, integrated
with a real-time obstacle
detection system

MPEG

9.520, spring 2003 Constantine Papageorgiou

System installed in experimental
Mercedes
CBCL MIT

A fast version, integrated
with a real-time obstacle
detection system

MPEG

9.520, spring 2003 Constantine Papageorgiou

People classification/detection: training
the system
CBCL MIT

... ...

1848 patterns 7189 patterns

Representation: overcomplete dictionary of Haar wavelets; high
dimensional feature space (>1300 features)

Core learning algorithm:
Support Vector Machine
classifier

pedestrian detection system
9.520, spring 2003

An improved approach: combining component
detectors
CBCL MIT

9.520, spring 2003 Mohan, Papageorgiou and Poggio, 1999

Results
CBCL MIT

The system is capable of
detecting partially
occluded people

9.520, spring 2003

System Performance
CBCL MIT

Combination systems (ACC)
perform best.
All component based
systems perform better than
full body person detectors.

9.520, spring 2003 A. Mohan, C. Papageorgiou, T. Poggio

Learning from Examples: Applications
CBCL MIT

INPUT OUTPUT

Object identification
Object categorization
Image analysis
Graphics
Finance
Bioinformatics
…
9.520, spring 2003

Image Analysis
CBCL MIT
IMAGE ANALYSIS: OBJECT RECOGNITION AND POSE
ESTIMATION

⇒ Bear (0° view)

⇒ Bear (45° view)

9.520, spring 2003

Computer vision: analysis of facial expressions
CBCL MIT
MI/ CBCL/AI MIT
12

10

8

6

4

2

0

85
79
73
67
61
55
49
43
37
31
25
19
13
7
1

The main goal is to estimate basic facial parameters, e.g.
degree of mouth openness, through learning. One of the main
applications is video-speech fusion to improve speech
recognition systems.
NTT, Atsugi, January 2002
9.520, spring 2002 Kumar, Poggio, 2001

CBCL Combining Top-Down MIT

Constraints and Bottom-Up Data
Morphable Model

= a1 + a2 + a3 + …

Learning Engine
SVM, RBF, etc

9.520, spring 2002 Kumar, Poggio, 2001

The Three Stages
CBCL MIT

Localization of
Localization of Analysis of
Analysis of
Face Detection
Facial Features
Facial Features Facial parts
Facial parts
12

10

8

6

4

2

0

85
79
73
67
61
55
49
43
37
31
25
19
13
7
1
For more details  Appendix 2
9.520, spring 2003

Image Synthesis
CBCL MIT

UNCONVENTIONAL GRAPHICS

Θ = 0° view ⇒

Θ = 45° view ⇒

9.520, spring 2003

Supermodels
CBCL MIT

Cc-cs.mpg

MPEG (Steve Lines)
9.520, spring 2003

A trainable system for TTVS
CBCL MIT

Input: Text
Output:
photorealistic talking
face uttering text

Tony Ezzat
9.520, spring 2003

From 2D to a better 2d and from 2D
CBCL to 3D MIT

Two extensions of our text-to-visual-speech
(TTVS) system:
• morphing of 3D models of faces to output a 3D model
of a speaking face (Blanz)
• Learn facial appearance, dynamics and coarticulation
(Ezzat)

9.520, spring 2002

Towards 3D
CBCL
MIT

Neutral face shape was reconstructed from a single image,
and animation transfered to new face.

3D face scans were collected.

Click for animation

Click for animation

Blanz, Ezzat, Poggio
9.520, spring 2002

Using the same basic learning techniques: : Trainable
Videorealistic Face Animation
CBCL (voice is real, video is synthetic)
MIT

Ezzat, Geiger, Poggio, SigGraph 2002
9.520, spring 2002

Trainable Videorealistic Face Animation

2. Run Time
CBCL
1. Learning MIT

For any speech input the system
provides as output a synthetic
video stream
System learns from 4 mins Phone Stream
of video the face appearance /SIL//B/ /B//AE/ /AE//AE/ /JH//JH/ /JH/
/SIL/

(Morphable Model) and the
speech dynamics of the
person
Phonetic Models
Trajectory
Synthesis {µ p ,Σp}

MMM { I i , Ci }
Image Prototypes

Tony Ezzat,Geiger, Poggio, SigGraph 2002
9.520, spring 2002

Novel !!!: Trainable Videorealistic Face Animation
CBCL MIT

Let us look at video!!!

Ezzat, Poggio, 2002
9.520, spring 2002

Reconstructed 3D Face Models from 1 image
CBCL MIT

Blanz and Vetter,
MPI
SigGraph ‘99
9.520, spring 2003

Reconstructed 3D Face Models from 1
image
CBCL MIT

Blanz and Vetter,
MPI
SigGraph ‘99
9.520, spring 2003

Artificial Agents – learning algorithms – buy
CBCL
and sell stocks MIT

Example of a subproject: The Electronic Market Maker

Public Limit/Market
Orders
Bid/Ask Prices
All available market Bid/Ask Sizes
Information Buy/Sell

User Inventory control
control/calibration

Learning
Feedback
9.520, spring 2003
Loop Nicholas Chang

The Ventral Visual Stream: From V1 to IT
CBCL MIT

modified from Ungerleider and Haxby, 1994

Hubel & Wiesel, 1959
Desimone, 1991
9.520, spring 2003 Desimone, 1991

Summary of “basic facts”
CBCL MIT
Accumulated evidence points to three (mostly accepted)
properties of the ventral visual stream architecture:

• Hierarchical build-up of invariances (first to translation
and scale, then to viewpoint etc.) , size of the
receptive fields and complexity of preferred stimuli

• Basic feed-forward processing of information (for
“immediate” recognition tasks)

• Learning of an individual object generalizes to scale
and position

9.520, spring 2003

The standard model: following Hubel and Wiesel…
CBCL MIT
Learning
(supervised)

Learning
(unsupervised
)

9.520, spring 2003 Riesenhuber & Poggio, Nature Neuroscience, 2000

The standard model
CBCL MIT
Interprets or predicts many existing data in microcircuits and system physiology,
and also in cognitive science:

• What some complex cells and V4 cells do and how: MAX…

• View-tuning of IT cells (Logothetis)
• Response to pseudomirror views
• Effect of scrambling
• Multiple objects
• Robustness to clutter
• Consistent with K. Tanaka’s simplification procedure
• Categorization tasks (cats vs dogs)
• Invariance to translation, scale etc…

• Gender classification
• Face inversion effect : experience, viewpoint, other-race, configural
vs. featural representation
• Transfer of generalization
• No binding problem, no need for oscillations
9.520, spring 2003

Model’s early predictions: neurons
become
CBCL view-tuned during recognition MIT

Logothetis, Pauls, and Poggio,
1995;
Logothetis, Pauls, 1995

9.520, spring 2003 Poggio, Edelman, Riesenhuber (1990, 2000)

Recording Sites in Anterior IT
CBCL MIT

LUN
LAT

STS
…neurons tuned to
faces are close by….
IOS

AMTS
LAT
STS Ho=0

AMTS

Logothetis, Pauls, and Poggio, 1995;
Logothetis, Pauls, 1995
9.520, spring 2003

The Cortex: Neurons Tuned to Object Views as
predicted by model
CBCL MIT

9.520, spring 2003 Logothetis, Pauls, Poggio, 1995

View-tuned cells: scale invariance
(one training view only)!
CBCL MIT
S cale Invariant Re sponse s of an IT Neuron

76 1 . 0 de g 76 1 . 7 5 de g 76 2 . 5 de g 76 3 . 2 5 de g
(x 0. 4) (x 0. 7) (x 1. 0) (x 1. 3)

Sp ikes/sec
Sp ikes/sec

Sp ikes/sec

Spikes/sec
0 0 0 0
0 1000 2000 3000 0 1000 2000 3000 0 1000 2000 3000 0 1000 2000 3000
Time (msec) Time (msec) Time (msec) Time (msec)

76 4 . 0 de g 76 4 . 7 5 de g 76 5 . 5 de g 76 6 . 2 5 de g
(x 1. 6) (x 1. 9) (x 2. 2) (x 2. 5)
Sp ikes/sec

Sp ikes/sec

Sp ikes/sec
Sp ikes/sec

0 0 0 0
0 1000 2000 3000 0 1000 2000 3000 0 1000 2000 3000 0 1000 2000 3000
Time (msec) Time (msec) Time (msec) Time (msec)

9.520, spring 2003
Logothetis, Pauls, and Poggio, 1995; Logothetis, Pauls, 1995

Joint Project (Max Riesenhuber) with Earl Miller & David
Freedman (MIT): Neural Correlate of Categorization (NCC)
CBCL MIT

Define categories in morph space

60% Cat 60% Dog
80% Cat Morphs 80% Dog
Morphs Morphs
Morphs
Prototypes Prototypes
100% Dog
100% Cat

Category
9.520, spring 2003 boundary

Categorization task
CBCL MIT

Train monkey on categorization task

.
.
.
. (Match)
Fixation
500 ms. Sample
600 ms. Delay .
1000 .
ms.
Test .
(Nonmatch)
Delay
Test
(Match)

After training, record from neurons in IT & PFC
9.520, spring 2003

Single cell example: a PFC neuron that responds
more strongly to DOGS than CATS
CBCL MIT
Fixation Sample Delay Choice
13
Dog 100%
Dog 80%
Dog 60%
Firing Rate (Hz)

10

7

4
Cat 100%
Cat 80%
Cat 60%
1
-500 0 500 1000 1500 2000
Time from sample stimulus onset (ms)

D. Freedman + E. Miller + M.
Riesenhuber+T. Poggio (Science,
9.520, spring 2003 2001)

The model suggests same computations for
different recognition tasks (and objects!)
CBCL MIT
Task-specific
units (e.g., PFC)
General
representation
(IT)

 Is this
HMAX
what’s going
on in cortex?

9.520, spring 2003

Class01

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Class01

Similar to Class01 (20)

More from Sattar kayani

More from Sattar kayani (17)

Class01

Editor's Notes