This document provides an overview of the course "Statistical Learning Theory and Applications" being taught at MIT in the spring of 2003. The course will cover supervised learning theory and algorithms including regularization networks and support vector machines. It will explore applications of learning from examples in various domains including bioinformatics, computer vision, and text classification. The course will take a multidisciplinary approach, exploring learning from the perspectives of mathematics, algorithms, and neuroscience. Students will complete problem sets and a final project, and participation will be part of the grading.
Human: Thank you for the summary. Can you summarize the document in 2 sentences or less?
How can we apply machine learning techniques on graphs to obtain predictions in a variety of domains? Know more from Sami Abu-El-Haija, an AI Scientist with experience from both industry (Google Research) and academia (University of Southern California).
How can we apply machine learning techniques on graphs to obtain predictions in a variety of domains? Know more from Sami Abu-El-Haija, an AI Scientist with experience from both industry (Google Research) and academia (University of Southern California).
Shearlet Frames and Optimally Sparse ApproximationsJakob Lemvig
Shearlet theory has become a central tool in analyzing and representing 2D and 3D data with anisotropic features. Shearlet systems are systems of functions generated by one single generator with parabolic scaling, shearing, and translation operators applied to it, in much the same way wavelet systems are dyadic scalings and translations of a single function, but including a directional parameter. The success of shearlets owes to an extensive list of desirable properties: Shearlet systems can be generated by one function, they provide precise resolution of wavefront sets, they allow compactly supported analyzing elements, they are associated with fast decomposition algorithms, and they provide a unified treatment of the continuum and the digital world.
This talk gives an introduction to shearlet theory with focus on separable and compactly supported shearlets in 2D and 3D. We will consider constructions of band-limited and compactly supported shearlet frames in those dimensions. Finally, we will show that compactly supported shearlet frames satisfying weak decay, smoothness, and directional moment conditions provide optimally sparse approximations of a generalized model of cartoon-like images comprising of piecewise C² functions that are smooth apart from piecewise C² discontinuity edges.
This talk is based on joint with G. Kutyniok and W.-Q Lim (TU Berlin).
My invited talk at the 2018 Annual Meeting of SIAM (Society of Industrial and...Anirbit Mukherjee
This is a slightly expanded version of the talk I gave at the 2018 ISMP (International Symposium on Mathematical Programming). This SIAM talk has some more introductory material than the ISMP talk.
My 2hr+ survey talk at the Vector Institute, on our deep learning theorems.Anirbit Mukherjee
This survey talk at the Vector Institute is a much more extended version of my overview talks at the ISMP 2018 and the SIAM Annual Meeting 2018. This gives a lot more details about background concepts and proof strategies.
lassification with decision trees from a nonparametric predictive inference p...NTNU
An application of nonparametric predictive inference for multinomial data (NPI) to classification tasks is presented. This model is applied to an established procedure for building classification trees using imprecise probabilities and uncertainty measures, thus far used only with the imprecise Dirichlet model (IDM), that is defined through the use of a parameter expressing previous knowledge. The accuracy of that procedure of classification has a significant dependence on the value of the parameter used when the IDM is applied. A detailed study involving 40 data sets shows that the procedure using the NPI model (which has no parameter dependence) obtains a better trade-off between accuracy and size of tree than does the procedure when the IDM is used, whatever the choice of parameter. In a bias-variance study of the errors, it is proved that the procedure with the NPI model has a lower variance than the one with the IDM, implying a lower level of over-fitting.
https://telecombcn-dl.github.io/idl-2020/
Deep learning technologies are at the core of the current revolution in artificial intelligence for multimedia data analysis. The convergence of large-scale annotated datasets and affordable GPU hardware has allowed the training of neural networks for data analysis tasks which were previously addressed with hand-crafted features. Architectures such as convolutional neural networks, recurrent neural networks or Q-nets for reinforcement learning have shaped a brand new scenario in signal processing. This course will cover the basic principles of deep learning from both an algorithmic and computational perspectives.
https://telecombcn-dl.github.io/dlai-2019/
Deep learning technologies are at the core of the current revolution in artificial intelligence for multimedia data analysis. The convergence of large-scale annotated datasets and affordable GPU hardware has allowed the training of neural networks for data analysis tasks which were previously addressed with hand-crafted features. Architectures such as convolutional neural networks, recurrent neural networks or Q-nets for reinforcement learning have shaped a brand new scenario in signal processing. This course will cover the basic principles of deep learning from both an algorithmic and computational perspectives.
Cognition, Information and Subjective ComputationHector Zenil
One of the most important contending theories deeply connects consciousness to information theory. We keep connecting mind properties to computation. Turing did it with human intelligence and computation. John Searle (unintended I will claim) connected understanding (and consciousness) to program complexity (and soft AI). And more recently, Guilio Tononi formally connected internal experience and consciousness to computation and information. Therefore, can understanding computation shed light on intelligence and consciousness? I claim it does. So what is computation?. I aim at finding a grading (such as Tononi's phi) metric of computation, weakly observer dependent (following some ideas of Searle) and with considerations to resources complexity to give it sense to the Turing test (as Scott Aaronson would agree with).
Adaptive Neuro-Fuzzy Inference System based Fractal Image CompressionIDES Editor
This paper presents an Adaptive Neuro-Fuzzy
Inference System (ANFIS) model for fractal image
compression. One of the image compression techniques in
the spatial domain is Fractal Image Compression (FIC)
but the main drawback of FIC using traditional
exhaustive search is that it involves more computational
time due to global search. In order to improve the
computational time and compression ratio, artificial
intelligence technique like ANFIS has been used. Feature
extraction reduces the dimensionality of the problem and
enables the ANFIS network to be trained on an image
separate from the test image thus reducing the
computational time. Lowering the dimensionality of the
problem reduces the computations required during the
search. The main advantage of ANFIS network is that it
can adapt itself from the training data and produce a
fuzzy inference system. The network adapts itself
according to the distribution of feature space observed
during training. Computer simulations reveal that the
network has been properly trained and the fuzzy system
thus evolved, classifies the domains correctly with
minimum deviation which helps in encoding the image
using FIC.
https://telecombcn-dl.github.io/2018-dlai/
Deep learning technologies are at the core of the current revolution in artificial intelligence for multimedia data analysis. The convergence of large-scale annotated datasets and affordable GPU hardware has allowed the training of neural networks for data analysis tasks which were previously addressed with hand-crafted features. Architectures such as convolutional neural networks, recurrent neural networks or Q-nets for reinforcement learning have shaped a brand new scenario in signal processing. This course will cover the basic principles of deep learning from both an algorithmic and computational perspectives.
An Importance Sampling Approach to Integrate Expert Knowledge When Learning B...NTNU
The introduction of expert knowledge when learning Bayesian Networks from data is known to be an excellent approach to boost the performance of automatic learning methods, specially when the data is scarce. Previous approaches for this problem based on Bayesian statistics introduce the expert knowledge modifying the prior probability distributions. In this study, we propose a new methodology based on Monte Carlo simulation which starts with non-informative priors and requires knowledge from the expert a posteriori, when the simulation ends. We also explore a new Importance Sampling method for Monte Carlo simulation and the definition of new non-informative priors for the structure of the network. All these approaches are experimentally validated with five standard Bayesian networks.
Read more:
http://link.springer.com/chapter/10.1007%2F978-3-642-14049-5_70
https://telecombcn-dl.github.io/2018-dlai/
Deep learning technologies are at the core of the current revolution in artificial intelligence for multimedia data analysis. The convergence of large-scale annotated datasets and affordable GPU hardware has allowed the training of neural networks for data analysis tasks which were previously addressed with hand-crafted features. Architectures such as convolutional neural networks, recurrent neural networks or Q-nets for reinforcement learning have shaped a brand new scenario in signal processing. This course will cover the basic principles of deep learning from both an algorithmic and computational perspectives.
In large scale visual pattern recognition applications, when the subject set is large the traditional linear models like PCA/LDA/LPP, become inadequate in capturing the non-linearity and local variations of visual appearance manifold. Kernelized solutions can alleviate the problem to certain degree, but faces a computational complexity challenge of solving eigen or QP problems of size n x n for number of training samples n. In this work, we developed a novel solution to this problem by applying a data partition first and obtain a rich set of local data patch models, then the hierarchical structure of this rich set of models are computed with subspace clustering on Grassmanian manifold, via a VQ like algorithm with data partition locality constraint. At query time, a probe image is projected to the data space partition first to obtain the probe model, and the optimal local model is computed by traversing the model hierarchical tree. Simulation results demonstrated the effectiveness of this solution in computational efficiency and recognition accuracy, with applications in large subject set face recognition and image retrieval.
[Bio]
Zhu Li is currently a Senior Staff Researcher and Media Analytics & Processing Group Lead with the Media Networking Lab, Core Networks Research, FutureWei (Huawei) Technology USA, at Bridgewater, New Jersey. He received his PhD in Electrical & Computer Engineering from Northwestern University, Evanston in 2004. He was an Assistant Professor with the Dept of Computing, The Hong Kong Polytechnic University from 2008 to 2010, and a Senior Research Engineer, Senior Staff Research Engineering, and then Principal Staff Research Engineer with the Multimedia Research Lab (MRL), Motorola Labs, Schaumburg, Illinois, from 2000 to 2008. His research interests include audio-visual analytics and machine learning with its application in large scale video repositories annotation, search and recommendation, as well as video adaptation, source-channel coding and distributed optimization issues of the wireless video networks. He has 21 issued or pending patents, 70+ publications in book chapters, journals, conference proceedings and standards contributions in these areas. He is an IEEE senior member, elected Vice Chair of the IEEE Multimedia Communication Technical Committee (MMTC) 2008~2010, co-editor for the Springer-Verlag book on "Intelligent Video Communication: Techniques and Applications". He served on numerous conference and workshop TPCs and was symposium co-chair at IEEE ICC'2008, and on Best Paper Award Committee for IEEE ICME 2010. He received the Best Poster Paper Award from IEEE Int'l Conf on Multimedia & Expo (ICME) at Toronto, 2006, and the Best Paper Award from IEEE Int'l Conf on Image Processing (ICIP) at San Antonio, 2007.
Camp IT: Making the World More Efficient Using AI & Machine LearningKrzysztof Kowalczyk
Slides from the introductory lecture I gave for students at Camp IT 2019. I tried to cover artificial inteligence, machine learning, most popular algorithms and their applications to business as broadly as possible - for in-depth materials on the given topics, see links and references in the presentation.
Shearlet Frames and Optimally Sparse ApproximationsJakob Lemvig
Shearlet theory has become a central tool in analyzing and representing 2D and 3D data with anisotropic features. Shearlet systems are systems of functions generated by one single generator with parabolic scaling, shearing, and translation operators applied to it, in much the same way wavelet systems are dyadic scalings and translations of a single function, but including a directional parameter. The success of shearlets owes to an extensive list of desirable properties: Shearlet systems can be generated by one function, they provide precise resolution of wavefront sets, they allow compactly supported analyzing elements, they are associated with fast decomposition algorithms, and they provide a unified treatment of the continuum and the digital world.
This talk gives an introduction to shearlet theory with focus on separable and compactly supported shearlets in 2D and 3D. We will consider constructions of band-limited and compactly supported shearlet frames in those dimensions. Finally, we will show that compactly supported shearlet frames satisfying weak decay, smoothness, and directional moment conditions provide optimally sparse approximations of a generalized model of cartoon-like images comprising of piecewise C² functions that are smooth apart from piecewise C² discontinuity edges.
This talk is based on joint with G. Kutyniok and W.-Q Lim (TU Berlin).
My invited talk at the 2018 Annual Meeting of SIAM (Society of Industrial and...Anirbit Mukherjee
This is a slightly expanded version of the talk I gave at the 2018 ISMP (International Symposium on Mathematical Programming). This SIAM talk has some more introductory material than the ISMP talk.
My 2hr+ survey talk at the Vector Institute, on our deep learning theorems.Anirbit Mukherjee
This survey talk at the Vector Institute is a much more extended version of my overview talks at the ISMP 2018 and the SIAM Annual Meeting 2018. This gives a lot more details about background concepts and proof strategies.
lassification with decision trees from a nonparametric predictive inference p...NTNU
An application of nonparametric predictive inference for multinomial data (NPI) to classification tasks is presented. This model is applied to an established procedure for building classification trees using imprecise probabilities and uncertainty measures, thus far used only with the imprecise Dirichlet model (IDM), that is defined through the use of a parameter expressing previous knowledge. The accuracy of that procedure of classification has a significant dependence on the value of the parameter used when the IDM is applied. A detailed study involving 40 data sets shows that the procedure using the NPI model (which has no parameter dependence) obtains a better trade-off between accuracy and size of tree than does the procedure when the IDM is used, whatever the choice of parameter. In a bias-variance study of the errors, it is proved that the procedure with the NPI model has a lower variance than the one with the IDM, implying a lower level of over-fitting.
https://telecombcn-dl.github.io/idl-2020/
Deep learning technologies are at the core of the current revolution in artificial intelligence for multimedia data analysis. The convergence of large-scale annotated datasets and affordable GPU hardware has allowed the training of neural networks for data analysis tasks which were previously addressed with hand-crafted features. Architectures such as convolutional neural networks, recurrent neural networks or Q-nets for reinforcement learning have shaped a brand new scenario in signal processing. This course will cover the basic principles of deep learning from both an algorithmic and computational perspectives.
https://telecombcn-dl.github.io/dlai-2019/
Deep learning technologies are at the core of the current revolution in artificial intelligence for multimedia data analysis. The convergence of large-scale annotated datasets and affordable GPU hardware has allowed the training of neural networks for data analysis tasks which were previously addressed with hand-crafted features. Architectures such as convolutional neural networks, recurrent neural networks or Q-nets for reinforcement learning have shaped a brand new scenario in signal processing. This course will cover the basic principles of deep learning from both an algorithmic and computational perspectives.
Cognition, Information and Subjective ComputationHector Zenil
One of the most important contending theories deeply connects consciousness to information theory. We keep connecting mind properties to computation. Turing did it with human intelligence and computation. John Searle (unintended I will claim) connected understanding (and consciousness) to program complexity (and soft AI). And more recently, Guilio Tononi formally connected internal experience and consciousness to computation and information. Therefore, can understanding computation shed light on intelligence and consciousness? I claim it does. So what is computation?. I aim at finding a grading (such as Tononi's phi) metric of computation, weakly observer dependent (following some ideas of Searle) and with considerations to resources complexity to give it sense to the Turing test (as Scott Aaronson would agree with).
Adaptive Neuro-Fuzzy Inference System based Fractal Image CompressionIDES Editor
This paper presents an Adaptive Neuro-Fuzzy
Inference System (ANFIS) model for fractal image
compression. One of the image compression techniques in
the spatial domain is Fractal Image Compression (FIC)
but the main drawback of FIC using traditional
exhaustive search is that it involves more computational
time due to global search. In order to improve the
computational time and compression ratio, artificial
intelligence technique like ANFIS has been used. Feature
extraction reduces the dimensionality of the problem and
enables the ANFIS network to be trained on an image
separate from the test image thus reducing the
computational time. Lowering the dimensionality of the
problem reduces the computations required during the
search. The main advantage of ANFIS network is that it
can adapt itself from the training data and produce a
fuzzy inference system. The network adapts itself
according to the distribution of feature space observed
during training. Computer simulations reveal that the
network has been properly trained and the fuzzy system
thus evolved, classifies the domains correctly with
minimum deviation which helps in encoding the image
using FIC.
https://telecombcn-dl.github.io/2018-dlai/
Deep learning technologies are at the core of the current revolution in artificial intelligence for multimedia data analysis. The convergence of large-scale annotated datasets and affordable GPU hardware has allowed the training of neural networks for data analysis tasks which were previously addressed with hand-crafted features. Architectures such as convolutional neural networks, recurrent neural networks or Q-nets for reinforcement learning have shaped a brand new scenario in signal processing. This course will cover the basic principles of deep learning from both an algorithmic and computational perspectives.
An Importance Sampling Approach to Integrate Expert Knowledge When Learning B...NTNU
The introduction of expert knowledge when learning Bayesian Networks from data is known to be an excellent approach to boost the performance of automatic learning methods, specially when the data is scarce. Previous approaches for this problem based on Bayesian statistics introduce the expert knowledge modifying the prior probability distributions. In this study, we propose a new methodology based on Monte Carlo simulation which starts with non-informative priors and requires knowledge from the expert a posteriori, when the simulation ends. We also explore a new Importance Sampling method for Monte Carlo simulation and the definition of new non-informative priors for the structure of the network. All these approaches are experimentally validated with five standard Bayesian networks.
Read more:
http://link.springer.com/chapter/10.1007%2F978-3-642-14049-5_70
https://telecombcn-dl.github.io/2018-dlai/
Deep learning technologies are at the core of the current revolution in artificial intelligence for multimedia data analysis. The convergence of large-scale annotated datasets and affordable GPU hardware has allowed the training of neural networks for data analysis tasks which were previously addressed with hand-crafted features. Architectures such as convolutional neural networks, recurrent neural networks or Q-nets for reinforcement learning have shaped a brand new scenario in signal processing. This course will cover the basic principles of deep learning from both an algorithmic and computational perspectives.
In large scale visual pattern recognition applications, when the subject set is large the traditional linear models like PCA/LDA/LPP, become inadequate in capturing the non-linearity and local variations of visual appearance manifold. Kernelized solutions can alleviate the problem to certain degree, but faces a computational complexity challenge of solving eigen or QP problems of size n x n for number of training samples n. In this work, we developed a novel solution to this problem by applying a data partition first and obtain a rich set of local data patch models, then the hierarchical structure of this rich set of models are computed with subspace clustering on Grassmanian manifold, via a VQ like algorithm with data partition locality constraint. At query time, a probe image is projected to the data space partition first to obtain the probe model, and the optimal local model is computed by traversing the model hierarchical tree. Simulation results demonstrated the effectiveness of this solution in computational efficiency and recognition accuracy, with applications in large subject set face recognition and image retrieval.
[Bio]
Zhu Li is currently a Senior Staff Researcher and Media Analytics & Processing Group Lead with the Media Networking Lab, Core Networks Research, FutureWei (Huawei) Technology USA, at Bridgewater, New Jersey. He received his PhD in Electrical & Computer Engineering from Northwestern University, Evanston in 2004. He was an Assistant Professor with the Dept of Computing, The Hong Kong Polytechnic University from 2008 to 2010, and a Senior Research Engineer, Senior Staff Research Engineering, and then Principal Staff Research Engineer with the Multimedia Research Lab (MRL), Motorola Labs, Schaumburg, Illinois, from 2000 to 2008. His research interests include audio-visual analytics and machine learning with its application in large scale video repositories annotation, search and recommendation, as well as video adaptation, source-channel coding and distributed optimization issues of the wireless video networks. He has 21 issued or pending patents, 70+ publications in book chapters, journals, conference proceedings and standards contributions in these areas. He is an IEEE senior member, elected Vice Chair of the IEEE Multimedia Communication Technical Committee (MMTC) 2008~2010, co-editor for the Springer-Verlag book on "Intelligent Video Communication: Techniques and Applications". He served on numerous conference and workshop TPCs and was symposium co-chair at IEEE ICC'2008, and on Best Paper Award Committee for IEEE ICME 2010. He received the Best Poster Paper Award from IEEE Int'l Conf on Multimedia & Expo (ICME) at Toronto, 2006, and the Best Paper Award from IEEE Int'l Conf on Image Processing (ICIP) at San Antonio, 2007.
Camp IT: Making the World More Efficient Using AI & Machine LearningKrzysztof Kowalczyk
Slides from the introductory lecture I gave for students at Camp IT 2019. I tried to cover artificial inteligence, machine learning, most popular algorithms and their applications to business as broadly as possible - for in-depth materials on the given topics, see links and references in the presentation.
TMS workshop on machine learning in materials science: Intro to deep learning...BrianDeCost
This presentation is intended as a high-level introduction for to deep learning and its applications in materials science. The intended audience is materials scientists and engineers
Disclaimers: the second half of this presentation is intended as a broad overview of deep learning applications in materials science; due to time limitations it is not intended to be comprehensive. As a review of the field, this necessarily includes work that is not my own. If my own name is not included explicitly in the reference at the bottom of a slide, I was not involved in that work.
Any mention of commercial products in this presentation is for information only; it does not imply recommendation or endorsement by NIST.
In this article we consider macrocanonical models for texture synthesis. In these models samples are generated given an input texture image and a set of features which should be matched in expectation. It is known that if the images are quantized, macrocanonical models are given by Gibbs measures, using the maximum entropy principle. We study conditions under which this result extends to real-valued images. If these conditions hold, finding a macrocanonical model amounts to minimizing a convex function and sampling from an associated Gibbs measure. We analyze an algorithm which alternates between sampling and minimizing. We present experiments with neural network features and study the drawbacks and advantages of using this sampling scheme.
1. 9.520
CBCL MIT
Statistical Learning Theory and
Applications
Sayan Mukherjee and Ryan Rifkin
+ tomaso poggio
+ Alex Rakhlin
9.520, spring 2003
2. Learning: Brains and Machines
CBCL MIT
Learning is the gateway to
understanding the brain and to
making intelligent machines.
Problem of learning:
a focus for
o modern math
o computer algorithms
o neuroscience
9.520, spring 2003
3. Multidisciplinary Approach to Learning
CBCL MIT
1 l
min ∑ V ( yi , f ( xi )) + µ f
2
Learning theory
K
i =1
f ∈H l
+ algorithms
• Information extraction (text,Web…)
• Computer vision and graphics
ENGINEERING • Man-machine interfaces
APPLICATIONS • Bioinformatics (DNA arrays)
• Artificial Markets (society of
learning agents)
Computational Learning to recognize objects in
Neuroscience: visual cortex
models+experiments
9.520, spring 2003
4. Class
CBCL MIT
Rules of the game: problem sets (2 + 1 optional)
final project
grading
participation!
Web site:
www.ai.mit.edu/projects/cbcl/courses/course9.520/index.html
9.520, spring 2003
5. Overview of overview
CBCL MIT
o Supervised learning: the problem and how to frame
it within classical math
o Examples of in-house applications
o Learning and the brain
9.520, spring 2003
6. Learning from Examples
CBCL MIT
INPUT
f OUTPUT
EXAMPLES
INPUT1 OUTPUT1
INPUT2 OUTPUT2
...........
INPUTn OUTPUTn
9.520, spring 2003
7. Learning from Examples: formal setting
CBCL MIT
Given a set of l examples (past data)
{ ( x1 , y1 ), ( x2 , y2 ),...,( xl , yl )}
Question: find function f such that
f ( x) = y
ˆ
is a good predictor of y for a future input x
9.520, spring 2003
8. Classical equivalent view: supervised learning as
problem of multivariate function approximation
CBCL MIT
= data from f
= function f
= approximation of f y
x
Generalization: estimating value of function where
there are no data
Regression: function is real valued
Classification: function is binary
Problem is ill-posed!
9.520, spring 2003
9. Well-posed problems
CBCL MIT
A problem is well-posed if its solution
• exists J. S. Hadamard, 1865-1963
• is unique
• is stable, eg depends continuously on the data
Inverse problems (tomography, radar, scattering…)
are typically ill-posed
9.520, spring 2003
10. Two key requirements to solve the problem of
learning from examples:
CBCL MIT
well-posedness and consistency.
A standard way to learn from examples is
The problem is in general ill-posed and does not have a
predictive (or consistent) solution. By choosing an appropriate
hypothesis space H it can be made well-posed. Independently,
it is also known that an appropriate hypothesis space can
guarantee consistency.
We have proved recently a surprising theorem: consistency
and well posedness are equivalent, eg one implies the other.
Thus a stable solution is also predictive and viceversa.
9.520, spring 2003
Mukherjee, Niyogi, Poggio, 2002
11. A simple, “magic” algorithm ensures consistency
and well-posedness
CBCL MIT
1 l
min ∑ V ( f ( xi ) − yi ) + λ
2
f K
i =1
f ∈H l
f (x) = ∑i α i K (x, x i ) + p(x)
l
implies
Equation includes Regularization Networks (examples are
Radial Basis Functions and Support Vector Machines)
9.520, spring 2003
For a review, see Poggio and Smale, Notices of the AMS, 2003
12. Classical framework but with more general
loss function
CBCL MIT
The algorithm uses a quite general space of functions or “hypotheses” :
RKHSs. n of the classical framework can provide a better measure
of “loss” (for instance for classification)…
1 l
min ∑ V ( f ( xi ) − yi ) + λ
2
f K
i =1
f ∈H l
9.520, spring 2003 Girosi, Caprile, Poggio, 1990
14. Equivalence to networks
CBCL MIT
Many different V lead to the same solution…
f (x) = ∑i α i K (x, x i ) + b
l
X X
1 l
…and can be “written” as
K K K
the same type of network..
+
f
9.520, spring 2003
15. Unified framework: RN, SVMR and SVMC
CBCL MIT
1 l
min ∑ V ( f ( xi ) − yi ) + λ
2
f K
i =1
f ∈H l
Equation includes Regularization Networks, eg Radial Basis
Functions, and Support Vector Machines (classification and
regression) and some multilayer Perceptrons.
Review by Evgeniou, Pontil and Poggio
9.520, spring 2003 Advances in Computational Mathematics, 2000
16. The theoretical foundations of statistical learning
are becoming part of mainstream mathematics
CBCL MIT
9.520, spring 2003
17. Theory summary
CBCL MIT
In the course we will introduce
• Stability (well-posedness)
• Consistency (predictivity)
• RKHSs hypotheses spaces
• Regularization techniques leading to RN and SVMs
• Generalization bounds based on stability
• Alternative bounds (VC and Vgamma dimensions)
• Related topics, extensions beyond SVMs
• Applications
• A new key result
9.520, spring 2003
18. Overview of overview
CBCL MIT
o Supervised learning: real math
o Examples of recent and ongoing in-house research on
applications
o Learning and the brain
9.520, spring 2003
19. Learning from Examples: engineering
applications
CBCL MIT
INPUT OUTPUT
Bioinformatics
Artificial Markets
Object categorization
Object identification
Image analysis
Graphics
Text Classification
…..
9.520, spring 2003
20. Bioinformatics application: predicting type of
cancer from DNA chips signals
CBCL MIT
Learning from examples paradigm
Prediction
Statistical Learning Prediction
Algorithm
Examples
New sample
9.520, spring 2003
21. Bioinformatics application: predicting type of
cancer from DNA chips
CBCL MIT
New feature selection SVM:
Only 38 training examples, 7100 features
AML vs ALL: 40 genes 34/34 correct, 0 rejects.
5 genes 31/31 correct, 3 rejects of which 1 is an error.
9.520, spring 2003
22. Learning from Examples: engineering
applications
CBCL MIT
INPUT OUTPUT
Bioinformatics
Artificial Markets
Object categorization
Object identification
Image analysis
Graphics
Text Classification
…..
9.520, spring 2003
23. Face identification: example
CBCL MIT
An old view-based system: 15 views
Performance: 98% on 68 person database
Beymer, 1995
9.520, spring 2003
24. Face identification
CBCL MIT
Training Data Identification Result
A
SVM Classifier
Feature extraction
New face image
9.520, spring 2003 Bernd Heisele, Jennifer Huang, 2002
25. Real time detection and identification
CBCL MIT
Ported to Oxygen’s H21 (handheld device)
Identification of rotated faces up to about 45º
Robust against changes in illumination and
background
Frame rate of 15 Hz
Ho, Heisele, Poggio, 2000
Weinstein, Ho, Heisele, Poggio,
9.520, spring 2003 Steele, Agarwal, 2002
26. Learning from Examples: engineering
applications
CBCL MIT
INPUT OUTPUT
Bioinformatics
Artificial Markets
Object categorization
Object identification
Image analysis
Graphics
Text Classification
…..
9.520, spring 2003
27. Learning Object Detection:
Finding Frontal Faces ...
CBCL MIT
Training Database
1000+ Real, 3000+ VIRTUAL
50,0000+ Non-Face Pattern
Sung, Poggio 1995
9.520, spring 2003
28. Recent work on face detection
CBCL MIT
• Detection of faces
in images
• Robustness against
slight rotations in
depth and image
plane
• Full face vs.
component-based
classifier
9.520, spring 2003 Heisele, Pontil, Poggio, 2000
29. The best existing system for face
CBCL
detection? MIT
Test: CBCL set / 1,154 real faces / 14,579 non-faces / 132,365 shifted windows
Training: 2,457 synthetic faces (58x58) / 13,654 non-faces
1
0.9
0.8
0.7
0.6
Correct
0.5
0.4
0.3
component classifier
0.2
whole face classifier
0.1
0
0 0.00001 0.00002 0.00003 0.00004 0.00005 0.00006 0.00007 0.00008 0.00009 0.0001
False positives / number of windows
9.520, spring 2003 Heisele, Serre, Poggio et al., 2000
30. Trainable System for Object Detection:
Pedestrian detection - Results
CBCL MIT
9.520, spring 2003 Papageorgiou and Poggio, 1998
31. Trainable System for Object Detection:
Pedestrian detection - Training
CBCL MIT
Papageorgiou and Poggio, 1998
9.520, spring 2003
32. The system was tested in a test car
(Mercedes)
CBCL MIT
9.520, spring 2003
34. System installed in experimental Mercedes
CBCL MIT
A fast version, integrated
with a real-time obstacle
detection system
MPEG
9.520, spring 2003 Constantine Papageorgiou
35. System installed in experimental
Mercedes
CBCL MIT
A fast version, integrated
with a real-time obstacle
detection system
MPEG
9.520, spring 2003 Constantine Papageorgiou
36. People classification/detection: training
the system
CBCL MIT
... ...
1848 patterns 7189 patterns
Representation: overcomplete dictionary of Haar wavelets; high
dimensional feature space (>1300 features)
Core learning algorithm:
Support Vector Machine
classifier
pedestrian detection system
9.520, spring 2003
37. An improved approach: combining component
detectors
CBCL MIT
9.520, spring 2003 Mohan, Papageorgiou and Poggio, 1999
38. Results
CBCL MIT
The system is capable of
detecting partially
occluded people
9.520, spring 2003
39. System Performance
CBCL MIT
Combination systems (ACC)
perform best.
All component based
systems perform better than
full body person detectors.
9.520, spring 2003 A. Mohan, C. Papageorgiou, T. Poggio
40. Learning from Examples: Applications
CBCL MIT
INPUT OUTPUT
Object identification
Object categorization
Image analysis
Graphics
Finance
Bioinformatics
…
9.520, spring 2003
41. Image Analysis
CBCL MIT
IMAGE ANALYSIS: OBJECT RECOGNITION AND POSE
ESTIMATION
⇒ Bear (0° view)
⇒ Bear (45° view)
9.520, spring 2003
42. Computer vision: analysis of facial expressions
CBCL MIT
MI/ CBCL/AI MIT
12
10
8
6
4
2
0
85
79
73
67
61
55
49
43
37
31
25
19
13
7
1
The main goal is to estimate basic facial parameters, e.g.
degree of mouth openness, through learning. One of the main
applications is video-speech fusion to improve speech
recognition systems.
NTT, Atsugi, January 2002
9.520, spring 2002 Kumar, Poggio, 2001
43. CBCL Combining Top-Down MIT
Constraints and Bottom-Up Data
Morphable Model
= a1 + a2 + a3 + …
Learning Engine
SVM, RBF, etc
9.520, spring 2002 Kumar, Poggio, 2001
44. The Three Stages
CBCL MIT
Localization of
Localization of Analysis of
Analysis of
Face Detection
Facial Features
Facial Features Facial parts
Facial parts
12
10
8
6
4
2
0
85
79
73
67
61
55
49
43
37
31
25
19
13
7
1
For more details Appendix 2
9.520, spring 2003
45. Learning from Examples: engineering
applications
CBCL MIT
INPUT OUTPUT
Bioinformatics
Artificial Markets
Object categorization
Object identification
Image analysis
Graphics
Text Classification
…..
9.520, spring 2003
46. Image Synthesis
CBCL MIT
UNCONVENTIONAL GRAPHICS
Θ = 0° view ⇒
Θ = 45° view ⇒
9.520, spring 2003
47. Supermodels
CBCL MIT
Cc-cs.mpg
MPEG (Steve Lines)
9.520, spring 2003
48. A trainable system for TTVS
CBCL MIT
Input: Text
Output:
photorealistic talking
face uttering text
Tony Ezzat
9.520, spring 2003
49. From 2D to a better 2d and from 2D
CBCL to 3D MIT
Two extensions of our text-to-visual-speech
(TTVS) system:
• morphing of 3D models of faces to output a 3D model
of a speaking face (Blanz)
• Learn facial appearance, dynamics and coarticulation
(Ezzat)
9.520, spring 2002
50. Towards 3D
CBCL
MIT
Neutral face shape was reconstructed from a single image,
and animation transfered to new face.
3D face scans were collected.
Click for animation
Click for animation
Blanz, Ezzat, Poggio
9.520, spring 2002
51. Using the same basic learning techniques: : Trainable
Videorealistic Face Animation
CBCL (voice is real, video is synthetic)
MIT
Ezzat, Geiger, Poggio, SigGraph 2002
9.520, spring 2002
52. Trainable Videorealistic Face Animation
2. Run Time
CBCL
1. Learning MIT
For any speech input the system
provides as output a synthetic
video stream
System learns from 4 mins Phone Stream
of video the face appearance /SIL//B/ /B//AE/ /AE//AE/ /JH//JH/ /JH/
/SIL/
(Morphable Model) and the
speech dynamics of the
person
Phonetic Models
Trajectory
Synthesis {µ p ,Σp}
MMM { I i , Ci }
Image Prototypes
Tony Ezzat,Geiger, Poggio, SigGraph 2002
9.520, spring 2002
53. Novel !!!: Trainable Videorealistic Face Animation
CBCL MIT
Let us look at video!!!
Ezzat, Poggio, 2002
9.520, spring 2002
54. Reconstructed 3D Face Models from 1 image
CBCL MIT
Blanz and Vetter,
MPI
SigGraph ‘99
9.520, spring 2003
55. Reconstructed 3D Face Models from 1
image
CBCL MIT
Blanz and Vetter,
MPI
SigGraph ‘99
9.520, spring 2003
56. Learning from Examples: Applications
CBCL MIT
INPUT OUTPUT
Object identification
Object categorization
Image analysis
Graphics
Finance
Bioinformatics
…
9.520, spring 2003
57. Artificial Agents – learning algorithms – buy
CBCL
and sell stocks MIT
Example of a subproject: The Electronic Market Maker
Public Limit/Market
Orders
Bid/Ask Prices
All available market Bid/Ask Sizes
Information Buy/Sell
User Inventory control
control/calibration
Learning
Feedback
9.520, spring 2003
Loop Nicholas Chang
58. Learning from Examples: engineering
applications
CBCL MIT
INPUT OUTPUT
Bioinformatics
Artificial Markets
Object categorization
Object identification
Image analysis
Graphics
Text Classification
…..
9.520, spring 2003
59. Overview of overview
CBCL MIT
o Supervised learning: the problem and how to frame
it within classical math
o Examples of in-house applications
o Learning and the brain
9.520, spring 2003
60. The Ventral Visual Stream: From V1 to IT
CBCL MIT
modified from Ungerleider and Haxby, 1994
Hubel & Wiesel, 1959
Desimone, 1991
9.520, spring 2003 Desimone, 1991
61. Summary of “basic facts”
CBCL MIT
Accumulated evidence points to three (mostly accepted)
properties of the ventral visual stream architecture:
• Hierarchical build-up of invariances (first to translation
and scale, then to viewpoint etc.) , size of the
receptive fields and complexity of preferred stimuli
• Basic feed-forward processing of information (for
“immediate” recognition tasks)
• Learning of an individual object generalizes to scale
and position
9.520, spring 2003
62. The standard model: following Hubel and Wiesel…
CBCL MIT
Learning
(supervised)
Learning
(unsupervised
)
9.520, spring 2003 Riesenhuber & Poggio, Nature Neuroscience, 2000
63. The standard model
CBCL MIT
Interprets or predicts many existing data in microcircuits and system physiology,
and also in cognitive science:
• What some complex cells and V4 cells do and how: MAX…
• View-tuning of IT cells (Logothetis)
• Response to pseudomirror views
• Effect of scrambling
• Multiple objects
• Robustness to clutter
• Consistent with K. Tanaka’s simplification procedure
• Categorization tasks (cats vs dogs)
• Invariance to translation, scale etc…
• Gender classification
• Face inversion effect : experience, viewpoint, other-race, configural
vs. featural representation
• Transfer of generalization
• No binding problem, no need for oscillations
9.520, spring 2003
64. Model’s early predictions: neurons
become
CBCL view-tuned during recognition MIT
Logothetis, Pauls, and Poggio,
1995;
Logothetis, Pauls, 1995
9.520, spring 2003 Poggio, Edelman, Riesenhuber (1990, 2000)
65. Recording Sites in Anterior IT
CBCL MIT
LUN
LAT
STS
…neurons tuned to
faces are close by….
IOS
AMTS
LAT
STS Ho=0
AMTS
Logothetis, Pauls, and Poggio, 1995;
Logothetis, Pauls, 1995
9.520, spring 2003
66. The Cortex: Neurons Tuned to Object Views as
predicted by model
CBCL MIT
9.520, spring 2003 Logothetis, Pauls, Poggio, 1995
67. View-tuned cells: scale invariance
(one training view only)!
CBCL MIT
S cale Invariant Re sponse s of an IT Neuron
76 1 . 0 de g 76 1 . 7 5 de g 76 2 . 5 de g 76 3 . 2 5 de g
(x 0. 4) (x 0. 7) (x 1. 0) (x 1. 3)
Sp ikes/sec
Sp ikes/sec
Sp ikes/sec
Spikes/sec
0 0 0 0
0 1000 2000 3000 0 1000 2000 3000 0 1000 2000 3000 0 1000 2000 3000
Time (msec) Time (msec) Time (msec) Time (msec)
76 4 . 0 de g 76 4 . 7 5 de g 76 5 . 5 de g 76 6 . 2 5 de g
(x 1. 6) (x 1. 9) (x 2. 2) (x 2. 5)
Sp ikes/sec
Sp ikes/sec
Sp ikes/sec
Sp ikes/sec
0 0 0 0
0 1000 2000 3000 0 1000 2000 3000 0 1000 2000 3000 0 1000 2000 3000
Time (msec) Time (msec) Time (msec) Time (msec)
9.520, spring 2003
Logothetis, Pauls, and Poggio, 1995; Logothetis, Pauls, 1995
68. Joint Project (Max Riesenhuber) with Earl Miller & David
Freedman (MIT): Neural Correlate of Categorization (NCC)
CBCL MIT
Define categories in morph space
60% Cat 60% Dog
80% Cat Morphs 80% Dog
Morphs Morphs
Morphs
Prototypes Prototypes
100% Dog
100% Cat
Category
9.520, spring 2003 boundary
69. Categorization task
CBCL MIT
Train monkey on categorization task
.
.
.
. (Match)
Fixation
500 ms. Sample
600 ms. Delay .
1000 .
ms.
Test .
(Nonmatch)
Delay
Test
(Match)
After training, record from neurons in IT & PFC
9.520, spring 2003
70. Single cell example: a PFC neuron that responds
more strongly to DOGS than CATS
CBCL MIT
Fixation Sample Delay Choice
13
Dog 100%
Dog 80%
Dog 60%
Firing Rate (Hz)
10
7
4
Cat 100%
Cat 80%
Cat 60%
1
-500 0 500 1000 1500 2000
Time from sample stimulus onset (ms)
D. Freedman + E. Miller + M.
Riesenhuber+T. Poggio (Science,
9.520, spring 2003 2001)
71. The model suggests same computations for
different recognition tasks (and objects!)
CBCL MIT
Task-specific
units (e.g., PFC)
General
representation
(IT)
Is this
HMAX
what’s going
on in cortex?
9.520, spring 2003
Editor's Notes
Desimone Fig: cells are tuned to views of complex objects. Here: face cell
pretty amazing
FEEDFORWARD! inspired by ventral s. HIERARCHY: from simple to complex, small to big <show> 2 objectives: inv&spec sep.mech: blue: feature spec., green:inv. feat.spec is easy: template, ”and” --- But: how to do pooling for inv?
WHY MAX? next!
This is also borne out in physiology data! exp
Use same model to also investigate novel questions. Not a different model for each problem!!!
Christian Shelton
Can we find split into task-independent rep & task-dependent units also in cortex?