Technical Area: Machine Learning and Pattern Recognition
1. Technical Area:
Machine Learning and Pattern Recognition
Examiner: Alex “Sandy” Pentland
Toshiba Professor in Media Arts and Sciences
MIT Media Laboratory
Description
The objective of this area is to familiarize myself with the main techniques and
algorithms of machine learning and pattern recognition. The goal is to understand high-
level advantages and disadvantages of various approaches to better understand how they
can be used, combined, and improved.
Supervised learning
• Graphical Models (Bayes Nets*, HMMs, decision theory)
• Instance based learning (NN, KNN)
• Decision trees* (ID3, C4.5)
• Sequential Learning
o Dynamic Bayesian Networks* (HMMs)
o Sliding window*/Recurrent Sliding window (RNN, RDT)
o Conditional Random fields
• Linear/non-linear regression and classification
• Generalized linear discriminant
• Neural Networks (Perceptron, MLPs, RBFs)
• Support Vector Machines*
Unsupervised learning
• Partitional* (Generative: Mixture of Gaussians, reconstructive:K-means)
• Hierarchical clustering* (single, complete, average link)
• Spectral clustering
Semi-supervised learning* (Cluster-classify, transductive SVMs)
Reinforcement learning (Q-learning, TD learning)
Ensemble Methods* (Weighted majority, Bagging, Boosting)
Density estimation (NN, Kernel based, Bayesian approaches)
Feature Extraction techniques (Fisher, PCA, ICA)
Feature Selection techniques (Filtering, wrapping, Bayesian)
Parameter estimation techniques (ML, MAP, MC, EM, BP, CV)
* Covered in more depth
Written Requirement
The written requirement for this area will consist of a 24-hour take-home exam.
Signature: ______________________________ Date: _____________
2. Reading list
The reading list is structured as follow:
Fundamentals
Graphical Models
R. Duda, P. Hart, and D. Stork, "Chapter 2. Bayesian Decision Theory," in
Pattern Classification, 2nd ed: John Wiley & Sons, 2000.
S. Russell and P. Norvig, "Probabilistic Reasoning Systems," in Artificial
intelligence: a Modern Approach: Prentice Hall, 1995, pp. 436-467.
M. Jordan and C. Bishop, An Introduction to Graphical Models: to be published,
2001. CH 1, 5-7
K. Murphy, An Introduction to Graphical Models, Technical report, Intel
Research, May 2001.
R. Cowell, "Introduction to Inference for Bayesian Networks," in Learning in
Graphical Models: MIT Press, 1998, pp. 9-26.
R. Cowell, "Advanced Inference in Bayesian Networks," in Learning in Graphical
Models: MIT Press, 1998, pp. 27-49.
J. S. Yedidia and W. T. Freeman, "Understanding belief propagation and its
generalizations," in Exploring Artificial Intelligence in the New Millenium, vol.
Chap 8, S. a. T. Books, Ed., 2003, pp. 236-239.
D. Heckerman, "A Tutorial on Learning with Bayesian Networks," in Learning in
Graphical Models: MIT Press, 1998, pp. 301-354.
M. I. Jordan, Z. Ghahramani, T. Jaakkola, and L. K. Saul, "An Introduction to
Variational Methods for Graphical Models," in Learning in Graphical Models: MIT
Press, 1998, pp. 105-162.
D. J. C. MacKay, "Introduction to Monte Carlo Methods," in Learning in Graphical
Models: MIT Press, 1998, pp. 175-204.
C. Andrieu, N. d. Freitas, A. Doucet, and M. Jordan, "An Introduction to MCMC
for Machine Learning," Machine Learning, vol. 50, pp. 5-43, 2003.
H. Guo and W. Luo, "Implementation and Evaluation of Exact and Approximate
Dynamic Bayesian Network Inference Algorithms."
3. The Influence Model
S. Basu, T. Choudhury, and B. Clarkson, Learning Human Interactions with the
Influence Model, Technical report 539, Massachusetts Institute of Technology,
Media Lab, 2001.
A. K. Jammalamadaka, Aspects of Inference for the Influence Model and Related
Graphical Models, Master's Thesis, Electrical Engineering and Computer
Science, Massachusetts Institute of Technology.
Parameter estimation techniques (ML, MAP, MC, EM, BP, CV)
R. Duda, P. Hart, and D. Stork, "Chapter 3. Maximum Likelihood and Bayesian
Parameter Estimation," in Pattern Classification, 2nd ed: John Wiley & Sons,
2000.
C. M. Bishop, "Parameter Optimization Algorithms," in Neural Networks for
Pattern Recognition: Oxford University Press, 1995, pp. 253-292.
Instance based learning (NN, KNN)
R. Duda, P. Hart, and D. Stork, "Chapter 4. Non-parametric Techniques," in
Pattern Classification, 2nd ed: John Wiley & Sons, 2000.
T. Mitchell, "Instance-Based Learning," in Machine Learning: McGraw Hill, 1997.
C. Atkeson, A. Moore, and S. Schaal, "Locally Weighted Learning," in AI Review,
vol. 11: Kluwer, 1997, pp. 11-73.
Decision trees* (ID3, C4.5)
R. Duda, P. Hart, and D. Stork, "Chapter 8. Non-metric Methods," in Pattern
Classification, 2nd ed: John Wiley & Sons, 2000.
T. Mitchell, "Decision Tree Learning," in Machine Learning: McGraw Hill, 1997
S. R. Safavian and D. Landgrebe, "A Survey of Decision Tree Classifier
Methodology," in IEEE Trans. Systems, Man, Cybernetics, vol. 21, 1991, pp.
660-674.
P. Domingos and F. Provost, Well-trained {PETs}: Improving Probability
Estimation Trees, CDER Working Paper #00-04-IS, Stern School of Business,
New York University, NY, NY 2000.
G. Bakiri and T. G. Dietterich, "Achieving High-Accuracy Text-to-Speech with
Machine Learning," in Data Mining in Speech Syntesis, B. Damper, Ed., 1997.
4. Sequential learning
R. Duda, P. Hart, and D. Stork, "Chapter 3.1 Hidden Markov Models," in Pattern
Classification, 2nd ed: John Wiley & Sons, 2000.
L. R. Rabiner and B.-H. Juang, "A Theory and Implementation of Hidden Markov
Models," in Fundamentals of Speech Recognition: Prentice Hall, 1993, pp. 321-
389.
K. P. Murphy, Hidden Semi-Markov Models (HSMMs), Technical report, 2002.
K. Murphy, Dynamic Bayesian Networks: Representation, Inference and
Learning, Ph.D thesis, Computer Science Division, UC Berkeley, 2002.
T. G. Dietterich, "Machine Learning for Sequential Data: A Review," in Structural,
Syntactic, and Statistical Pattern Recognition, vol. 2396, Lecture Notes in
Computer Science, Ed.: Springer-Verlag, 2002, pp. 15-30.
H. M. Wallach, Conditional Random Fields: An Introduction, Technical Report
MS-CIS-04-21, Department of Computer and Information Science, University of
Pennsylvania, 2004.
J. Lafferty, A. McCallum, and F. Pereira, "Conditional Random Fields:
Probabilistic Models for Segmenting and Labeling Sequential Data," in In
Proceedings of the Eighteenth International conference on Machine Learning
(ICML-2001), 2001
A. McCallum, "Efficiently Inducing Features of Conditional Random Fields," in
Proceedings of the 19ty Conference in Uncertainty in Artificial Intelligence (UIA
'03), 2003.
B. Pearlmutter, Dynamic Recurrent Neural Networks, Technical Report CMU-CS-
90-196, Carnegie Mellon University School of Computer Science, Pittsburgh, PA
1990.
Linear/non-linear regression and classification
R. Duda, P. Hart, and D. Stork, "Chapter 5. Linear Discriminant Functions," in
Pattern Classification, 2nd ed: John Wiley & Sons, 2000.
A. K. Jain, R. P. W. Duin, and J. C. Mao, "Statistical Pattern Recognition: A
Review," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol.
22, 1, pp. 4-37, 2000.
5. Neural Networks
R. Duda, P. Hart, and D. Stork, "Chapter 6. Multi-layer Neural Networks," in
Pattern Classification, 2nd ed: John Wiley & Sons, 2000.
A. K. Jain, J. Mao, and K. M. Mohiuddin, "Artificial Neural Networks: A Tutorial,"
in IEEE Computer, vol. 29, 1996, pp. 31-44.
J. Sima and P. Orponen, "General-Purpose Computation with Neural Networks:
A Survey of Complexity Theoretic Results," in Neural Computation, vol. 15, 2003,
pp. 2727-2778
C. M. Bishop, "Single Layer Networks," in Neural Networks for Pattern
Recognition: Oxford University Press, 1995, pp. 77-112.
C. M. Bishop, "The Multi-Layer Perceptron," in Neural Networks for Pattern
Recognition: Oxford University Press, 1995, pp. 116-137.
Support Vector Machines
C. J. C. Burges, "A Tutorial on Support Vector Machines for Pattern
Recognition," Data Mining and Knowledge Discovery, vol. 2, pp. 121-167, 1998.
R. Herbrich, "Kernel Classifiers from a Machine Learning Perspective," in
Learning Kernel Classifiers: The MIT Press, pp. 49-66.
C. Hsu, C. Chang, and C. Lin, A Practical Guide to Support Vector Classification,
Technical Report Department of Computer Science and Information Engineering,
National Taiwan University, 2003.
Density estimation
R. Duda, P. Hart, and D. Stork, "Chapter 4. Non-parametric Techniques," in
Pattern Classification, 2nd ed: John Wiley & Sons, 2000.
C. M. Bishop, "Probabilistic Density Estimation," in Neural Networks for Pattern
Recognition: Oxford University Press, 1995, pp. 33-73.
Unsupervised learning
R. Duda, P. Hart, and D. Stork, "Chapter 10. Unsupervised Learning and
Clustering," in Pattern Classification, 2nd ed: John Wiley & Sons, 2000.
P. Berkhin, Survey Of Clustering Data Mining Techniques, Technical Report
Accrue Software, San Jose, CA 2002.
G. Fung, A Comprehensive Overview of Basic Clustering Algorithms, Technical
Report, University of Winsconsin, Madison, 2001.
6. A. K. Jain, M. N. Murty, and P. J. Flynn, "Data Clustering: A Review," ACM
Computing Surveys, vol. 31, 3, pp. 264-323, 1999.
A. Ng, M. Jordan, and Y. Weiss, "On spectral clustering: Analysis and an
Algorithm," in Advances in Neural Information Processing Systems, 2001
S. Zhong, Probabilistic Model-Based Clustering of Time Series, Ph.D Qualifying
proposal University of Texas at Austin, Austin, Texas, May 2002.
Semi-supervised learning* (Cluster-classify, transductive SVMs)
X. Zhu, Semi-Supervised Learning with Graphs, Technical CMU-LTI-05-192,
Carnegie Mellon University, 2005.
A. Blum and T. Mitchell, "Combining Labeled and Unlabeled Data with Co-
Training," in Proceedings of the 11th Conference on Computational Learning
Theory. Madison, Wisconsin, United States: Morgan Kaufmann Publishers, 1998,
pp. 92-100
O. Chapelle, J. Weston, and B. Olkopf, "Cluster Kernels for Semi-supervised
Learning," in Advances in Neural Information Processing Systems, vol. 15.
Cambridge, MA: The MIT Press, 2003, pp. 585-592.
K. Bennett and A. Demiriz, "Semi-Supervised Support Vector Machines," in
Proceedings of Advances in Neural Information Processing Systems, M. S.
Kearns, S. A. Solla, and D. A. Cohn, Eds.: The MIT Press, 1998, pp. 368-374
K. P. Bennett and A. Demiriz, "Optimization Approaches to Semi-Supervised
Learning," in to appear in Applications and Algorithms of Complementarity, L.
Mangasarian and J. S. Pang, Eds.: Kluwer Academic Publishers, 2000.
T. Joachims, "Transductive Inference for Text Classification using Support Vector
Machines," in Proceedings of The 16th International Conference on Machine
Learning (ICML' 99). San Francisco, CA: Morgan Kaufmann Publishers, 1999,
pp. 200-209.
Reinforcement learning (Q-learning, TD learning)
L. P. Kaelbling, M. L. Littman, and A. W. Moore, "Reinforcement Learning: a
Survey," Journal of Artificial Intelligence Research, vol. 4, pp. 237-285, 1996.
R. S. Sutton and A. G. Barto, "The Reinforcement Learning Problem," in
Reinforcement Learning: The MIT Press, pp. 51-81
7. R. S. Sutton and A. G. Barto, "Temporal-Difference Learning," in Reinforcement
Learning: The MIT Press, pp. 133-157.
R. S. Sutton, "Learning to Predict by the Method of Temporal Differences,"
Machine Learning, vol. 3, pp. 9-44, 1998.
Feature Extraction techniques (Fisher, PCA, ICA)
R. Duda, P. Hart, and D. Stork, "Chapter 3.8 Component Analysis and
Discriminants," in Pattern Classification, 2nd ed: John Wiley & Sons, 2000.
C. J. C. Burges, Geometric Methods for Feature Extraction and Dimensional
Reduction: A Guided Tour, Technical Report MSR-TR-2004-55, Microsoft
Research, Redmond, WA 2003.
Ensemble Methods
T. G. Dietterich, "Ensemble Methods in Machine Learning," in Proceedings of the
First International Workshop on Multiple Classifier Systems, vol. 1857, 2000, pp.
1-15
Y. Freund and R. Schapire, "A short introduction to boosting," in Journal of
Japanese Society for Artificial Intelligence, vol. 11, 1999, pp. 771-780.
E. Bauer and R. Kohavi, "An Empirical Comparison of Voting Classification
Algorithms: Bagging, Boosting, and Variants," in Machine Learning, vol. 36,
1999, pp. 105-139.
T. G. Dietterich, "Machine-Learning Research: Four Current Directions," in The
AI Magazine, vol. 18, 1998, pp. 97-136
Applications to activity recognition
M. Philipose, K. Fishkin, D. Fox, H. Kautz, D. Patterson, and M. perkowitz,
"Guide: Towards Understanding Daily Life via Auto-Identification and Statistical
Analysis," in Proceedings of The 2nd International Workshop on Ubiquitous
Computing for Pervasive Healthcare Applications (UbiHealth ‘03). Seattle, WA,
2003.
D. Wilson, "Simultaneous Tracking & Activity Recognition (STAR) Using Many
Anonymous, Binary Sensors," to be published in Proceedings of The 3rd
International Conference on Pervasive Computing (Pervasive ‘05). Munich,
Germany, 2005.
8. D. Patterson, D. Fox, H. Kautz, and M. Philipose, "Expressive, Tractable and
Scalable Techniques for Modeling Activities of Daily Living.," in Proceedings of
The 2nd International Workshop on Ubiquitous Computing for Pervasive
Healthcare Applications (UbiHealth '03). Seattle, WA, 2003.
D. Patterson, L. Liao, D. Fox, and H. Kautz, "Inferring High Level Behavior from
Low Level Sensors.," in Fifth Annual Conference on Ubiquitous Computing
(UBICOMP 2003). Seattle, WA, 2003.
M. Philipose, K. P. Fishkin, M. Perkowitz, D. J. Patterson, D. Hahnel, D. Fox, and
H. Kautz, "Inferring Activities from Interactions with Objects," IEEE Pervasive
Computing Magazine, vol. 3, 4, 2004.
M. Perkowitz, M. Philipose, D. J. Patterson, and K. Fishkin, "Mining Models of
Human Activities from the Web," in Proceedings of The Thirteenth International
World Wide Web Conference (WWW '04). New York, USA, 2004.
D. Patterson, D. Fox, H. Kautz, and M. Philipose, Sporadic State Estimation for
General Activity Inference, Technical report irs_tr_04_003_a, Intel Research
Seattle and the University of Washington, June 2004.
D. Patterson, Modeling Details of the Activity Tracker, Technical report
irs_tr_04_003_a, Intel Research Seattle and the University of Washington,
Seattle, WA, July 2004.
N. M. Oliver, B. Rosario, and A. Pentland, "A Bayesian Computer Vision System
for Modeling Human Interactions," IEEE Transactions on Pattern Analysis and
Machine Intelligence, vol. 22, 8, pp. 831-843, 2000.
N. Oliver, E. Horvitz, and A. Garg, "Hierarchical Representations for Learning
and Inferring Office Activity from Multimodal Information," in Proceedings of the
4th International Conference on Multimodal Interfaces, 2002.
S. Luhr, H. Bui, S. Venkatesh, and G. West, "Recognition of Human Activity
Through Hierarchical Stochastic Learning," in Proceedings of The IEEE
International Conference on Pervasive Computing and Communications: IEEE
Press, 2003.