Technical Area: Machine Learning and Pattern Recognition

  • 678 views
Uploaded on

 

  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
    Be the first to like this
No Downloads

Views

Total Views
678
On Slideshare
0
From Embeds
0
Number of Embeds
0

Actions

Shares
Downloads
6
Comments
0
Likes
0

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. Technical Area: Machine Learning and Pattern Recognition Examiner: Alex “Sandy” Pentland Toshiba Professor in Media Arts and Sciences MIT Media Laboratory Description The objective of this area is to familiarize myself with the main techniques and algorithms of machine learning and pattern recognition. The goal is to understand high- level advantages and disadvantages of various approaches to better understand how they can be used, combined, and improved. Supervised learning • Graphical Models (Bayes Nets*, HMMs, decision theory) • Instance based learning (NN, KNN) • Decision trees* (ID3, C4.5) • Sequential Learning o Dynamic Bayesian Networks* (HMMs) o Sliding window*/Recurrent Sliding window (RNN, RDT) o Conditional Random fields • Linear/non-linear regression and classification • Generalized linear discriminant • Neural Networks (Perceptron, MLPs, RBFs) • Support Vector Machines* Unsupervised learning • Partitional* (Generative: Mixture of Gaussians, reconstructive:K-means) • Hierarchical clustering* (single, complete, average link) • Spectral clustering Semi-supervised learning* (Cluster-classify, transductive SVMs) Reinforcement learning (Q-learning, TD learning) Ensemble Methods* (Weighted majority, Bagging, Boosting) Density estimation (NN, Kernel based, Bayesian approaches) Feature Extraction techniques (Fisher, PCA, ICA) Feature Selection techniques (Filtering, wrapping, Bayesian) Parameter estimation techniques (ML, MAP, MC, EM, BP, CV) * Covered in more depth Written Requirement The written requirement for this area will consist of a 24-hour take-home exam. Signature: ______________________________ Date: _____________
  • 2. Reading list The reading list is structured as follow: Fundamentals Graphical Models R. Duda, P. Hart, and D. Stork, "Chapter 2. Bayesian Decision Theory," in Pattern Classification, 2nd ed: John Wiley & Sons, 2000. S. Russell and P. Norvig, "Probabilistic Reasoning Systems," in Artificial intelligence: a Modern Approach: Prentice Hall, 1995, pp. 436-467. M. Jordan and C. Bishop, An Introduction to Graphical Models: to be published, 2001. CH 1, 5-7 K. Murphy, An Introduction to Graphical Models, Technical report, Intel Research, May 2001. R. Cowell, "Introduction to Inference for Bayesian Networks," in Learning in Graphical Models: MIT Press, 1998, pp. 9-26. R. Cowell, "Advanced Inference in Bayesian Networks," in Learning in Graphical Models: MIT Press, 1998, pp. 27-49. J. S. Yedidia and W. T. Freeman, "Understanding belief propagation and its generalizations," in Exploring Artificial Intelligence in the New Millenium, vol. Chap 8, S. a. T. Books, Ed., 2003, pp. 236-239. D. Heckerman, "A Tutorial on Learning with Bayesian Networks," in Learning in Graphical Models: MIT Press, 1998, pp. 301-354. M. I. Jordan, Z. Ghahramani, T. Jaakkola, and L. K. Saul, "An Introduction to Variational Methods for Graphical Models," in Learning in Graphical Models: MIT Press, 1998, pp. 105-162. D. J. C. MacKay, "Introduction to Monte Carlo Methods," in Learning in Graphical Models: MIT Press, 1998, pp. 175-204. C. Andrieu, N. d. Freitas, A. Doucet, and M. Jordan, "An Introduction to MCMC for Machine Learning," Machine Learning, vol. 50, pp. 5-43, 2003. H. Guo and W. Luo, "Implementation and Evaluation of Exact and Approximate Dynamic Bayesian Network Inference Algorithms."
  • 3. The Influence Model S. Basu, T. Choudhury, and B. Clarkson, Learning Human Interactions with the Influence Model, Technical report 539, Massachusetts Institute of Technology, Media Lab, 2001. A. K. Jammalamadaka, Aspects of Inference for the Influence Model and Related Graphical Models, Master's Thesis, Electrical Engineering and Computer Science, Massachusetts Institute of Technology. Parameter estimation techniques (ML, MAP, MC, EM, BP, CV) R. Duda, P. Hart, and D. Stork, "Chapter 3. Maximum Likelihood and Bayesian Parameter Estimation," in Pattern Classification, 2nd ed: John Wiley & Sons, 2000. C. M. Bishop, "Parameter Optimization Algorithms," in Neural Networks for Pattern Recognition: Oxford University Press, 1995, pp. 253-292. Instance based learning (NN, KNN) R. Duda, P. Hart, and D. Stork, "Chapter 4. Non-parametric Techniques," in Pattern Classification, 2nd ed: John Wiley & Sons, 2000. T. Mitchell, "Instance-Based Learning," in Machine Learning: McGraw Hill, 1997. C. Atkeson, A. Moore, and S. Schaal, "Locally Weighted Learning," in AI Review, vol. 11: Kluwer, 1997, pp. 11-73. Decision trees* (ID3, C4.5) R. Duda, P. Hart, and D. Stork, "Chapter 8. Non-metric Methods," in Pattern Classification, 2nd ed: John Wiley & Sons, 2000. T. Mitchell, "Decision Tree Learning," in Machine Learning: McGraw Hill, 1997 S. R. Safavian and D. Landgrebe, "A Survey of Decision Tree Classifier Methodology," in IEEE Trans. Systems, Man, Cybernetics, vol. 21, 1991, pp. 660-674. P. Domingos and F. Provost, Well-trained {PETs}: Improving Probability Estimation Trees, CDER Working Paper #00-04-IS, Stern School of Business, New York University, NY, NY 2000. G. Bakiri and T. G. Dietterich, "Achieving High-Accuracy Text-to-Speech with Machine Learning," in Data Mining in Speech Syntesis, B. Damper, Ed., 1997.
  • 4. Sequential learning R. Duda, P. Hart, and D. Stork, "Chapter 3.1 Hidden Markov Models," in Pattern Classification, 2nd ed: John Wiley & Sons, 2000. L. R. Rabiner and B.-H. Juang, "A Theory and Implementation of Hidden Markov Models," in Fundamentals of Speech Recognition: Prentice Hall, 1993, pp. 321- 389. K. P. Murphy, Hidden Semi-Markov Models (HSMMs), Technical report, 2002. K. Murphy, Dynamic Bayesian Networks: Representation, Inference and Learning, Ph.D thesis, Computer Science Division, UC Berkeley, 2002. T. G. Dietterich, "Machine Learning for Sequential Data: A Review," in Structural, Syntactic, and Statistical Pattern Recognition, vol. 2396, Lecture Notes in Computer Science, Ed.: Springer-Verlag, 2002, pp. 15-30. H. M. Wallach, Conditional Random Fields: An Introduction, Technical Report MS-CIS-04-21, Department of Computer and Information Science, University of Pennsylvania, 2004. J. Lafferty, A. McCallum, and F. Pereira, "Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequential Data," in In Proceedings of the Eighteenth International conference on Machine Learning (ICML-2001), 2001 A. McCallum, "Efficiently Inducing Features of Conditional Random Fields," in Proceedings of the 19ty Conference in Uncertainty in Artificial Intelligence (UIA '03), 2003. B. Pearlmutter, Dynamic Recurrent Neural Networks, Technical Report CMU-CS- 90-196, Carnegie Mellon University School of Computer Science, Pittsburgh, PA 1990. Linear/non-linear regression and classification R. Duda, P. Hart, and D. Stork, "Chapter 5. Linear Discriminant Functions," in Pattern Classification, 2nd ed: John Wiley & Sons, 2000. A. K. Jain, R. P. W. Duin, and J. C. Mao, "Statistical Pattern Recognition: A Review," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 22, 1, pp. 4-37, 2000.
  • 5. Neural Networks R. Duda, P. Hart, and D. Stork, "Chapter 6. Multi-layer Neural Networks," in Pattern Classification, 2nd ed: John Wiley & Sons, 2000. A. K. Jain, J. Mao, and K. M. Mohiuddin, "Artificial Neural Networks: A Tutorial," in IEEE Computer, vol. 29, 1996, pp. 31-44. J. Sima and P. Orponen, "General-Purpose Computation with Neural Networks: A Survey of Complexity Theoretic Results," in Neural Computation, vol. 15, 2003, pp. 2727-2778 C. M. Bishop, "Single Layer Networks," in Neural Networks for Pattern Recognition: Oxford University Press, 1995, pp. 77-112. C. M. Bishop, "The Multi-Layer Perceptron," in Neural Networks for Pattern Recognition: Oxford University Press, 1995, pp. 116-137. Support Vector Machines C. J. C. Burges, "A Tutorial on Support Vector Machines for Pattern Recognition," Data Mining and Knowledge Discovery, vol. 2, pp. 121-167, 1998. R. Herbrich, "Kernel Classifiers from a Machine Learning Perspective," in Learning Kernel Classifiers: The MIT Press, pp. 49-66. C. Hsu, C. Chang, and C. Lin, A Practical Guide to Support Vector Classification, Technical Report Department of Computer Science and Information Engineering, National Taiwan University, 2003. Density estimation R. Duda, P. Hart, and D. Stork, "Chapter 4. Non-parametric Techniques," in Pattern Classification, 2nd ed: John Wiley & Sons, 2000. C. M. Bishop, "Probabilistic Density Estimation," in Neural Networks for Pattern Recognition: Oxford University Press, 1995, pp. 33-73. Unsupervised learning R. Duda, P. Hart, and D. Stork, "Chapter 10. Unsupervised Learning and Clustering," in Pattern Classification, 2nd ed: John Wiley & Sons, 2000. P. Berkhin, Survey Of Clustering Data Mining Techniques, Technical Report Accrue Software, San Jose, CA 2002. G. Fung, A Comprehensive Overview of Basic Clustering Algorithms, Technical Report, University of Winsconsin, Madison, 2001.
  • 6. A. K. Jain, M. N. Murty, and P. J. Flynn, "Data Clustering: A Review," ACM Computing Surveys, vol. 31, 3, pp. 264-323, 1999. A. Ng, M. Jordan, and Y. Weiss, "On spectral clustering: Analysis and an Algorithm," in Advances in Neural Information Processing Systems, 2001 S. Zhong, Probabilistic Model-Based Clustering of Time Series, Ph.D Qualifying proposal University of Texas at Austin, Austin, Texas, May 2002. Semi-supervised learning* (Cluster-classify, transductive SVMs) X. Zhu, Semi-Supervised Learning with Graphs, Technical CMU-LTI-05-192, Carnegie Mellon University, 2005. A. Blum and T. Mitchell, "Combining Labeled and Unlabeled Data with Co- Training," in Proceedings of the 11th Conference on Computational Learning Theory. Madison, Wisconsin, United States: Morgan Kaufmann Publishers, 1998, pp. 92-100 O. Chapelle, J. Weston, and B. Olkopf, "Cluster Kernels for Semi-supervised Learning," in Advances in Neural Information Processing Systems, vol. 15. Cambridge, MA: The MIT Press, 2003, pp. 585-592. K. Bennett and A. Demiriz, "Semi-Supervised Support Vector Machines," in Proceedings of Advances in Neural Information Processing Systems, M. S. Kearns, S. A. Solla, and D. A. Cohn, Eds.: The MIT Press, 1998, pp. 368-374 K. P. Bennett and A. Demiriz, "Optimization Approaches to Semi-Supervised Learning," in to appear in Applications and Algorithms of Complementarity, L. Mangasarian and J. S. Pang, Eds.: Kluwer Academic Publishers, 2000. T. Joachims, "Transductive Inference for Text Classification using Support Vector Machines," in Proceedings of The 16th International Conference on Machine Learning (ICML' 99). San Francisco, CA: Morgan Kaufmann Publishers, 1999, pp. 200-209. Reinforcement learning (Q-learning, TD learning) L. P. Kaelbling, M. L. Littman, and A. W. Moore, "Reinforcement Learning: a Survey," Journal of Artificial Intelligence Research, vol. 4, pp. 237-285, 1996. R. S. Sutton and A. G. Barto, "The Reinforcement Learning Problem," in Reinforcement Learning: The MIT Press, pp. 51-81
  • 7. R. S. Sutton and A. G. Barto, "Temporal-Difference Learning," in Reinforcement Learning: The MIT Press, pp. 133-157. R. S. Sutton, "Learning to Predict by the Method of Temporal Differences," Machine Learning, vol. 3, pp. 9-44, 1998. Feature Extraction techniques (Fisher, PCA, ICA) R. Duda, P. Hart, and D. Stork, "Chapter 3.8 Component Analysis and Discriminants," in Pattern Classification, 2nd ed: John Wiley & Sons, 2000. C. J. C. Burges, Geometric Methods for Feature Extraction and Dimensional Reduction: A Guided Tour, Technical Report MSR-TR-2004-55, Microsoft Research, Redmond, WA 2003. Ensemble Methods T. G. Dietterich, "Ensemble Methods in Machine Learning," in Proceedings of the First International Workshop on Multiple Classifier Systems, vol. 1857, 2000, pp. 1-15 Y. Freund and R. Schapire, "A short introduction to boosting," in Journal of Japanese Society for Artificial Intelligence, vol. 11, 1999, pp. 771-780. E. Bauer and R. Kohavi, "An Empirical Comparison of Voting Classification Algorithms: Bagging, Boosting, and Variants," in Machine Learning, vol. 36, 1999, pp. 105-139. T. G. Dietterich, "Machine-Learning Research: Four Current Directions," in The AI Magazine, vol. 18, 1998, pp. 97-136 Applications to activity recognition M. Philipose, K. Fishkin, D. Fox, H. Kautz, D. Patterson, and M. perkowitz, "Guide: Towards Understanding Daily Life via Auto-Identification and Statistical Analysis," in Proceedings of The 2nd International Workshop on Ubiquitous Computing for Pervasive Healthcare Applications (UbiHealth ‘03). Seattle, WA, 2003. D. Wilson, "Simultaneous Tracking & Activity Recognition (STAR) Using Many Anonymous, Binary Sensors," to be published in Proceedings of The 3rd International Conference on Pervasive Computing (Pervasive ‘05). Munich, Germany, 2005.
  • 8. D. Patterson, D. Fox, H. Kautz, and M. Philipose, "Expressive, Tractable and Scalable Techniques for Modeling Activities of Daily Living.," in Proceedings of The 2nd International Workshop on Ubiquitous Computing for Pervasive Healthcare Applications (UbiHealth '03). Seattle, WA, 2003. D. Patterson, L. Liao, D. Fox, and H. Kautz, "Inferring High Level Behavior from Low Level Sensors.," in Fifth Annual Conference on Ubiquitous Computing (UBICOMP 2003). Seattle, WA, 2003. M. Philipose, K. P. Fishkin, M. Perkowitz, D. J. Patterson, D. Hahnel, D. Fox, and H. Kautz, "Inferring Activities from Interactions with Objects," IEEE Pervasive Computing Magazine, vol. 3, 4, 2004. M. Perkowitz, M. Philipose, D. J. Patterson, and K. Fishkin, "Mining Models of Human Activities from the Web," in Proceedings of The Thirteenth International World Wide Web Conference (WWW '04). New York, USA, 2004. D. Patterson, D. Fox, H. Kautz, and M. Philipose, Sporadic State Estimation for General Activity Inference, Technical report irs_tr_04_003_a, Intel Research Seattle and the University of Washington, June 2004. D. Patterson, Modeling Details of the Activity Tracker, Technical report irs_tr_04_003_a, Intel Research Seattle and the University of Washington, Seattle, WA, July 2004. N. M. Oliver, B. Rosario, and A. Pentland, "A Bayesian Computer Vision System for Modeling Human Interactions," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 22, 8, pp. 831-843, 2000. N. Oliver, E. Horvitz, and A. Garg, "Hierarchical Representations for Learning and Inferring Office Activity from Multimodal Information," in Proceedings of the 4th International Conference on Multimodal Interfaces, 2002. S. Luhr, H. Bui, S. Venkatesh, and G. West, "Recognition of Human Activity Through Hierarchical Stochastic Learning," in Proceedings of The IEEE International Conference on Pervasive Computing and Communications: IEEE Press, 2003.