课堂讲义(最后更新:2009-9-25)

机器学习陈昱北京大学计算机科学技术研究所信息安全工程研究中心

课程基本信息主讲老师：陈昱 [email_address] Tel ： 82529680 助教：王洪艳 [email_address] 课程网页： http://www.icst.pku.edu.cn/course/mlearning/index.htm

Ch1 Introduction What is machine learning (ML)? Design a learning system: an example ML applications Miscellaneous issues

A Brief History of Machine Learning ML as a scientific discipline was born in mid-seventies of last century. The first ML workshop was held in 1980 at CMU, with some two dozen participants and photocopied preprints. The first ML publication “Machine Learning” started in 1986.

Some Early Seminal Works Perceptron model proposed by Rosenblatt (1958), so-called “connectionist” approach ， a seminal work in neural work. A system that learns to play checkers (Samuel, 1959 & 1963) META-DENTRAL program, which generates rules that explains mass spectroscopy data used by expert system DENTRAL (Buchanan, 1978), an example of symbolic learning.

What is Machine Learning? The central problem it studies: How can we build computer systems that automatically improve with experience, and what are the laws that govern all learning processes? We state a learning problem as: a machine learns with respect to (w.r.t.) a particular task T , performance metric P , and type of experience E .

What is Machine Learning (2) More precisely, a computer program is said to learn from experience E , w.r.t. to some class of tasks T and performance measure P , if its performance at tasks in T , as measured by P , improves with experience E .

Alternative Views Machine learning as an attempt to automate parts of the scientific method (Wikipedia) Scientific method refers to a body of techniques for investigating phenomena, acquiring new knowledge, or correcting and integrating previous knowledge. Machine learning as computational approaches to learning

Example of Learning Problem Handwriting Recognition Task T : recognizing and classifying handwritten words within images Performance measure P : percent of words correctly classified Training experience E : a database of handwritten words with given classification

Place within Computer Science Think about a niche within the space of all software applications where ML plays a special role Software applications that we can’t program by hand (too complicated) Self customizing programs

Relation with other Disciplines Human and animal learning (Psychology, Neuroscience …) Biology, economics, control theory (adaptiveness, optimization) Computer Science Machine Learning Statistics

Design a Learning System Consider the example of learning how to play checkers T : playing checkers P : the percent of games it wins in the world tournament E ?

starting position of a checkers game, from Wikipedia

a checkers board state, from http://www.5025488.net/bbs/thread-49430-1-1.html

Choose the Training Experien ce Type of feedback provided by training examples (to improve P ) Direct: individual checkers board states plus the correct move for each state Indirect: move sequences plus final outcome for each game Need to assign each move a credit/punish for the final outcome

Choose the Training Experience Type of feedback provided by training examples (to improve P ) Direct: individual checkers board states plus the correct move for each state Indirect: move sequences plus final outcome for each game Need to assign each move a credit/punish for the final outcome Easy for learning!

Choose the Training Experience (2) How much the learner can control training examples? Completely rely on a teacher to select board states and provide correct move for each state, have complete control over board states and final game outcome (indirect feedback), as in the case of playing against itself, or propose confusing board states to a teacher and ask for correct move.

Choose the Training Experience (3) How well the training examples resemble to the cases in which the final performance P is measured? Theoretical assumption vs. reality Related topics: Concept drifting Incremental learning Transfer learning

Chose the Training Experience (3) How well the training examples resemble to the cases in which the final performance P is measured? Theoretical assumption vs. reality Related topics: Concept drifting Incremental learning Transfer learning (current research hotspot!)

Update Summary A checkers learning problem T : playing checkers P : the percent of games it wins in the world tournament E : games played against itself

Remaining Issues What knowledge to be learned? How to represent this knowledge? What algorithm used to learn the knowledge (learning mechanism)?

Choose the Target Function Think of a checker learning program as an optimization problem: At every board state the program chooses the best move among all the legal moves. Reformulate what to be learned as a function ChooseMove : B -> M , or a better representation, V : B -> R real number set

How to Define Target Function V ? If b is a final board state, then it is simple: If b is won, V ( b )=100 (or other big number) If b is lost, V ( b )=-100 If b is draw, V ( b )=0

How to Define Target Function V ?(2) Otherwise, it is tough! We might define V( b )= V ( b’ ), where b’ is the best final state that can be achieved starting from b and playing optimally until end of the program. However, such definition is not operational!

Choose a Representation of V A tradeoff between the expressiveness of V and demand for training data Let us consider a simple representation Ṽ of V : a linear combination of following board states: x 1 : #black pieces on the board x 2 : #red pieces x 3 : #black kings x 4 : #red kings x 5 : #black pieces threatened by red (i.e. which can be captured on red’s next move) x 6 : #red pieces threatened by black (i.e. which can be captured on black’s next move)

Choose an Approximation Algorithm Choose a set of training examples ( b , V train ( b )) Estimate V train ( b ) For some board state, it is obvious, e.g. V train ( b )=100 if x 2 =0 (assuming the learning program plays black). Only indirect training examples are available. One common approach is via iteration, such as V train ( b ) ←Ṽ( Successor ( b )).

Adjust the Weights A common approach to obtain the weights is by minimizing the sum of square of error

An Algorithm for Finding Weights Least mean square (LMS) weight update rule: For each training example ( b , V train ( b )) Use the current weights to calculate Ṽ( b ). For each weight w i , update it as

Summary of the Whole Design Process

Issues in Machine Learning What algorithms exist for learning general target functions from training examples? Convergence of algorithms given sufficient examples? Which algorithms work best for which kind of target functions? How does number of examples influence accuracy of learned functions? How dose character of hypothesis space impact accuracy? How can prior knowledge of learner help?

Issues in Machine Learning (2) What specific functions should the learner attempt to learn? Can this process be automated? How can the learner automatically alter its representation to improve its ability to represent and learn the target function?

Machine Learning* Speech Recognition Automated Control learning Reinforcement learning Predictive modeling Pattern discovery Hidden Markov models Convex optimization Explanation-based learning .... Extracting facts from text Object recognition Data Mining

Example: Self-Learning Robot iCub iCub is a humanoid robot the size of a 3.5 year old child. It has been developing for 5-years under the project RobotCub, funded by European Commission for studying human cognition. RobotCub is an open source project.

Application Successes Speech recognition Two training stages: speaker-independent and speaker-dependent Computer vision Face recognition, sorting letters contain hand-written addresses by US postal office Bio-surveillance Detecting and tracking outbreak of disease Robot control Robots drive autonomously

Research on ML Current research questions Long-term questions For the above two items, see “The Discipline of Machine Learning” by Tom Mitchell for a sample of questions. Machine learning for tough problems: relevant novelty detection, structural learning, active learning.* *: from a slide by Jaime Carbonell et al. in April 2007.

Ethical Questions When and where to apply ML technology? For example, when collecting data for security or law enforcement, or for marketing purpose, what about our privacy? Privacy-preserving data mining. Borrow something from Secure Multiparty Computing (SMC)?

Major Conference and Journal International Conference on Machine Learning (ICML) Conference on Neural Information Processing Systems (NIPS) Annual Conference on Learning Theory (COLT) Journal of Machine Learning Research (JMLR) Machine Learning

Some Interesting Ref Pattern Recognition in industry, by Phiroz Bhagat, Elsevier, 2005. UCI Machine Learning Repository “machine learning” item on Wikipedia

HW Read “The Discipline of Machine Learning” by Tom Mitchell 1.2 (10pt, due Sept 22) Bonus problem: pick up one challenge from Jaime’s paper written in 1992, and write a detailed update progress report. (10pt)

课堂讲义(最后更新:2009-9-25)

More Related Content

What's hot

Similar to 课堂讲义(最后更新:2009-9-25)

More from butest

课堂讲义(最后更新:2009-9-25)

Editor's Notes