机器学习 陈昱 北京大学计算机科学技术研究所 信息安全工程研究中心
课程基本信息 主讲老师:陈昱 [email_address] Tel : 82529680 助教: 王洪艳 [email_address] 课程网页: http://www.icst.pku.edu.cn/course/mlearning/index.htm
Ch1 Introduction What is machine learning (ML)? Design a learning system: an example ML applications Miscellaneous issues
Ch1 Introduction What is machine learning (ML)? Design a learning system: an example ML applications Miscellaneous issues
A Brief History of Machine Learning ML as a scientific discipline was born in mid-seventies of last century. The first ML workshop was held in 1980 at CMU, with some two dozen  participants and photocopied preprints. The first ML publication “Machine Learning” started in 1986.
Some Early Seminal Works Perceptron model proposed by Rosenblatt (1958), so-called “connectionist” approach ,  a seminal work in neural work. A system that learns to play checkers (Samuel, 1959 & 1963) META-DENTRAL program, which generates rules that explains mass spectroscopy data used by expert system DENTRAL (Buchanan, 1978), an example of symbolic learning.
What is Machine Learning? The central problem it studies:  How can we build computer systems that  automatically  improve with experience, and what are the laws that govern all learning processes? We state a learning problem as: a machine learns with respect to (w.r.t.) a particular task  T , performance metric  P , and type of experience  E .
What is Machine Learning (2) More precisely, a computer program is said to learn from experience  E , w.r.t. to some class of tasks  T  and performance measure  P , if its performance at tasks in  T , as measured by  P , improves with experience  E .
Alternative Views Machine learning as an attempt to automate parts of the scientific method (Wikipedia) Scientific method refers to a body of techniques for investigating phenomena, acquiring new knowledge, or correcting and integrating previous knowledge. Machine learning as computational approaches to learning
Example of Learning Problem Handwriting Recognition Task  T : recognizing and classifying handwritten words within images Performance measure  P : percent of words correctly classified Training experience  E : a database of handwritten words with given classification
Place within Computer Science Think about a niche within the space of all software applications where ML plays a special role Software applications that we can’t program by hand (too complicated) Self customizing programs
Relation with other Disciplines  Human and animal learning (Psychology, Neuroscience …) Biology, economics, control theory (adaptiveness, optimization) Computer Science Machine Learning Statistics
Ch1 Introduction What is machine learning (ML)? Design a learning system: an example ML applications Miscellaneous issues
Design a Learning System Consider the example of learning how to play checkers T : playing checkers P : the percent of games it wins in the world tournament E ?
starting position of  a checkers game, from Wikipedia
a checkers board  state, from http://www.5025488.net/bbs/thread-49430-1-1.html
Choose the Training Experien ce Type of  feedback  provided by training examples (to improve  P ) Direct: individual checkers board states plus the correct move for each state Indirect: move sequences plus final outcome for each game Need to assign each move a credit/punish for the final outcome
Choose the Training Experience Type of  feedback  provided by training examples (to improve  P ) Direct: individual checkers board states plus the correct move for each state Indirect: move sequences plus final outcome for each game Need to assign each move a credit/punish for the final outcome Easy for learning!
Choose the Training Experience (2) How much the learner can control training examples? Completely rely on a teacher to select board states and provide correct move for each state, have complete control over board states and final game outcome (indirect feedback), as in the case of playing against itself, or propose confusing board states to a teacher and ask for correct move.
Choose the Training Experience (3) How well the training examples resemble to the cases in which the final performance  P  is measured?  Theoretical assumption vs. reality Related topics:  Concept drifting Incremental learning Transfer learning
Chose the Training Experience (3) How well the training examples resemble to the cases in which the final performance  P  is measured?  Theoretical assumption vs. reality Related topics:  Concept drifting Incremental learning Transfer learning  (current research hotspot!)
Update Summary A checkers learning problem T : playing checkers P : the percent of games it wins in the world tournament E :   games played against itself
Remaining Issues What knowledge to be learned? How to represent this knowledge? What algorithm used to learn the knowledge (learning mechanism)?
Remaining Issues What knowledge to be learned? How to represent this knowledge? What algorithm used to learn the knowledge (learning mechanism)?
Choose the Target Function Think of a checker learning program as an optimization problem: At every board state the program chooses the best move among all the legal moves. Reformulate what to be learned as a function  ChooseMove :  B ->   M , or a better representation,  V :  B ->   R real number set
How to Define Target Function  V ? If  b  is a final board state, then it is simple: If  b  is won,  V ( b )=100 (or other big number) If  b  is lost,  V ( b )=-100 If  b  is draw,  V ( b )=0
How to Define Target Function  V  ?(2) Otherwise, it is tough! We might define V( b )= V ( b’ ), where  b’   is the best final state that can be achieved starting from  b  and playing optimally until end of the program. However, such definition is not operational!
Remaining Issues What knowledge to be learned? How to represent this knowledge? What algorithm used to learn the knowledge (learning mechanism)?
Choose a Representation of  V   A tradeoff between the expressiveness of  V  and demand for training data Let us consider a simple representation  Ṽ  of  V : a linear combination of following board states: x 1 : #black pieces on the board x 2 : #red pieces x 3 : #black kings x 4 : #red kings x 5 : #black pieces threatened by red (i.e. which can be captured on red’s next move) x 6 : #red pieces threatened by black (i.e. which can be captured on black’s next move)
A Simple Representation of  V
Remaining Issues What knowledge to be learned? How to represent this knowledge? What algorithm used to learn the knowledge (learning mechanism)?
Choose an Approximation Algorithm Choose a set of training examples ( b ,  V train ( b )) Estimate  V train ( b ) For some board state, it is obvious, e.g.  V train ( b )=100 if  x 2 =0 (assuming the learning program plays black). Only indirect training examples are available. One common approach is via iteration, such as  V train ( b ) ←Ṽ( Successor ( b )).
Adjust the Weights A common approach to obtain the weights is by minimizing the sum of square of error
An Algorithm for Finding Weights Least mean square (LMS) weight update rule: For each training example ( b ,  V train ( b )) Use the current weights to calculate Ṽ( b ).  For each weight w i , update it as
Summary of the Whole Design Process
Issues in Machine Learning What algorithms exist for learning general target functions from training examples? Convergence of algorithms given sufficient examples? Which algorithms work best for which kind of target functions? How does number of examples influence accuracy of learned functions? How dose character of hypothesis space impact accuracy? How can prior knowledge of learner help?
Issues in Machine Learning (2) What specific functions should the learner attempt to learn? Can this process be automated? How can the learner  automatically  alter its representation to improve its ability to represent and learn the target function?
Ch1 Introduction What is machine learning (ML)? Design a learning system: an example ML applications Miscellaneous issues
Machine Learning* Speech Recognition Automated Control learning Reinforcement learning Predictive modeling Pattern discovery Hidden Markov models Convex optimization Explanation-based learning .... Extracting facts from text Object recognition Data Mining
Example: Self-Learning Robot iCub iCub is a humanoid robot the size of a 3.5 year old child. It has been developing for 5-years under the project  RobotCub,  funded by European Commission for studying human cognition.  RobotCub is an open source project.
Application Successes Speech recognition Two training stages: speaker-independent and speaker-dependent Computer vision Face recognition, sorting letters contain hand-written addresses by US postal office Bio-surveillance Detecting and tracking outbreak of disease Robot control Robots drive autonomously
Ch1 Introduction What is machine learning (ML)? Design a learning system: an example ML applications Miscellaneous issues
Research on ML Current research questions Long-term questions For the above two items, see “The Discipline of Machine Learning” by Tom Mitchell for a sample of questions. Machine learning for tough problems: relevant novelty detection, structural learning, active learning.* *: from a slide by Jaime Carbonell et al. in April 2007.
Ethical Questions When and where to apply ML technology? For example, when collecting data for security or law enforcement, or for marketing purpose, what about our privacy? Privacy-preserving data mining. Borrow something from Secure Multiparty Computing (SMC)?
Major Conference and Journal International Conference on Machine Learning (ICML) Conference on Neural Information Processing Systems (NIPS) Annual Conference on Learning Theory (COLT) Journal of Machine Learning Research (JMLR) Machine Learning
Some Interesting Ref Pattern Recognition in industry, by Phiroz Bhagat, Elsevier, 2005. UCI Machine Learning Repository “machine learning” item on Wikipedia
HW Read “The Discipline of Machine Learning” by Tom Mitchell 1.2 (10pt, due Sept 22) Bonus problem: pick up one challenge from Jaime’s paper written in 1992, and write a detailed update progress report. (10pt)

课堂讲义(最后更新:2009-9-25)

  • 1.
  • 2.
    课程基本信息 主讲老师:陈昱 [email_address]Tel : 82529680 助教: 王洪艳 [email_address] 课程网页: http://www.icst.pku.edu.cn/course/mlearning/index.htm
  • 3.
    Ch1 Introduction Whatis machine learning (ML)? Design a learning system: an example ML applications Miscellaneous issues
  • 4.
    Ch1 Introduction Whatis machine learning (ML)? Design a learning system: an example ML applications Miscellaneous issues
  • 5.
    A Brief Historyof Machine Learning ML as a scientific discipline was born in mid-seventies of last century. The first ML workshop was held in 1980 at CMU, with some two dozen participants and photocopied preprints. The first ML publication “Machine Learning” started in 1986.
  • 6.
    Some Early SeminalWorks Perceptron model proposed by Rosenblatt (1958), so-called “connectionist” approach , a seminal work in neural work. A system that learns to play checkers (Samuel, 1959 & 1963) META-DENTRAL program, which generates rules that explains mass spectroscopy data used by expert system DENTRAL (Buchanan, 1978), an example of symbolic learning.
  • 7.
    What is MachineLearning? The central problem it studies: How can we build computer systems that automatically improve with experience, and what are the laws that govern all learning processes? We state a learning problem as: a machine learns with respect to (w.r.t.) a particular task T , performance metric P , and type of experience E .
  • 8.
    What is MachineLearning (2) More precisely, a computer program is said to learn from experience E , w.r.t. to some class of tasks T and performance measure P , if its performance at tasks in T , as measured by P , improves with experience E .
  • 9.
    Alternative Views Machinelearning as an attempt to automate parts of the scientific method (Wikipedia) Scientific method refers to a body of techniques for investigating phenomena, acquiring new knowledge, or correcting and integrating previous knowledge. Machine learning as computational approaches to learning
  • 10.
    Example of LearningProblem Handwriting Recognition Task T : recognizing and classifying handwritten words within images Performance measure P : percent of words correctly classified Training experience E : a database of handwritten words with given classification
  • 11.
    Place within ComputerScience Think about a niche within the space of all software applications where ML plays a special role Software applications that we can’t program by hand (too complicated) Self customizing programs
  • 12.
    Relation with otherDisciplines Human and animal learning (Psychology, Neuroscience …) Biology, economics, control theory (adaptiveness, optimization) Computer Science Machine Learning Statistics
  • 13.
    Ch1 Introduction Whatis machine learning (ML)? Design a learning system: an example ML applications Miscellaneous issues
  • 14.
    Design a LearningSystem Consider the example of learning how to play checkers T : playing checkers P : the percent of games it wins in the world tournament E ?
  • 15.
    starting position of a checkers game, from Wikipedia
  • 16.
    a checkers board state, from http://www.5025488.net/bbs/thread-49430-1-1.html
  • 17.
    Choose the TrainingExperien ce Type of feedback provided by training examples (to improve P ) Direct: individual checkers board states plus the correct move for each state Indirect: move sequences plus final outcome for each game Need to assign each move a credit/punish for the final outcome
  • 18.
    Choose the TrainingExperience Type of feedback provided by training examples (to improve P ) Direct: individual checkers board states plus the correct move for each state Indirect: move sequences plus final outcome for each game Need to assign each move a credit/punish for the final outcome Easy for learning!
  • 19.
    Choose the TrainingExperience (2) How much the learner can control training examples? Completely rely on a teacher to select board states and provide correct move for each state, have complete control over board states and final game outcome (indirect feedback), as in the case of playing against itself, or propose confusing board states to a teacher and ask for correct move.
  • 20.
    Choose the TrainingExperience (3) How well the training examples resemble to the cases in which the final performance P is measured? Theoretical assumption vs. reality Related topics: Concept drifting Incremental learning Transfer learning
  • 21.
    Chose the TrainingExperience (3) How well the training examples resemble to the cases in which the final performance P is measured? Theoretical assumption vs. reality Related topics: Concept drifting Incremental learning Transfer learning (current research hotspot!)
  • 22.
    Update Summary Acheckers learning problem T : playing checkers P : the percent of games it wins in the world tournament E : games played against itself
  • 23.
    Remaining Issues Whatknowledge to be learned? How to represent this knowledge? What algorithm used to learn the knowledge (learning mechanism)?
  • 24.
    Remaining Issues Whatknowledge to be learned? How to represent this knowledge? What algorithm used to learn the knowledge (learning mechanism)?
  • 25.
    Choose the TargetFunction Think of a checker learning program as an optimization problem: At every board state the program chooses the best move among all the legal moves. Reformulate what to be learned as a function ChooseMove : B -> M , or a better representation, V : B -> R real number set
  • 26.
    How to DefineTarget Function V ? If b is a final board state, then it is simple: If b is won, V ( b )=100 (or other big number) If b is lost, V ( b )=-100 If b is draw, V ( b )=0
  • 27.
    How to DefineTarget Function V ?(2) Otherwise, it is tough! We might define V( b )= V ( b’ ), where b’ is the best final state that can be achieved starting from b and playing optimally until end of the program. However, such definition is not operational!
  • 28.
    Remaining Issues Whatknowledge to be learned? How to represent this knowledge? What algorithm used to learn the knowledge (learning mechanism)?
  • 29.
    Choose a Representationof V A tradeoff between the expressiveness of V and demand for training data Let us consider a simple representation Ṽ of V : a linear combination of following board states: x 1 : #black pieces on the board x 2 : #red pieces x 3 : #black kings x 4 : #red kings x 5 : #black pieces threatened by red (i.e. which can be captured on red’s next move) x 6 : #red pieces threatened by black (i.e. which can be captured on black’s next move)
  • 30.
  • 31.
    Remaining Issues Whatknowledge to be learned? How to represent this knowledge? What algorithm used to learn the knowledge (learning mechanism)?
  • 32.
    Choose an ApproximationAlgorithm Choose a set of training examples ( b , V train ( b )) Estimate V train ( b ) For some board state, it is obvious, e.g. V train ( b )=100 if x 2 =0 (assuming the learning program plays black). Only indirect training examples are available. One common approach is via iteration, such as V train ( b ) ←Ṽ( Successor ( b )).
  • 33.
    Adjust the WeightsA common approach to obtain the weights is by minimizing the sum of square of error
  • 34.
    An Algorithm forFinding Weights Least mean square (LMS) weight update rule: For each training example ( b , V train ( b )) Use the current weights to calculate Ṽ( b ). For each weight w i , update it as
  • 35.
    Summary of theWhole Design Process
  • 36.
    Issues in MachineLearning What algorithms exist for learning general target functions from training examples? Convergence of algorithms given sufficient examples? Which algorithms work best for which kind of target functions? How does number of examples influence accuracy of learned functions? How dose character of hypothesis space impact accuracy? How can prior knowledge of learner help?
  • 37.
    Issues in MachineLearning (2) What specific functions should the learner attempt to learn? Can this process be automated? How can the learner automatically alter its representation to improve its ability to represent and learn the target function?
  • 38.
    Ch1 Introduction Whatis machine learning (ML)? Design a learning system: an example ML applications Miscellaneous issues
  • 39.
    Machine Learning* SpeechRecognition Automated Control learning Reinforcement learning Predictive modeling Pattern discovery Hidden Markov models Convex optimization Explanation-based learning .... Extracting facts from text Object recognition Data Mining
  • 40.
    Example: Self-Learning RobotiCub iCub is a humanoid robot the size of a 3.5 year old child. It has been developing for 5-years under the project RobotCub, funded by European Commission for studying human cognition. RobotCub is an open source project.
  • 41.
    Application Successes Speechrecognition Two training stages: speaker-independent and speaker-dependent Computer vision Face recognition, sorting letters contain hand-written addresses by US postal office Bio-surveillance Detecting and tracking outbreak of disease Robot control Robots drive autonomously
  • 42.
    Ch1 Introduction Whatis machine learning (ML)? Design a learning system: an example ML applications Miscellaneous issues
  • 43.
    Research on MLCurrent research questions Long-term questions For the above two items, see “The Discipline of Machine Learning” by Tom Mitchell for a sample of questions. Machine learning for tough problems: relevant novelty detection, structural learning, active learning.* *: from a slide by Jaime Carbonell et al. in April 2007.
  • 44.
    Ethical Questions Whenand where to apply ML technology? For example, when collecting data for security or law enforcement, or for marketing purpose, what about our privacy? Privacy-preserving data mining. Borrow something from Secure Multiparty Computing (SMC)?
  • 45.
    Major Conference andJournal International Conference on Machine Learning (ICML) Conference on Neural Information Processing Systems (NIPS) Annual Conference on Learning Theory (COLT) Journal of Machine Learning Research (JMLR) Machine Learning
  • 46.
    Some Interesting RefPattern Recognition in industry, by Phiroz Bhagat, Elsevier, 2005. UCI Machine Learning Repository “machine learning” item on Wikipedia
  • 47.
    HW Read “TheDiscipline of Machine Learning” by Tom Mitchell 1.2 (10pt, due Sept 22) Bonus problem: pick up one challenge from Jaime’s paper written in 1992, and write a detailed update progress report. (10pt)

Editor's Notes

  • #8 Learn some patterns!
  • #16 8*8, 12 pieces each side. The pieces are usually made of wood and are flat and cylindrical. Two kinds of pieces: king and men. The back side always initiates the game. Move: simple move, jump; Kings: When a piece becomes “kinged”? How the game ends: A player wins by capturing all of the opposing player's pieces, or by leaving the opposing player with no legal moves, or a player's piece jumps into the kings row.
  • #20 Active learning by querying teacher is much more data-efficient than random observation.
  • #30 Explain intuitively!
  • #33 Successor: after two moves. How to compute Successor ( b )?
  • #34 Meaning under Bayesian learning setting
  • #35 LMS: viewed as stochastic gradient-descent search
  • #40 *: This slide is from a presentation by Jaime Carbonell et al. in April 2007.
  • #42 See “The Discipline of Machine Learning” by Tom Mitchell 舆情分析
  • #44 Tell a story about Bioinformatics