Lecture23

1,402 views

Published on

Published in: Education, Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
1,402
On SlideShare
0
From Embeds
0
Number of Embeds
45
Actions
Shares
0
Downloads
83
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Lecture23

  1. 1. Introduction to Machine Learning Lecture 23 Learning Classifier Systems Albert Orriols i Puig http://www.albertorriols.net htt // lb t i l t aorriols@salle.url.edu Artificial Intelligence – Machine Learning g g Enginyeria i Arquitectura La Salle Universitat Ramon Llull
  2. 2. Recap of Lectures 21-22 Value functions Vπ(s): Long-term reward estimation from s a e s following po cy π o state o o g policy Qπ(s,a): Long-term reward estimation from s a e s e ecu g ac o a o state executing action and then following policy π The long term reward is a recency weighted average of recency-weighted the received rewards …r … at rt+1 at+1 rt+2 at+2 rt+3 at+3 t st st+1 st+2 st+3 Slide 2 Artificial Intelligence Machine Learning
  3. 3. Recap of Lectures 21-22 Q Q-learning g Slide 3 Artificial Intelligence Machine Learning
  4. 4. Today’s Agenda The Origins of LCSs Michigan-style LCSs Pittsburg-style LCS Pitt b t l LCSs Michigan-style LCSs Slide 4 Artificial Intelligence Machine Learning
  5. 5. Original Idea of LCS Holland’s envision: Cognitive Systems g y Create true artificial intelligence itself True intelligence requires adaptive behavior in the face of changing circumstances (Holland & Reitman, 1978) Holland s Holland’s vision going back to late 50s and early 60s of roving bands of computer programs. Holland’s notion of genetic search as program searching (1962) The free generation procedure. . . Requires the generators (and combinations of generators) to “shift” and “connect” at random in the shift connect computer…two or more generators occupying adjacent modules (“in contact”) may become connected. Such connected sets of generators are to shift as a unit. t t hift it From stimulus-response t internal states and modifiable d t t F ti l to i t l tt d difi bl detectors and effectors Slide 5 Artificial Intelligence Machine Learning
  6. 6. First LCS Implementation CS-1 (Holland & Reitman, 1978) Post-production system General memory containing classifiers Process: Code the situation and find in memory the actions that are appropriate to both CS-1 goal and situation Store in memory the consequences of these actions (learning) Generate new good productions (classifiers) t endure. (l ifi ) to d Population of classifiers Current system knowledge Performance component Short term behavior of the system Rule discovery component Get new promising rules Slide 6 Artificial Intelligence Machine Learning
  7. 7. Meanwhile, in Pitts University Smith’s interpretation of Holland’s GA envision p Smith’s notion of learning as adaptive search (1980, 1983) LS-1: “Learns a set of heuristics, represented as production LS 1 “L t fh i ti td d ti system programs, to govern the application of a set of operators in performing a particular task” Great success! LS-1 took Waterman’s poker player to the cleaners (not bluffing) Slide 7 Artificial Intelligence Machine Learning
  8. 8. Two Models And here, two ways started: Michigan vs Pitts LCSs , y g Pittsburgh-style LCSs Michigan-style LCSs Straight GA Cognitive system Individual = set of rules Individual = rule Solution: best individual Solution: all the population Usually offline systems U ll ffli Apportionment of credit Reinforcement learning We focus on Michigan-style LCS Slide 8 Artificial Intelligence Machine Learning
  9. 9. Michigan-style LCSs General schema Environment Sensorial Action Reward state Online rule evaluator: • XCS: Q-Learning (Sutton & Barto, 1998) Classifier 1 Learning Any Representation: y p Uses Widrow-Hoff delta rule Classifier 2 Classifier production rules, genetic programs, System Classifier n perceptrons, SVMs Rule evolution: Genetic Typically, a GA (Holland, 75; Algorithm Goldberg, 89) applied on the population. Slide 9 Artificial Intelligence Machine Learning
  10. 10. Knowledge Representation The knowledge representation consists of g p Population of classifiers Usually independent of each other Each classifier has Condition C diti part C t Action part A Prediction P di ti part P t Interpreted as: If condition C is satisfied and action A is executed, then P is executed expected to be true Solution for a new problem Get the classifiers that match the sensorial state Decide which action should be used among the actions of the selected classifiers Slide 10 Artificial Intelligence Machine Learning
  11. 11. Condition Structures Condition structure depends on the types of attributes p yp Binary Ternary encoding {0, 1 #} {0 1, If v1 is ‘0’ and v2 is ‘1’ and v3 is ‘#’ … and vn in ‘0’ then actioni Continuous Interval-based encoding If v1 in [l1,u1] and v2 in [l2,u2] … and vn in [ln,un] then actioni u u u Hyperellipsoids Slide 11 Artificial Intelligence Machine Learning
  12. 12. Condition Structures Condition structure depends on the types of attributes p yp Many other representations Partial matching (Booker 1985) (Booker, Default hierarchies (Holland et al., 1986) Fuzzy conditions (Bonarini 2000; Valenzuela Rendón 1991; Casillas et (Bonarini, Valenzuela-Rendón, al., 2008, Orriols et al., 2009) Neural-network-based encodings (Bull & O’Hara, 2002) GP tree encodings with S-expressions (Lanzi, 1999) Slide 12 Artificial Intelligence Machine Learning
  13. 13. Prediction Prediction can be: Scalar number Line Polynomial Neural network … We ill W will consider the initial idea: prediction is a scalar number id th i iti l id di ti i l b Slide 13 Artificial Intelligence Machine Learning
  14. 14. Learning Interaction in XCS ENVIRONMENT Match Set [M] Problem instance 1C A PεF num as ts exp Selected 3C A PεF num as ts exp action 5C A PεF num as ts exp Population [P] 6C A PεF num as ts exp Match set REWARD … generation 1C A PεF num as ts exp 2C A PεF num as ts exp Prediction 3C A PεF num as ts exp Array 4C A PεF num as ts exp 5C A PεF num as ts exp 6C A PεF num as ts exp Selected action … Action Set [A] [] Classifier 1C A PεF num as ts exp Parameters Deletion Selection, reproduction, 3C A PεF num as ts exp Update and mutation 5C A PεF num as ts exp (Widrow-Hoff rule) 6C A PεF num as ts exp … Delayed reward [A-1] Genetic Algorithm Competition Fitness Sharing Action Set [A]-1 in the niche 1C A PεF num as ts e p u exp 3C A PεF num as ts exp 5C A PεF num as ts exp 6C A PεF num as ts exp … Slide 14 Artificial Intelligence Machine Learning
  15. 15. Estimate Classifier Prediction Three key p y parameters Prediction: What I will get if I select the action Error: Error on that prediction Does it sound familiar? Q-learning! Fitness: How good is my classifier g y These parameters are estimated on-line Slide 15 Artificial Intelligence Machine Learning
  16. 16. Evolutionary Search GA applied time to time to [A] pp [] Select two parents Cross th C them Mutate them Introduce the two new offspring into the population If the population is full t e popu at o s u remove poo classifiers e o e poor c ass e s Slide 16 Artificial Intelligence Machine Learning
  17. 17. LCS Learning Pressures Parameter updates identifies most accurate classifiers Different pressures caused by the GA: S t pressure t toward generality d lit Set Fitness pressure toward highly fit classifiers Mutation pressure pressuring toward diversification Subsumption pressure toward the deletion of accurate, over-specialized classifiers Slide 17 Artificial Intelligence Machine Learning
  18. 18. Next Class Applications of LCS A li ti f Slide 18 Artificial Intelligence Machine Learning
  19. 19. Introduction to Machine Learning Lecture 23 Learning Classifier Systems Albert Orriols i Puig http://www.albertorriols.net htt // lb t i l t aorriols@salle.url.edu Artificial Intelligence – Machine Learning g g Enginyeria i Arquitectura La Salle Universitat Ramon Llull

×