Lecture23

Loading...

Flash Player 9 (or above) is needed to view presentations.
We have detected that you do not have it on your computer. To install it, go here.

0 comments

Post a comment

    Post a comment
    Embed Video
    Edit your comment Cancel

    Favorites, Groups & Events

    Lecture23 - Presentation Transcript

    1. Introduction to Machine Learning Lecture 23 Learning Classifier Systems Albert Orriols i Puig http://www.albertorriols.net htt // lb t i l t aorriols@salle.url.edu Artificial Intelligence – Machine Learning g g Enginyeria i Arquitectura La Salle Universitat Ramon Llull
    2. Recap of Lectures 21-22 Value functions Vπ(s): Long-term reward estimation from s a e s following po cy π o state o o g policy Qπ(s,a): Long-term reward estimation from s a e s e ecu g ac o a o state executing action and then following policy π The long term reward is a recency weighted average of recency-weighted the received rewards …r … at rt+1 at+1 rt+2 at+2 rt+3 at+3 t st st+1 st+2 st+3 Slide 2 Artificial Intelligence Machine Learning
    3. Recap of Lectures 21-22 Q Q-learning g Slide 3 Artificial Intelligence Machine Learning
    4. Today’s Agenda The Origins of LCSs Michigan-style LCSs Pittsburg-style LCS Pitt b t l LCSs Michigan-style LCSs Slide 4 Artificial Intelligence Machine Learning
    5. Original Idea of LCS Holland’s envision: Cognitive Systems g y Create true artificial intelligence itself True intelligence requires adaptive behavior in the face of changing circumstances (Holland & Reitman, 1978) Holland s Holland’s vision going back to late 50s and early 60s of roving bands of computer programs. Holland’s notion of genetic search as program searching (1962) The free generation procedure. . . Requires the generators (and combinations of generators) to “shift” and “connect” at random in the shift connect computer…two or more generators occupying adjacent modules (“in contact”) may become connected. Such connected sets of generators are to shift as a unit. t t hift it From stimulus-response t internal states and modifiable d t t F ti l to i t l tt d difi bl detectors and effectors Slide 5 Artificial Intelligence Machine Learning
    6. First LCS Implementation CS-1 (Holland & Reitman, 1978) Post-production system General memory containing classifiers Process: Code the situation and find in memory the actions that are appropriate to both CS-1 goal and situation Store in memory the consequences of these actions (learning) Generate new good productions (classifiers) t endure. (l ifi ) to d Population of classifiers Current system knowledge Performance component Short term behavior of the system Rule discovery component Get new promising rules Slide 6 Artificial Intelligence Machine Learning
    7. Meanwhile, in Pitts University Smith’s interpretation of Holland’s GA envision p Smith’s notion of learning as adaptive search (1980, 1983) LS-1: “Learns a set of heuristics, represented as production LS 1 “L t fh i ti td d ti system programs, to govern the application of a set of operators in performing a particular task” Great success! LS-1 took Waterman’s poker player to the cleaners (not bluffing) Slide 7 Artificial Intelligence Machine Learning
    8. Two Models And here, two ways started: Michigan vs Pitts LCSs , y g Pittsburgh-style LCSs Michigan-style LCSs Straight GA Cognitive system Individual = set of rules Individual = rule Solution: best individual Solution: all the population Usually offline systems U ll ffli Apportionment of credit Reinforcement learning We focus on Michigan-style LCS Slide 8 Artificial Intelligence Machine Learning
    9. Michigan-style LCSs General schema Environment Sensorial Action Reward state Online rule evaluator: • XCS: Q-Learning (Sutton & Barto, 1998) Classifier 1 Learning Any Representation: y p Uses Widrow-Hoff delta rule Classifier 2 Classifier production rules, genetic programs, System Classifier n perceptrons, SVMs Rule evolution: Genetic Typically, a GA (Holland, 75; Algorithm Goldberg, 89) applied on the population. Slide 9 Artificial Intelligence Machine Learning
    10. Knowledge Representation The knowledge representation consists of g p Population of classifiers Usually independent of each other Each classifier has Condition C diti part C t Action part A Prediction P di ti part P t Interpreted as: If condition C is satisfied and action A is executed, then P is executed expected to be true Solution for a new problem Get the classifiers that match the sensorial state Decide which action should be used among the actions of the selected classifiers Slide 10 Artificial Intelligence Machine Learning
    11. Condition Structures Condition structure depends on the types of attributes p yp Binary Ternary encoding {0, 1 #} {0 1, If v1 is ‘0’ and v2 is ‘1’ and v3 is ‘#’ … and vn in ‘0’ then actioni Continuous Interval-based encoding If v1 in [l1,u1] and v2 in [l2,u2] … and vn in [ln,un] then actioni u u u Hyperellipsoids Slide 11 Artificial Intelligence Machine Learning
    12. Condition Structures Condition structure depends on the types of attributes p yp Many other representations Partial matching (Booker 1985) (Booker, Default hierarchies (Holland et al., 1986) Fuzzy conditions (Bonarini 2000; Valenzuela Rendón 1991; Casillas et (Bonarini, Valenzuela-Rendón, al., 2008, Orriols et al., 2009) Neural-network-based encodings (Bull & O’Hara, 2002) GP tree encodings with S-expressions (Lanzi, 1999) Slide 12 Artificial Intelligence Machine Learning
    13. Prediction Prediction can be: Scalar number Line Polynomial Neural network … We ill W will consider the initial idea: prediction is a scalar number id th i iti l id di ti i l b Slide 13 Artificial Intelligence Machine Learning
    14. Learning Interaction in XCS ENVIRONMENT Match Set [M] Problem instance 1C A PεF num as ts exp Selected 3C A PεF num as ts exp action 5C A PεF num as ts exp Population [P] 6C A PεF num as ts exp Match set REWARD … generation 1C A PεF num as ts exp 2C A PεF num as ts exp Prediction 3C A PεF num as ts exp Array 4C A PεF num as ts exp 5C A PεF num as ts exp 6C A PεF num as ts exp Selected action … Action Set [A] [] Classifier 1C A PεF num as ts exp Parameters Deletion Selection, reproduction, 3C A PεF num as ts exp Update and mutation 5C A PεF num as ts exp (Widrow-Hoff rule) 6C A PεF num as ts exp … Delayed reward [A-1] Genetic Algorithm Competition Fitness Sharing Action Set [A]-1 in the niche 1C A PεF num as ts e p u exp 3C A PεF num as ts exp 5C A PεF num as ts exp 6C A PεF num as ts exp … Slide 14 Artificial Intelligence Machine Learning
    15. Estimate Classifier Prediction Three key p y parameters Prediction: What I will get if I select the action Error: Error on that prediction Does it sound familiar? Q-learning! Fitness: How good is my classifier g y These parameters are estimated on-line Slide 15 Artificial Intelligence Machine Learning
    16. Evolutionary Search GA applied time to time to [A] pp [] Select two parents Cross th C them Mutate them Introduce the two new offspring into the population If the population is full t e popu at o s u remove poo classifiers e o e poor c ass e s Slide 16 Artificial Intelligence Machine Learning
    17. LCS Learning Pressures Parameter updates identifies most accurate classifiers Different pressures caused by the GA: S t pressure t toward generality d lit Set Fitness pressure toward highly fit classifiers Mutation pressure pressuring toward diversification Subsumption pressure toward the deletion of accurate, over-specialized classifiers Slide 17 Artificial Intelligence Machine Learning
    18. Next Class Applications of LCS A li ti f Slide 18 Artificial Intelligence Machine Learning
    19. Introduction to Machine Learning Lecture 23 Learning Classifier Systems Albert Orriols i Puig http://www.albertorriols.net htt // lb t i l t aorriols@salle.url.edu Artificial Intelligence – Machine Learning g g Enginyeria i Arquitectura La Salle Universitat Ramon Llull
    SlideShare Zeitgeist 2009

    + Albert Orriols-PuigAlbert Orriols-Puig Nominate

    custom

    198 views, 0 favs, 1 embeds more stats

    More info about this document

    © All Rights Reserved

    Go to text version

    • Total Views 198
      • 179 on SlideShare
      • 19 from embeds
    • Comments 0
    • Favorites 0
    • Downloads 10
    Most viewed embeds
    • 19 views on http://www.albertorriols.net

    more

    All embeds
    • 19 views on http://www.albertorriols.net

    less

    Flagged as inappropriate Flag as inappropriate
    Flag as inappropriate

    Select your reason for flagging this presentation as inappropriate. If needed, use the feedback form to let us know more details.

    Cancel
    File a copyright complaint
    Having problems? Go to our helpdesk?

    Categories