Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Machine Learning

538 views

Published on

  • Be the first to comment

Machine Learning

  1. 1. Machine Learning Proposed Term Paper Topics Robert Stengel Robotics and Intelligent Systems MAE 345, Princeton University, 2009 MAE 345, Fall 2009 ! Multistep NN with Memory • Markov Decision Processes ! Maze-Navigating Robot – Optimal and near-optimal control ! Robotic Prosthetic Device • Finding Decision Rules in Data ! Optimal Control of an Ambiguous Robot – ID3 algorithm ! Game-Playing NN • Search ! NN for Object Recognition ! Robotic Cloth Folder ! SAGA Simulated Creature ! NN to Optimize Problem Set Solution ! Blob-Tracking NN ! Dust-Collecting Robot that Learns ! NN for Stock Return Prediction Copyright 2009 by Robert Stengel. All rights reserved. For educational use only. http://www.princeton.edu/~stengel/MAE345.html Finding Decision Example of On-Line Rules in Data Code Modification • Identification of key attributes and • Execute a decision tree outcomes – Get wrong answer • Add logic to distinguish between right and wrong • Taxonomies developed by experts cases • First principles of science and – If Comfort Zone = Water, • then Animal = Hippo, mathematics • else Animal = Rhino • Trial and error – True, but Animal is Dinosaur, not Hippo – Ask user for right answer • Probability theory and fuzzy logic – Ask user for a rule that distinguishes between right and wrong answer: If Animal is extinct, … • Simulation and empirical results
  2. 2. Maximizing the Utility Function Markov Decision Process of a Markov Process • Model for decision making under uncertainty " Utility function: J = # ! (t)Ra(t ) [ x(t), x(t + 1)] ! S, A, Pam ( x k , x ') , Ram ( x k , x ') # t =0 " $ ! (t) : discount rate, 0<! (t)<1 where S : finite set of states, x1 , x 2 ,…, x K " A : finite set of actions, a1 , a2 ,…, aM Utility function to go = Value function: V = # ! (t)Ra(t ) [ x(t), x(t + 1)] Pam ( x k , x ') = Pr ! x k ( ti +1 ) = x ' | x k ( ti ) = x k , a ( ti ) = am # t =t current " $ Ram ( x k , x ') = Expected immediate reward for transition from x k to x ' • Optimal control at t $ & " ( & • Optimal decision maximizes expected total reward (or u opt ( t ) = arg max % Ra(t ) [ x(t), x(t + 1)] + ! (t) # Pa(t ) [ x(t), x(t + 1)]V [ x(t + 1)]) minimizes expected total cost) by choosing best set of a & ' t =t current & * actions (or control policy) • Optimized value function – Linear-quadratic-Gaussian (LQG) control " – Dynamic programming -> HJB equation ~> A* search V * ( t ) = Ruopt (t ) [ x * (t)] + ! (t) # Puopt (t ) [ x * (t), x est * (t + 1)]V [ x est * (t + 1)] t =t current – Reinforcement learning ~> Heuristic search Reinforcement (“Q”) Learning Q Learning Control of a Markov Control of a Markov Process Process is Analogous to LQG • Q: quality of a state-action function Control in the LTI Case • Heuristic value function • One-step philosophy for heuristic optimization $ { Q [ x(t + 1), u(t + 1)] = Q [ x(t), u(t)] + ! (t) # Ru(t ) [ x(t)] + " (t)max Q [ x(t + 1), u ]% ' Q [ x(t), u(t)] u & } ! (t) : learning rate, 0<! (t)<1 $ { u & } Q [ x(t + 1), u(t + 1)] = Q [ x(t), u(t)] + ! (t) # Ru(t ) [ x(t)] + " (t)max Q [ x(t + 1), u ]% ' Q [ x(t), u(t)] Controller ! (t) : learning rate, 0<! (t)<1 x k +1 = !x k + "C ( x k # x k *) ˆ • Various algorithms for computing best control value Estimator ubest ( t ) = arg max Q [ x(t + 1), u ] u x k = !x k "1 " #C ( x k "1 " x k "1 *) + K ( z k " H x x k "1 ) ˆ ˆ ˆ ˆ Q-Learning Snail Q-Learning, Ball on Plate
  3. 3. LQG Control Optimizes Discrete- Structuring an Efficient Time LTI Markov Process Decision Tree (Off-Line) • Choose most important attributes first • Recognize when no result can be deduced • Exclude irrelevant factors ! S, A, Pam ( x k , x ') , Ram ( x k , x ') # " $ where • Iterative Dichotomizer*: the ID3 Algorithm S : infinite set of states, x1 , x 2 ,…, x K – Build an efficient decision tree from a fixed A : infinite set of actions, a1 , a2 ,…, aM set of examples (supervised learning) Pam ( x k , x ') = Pr ! x k ( ti +1 ) = x ' | x k ( ti ) = x k , a ( ti ) = am # " $ Ram ( x k , x ') = Expected immediate reward for transition from x k to x ' *Dichotomy: Division into two (usually contradictory) parts or opinions Fuzzy Ball-Game Training Set Parameters of the ID3 Algorithm Attributes Decisions Case # Forecast Temperature Humidity Wind Play Ball? 1 Sunny Hot High Weak No 2 Sunny Hot High Strong No 3 Overcast Hot High Weak Yes 4 Rain Mild High Weak Yes 5 Rain Cool Low Weak Yes 6 Rain Cool Low Strong No 7 Overcast Cool Low Strong Yes 8 Sunny Mild High Weak No 9 Sunny Cool Low Weak Yes • Decisions, e.g., Play ball or 10 Rain Mild Low Weak Yes 11 Sunny Mild Low Strong Yes don!t play ball 12 Overcast Mild High Strong Yes 13 Overcast Hot Low Weak Yes – D = Number of possible decisions 14 Rain Mild High Strong No • Decision: Yes, no
  4. 4. Parameters of Parameters of the ID3 Algorithm the ID3 Algorithm • Attributes, e.g., Temperature, humidity, • Training trials, e.g., all the wind, weather forecast – M = Number of attributes to be considered in games played last month making a decision – N = Number of training trials – Im = Number of values that the ith attribute can take – n(i) = Number of examples with • Temperature: Hot, mild, cool ith attribute • Humidity: High, low • Wind: Strong, weak • Forecast: Sunny, overcast, rain Example: Probability Spaces for Example: Decision, given Three Attributes Values of Three Attributes • Probability of an attribute value represented by area in diagram Attribute #1 Attribute #2 Attribute #3 Attribute #1 Attribute #2 Attribute #3 2 possible values 6 possible values 4 possible values 2 possible values 6 possible values 4 possible values
  5. 5. Accurate Detection of Events Depends Accurate Detection of Events Depends on Their Probability of Occurence on Their Probability of Occurence ! noise = 0.1 ! noise = 0.2 ! noise = 0.4 Entropy Measures Information Entropy of Two Events with Various Content of a Signal Frequencies of Occurrence • Pr(i) log2Pr(i) represents the channel capacity (i.e., average number of bits) required to portray • S = Entropy of a signal encoding I distinct events the ith event I • Frequencies of occurrence estimate S = ! " Pr(i) log 2 Pr(i) 0 " Pr(.) " 1 log2 Pr(.) " 0 probabilities of each event (#1 and #2) i =1 – Pr(#1) = n(#1)/N log2 Pr(#1 or #2) " 0 – Pr(#2) = n(#2)/N = 1 – n(#1)/N • i = Index identifying an event encoded by a signal • Pr(i) = Probability of ith event S = S# 1 + S# 2 • log2Pr(i) = Number of bits required to = ! Pr(#1) log 2 Pr(#1) ! Pr(# 2) log 2 Pr(# 2) characterize the probability that the ith event occurs
  6. 6. Best Decision is Related to Entropy Entropy of Two Events with Various and the Probability of Occurrence Frequencies of Occurrence • High entropy Entropies for 128 Trials – Signal provides high coding I S = !" Pr(i) log 2 Pr(i) Pr(#1) - # of Bits(#1) Pr(#2) - # of Bits(#2) Entropy precision of distinct events n n/N log2(n/N) 1 - n/N log2(1 - n/N) S 1 0.008 -7 0.992 -0.011 0.066 – Differences coded with few bits 2 0.016 -6 0.984 -0.023 0.116 i=1 4 0.031 -5 0.969 -0.046 0.201 • Low entropy 8 0.063 -4 0.938 -0.093 0.337 16 0.125 -3 0.875 -0.193 0.544 – Lack of distinction between 32 0.25 -2 0.75 -0.415 0.811 signal values 64 0.50 -1 0.50 -1 1 96 0.75 -0.415 0.25 -2 0.811 – Detecting differences requires 112 0.875 -0.193 0.125 -3 0.544 120 0.938 -0.093 0.063 -4 0.337 many bits 124 0.969 -0.046 0.031 -5 0.201 126 0.984 -0.023 0.016 -6 0.116 • Best classification of events 127 0.992 -0.011 0.008 -7 0.066 when S = 1... – but that may not be achievable Case # Forecast Temperature Humidity Wind Play Ball? 1 2 3 4 Sunny Sunny Overcast Rain Hot Hot Hot Mild High High High High Weak Strong Weak Weak No No Yes Yes Decision-Making Decision Tree Produced by 5 Rain Cool Low Weak Yes 6 7 Rain Overcast Cool Cool Low Low Strong Strong No Yes ID3 Algorithm Parameters for ID3 8 Sunny Mild High Weak No 9 Sunny Cool Low Weak Yes 10 Rain Mild Low Weak Yes 11 Sunny Mild Low Strong Yes 12 13 Overcast Overcast Mild Hot High Low Strong Weak Yes Yes • Root Attribute gains, Gi 14 Rain Mild High Strong No – Forecast: 0.246 – Temperature: 0.029 • SD = Entropy of all possible decisions – – Humidity: 0.151 Wind: 0.048 D SD = !" Pr(d) log 2 Pr(d) d =1 • Gi = Information gain of ith attribute Im D Gi = SD + ! Pr(i) ! Pr(id ) log 2 Pr(id ) i=1 d =1 • Pr(id) = n(id)/ N(d) = Probability that ith • Temperature is inconsequential and attribute correlates with dth decision is not included in the decision tree
  7. 7. Decision Tree Produced by Search ID3 Algorithm • Typical AI textbook problems – Prove a theorem • Sunny Branch – Solve a puzzle (e.g., Tower of Attribute gains, Gi Hanoi) – Temperature: 0.57 – Find a sequence of moves that – Humidity: 0.97 wins a game (e.g., chess) – Wind: 0.019 – Find the shortest path connecting a set of points (e.g., Traveling salesman problem) – Find a sequence of symbolic transformations that solve a calculus problem (e.g., Mathematica) • The common thread: search – Structures for search – Strategies for search Curse of Structures for Search Dimensionality • Feasible search paths may • Trees grow without bound – Possible combinatorial – Single path between root and any node explosion – Checkers: 5 x 1020 possible – Path between adjacent nodes = arc moves – Root node – Chess: 10120 moves – Protein folding: ? • no precursors • Limiting search complexity – Leaf node – Redefine search space – Employ heuristic (i.e., pragmatic) • no successors rules • possible terminator – Establish restricted search range – Invoke decision models that have worked in the past
  8. 8. Structures for Search Directions of Search • Forward chaining • Graphs –Reason from premises to actions –Multiple paths –Data-driven: draw conclusions between root from facts and some • Backward chaining nodes –Reason from actions to premises –Trees are subsets of –Goal-driven: find facts that graphs support hypotheses Strategies for Search Blind Search • Search forward from opening? • Node expansion • Search backward from end game? – Find all successors to that node • Realistic assessment • Both? • Depth-first forward search – Not necessary to consider all 10120 possible moves – Expand nodes descended from most recently to play good chess expanded node – Playing excellent chess may require much forward – Consider other paths only after reaching a node and backward chaining, but not 10120 evaluations with no successors – Most applications are more procedural • Breadth-first forward search • Search categories – Expand nodes in order of proximity to the start node – Blind search – Consider all sequences of arc number n (from root – Heuristic search node) before considering any of number (n + 1) – Probabilistic search – Exhaustive, but guaranteed to find the shortest path – Optimization to a terminator
  9. 9. AND/OR Graph Search Blind Search • Bidirectional search – Search forward from root node and backward from one or more leaf nodes – Terminate when search nodes coincide • A node is “solved” if • Minimal-cost forward search – It is a leaf node with a satisfactory goal – Each arc is assigned a cost state – Expand nodes in order of minimum cost – It has solved AND nodes as successors – It has OR nodes as successors, at least one of which is solved. • Goal: Solve the root node Heuristic Search Heuristic Optimal Search • For large problems, blind search typically leads to combinatorial explosion • Employ heuristic knowledge about the quality of possible paths – Decide which node to expand next – Discard (or prune) nodes that are unlikely to be fruitful • Search for feasible (approximately optimal) rather than optimal solutions • Ordered or best-first search – Always expand “most promising” node
  10. 10. Mechanical Control System Heuristic Dynamic Programming: A* Search k kf Jk f = ! Ji + ˆ ! J (arc ) ˆ i i i=1 i= k +1 • Each arc bears an incremental cost • Cost, J, estimated at kth instant = – Cost accrued to k – Remaining cost to reach final point, kf • Goal: minimize estimated cost by choice of remaining arcs • Choose arck+1, arck+2 accordingly • Use heuristics to estimate remaining cost Inferential Fault Analyzer for Helicopter Control System Local Failure Analysis • Local failure analysis • Frames store facts and facilitate search and inference – Set of hypothetical models of specific failure – Components and up-/downstream linkages of control system • Global failure analysis – Failure model parameters – Forward reasoning assesses failure impact – Rule base for failure analysis (LISP) – Backward reasoning deduces possible causes Aft Rotor Forward Rotor Cockpit Controls
  11. 11. Heuristic Search Global Failure Analysis • Global failure analysis – Determination based on aggregate of local models • Heuristic score based on – Criticality of failure – Reliability of component – Extensiveness of failure – Implicated devices – Level of backtracking – Severity of failure – Net probability of failure model Shortest Path Problems • Find the shortest (or • Simulated annealing solution Next Time: least costly) path that • Genetic algorithm solution visits all selected cities just once • Neural network solution Knowledge – Traveling Saleman – MapQuest/GPS/GIS Representation Modified Dijkstra Algorithm

×