Successfully reported this slideshow.
Upcoming SlideShare
×

# Machine Learning

538 views

Published on

• Full Name
Comment goes here.

Are you sure you want to Yes No
• Be the first to comment

### Machine Learning

1. 1. Machine Learning Proposed Term Paper Topics Robert Stengel Robotics and Intelligent Systems MAE 345, Princeton University, 2009 MAE 345, Fall 2009 ! Multistep NN with Memory • Markov Decision Processes ! Maze-Navigating Robot – Optimal and near-optimal control ! Robotic Prosthetic Device • Finding Decision Rules in Data ! Optimal Control of an Ambiguous Robot – ID3 algorithm ! Game-Playing NN • Search ! NN for Object Recognition ! Robotic Cloth Folder ! SAGA Simulated Creature ! NN to Optimize Problem Set Solution ! Blob-Tracking NN ! Dust-Collecting Robot that Learns ! NN for Stock Return Prediction Copyright 2009 by Robert Stengel. All rights reserved. For educational use only. http://www.princeton.edu/~stengel/MAE345.html Finding Decision Example of On-Line Rules in Data Code Modiﬁcation • Identiﬁcation of key attributes and • Execute a decision tree outcomes – Get wrong answer • Add logic to distinguish between right and wrong • Taxonomies developed by experts cases • First principles of science and – If Comfort Zone = Water, • then Animal = Hippo, mathematics • else Animal = Rhino • Trial and error – True, but Animal is Dinosaur, not Hippo – Ask user for right answer • Probability theory and fuzzy logic – Ask user for a rule that distinguishes between right and wrong answer: If Animal is extinct, … • Simulation and empirical results
2. 2. Maximizing the Utility Function Markov Decision Process of a Markov Process • Model for decision making under uncertainty " Utility function: J = # ! (t)Ra(t ) [ x(t), x(t + 1)] ! S, A, Pam ( x k , x ') , Ram ( x k , x ') # t =0 " \$ ! (t) : discount rate, 0<! (t)<1 where S : finite set of states, x1 , x 2 ,…, x K " A : finite set of actions, a1 , a2 ,…, aM Utility function to go = Value function: V = # ! (t)Ra(t ) [ x(t), x(t + 1)] Pam ( x k , x ') = Pr ! x k ( ti +1 ) = x ' | x k ( ti ) = x k , a ( ti ) = am # t =t current " \$ Ram ( x k , x ') = Expected immediate reward for transition from x k to x ' • Optimal control at t \$ & " ( & • Optimal decision maximizes expected total reward (or u opt ( t ) = arg max % Ra(t ) [ x(t), x(t + 1)] + ! (t) # Pa(t ) [ x(t), x(t + 1)]V [ x(t + 1)]) minimizes expected total cost) by choosing best set of a & ' t =t current & * actions (or control policy) • Optimized value function – Linear-quadratic-Gaussian (LQG) control " – Dynamic programming -> HJB equation ~> A* search V * ( t ) = Ruopt (t ) [ x * (t)] + ! (t) # Puopt (t ) [ x * (t), x est * (t + 1)]V [ x est * (t + 1)] t =t current – Reinforcement learning ~> Heuristic search Reinforcement (“Q”) Learning Q Learning Control of a Markov Control of a Markov Process Process is Analogous to LQG • Q: quality of a state-action function Control in the LTI Case • Heuristic value function • One-step philosophy for heuristic optimization \$ { Q [ x(t + 1), u(t + 1)] = Q [ x(t), u(t)] + ! (t) # Ru(t ) [ x(t)] + " (t)max Q [ x(t + 1), u ]% ' Q [ x(t), u(t)] u & } ! (t) : learning rate, 0<! (t)<1 \$ { u & } Q [ x(t + 1), u(t + 1)] = Q [ x(t), u(t)] + ! (t) # Ru(t ) [ x(t)] + " (t)max Q [ x(t + 1), u ]% ' Q [ x(t), u(t)] Controller ! (t) : learning rate, 0<! (t)<1 x k +1 = !x k + "C ( x k # x k *) ˆ • Various algorithms for computing best control value Estimator ubest ( t ) = arg max Q [ x(t + 1), u ] u x k = !x k "1 " #C ( x k "1 " x k "1 *) + K ( z k " H x x k "1 ) ˆ ˆ ˆ ˆ Q-Learning Snail Q-Learning, Ball on Plate
3. 3. LQG Control Optimizes Discrete- Structuring an Efﬁcient Time LTI Markov Process Decision Tree (Off-Line) • Choose most important attributes ﬁrst • Recognize when no result can be deduced • Exclude irrelevant factors ! S, A, Pam ( x k , x ') , Ram ( x k , x ') # " \$ where • Iterative Dichotomizer*: the ID3 Algorithm S : infinite set of states, x1 , x 2 ,…, x K – Build an efﬁcient decision tree from a ﬁxed A : infinite set of actions, a1 , a2 ,…, aM set of examples (supervised learning) Pam ( x k , x ') = Pr ! x k ( ti +1 ) = x ' | x k ( ti ) = x k , a ( ti ) = am # " \$ Ram ( x k , x ') = Expected immediate reward for transition from x k to x ' *Dichotomy: Division into two (usually contradictory) parts or opinions Fuzzy Ball-Game Training Set Parameters of the ID3 Algorithm Attributes Decisions Case # Forecast Temperature Humidity Wind Play Ball? 1 Sunny Hot High Weak No 2 Sunny Hot High Strong No 3 Overcast Hot High Weak Yes 4 Rain Mild High Weak Yes 5 Rain Cool Low Weak Yes 6 Rain Cool Low Strong No 7 Overcast Cool Low Strong Yes 8 Sunny Mild High Weak No 9 Sunny Cool Low Weak Yes • Decisions, e.g., Play ball or 10 Rain Mild Low Weak Yes 11 Sunny Mild Low Strong Yes don!t play ball 12 Overcast Mild High Strong Yes 13 Overcast Hot Low Weak Yes – D = Number of possible decisions 14 Rain Mild High Strong No • Decision: Yes, no
4. 4. Parameters of Parameters of the ID3 Algorithm the ID3 Algorithm • Attributes, e.g., Temperature, humidity, • Training trials, e.g., all the wind, weather forecast – M = Number of attributes to be considered in games played last month making a decision – N = Number of training trials – Im = Number of values that the ith attribute can take – n(i) = Number of examples with • Temperature: Hot, mild, cool ith attribute • Humidity: High, low • Wind: Strong, weak • Forecast: Sunny, overcast, rain Example: Probability Spaces for Example: Decision, given Three Attributes Values of Three Attributes • Probability of an attribute value represented by area in diagram Attribute #1 Attribute #2 Attribute #3 Attribute #1 Attribute #2 Attribute #3 2 possible values 6 possible values 4 possible values 2 possible values 6 possible values 4 possible values
5. 5. Accurate Detection of Events Depends Accurate Detection of Events Depends on Their Probability of Occurence on Their Probability of Occurence ! noise = 0.1 ! noise = 0.2 ! noise = 0.4 Entropy Measures Information Entropy of Two Events with Various Content of a Signal Frequencies of Occurrence • Pr(i) log2Pr(i) represents the channel capacity (i.e., average number of bits) required to portray • S = Entropy of a signal encoding I distinct events the ith event I • Frequencies of occurrence estimate S = ! " Pr(i) log 2 Pr(i) 0 " Pr(.) " 1 log2 Pr(.) " 0 probabilities of each event (#1 and #2) i =1 – Pr(#1) = n(#1)/N log2 Pr(#1 or #2) " 0 – Pr(#2) = n(#2)/N = 1 – n(#1)/N • i = Index identifying an event encoded by a signal • Pr(i) = Probability of ith event S = S# 1 + S# 2 • log2Pr(i) = Number of bits required to = ! Pr(#1) log 2 Pr(#1) ! Pr(# 2) log 2 Pr(# 2) characterize the probability that the ith event occurs
6. 6. Best Decision is Related to Entropy Entropy of Two Events with Various and the Probability of Occurrence Frequencies of Occurrence • High entropy Entropies for 128 Trials – Signal provides high coding I S = !" Pr(i) log 2 Pr(i) Pr(#1) - # of Bits(#1) Pr(#2) - # of Bits(#2) Entropy precision of distinct events n n/N log2(n/N) 1 - n/N log2(1 - n/N) S 1 0.008 -7 0.992 -0.011 0.066 – Differences coded with few bits 2 0.016 -6 0.984 -0.023 0.116 i=1 4 0.031 -5 0.969 -0.046 0.201 • Low entropy 8 0.063 -4 0.938 -0.093 0.337 16 0.125 -3 0.875 -0.193 0.544 – Lack of distinction between 32 0.25 -2 0.75 -0.415 0.811 signal values 64 0.50 -1 0.50 -1 1 96 0.75 -0.415 0.25 -2 0.811 – Detecting differences requires 112 0.875 -0.193 0.125 -3 0.544 120 0.938 -0.093 0.063 -4 0.337 many bits 124 0.969 -0.046 0.031 -5 0.201 126 0.984 -0.023 0.016 -6 0.116 • Best classiﬁcation of events 127 0.992 -0.011 0.008 -7 0.066 when S = 1... – but that may not be achievable Case # Forecast Temperature Humidity Wind Play Ball? 1 2 3 4 Sunny Sunny Overcast Rain Hot Hot Hot Mild High High High High Weak Strong Weak Weak No No Yes Yes Decision-Making Decision Tree Produced by 5 Rain Cool Low Weak Yes 6 7 Rain Overcast Cool Cool Low Low Strong Strong No Yes ID3 Algorithm Parameters for ID3 8 Sunny Mild High Weak No 9 Sunny Cool Low Weak Yes 10 Rain Mild Low Weak Yes 11 Sunny Mild Low Strong Yes 12 13 Overcast Overcast Mild Hot High Low Strong Weak Yes Yes • Root Attribute gains, Gi 14 Rain Mild High Strong No – Forecast: 0.246 – Temperature: 0.029 • SD = Entropy of all possible decisions – – Humidity: 0.151 Wind: 0.048 D SD = !" Pr(d) log 2 Pr(d) d =1 • Gi = Information gain of ith attribute Im D Gi = SD + ! Pr(i) ! Pr(id ) log 2 Pr(id ) i=1 d =1 • Pr(id) = n(id)/ N(d) = Probability that ith • Temperature is inconsequential and attribute correlates with dth decision is not included in the decision tree
7. 7. Decision Tree Produced by Search ID3 Algorithm • Typical AI textbook problems – Prove a theorem • Sunny Branch – Solve a puzzle (e.g., Tower of Attribute gains, Gi Hanoi) – Temperature: 0.57 – Find a sequence of moves that – Humidity: 0.97 wins a game (e.g., chess) – Wind: 0.019 – Find the shortest path connecting a set of points (e.g., Traveling salesman problem) – Find a sequence of symbolic transformations that solve a calculus problem (e.g., Mathematica) • The common thread: search – Structures for search – Strategies for search Curse of Structures for Search Dimensionality • Feasible search paths may • Trees grow without bound – Possible combinatorial – Single path between root and any node explosion – Checkers: 5 x 1020 possible – Path between adjacent nodes = arc moves – Root node – Chess: 10120 moves – Protein folding: ? • no precursors • Limiting search complexity – Leaf node – Redeﬁne search space – Employ heuristic (i.e., pragmatic) • no successors rules • possible terminator – Establish restricted search range – Invoke decision models that have worked in the past
8. 8. Structures for Search Directions of Search • Forward chaining • Graphs –Reason from premises to actions –Multiple paths –Data-driven: draw conclusions between root from facts and some • Backward chaining nodes –Reason from actions to premises –Trees are subsets of –Goal-driven: ﬁnd facts that graphs support hypotheses Strategies for Search Blind Search • Search forward from opening? • Node expansion • Search backward from end game? – Find all successors to that node • Realistic assessment • Both? • Depth-ﬁrst forward search – Not necessary to consider all 10120 possible moves – Expand nodes descended from most recently to play good chess expanded node – Playing excellent chess may require much forward – Consider other paths only after reaching a node and backward chaining, but not 10120 evaluations with no successors – Most applications are more procedural • Breadth-ﬁrst forward search • Search categories – Expand nodes in order of proximity to the start node – Blind search – Consider all sequences of arc number n (from root – Heuristic search node) before considering any of number (n + 1) – Probabilistic search – Exhaustive, but guaranteed to ﬁnd the shortest path – Optimization to a terminator
9. 9. AND/OR Graph Search Blind Search • Bidirectional search – Search forward from root node and backward from one or more leaf nodes – Terminate when search nodes coincide • A node is “solved” if • Minimal-cost forward search – It is a leaf node with a satisfactory goal – Each arc is assigned a cost state – Expand nodes in order of minimum cost – It has solved AND nodes as successors – It has OR nodes as successors, at least one of which is solved. • Goal: Solve the root node Heuristic Search Heuristic Optimal Search • For large problems, blind search typically leads to combinatorial explosion • Employ heuristic knowledge about the quality of possible paths – Decide which node to expand next – Discard (or prune) nodes that are unlikely to be fruitful • Search for feasible (approximately optimal) rather than optimal solutions • Ordered or best-ﬁrst search – Always expand “most promising” node
10. 10. Mechanical Control System Heuristic Dynamic Programming: A* Search k kf Jk f = ! Ji + ˆ ! J (arc ) ˆ i i i=1 i= k +1 • Each arc bears an incremental cost • Cost, J, estimated at kth instant = – Cost accrued to k – Remaining cost to reach ﬁnal point, kf • Goal: minimize estimated cost by choice of remaining arcs • Choose arck+1, arck+2 accordingly • Use heuristics to estimate remaining cost Inferential Fault Analyzer for Helicopter Control System Local Failure Analysis • Local failure analysis • Frames store facts and facilitate search and inference – Set of hypothetical models of speciﬁc failure – Components and up-/downstream linkages of control system • Global failure analysis – Failure model parameters – Forward reasoning assesses failure impact – Rule base for failure analysis (LISP) – Backward reasoning deduces possible causes Aft Rotor Forward Rotor Cockpit Controls
11. 11. Heuristic Search Global Failure Analysis • Global failure analysis – Determination based on aggregate of local models • Heuristic score based on – Criticality of failure – Reliability of component – Extensiveness of failure – Implicated devices – Level of backtracking – Severity of failure – Net probability of failure model Shortest Path Problems • Find the shortest (or • Simulated annealing solution Next Time: least costly) path that • Genetic algorithm solution visits all selected cities just once • Neural network solution Knowledge – Traveling Saleman – MapQuest/GPS/GIS Representation Modiﬁed Dijkstra Algorithm