1. Machine Learning Proposed Term Paper Topics
Robert Stengel
Robotics and Intelligent Systems MAE 345,
Princeton University, 2009
MAE 345, Fall 2009
! Multistep NN with Memory
• Markov Decision Processes ! Maze-Navigating Robot
– Optimal and near-optimal control ! Robotic Prosthetic Device
• Finding Decision Rules in Data ! Optimal Control of an Ambiguous Robot
– ID3 algorithm ! Game-Playing NN
• Search ! NN for Object Recognition
! Robotic Cloth Folder
! SAGA Simulated Creature
! NN to Optimize Problem Set Solution
! Blob-Tracking NN
! Dust-Collecting Robot that Learns
! NN for Stock Return Prediction
Copyright 2009 by Robert Stengel. All rights reserved. For educational use only.
http://www.princeton.edu/~stengel/MAE345.html
Finding Decision Example of On-Line
Rules in Data Code Modification
• Identification of key attributes and • Execute a decision tree
outcomes – Get wrong answer
• Add logic to distinguish between right and wrong
• Taxonomies developed by experts cases
• First principles of science and – If Comfort Zone = Water,
• then Animal = Hippo,
mathematics • else Animal = Rhino
• Trial and error – True, but Animal is Dinosaur, not Hippo
– Ask user for right answer
• Probability theory and fuzzy logic – Ask user for a rule that distinguishes between right and
wrong answer: If Animal is extinct, …
• Simulation and empirical results
2. Maximizing the Utility Function
Markov Decision Process of a Markov Process
• Model for decision making under uncertainty "
Utility function: J = # ! (t)Ra(t ) [ x(t), x(t + 1)]
! S, A, Pam ( x k , x ') , Ram ( x k , x ') #
t =0
" $ ! (t) : discount rate, 0<! (t)<1
where
S : finite set of states, x1 , x 2 ,…, x K "
A : finite set of actions, a1 , a2 ,…, aM Utility function to go = Value function: V = # ! (t)Ra(t ) [ x(t), x(t + 1)]
Pam ( x k , x ') = Pr ! x k ( ti +1 ) = x ' | x k ( ti ) = x k , a ( ti ) = am #
t =t current
" $
Ram ( x k , x ') = Expected immediate reward for transition from x k to x ' • Optimal control at t
$
& " (
&
• Optimal decision maximizes expected total reward (or u opt ( t ) = arg max % Ra(t ) [ x(t), x(t + 1)] + ! (t) # Pa(t ) [ x(t), x(t + 1)]V [ x(t + 1)])
minimizes expected total cost) by choosing best set of a &
' t =t current &
*
actions (or control policy) • Optimized value function
– Linear-quadratic-Gaussian (LQG) control "
– Dynamic programming -> HJB equation ~> A* search V * ( t ) = Ruopt (t ) [ x * (t)] + ! (t) # Puopt (t ) [ x * (t), x est * (t + 1)]V [ x est * (t + 1)]
t =t current
– Reinforcement learning ~> Heuristic search
Reinforcement (“Q”) Learning Q Learning Control of a Markov
Control of a Markov Process Process is Analogous to LQG
• Q: quality of a state-action function Control in the LTI Case
• Heuristic value function
• One-step philosophy for heuristic optimization $ {
Q [ x(t + 1), u(t + 1)] = Q [ x(t), u(t)] + ! (t) # Ru(t ) [ x(t)] + " (t)max Q [ x(t + 1), u ]% ' Q [ x(t), u(t)]
u & }
! (t) : learning rate, 0<! (t)<1
$ { u & }
Q [ x(t + 1), u(t + 1)] = Q [ x(t), u(t)] + ! (t) # Ru(t ) [ x(t)] + " (t)max Q [ x(t + 1), u ]% ' Q [ x(t), u(t)]
Controller
! (t) : learning rate, 0<! (t)<1
x k +1 = !x k + "C ( x k # x k *)
ˆ
• Various algorithms for computing best control value
Estimator
ubest ( t ) = arg max Q [ x(t + 1), u ]
u
x k = !x k "1 " #C ( x k "1 " x k "1 *) + K ( z k " H x x k "1 )
ˆ ˆ ˆ ˆ
Q-Learning Snail Q-Learning, Ball on Plate
3. LQG Control Optimizes Discrete- Structuring an Efficient
Time LTI Markov Process Decision Tree (Off-Line)
• Choose most important attributes first
• Recognize when no result can be
deduced
• Exclude irrelevant factors
! S, A, Pam ( x k , x ') , Ram ( x k , x ') #
" $
where
• Iterative Dichotomizer*: the ID3 Algorithm
S : infinite set of states, x1 , x 2 ,…, x K – Build an efficient decision tree from a fixed
A : infinite set of actions, a1 , a2 ,…, aM set of examples (supervised learning)
Pam ( x k , x ') = Pr ! x k ( ti +1 ) = x ' | x k ( ti ) = x k , a ( ti ) = am #
" $
Ram ( x k , x ') = Expected immediate reward for transition from x k to x ' *Dichotomy: Division into two (usually contradictory)
parts or opinions
Fuzzy Ball-Game Training Set Parameters of the ID3 Algorithm
Attributes Decisions
Case # Forecast Temperature Humidity Wind Play Ball?
1 Sunny Hot High Weak No
2 Sunny Hot High Strong No
3 Overcast Hot High Weak Yes
4 Rain Mild High Weak Yes
5 Rain Cool Low Weak Yes
6 Rain Cool Low Strong No
7 Overcast Cool Low Strong Yes
8 Sunny Mild High Weak No
9 Sunny Cool Low Weak Yes • Decisions, e.g., Play ball or
10 Rain Mild Low Weak Yes
11 Sunny Mild Low Strong Yes don!t play ball
12 Overcast Mild High Strong Yes
13 Overcast Hot Low Weak Yes – D = Number of possible decisions
14 Rain Mild High Strong No • Decision: Yes, no
4. Parameters of Parameters of
the ID3 Algorithm the ID3 Algorithm
• Attributes, e.g., Temperature, humidity, • Training trials, e.g., all the
wind, weather forecast
– M = Number of attributes to be considered in games played last month
making a decision – N = Number of training trials
– Im = Number of values that the ith attribute can
take – n(i) = Number of examples with
• Temperature: Hot, mild, cool ith attribute
• Humidity: High, low
• Wind: Strong, weak
• Forecast: Sunny, overcast, rain
Example: Probability Spaces for Example: Decision, given
Three Attributes Values of Three Attributes
• Probability of an attribute value
represented by area in diagram
Attribute #1 Attribute #2 Attribute #3 Attribute #1 Attribute #2 Attribute #3
2 possible values 6 possible values 4 possible values 2 possible values 6 possible values 4 possible values
5. Accurate Detection of Events Depends
Accurate Detection of Events Depends
on Their Probability of Occurence
on Their Probability of Occurence
! noise = 0.1
! noise = 0.2
! noise = 0.4
Entropy Measures Information Entropy of Two Events with Various
Content of a Signal Frequencies of Occurrence
• Pr(i) log2Pr(i) represents the channel capacity
(i.e., average number of bits) required to portray
• S = Entropy of a signal encoding I distinct events the ith event
I • Frequencies of occurrence estimate
S = ! " Pr(i) log 2 Pr(i) 0 " Pr(.) " 1
log2 Pr(.) " 0 probabilities of each event (#1 and #2)
i =1 – Pr(#1) = n(#1)/N
log2 Pr(#1 or #2) " 0
– Pr(#2) = n(#2)/N = 1 – n(#1)/N
• i = Index identifying an event encoded by
a signal
• Pr(i) = Probability of ith event
S = S# 1 + S# 2
• log2Pr(i) = Number of bits required to = ! Pr(#1) log 2 Pr(#1) ! Pr(# 2) log 2 Pr(# 2)
characterize the probability that the ith
event occurs
6. Best Decision is Related to Entropy
Entropy of Two Events with Various and the Probability of Occurrence
Frequencies of Occurrence • High entropy
Entropies for 128 Trials – Signal provides high coding I
S = !" Pr(i) log 2 Pr(i)
Pr(#1) - # of Bits(#1) Pr(#2) - # of Bits(#2) Entropy
precision of distinct events
n n/N log2(n/N) 1 - n/N log2(1 - n/N) S
1 0.008 -7 0.992 -0.011 0.066 – Differences coded with few bits
2 0.016 -6 0.984 -0.023 0.116 i=1
4 0.031 -5 0.969 -0.046 0.201 • Low entropy
8 0.063 -4 0.938 -0.093 0.337
16 0.125 -3 0.875 -0.193 0.544 – Lack of distinction between
32 0.25 -2 0.75 -0.415 0.811 signal values
64 0.50 -1 0.50 -1 1
96 0.75 -0.415 0.25 -2 0.811 – Detecting differences requires
112 0.875 -0.193 0.125 -3 0.544
120 0.938 -0.093 0.063 -4 0.337 many bits
124 0.969 -0.046 0.031 -5 0.201
126 0.984 -0.023 0.016 -6 0.116 • Best classification of events
127 0.992 -0.011 0.008 -7 0.066 when S = 1...
– but that may not be achievable
Case # Forecast Temperature Humidity Wind Play Ball?
1
2
3
4
Sunny
Sunny
Overcast
Rain
Hot
Hot
Hot
Mild
High
High
High
High
Weak
Strong
Weak
Weak
No
No
Yes
Yes
Decision-Making Decision Tree Produced by
5 Rain Cool Low Weak Yes
6
7
Rain
Overcast
Cool
Cool
Low
Low
Strong
Strong
No
Yes
ID3 Algorithm
Parameters for ID3
8 Sunny Mild High Weak No
9 Sunny Cool Low Weak Yes
10 Rain Mild Low Weak Yes
11 Sunny Mild Low Strong Yes
12
13
Overcast
Overcast
Mild
Hot
High
Low
Strong
Weak
Yes
Yes
• Root Attribute gains, Gi
14 Rain Mild High Strong No
– Forecast: 0.246
– Temperature: 0.029
• SD = Entropy of all possible decisions –
–
Humidity: 0.151
Wind: 0.048
D
SD = !" Pr(d) log 2 Pr(d)
d =1
• Gi = Information gain of ith attribute
Im D
Gi = SD + ! Pr(i) ! Pr(id ) log 2 Pr(id )
i=1 d =1
• Pr(id) = n(id)/ N(d) = Probability that ith • Temperature is inconsequential and
attribute correlates with dth decision is not included in the decision tree
7. Decision Tree Produced by Search
ID3 Algorithm
• Typical AI textbook problems
– Prove a theorem
• Sunny Branch
– Solve a puzzle (e.g., Tower of
Attribute gains, Gi Hanoi)
– Temperature: 0.57 – Find a sequence of moves that
– Humidity: 0.97 wins a game (e.g., chess)
– Wind: 0.019 – Find the shortest path
connecting a set of points (e.g.,
Traveling salesman problem)
– Find a sequence of symbolic
transformations that solve a
calculus problem (e.g.,
Mathematica)
• The common thread: search
– Structures for search
– Strategies for search
Curse of Structures for Search
Dimensionality
• Feasible search paths may • Trees
grow without bound
– Possible combinatorial – Single path between root and any node
explosion
– Checkers: 5 x 1020 possible
– Path between adjacent nodes = arc
moves – Root node
– Chess: 10120 moves
– Protein folding: ? • no precursors
• Limiting search complexity – Leaf node
– Redefine search space
– Employ heuristic (i.e., pragmatic)
• no successors
rules • possible terminator
– Establish restricted search range
– Invoke decision models that
have worked in the past
8. Structures for Search Directions of Search
• Forward chaining
• Graphs –Reason from premises to actions
–Multiple paths –Data-driven: draw conclusions
between root from facts
and some • Backward chaining
nodes
–Reason from actions to premises
–Trees are
subsets of –Goal-driven: find facts that
graphs support hypotheses
Strategies for Search Blind Search
• Search forward from opening?
• Node expansion
• Search backward from end game? – Find all successors to that node
• Realistic assessment • Both?
• Depth-first forward search
– Not necessary to consider all 10120 possible moves – Expand nodes descended from most recently
to play good chess expanded node
– Playing excellent chess may require much forward – Consider other paths only after reaching a node
and backward chaining, but not 10120 evaluations with no successors
– Most applications are more procedural
• Breadth-first forward search
• Search categories – Expand nodes in order of proximity to the start node
– Blind search – Consider all sequences of arc number n (from root
– Heuristic search node) before considering any of number (n + 1)
– Probabilistic search – Exhaustive, but guaranteed to find the shortest path
– Optimization to a terminator
9. AND/OR Graph Search
Blind Search
• Bidirectional search
– Search forward from root node and
backward from one or more leaf nodes
– Terminate when search nodes coincide • A node is “solved” if
• Minimal-cost forward search – It is a leaf node with a satisfactory goal
– Each arc is assigned a cost state
– Expand nodes in order of minimum cost – It has solved AND nodes as successors
– It has OR nodes as successors, at least
one of which is solved.
• Goal: Solve the root node
Heuristic Search Heuristic Optimal Search
• For large problems, blind search typically
leads to combinatorial explosion
• Employ heuristic knowledge about the
quality of possible paths
– Decide which node to expand next
– Discard (or prune) nodes that are unlikely to
be fruitful
• Search for feasible (approximately
optimal) rather than optimal solutions
• Ordered or best-first search
– Always expand “most promising” node
10. Mechanical Control System
Heuristic Dynamic
Programming: A* Search
k kf
Jk f = ! Ji +
ˆ ! J (arc )
ˆ
i i
i=1 i= k +1
• Each arc bears an incremental cost
• Cost, J, estimated at kth instant =
– Cost accrued to k
– Remaining cost to reach final point, kf
• Goal: minimize estimated cost by choice of
remaining arcs
• Choose arck+1, arck+2 accordingly
• Use heuristics to estimate remaining cost
Inferential Fault Analyzer for
Helicopter Control System Local Failure Analysis
• Local failure analysis • Frames store facts and facilitate search and inference
– Set of hypothetical models of specific failure – Components and up-/downstream linkages of control system
• Global failure analysis – Failure model parameters
– Forward reasoning assesses failure impact – Rule base for failure analysis (LISP)
– Backward reasoning deduces possible causes
Aft Rotor
Forward Rotor
Cockpit Controls
11. Heuristic Search Global Failure Analysis
• Global failure analysis
– Determination based on aggregate of
local models
• Heuristic score based on
– Criticality of failure
– Reliability of component
– Extensiveness of failure
– Implicated devices
– Level of backtracking
– Severity of failure
– Net probability of failure model
Shortest Path Problems
• Find the shortest (or • Simulated annealing solution
Next Time:
least costly) path that
• Genetic algorithm solution
visits all selected cities
just once • Neural network solution
Knowledge
– Traveling Saleman
– MapQuest/GPS/GIS
Representation
Modified Dijkstra
Algorithm