Machine Learning

Machine Learning Proposed Term Paper Topics
Robert Stengel
Robotics and Intelligent Systems MAE 345,
Princeton University, 2009
MAE 345, Fall 2009
! Multistep NN with Memory
• Markov Decision Processes ! Maze-Navigating Robot
– Optimal and near-optimal control ! Robotic Prosthetic Device
• Finding Decision Rules in Data ! Optimal Control of an Ambiguous Robot
– ID3 algorithm ! Game-Playing NN
• Search ! NN for Object Recognition
! Robotic Cloth Folder
! SAGA Simulated Creature
! NN to Optimize Problem Set Solution
! Blob-Tracking NN
! Dust-Collecting Robot that Learns
! NN for Stock Return Prediction
Copyright 2009 by Robert Stengel. All rights reserved. For educational use only.
http://www.princeton.edu/~stengel/MAE345.html

Finding Decision Example of On-Line
Rules in Data Code Modiﬁcation

• Identiﬁcation of key attributes and • Execute a decision tree
outcomes – Get wrong answer
• Add logic to distinguish between right and wrong
• Taxonomies developed by experts cases
• First principles of science and – If Comfort Zone = Water,
• then Animal = Hippo,
mathematics • else Animal = Rhino
• Trial and error – True, but Animal is Dinosaur, not Hippo
– Ask user for right answer
• Probability theory and fuzzy logic – Ask user for a rule that distinguishes between right and
wrong answer: If Animal is extinct, …
• Simulation and empirical results

Maximizing the Utility Function
Markov Decision Process of a Markov Process
• Model for decision making under uncertainty "
Utility function: J = # ! (t)Ra(t ) [ x(t), x(t + 1)]
! S, A, Pam ( x k , x ') , Ram ( x k , x ') #
t =0
" $ ! (t) : discount rate, 0<! (t)<1
where
S : finite set of states, x1 , x 2 ,…, x K "
A : finite set of actions, a1 , a2 ,…, aM Utility function to go = Value function: V = # ! (t)Ra(t ) [ x(t), x(t + 1)]
Pam ( x k , x ') = Pr ! x k ( ti +1 ) = x ' | x k ( ti ) = x k , a ( ti ) = am #
t =t current
" $
Ram ( x k , x ') = Expected immediate reward for transition from x k to x ' • Optimal control at t
$
& " (
&
• Optimal decision maximizes expected total reward (or u opt ( t ) = arg max % Ra(t ) [ x(t), x(t + 1)] + ! (t) # Pa(t ) [ x(t), x(t + 1)]V [ x(t + 1)])
minimizes expected total cost) by choosing best set of a &
' t =t current &
*
actions (or control policy) • Optimized value function
– Linear-quadratic-Gaussian (LQG) control "

– Dynamic programming -> HJB equation ~> A* search V * ( t ) = Ruopt (t ) [ x * (t)] + ! (t) # Puopt (t ) [ x * (t), x est * (t + 1)]V [ x est * (t + 1)]
t =t current
– Reinforcement learning ~> Heuristic search

Reinforcement (“Q”) Learning Q Learning Control of a Markov
Control of a Markov Process Process is Analogous to LQG
• Q: quality of a state-action function Control in the LTI Case
• Heuristic value function
• One-step philosophy for heuristic optimization $ {
Q [ x(t + 1), u(t + 1)] = Q [ x(t), u(t)] + ! (t) # Ru(t ) [ x(t)] + " (t)max Q [ x(t + 1), u ]% ' Q [ x(t), u(t)]
u & }
! (t) : learning rate, 0<! (t)<1
$ { u & }
Q [ x(t + 1), u(t + 1)] = Q [ x(t), u(t)] + ! (t) # Ru(t ) [ x(t)] + " (t)max Q [ x(t + 1), u ]% ' Q [ x(t), u(t)]
Controller
! (t) : learning rate, 0<! (t)<1

x k +1 = !x k + "C ( x k # x k *)
ˆ
• Various algorithms for computing best control value
Estimator
ubest ( t ) = arg max Q [ x(t + 1), u ]
u
x k = !x k "1 " #C ( x k "1 " x k "1 *) + K ( z k " H x x k "1 )
ˆ ˆ ˆ ˆ
Q-Learning Snail Q-Learning, Ball on Plate

LQG Control Optimizes Discrete- Structuring an Efficient
Time LTI Markov Process Decision Tree (Off-Line)

• Choose most important attributes first
• Recognize when no result can be
deduced
• Exclude irrelevant factors
! S, A, Pam ( x k , x ') , Ram ( x k , x ') #
" $
where
• Iterative Dichotomizer*: the ID3 Algorithm
S : infinite set of states, x1 , x 2 ,…, x K – Build an efficient decision tree from a fixed
A : infinite set of actions, a1 , a2 ,…, aM set of examples (supervised learning)
Pam ( x k , x ') = Pr ! x k ( ti +1 ) = x ' | x k ( ti ) = x k , a ( ti ) = am #
" $
Ram ( x k , x ') = Expected immediate reward for transition from x k to x ' *Dichotomy: Division into two (usually contradictory)
parts or opinions

Fuzzy Ball-Game Training Set Parameters of the ID3 Algorithm
Attributes Decisions
Case # Forecast Temperature Humidity Wind Play Ball?
1 Sunny Hot High Weak No
2 Sunny Hot High Strong No
3 Overcast Hot High Weak Yes
4 Rain Mild High Weak Yes
5 Rain Cool Low Weak Yes
6 Rain Cool Low Strong No
7 Overcast Cool Low Strong Yes
8 Sunny Mild High Weak No
9 Sunny Cool Low Weak Yes • Decisions, e.g., Play ball or
10 Rain Mild Low Weak Yes
11 Sunny Mild Low Strong Yes don!t play ball
12 Overcast Mild High Strong Yes
13 Overcast Hot Low Weak Yes – D = Number of possible decisions
14 Rain Mild High Strong No • Decision: Yes, no

Parameters of Parameters of
the ID3 Algorithm the ID3 Algorithm

• Attributes, e.g., Temperature, humidity, • Training trials, e.g., all the
wind, weather forecast
– M = Number of attributes to be considered in games played last month
making a decision – N = Number of training trials
– Im = Number of values that the ith attribute can
take – n(i) = Number of examples with
• Temperature: Hot, mild, cool ith attribute
• Humidity: High, low
• Wind: Strong, weak
• Forecast: Sunny, overcast, rain

Example: Probability Spaces for Example: Decision, given
Three Attributes Values of Three Attributes

• Probability of an attribute value
represented by area in diagram

Attribute #1 Attribute #2 Attribute #3 Attribute #1 Attribute #2 Attribute #3
2 possible values 6 possible values 4 possible values 2 possible values 6 possible values 4 possible values

Accurate Detection of Events Depends
Accurate Detection of Events Depends
on Their Probability of Occurence
on Their Probability of Occurence
! noise = 0.1

! noise = 0.2

! noise = 0.4

Entropy Measures Information Entropy of Two Events with Various
Content of a Signal Frequencies of Occurrence
• Pr(i) log2Pr(i) represents the channel capacity
(i.e., average number of bits) required to portray
• S = Entropy of a signal encoding I distinct events the ith event
I • Frequencies of occurrence estimate
S = ! " Pr(i) log 2 Pr(i) 0 " Pr(.) " 1
log2 Pr(.) " 0 probabilities of each event (#1 and #2)
i =1 – Pr(#1) = n(#1)/N
log2 Pr(#1 or #2) " 0
– Pr(#2) = n(#2)/N = 1 – n(#1)/N
• i = Index identifying an event encoded by
a signal
• Pr(i) = Probability of ith event
S = S# 1 + S# 2
• log2Pr(i) = Number of bits required to = ! Pr(#1) log 2 Pr(#1) ! Pr(# 2) log 2 Pr(# 2)
characterize the probability that the ith
event occurs

Best Decision is Related to Entropy
Entropy of Two Events with Various and the Probability of Occurrence
Frequencies of Occurrence • High entropy
Entropies for 128 Trials – Signal provides high coding I
S = !" Pr(i) log 2 Pr(i)
Pr(#1) - # of Bits(#1) Pr(#2) - # of Bits(#2) Entropy
precision of distinct events
n n/N log2(n/N) 1 - n/N log2(1 - n/N) S
1 0.008 -7 0.992 -0.011 0.066 – Differences coded with few bits
2 0.016 -6 0.984 -0.023 0.116 i=1
4 0.031 -5 0.969 -0.046 0.201 • Low entropy
8 0.063 -4 0.938 -0.093 0.337
16 0.125 -3 0.875 -0.193 0.544 – Lack of distinction between
32 0.25 -2 0.75 -0.415 0.811 signal values
64 0.50 -1 0.50 -1 1
96 0.75 -0.415 0.25 -2 0.811 – Detecting differences requires
112 0.875 -0.193 0.125 -3 0.544
120 0.938 -0.093 0.063 -4 0.337 many bits
124 0.969 -0.046 0.031 -5 0.201
126 0.984 -0.023 0.016 -6 0.116 • Best classiﬁcation of events
127 0.992 -0.011 0.008 -7 0.066 when S = 1...
– but that may not be achievable

Case # Forecast Temperature Humidity Wind Play Ball?
1
2
3
4
Sunny
Sunny
Overcast
Rain
Hot
Hot
Hot
Mild
High
High
High
High
Weak
Strong
Weak
Weak
No
No
Yes
Yes
Decision-Making Decision Tree Produced by
5 Rain Cool Low Weak Yes
6
7
Rain
Overcast
Cool
Cool
Low
Low
Strong
Strong
No
Yes
ID3 Algorithm
Parameters for ID3
8 Sunny Mild High Weak No
9 Sunny Cool Low Weak Yes
10 Rain Mild Low Weak Yes
11 Sunny Mild Low Strong Yes
12
13
Overcast
Overcast
Mild
Hot
High
Low
Strong
Weak
Yes
Yes
• Root Attribute gains, Gi
14 Rain Mild High Strong No
– Forecast: 0.246
– Temperature: 0.029
• SD = Entropy of all possible decisions –
–
Humidity: 0.151
Wind: 0.048
D
SD = !" Pr(d) log 2 Pr(d)
d =1

• Gi = Information gain of ith attribute
Im D
Gi = SD + ! Pr(i) ! Pr(id ) log 2 Pr(id )
i=1 d =1

• Pr(id) = n(id)/ N(d) = Probability that ith • Temperature is inconsequential and
attribute correlates with dth decision is not included in the decision tree

Decision Tree Produced by Search
ID3 Algorithm
• Typical AI textbook problems
– Prove a theorem
• Sunny Branch
– Solve a puzzle (e.g., Tower of
Attribute gains, Gi Hanoi)
– Temperature: 0.57 – Find a sequence of moves that
– Humidity: 0.97 wins a game (e.g., chess)
– Wind: 0.019 – Find the shortest path
connecting a set of points (e.g.,
Traveling salesman problem)
– Find a sequence of symbolic
transformations that solve a
calculus problem (e.g.,
Mathematica)
• The common thread: search
– Structures for search
– Strategies for search

Curse of Structures for Search
Dimensionality
• Feasible search paths may • Trees
grow without bound
– Possible combinatorial – Single path between root and any node
explosion
– Checkers: 5 x 1020 possible
– Path between adjacent nodes = arc
moves – Root node
– Chess: 10120 moves
– Protein folding: ? • no precursors
• Limiting search complexity – Leaf node
– Redeﬁne search space
– Employ heuristic (i.e., pragmatic)
• no successors
rules • possible terminator
– Establish restricted search range
– Invoke decision models that
have worked in the past

Structures for Search Directions of Search
• Forward chaining
• Graphs –Reason from premises to actions
–Multiple paths –Data-driven: draw conclusions
between root from facts
and some • Backward chaining
nodes
–Reason from actions to premises
–Trees are
subsets of –Goal-driven: find facts that
graphs support hypotheses

Strategies for Search Blind Search
• Search forward from opening?
• Node expansion
• Search backward from end game? – Find all successors to that node
• Realistic assessment • Both?
• Depth-first forward search
– Not necessary to consider all 10120 possible moves – Expand nodes descended from most recently
to play good chess expanded node
– Playing excellent chess may require much forward – Consider other paths only after reaching a node
and backward chaining, but not 10120 evaluations with no successors
– Most applications are more procedural
• Breadth-first forward search
• Search categories – Expand nodes in order of proximity to the start node
– Blind search – Consider all sequences of arc number n (from root
– Heuristic search node) before considering any of number (n + 1)
– Probabilistic search – Exhaustive, but guaranteed to find the shortest path
– Optimization to a terminator

AND/OR Graph Search
Blind Search

• Bidirectional search
– Search forward from root node and
backward from one or more leaf nodes
– Terminate when search nodes coincide • A node is “solved” if
• Minimal-cost forward search – It is a leaf node with a satisfactory goal
– Each arc is assigned a cost state
– Expand nodes in order of minimum cost – It has solved AND nodes as successors
– It has OR nodes as successors, at least
one of which is solved.
• Goal: Solve the root node

Heuristic Search Heuristic Optimal Search
• For large problems, blind search typically
leads to combinatorial explosion
• Employ heuristic knowledge about the
quality of possible paths
– Decide which node to expand next
– Discard (or prune) nodes that are unlikely to
be fruitful
• Search for feasible (approximately
optimal) rather than optimal solutions
• Ordered or best-ﬁrst search
– Always expand “most promising” node

Mechanical Control System
Heuristic Dynamic
Programming: A* Search
k kf

Jk f = ! Ji +
ˆ ! J (arc )
ˆ
i i
i=1 i= k +1

• Each arc bears an incremental cost
• Cost, J, estimated at kth instant =
– Cost accrued to k
– Remaining cost to reach ﬁnal point, kf
• Goal: minimize estimated cost by choice of
remaining arcs
• Choose arck+1, arck+2 accordingly
• Use heuristics to estimate remaining cost

Inferential Fault Analyzer for
Helicopter Control System Local Failure Analysis
• Local failure analysis • Frames store facts and facilitate search and inference
– Set of hypothetical models of speciﬁc failure – Components and up-/downstream linkages of control system
• Global failure analysis – Failure model parameters
– Forward reasoning assesses failure impact – Rule base for failure analysis (LISP)
– Backward reasoning deduces possible causes

Aft Rotor

Forward Rotor

Cockpit Controls

Heuristic Search Global Failure Analysis
• Global failure analysis
– Determination based on aggregate of
local models
• Heuristic score based on
– Criticality of failure
– Reliability of component
– Extensiveness of failure
– Implicated devices
– Level of backtracking
– Severity of failure
– Net probability of failure model

Shortest Path Problems
• Find the shortest (or • Simulated annealing solution

Next Time:
least costly) path that
• Genetic algorithm solution
visits all selected cities
just once • Neural network solution

Knowledge
– Traveling Saleman
– MapQuest/GPS/GIS

Representation
Modiﬁed Dijkstra
Algorithm

Machine Learning

Recommended

Recommended

More Related Content

Viewers also liked

Viewers also liked (6)

Similar to Machine Learning

Similar to Machine Learning (20)

More from butest

More from butest (20)

Machine Learning