Successfully reported this slideshow.
Upcoming SlideShare
×

Value Function Approximation via Low-Rank Models

234 views

Published on

We propose a novel value function approximation technique for Markov decision processes that compactly represents the state-action value function using a low-rank and sparse matrix model. Under minimal assumptions, this decomposition is a Robust Principal Component Analysis problem that can be solved exactly via the Principal Component Pursuit convex optimization problem.

Published in: Data & Analytics
• Full Name
Comment goes here.

Are you sure you want to Yes No
• Be the first to comment

• Be the first to like this

Value Function Approximation via Low-Rank Models

1. 1. Value Function Approximation via Low-Rank Models Hao Yi Ong AA 222, Stanford University May 28, 2015
2. 2. Outline Introduction Formulation Approach Numerical experiments Introduction 2
3. 3. Value function approximation Markov decision process can be solved optimally given the state-action value function – value function gives utility for taking an action given a state; want to ﬁnd action that maximizes utility – can be represented as a matrix for discrete problems – typically millions or billions of dimensions for practical problems value function approximation ﬁnds compact alternative – basis functions used widely in reinforcement learning (RL) – e.g., Gaussian radial basis function, neural network Introduction 3
4. 4. Value function decomposition idea: approximate value function as low-rank plus sparse components assumes intrinsic low-dimensionality – i.e., value function can be captured by small set of features – hinted by success of basis function approximation in RL falls under category of Robust Principal Component Analysis (PCA) – widely used in image/video analysis and collaborative ﬁltering; e.g., Netﬂix challenge – novel application of Robust PCA as far as author is aware Introduction 4
5. 5. Outline Introduction Formulation Approach Numerical experiments Formulation 5
6. 6. Markov decision process deﬁned by the tuple (S, A, T, R) S and A are the sets of all possible states and actions, respectively T gives the probability of transitioning into state s from taking action a at the current state s, and is often denoted T (s, a, s ) R gives a scalar value indicating the immediate reward received for taking action a at the current state s and is denoted R (s, a) Formulation 6
7. 7. Value iteration want to ﬁnd the optimal policy π (s) returns action that maximizes the utility from any given state related to state-action value function Q (s, a) π (s) = argmax a∈A Q (s, a) value iteration updates value function guess ˆQ until convergence ˆQ (s, a) := R (s, a) + s ∈S T (s, a, s ) max a ∈A ˆQ (s , a ) Formulation 7
8. 8. Matrix decomposition suppose matrix M ∈ Rm×n encodes Q (s, a) – m and n are the cardinalities of the state and action spaces approximate with decomposition M = L0 + S0 – L0 and S0 are the true low-rank and sparse components why should this work? – implicit assumption about correlation of utility values across actions Formulation 8
9. 9. Matrix decomposition M (m×n) = AL0 (m×r) BT L0 (r×n) + S0 (m×n) Formulation 9
10. 10. Outline Introduction Formulation Approach Numerical experiments Approach 10
11. 11. Principal Component Pursuit (PCP) best (known) convex estimate of Robust PCA minimize L ∗ + λ S 1 subject to L + S = M intuitively – nuclear norm · ∗ is best convex approximation to minimizing rank – 1-norm has sparsifying property remarkably, solution to PCP decomposes M perfectly [CLMW11] Approach 11
12. 12. Outline Introduction Formulation Approach Numerical experiments Numerical experiments 12
13. 13. Mountain car Numerical experiments 13
14. 14. Inverted pendulum Numerical experiments 14
15. 15. Implementation https://github.com/haoyio/LowRankMDP Numerical experiments 15
16. 16. References Emmanuel J Candes, Xiaodong Li, Yi Ma, and John Wright. Robust principal component analysis? Journal of the Association for Computing Machinery, 58(3), 2011. 16