Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Value Function Approximation via Low-Rank Models
Hao Yi Ong
AA 222, Stanford University
May 28, 2015
Outline
Introduction
Formulation
Approach
Numerical experiments
Introduction 2
Value function approximation
Markov decision process can be solved optimally given the
state-action value function
– value...
Value function decomposition
idea: approximate value function as low-rank plus sparse components
assumes intrinsic low-dim...
Outline
Introduction
Formulation
Approach
Numerical experiments
Formulation 5
Markov decision process
defined by the tuple (S, A, T, R)
S and A are the sets of all possible states and actions, respecti...
Value iteration
want to find the optimal policy π (s)
returns action that maximizes the utility from any given state
relate...
Matrix decomposition
suppose matrix M ∈ Rm×n
encodes Q (s, a)
– m and n are the cardinalities of the state and action spac...
Matrix decomposition
M
(m×n)
= AL0
(m×r)
BT
L0
(r×n)
+ S0
(m×n)
Formulation 9
Outline
Introduction
Formulation
Approach
Numerical experiments
Approach 10
Principal Component Pursuit (PCP)
best (known) convex estimate of Robust PCA
minimize L ∗ + λ S 1
subject to L + S = M
int...
Outline
Introduction
Formulation
Approach
Numerical experiments
Numerical experiments 12
Mountain car
Numerical experiments 13
Inverted pendulum
Numerical experiments 14
Implementation
https://github.com/haoyio/LowRankMDP
Numerical experiments 15
References
Emmanuel J Candes, Xiaodong Li, Yi Ma, and John Wright.
Robust principal component analysis?
Journal of the Ass...
Upcoming SlideShare
Loading in …5
×

Value Function Approximation via Low-Rank Models

234 views

Published on

We propose a novel value function approximation technique for Markov decision processes that compactly represents the state-action value function using a low-rank and sparse matrix model. Under minimal assumptions, this decomposition is a Robust Principal Component Analysis problem that can be solved exactly via the Principal Component Pursuit convex optimization problem.

Published in: Data & Analytics
  • Be the first to comment

  • Be the first to like this

Value Function Approximation via Low-Rank Models

  1. 1. Value Function Approximation via Low-Rank Models Hao Yi Ong AA 222, Stanford University May 28, 2015
  2. 2. Outline Introduction Formulation Approach Numerical experiments Introduction 2
  3. 3. Value function approximation Markov decision process can be solved optimally given the state-action value function – value function gives utility for taking an action given a state; want to find action that maximizes utility – can be represented as a matrix for discrete problems – typically millions or billions of dimensions for practical problems value function approximation finds compact alternative – basis functions used widely in reinforcement learning (RL) – e.g., Gaussian radial basis function, neural network Introduction 3
  4. 4. Value function decomposition idea: approximate value function as low-rank plus sparse components assumes intrinsic low-dimensionality – i.e., value function can be captured by small set of features – hinted by success of basis function approximation in RL falls under category of Robust Principal Component Analysis (PCA) – widely used in image/video analysis and collaborative filtering; e.g., Netflix challenge – novel application of Robust PCA as far as author is aware Introduction 4
  5. 5. Outline Introduction Formulation Approach Numerical experiments Formulation 5
  6. 6. Markov decision process defined by the tuple (S, A, T, R) S and A are the sets of all possible states and actions, respectively T gives the probability of transitioning into state s from taking action a at the current state s, and is often denoted T (s, a, s ) R gives a scalar value indicating the immediate reward received for taking action a at the current state s and is denoted R (s, a) Formulation 6
  7. 7. Value iteration want to find the optimal policy π (s) returns action that maximizes the utility from any given state related to state-action value function Q (s, a) π (s) = argmax a∈A Q (s, a) value iteration updates value function guess ˆQ until convergence ˆQ (s, a) := R (s, a) + s ∈S T (s, a, s ) max a ∈A ˆQ (s , a ) Formulation 7
  8. 8. Matrix decomposition suppose matrix M ∈ Rm×n encodes Q (s, a) – m and n are the cardinalities of the state and action spaces approximate with decomposition M = L0 + S0 – L0 and S0 are the true low-rank and sparse components why should this work? – implicit assumption about correlation of utility values across actions Formulation 8
  9. 9. Matrix decomposition M (m×n) = AL0 (m×r) BT L0 (r×n) + S0 (m×n) Formulation 9
  10. 10. Outline Introduction Formulation Approach Numerical experiments Approach 10
  11. 11. Principal Component Pursuit (PCP) best (known) convex estimate of Robust PCA minimize L ∗ + λ S 1 subject to L + S = M intuitively – nuclear norm · ∗ is best convex approximation to minimizing rank – 1-norm has sparsifying property remarkably, solution to PCP decomposes M perfectly [CLMW11] Approach 11
  12. 12. Outline Introduction Formulation Approach Numerical experiments Numerical experiments 12
  13. 13. Mountain car Numerical experiments 13
  14. 14. Inverted pendulum Numerical experiments 14
  15. 15. Implementation https://github.com/haoyio/LowRankMDP Numerical experiments 15
  16. 16. References Emmanuel J Candes, Xiaodong Li, Yi Ma, and John Wright. Robust principal component analysis? Journal of the Association for Computing Machinery, 58(3), 2011. 16

×