Value Function Approximation via Low-Rank Models

•

1 like•390 views

We propose a novel value function approximation technique for Markov decision processes that compactly represents the state-action value function using a low-rank and sparse matrix model. Under minimal assumptions, this decomposition is a Robust Principal Component Analysis problem that can be solved exactly via the Principal Component Pursuit convex optimization problem.

Data & Analytics

Value Function Approximation via Low-Rank Models
Hao Yi Ong
AA 222, Stanford University
May 28, 2015

Outline
Introduction
Formulation
Approach
Numerical experiments
Introduction 2

Value function approximation
Markov decision process can be solved optimally given the
state-action value function
– value function gives utility for taking an action given a state; want to
ﬁnd action that maximizes utility
– can be represented as a matrix for discrete problems
– typically millions or billions of dimensions for practical problems
value function approximation ﬁnds compact alternative
– basis functions used widely in reinforcement learning (RL)
– e.g., Gaussian radial basis function, neural network
Introduction 3

Value function decomposition
idea: approximate value function as low-rank plus sparse components
assumes intrinsic low-dimensionality
– i.e., value function can be captured by small set of features
– hinted by success of basis function approximation in RL
falls under category of Robust Principal Component Analysis (PCA)
– widely used in image/video analysis and collaborative ﬁltering; e.g.,
Netﬂix challenge
– novel application of Robust PCA as far as author is aware
Introduction 4

Outline
Introduction
Formulation
Approach
Numerical experiments
Formulation 5

Markov decision process
deﬁned by the tuple (S, A, T, R)
S and A are the sets of all possible states and actions, respectively
T gives the probability of transitioning into state s from taking
action a at the current state s, and is often denoted T (s, a, s )
R gives a scalar value indicating the immediate reward received for
taking action a at the current state s and is denoted R (s, a)
Formulation 6

Value iteration
want to ﬁnd the optimal policy π (s)
returns action that maximizes the utility from any given state
related to state-action value function Q (s, a)
π (s) = argmax
a∈A
Q (s, a)
value iteration updates value function guess ˆQ until convergence
ˆQ (s, a) := R (s, a) +
s ∈S
T (s, a, s ) max
a ∈A
ˆQ (s , a )
Formulation 7

Matrix decomposition
suppose matrix M ∈ Rm×n
encodes Q (s, a)
– m and n are the cardinalities of the state and action spaces
approximate with decomposition M = L0 + S0
– L0 and S0 are the true low-rank and sparse components
why should this work?
– implicit assumption about correlation of utility values across actions
Formulation 8

Matrix decomposition
M
(m×n)
= AL0
(m×r)
BT
L0
(r×n)
+ S0
(m×n)
Formulation 9

Outline
Introduction
Formulation
Approach
Numerical experiments
Approach 10

Principal Component Pursuit (PCP)
best (known) convex estimate of Robust PCA
minimize L ∗ + λ S 1
subject to L + S = M
intuitively
– nuclear norm · ∗ is best convex approximation to minimizing rank
– 1-norm has sparsifying property
remarkably, solution to PCP decomposes M perfectly [CLMW11]
Approach 11

Outline
Introduction
Formulation
Approach
Numerical experiments
Numerical experiments 12

Inverted pendulum
Numerical experiments 14

Implementation
https://github.com/haoyio/LowRankMDP
Numerical experiments 15

References
Emmanuel J Candes, Xiaodong Li, Yi Ma, and John Wright.
Robust principal component analysis?
Journal of the Association for Computing Machinery, 58(3), 2011.
16

What's hot

Sensor Fusion Study - Ch15. The Particle Filter [Seoyeon Stella Yang]AI Robotics KR

Ch7 frequency response analysisElaf A.Saeed

Sensor Fusion Study - Ch3. Least Square Estimation [강소라, Stella, Hayden]AI Robotics KR

Divide and conquerEmmanuel college

functions Gaditek

CS8451 - Design and Analysis of AlgorithmsKrishnan MuthuManickam

Complexity Analysis of Recursive FunctionMeghaj Mallick

Signal modellingDebangi_G

[AAAI2021] Combinatorial Pure Exploration with Full-bandit or Partial Linear ...Yuko Kuroki (黒木祐子)

Ann a Algorithms notesProf. Neeta Awasthy

Firefly exact MCMC for Big DataGianvito Siciliano

Methods of signal processing for adaptive antenna arraysSpringer

non parametric methods for power spectrum estimatonBhavika Jethani

On selection of periodic kernels parameters in time series predictioncsandit

Nitish 007Nitish sharma

Secanteoskrjulia

LAPLACE TRANSFORM (Differential Equation)AfshanKhan51

A+relativistic+quantum+kinematics+description+for+d-Dimensional+path+planning...Wissem Dhaouadi

Planning Algorithmsahmad bassiouny

CS8451 - Design and Analysis of AlgorithmsKrishnan MuthuManickam

What's hot (20)

Sensor Fusion Study - Ch15. The Particle Filter [Seoyeon Stella Yang]

Ch7 frequency response analysis

Sensor Fusion Study - Ch3. Least Square Estimation [강소라, Stella, Hayden]

Divide and conquer

functions

CS8451 - Design and Analysis of Algorithms

Complexity Analysis of Recursive Function

Signal modelling

[AAAI2021] Combinatorial Pure Exploration with Full-bandit or Partial Linear ...

Ann a Algorithms notes

Firefly exact MCMC for Big Data

Methods of signal processing for adaptive antenna arrays

non parametric methods for power spectrum estimaton

On selection of periodic kernels parameters in time series prediction

Nitish 007

Secante

LAPLACE TRANSFORM (Differential Equation)

A+relativistic+quantum+kinematics+description+for+d-Dimensional+path+planning...

Planning Algorithms

CS8451 - Design and Analysis of Algorithms

Similar to Value Function Approximation via Low-Rank Models

Reinforcement Learning Overview | Marco Del PraData Science Milan

Off-Policy Deep Reinforcement Learning without Exploration.pdfPo-Chuan Chen

PCA and LDA in machine learningAkhilesh Joshi

lcrGuy Lebanon

lecture_21.pptx - PowerPoint Presentationbutest

safe and efficient off policy reinforcement learningRyo Iwaki

Practical Reinforcement Learning with TensorFlowIllia Polosukhin

Reinforcement Learning : A Beginners TutorialOmar Enayet

Introduction to Reinforcement Learning for Molecular Design Dan Elton

TensorFlow and Deep Learning Tips and TricksBen Ball

Playing Atari with Deep Reinforcement LearningWilly Marroquin (WillyDevNET)

Beginners Guide to Non-Negative Matrix FactorizationBenjamin Bengfort

Yuwu chenYuwu Chen

Ml srhwt-machine-learning-based-superlative-rapid-haar-wavelet-transformation...Jumlesha Shaik

Scalable trust-region method for deep reinforcement learning using Kronecker-...Willy Marroquin (WillyDevNET)

Machine Learning.pdfBeyaNasr1

Hands-on Tutorial of Machine Learning in PythonChun-Ming Chang

Head First Reinforcement Learningazzeddine chenine

Artificial Intelligence butest

Batch mode reinforcement learning based on the synthesis of artificial trajec...Université de Liège (ULg)

Similar to Value Function Approximation via Low-Rank Models (20)

Reinforcement Learning Overview | Marco Del Pra

Off-Policy Deep Reinforcement Learning without Exploration.pdf

PCA and LDA in machine learning

lcr

lecture_21.pptx - PowerPoint Presentation

safe and efficient off policy reinforcement learning

Practical Reinforcement Learning with TensorFlow

Reinforcement Learning : A Beginners Tutorial

Introduction to Reinforcement Learning for Molecular Design

TensorFlow and Deep Learning Tips and Tricks

Playing Atari with Deep Reinforcement Learning

Beginners Guide to Non-Negative Matrix Factorization

Yuwu chen

Ml srhwt-machine-learning-based-superlative-rapid-haar-wavelet-transformation...

Scalable trust-region method for deep reinforcement learning using Kronecker-...

Machine Learning.pdf

Hands-on Tutorial of Machine Learning in Python

Head First Reinforcement Learning

Artificial Intelligence

Batch mode reinforcement learning based on the synthesis of artificial trajec...

Recently uploaded

04242024_CCC TUG_Joins and Relationshipsccctableauusergroup

PKS-TGC-1084-630 - Stage 1 Proposal.pptxPramod Kumar Srivastava

Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfRachmat Ramadhan H

From idea to production in a day – Leveraging Azure ML and Streamlit to build...Florian Roscheck

Schema on read is obsolete. Welcome metaprogramming..pdfLars Albertsson

Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Sapana Sha

代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改atducpo

Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh9953056974 Low Rate Call Girls In Saket, Delhi NCR

Brighton SEO | April 2024 | Data StorytellingNeil Barnes

Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Jack DiGiovanna

Unveiling Insights: The Role of a Data AnalystSamantha Rae Coolbeth

Digi Khata Problem along complete plan.pptxTanveerAhmed817946

Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083

VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...Call Girls In Delhi Whatsup 9873940964 Enjoy Unlimited Pleasure

Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service BhilaiSuhani Kapoor

100-Concepts-of-AI by Anupama Kate .pptxAnupama Kate

Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083

Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝soniya singh

E-Commerce Order PredictionShraddha Kamble.pptxBoston Institute of Analytics

Dubai Call Girls Wifey O52&786472 Call Girls Dubaihf8803863

Recently uploaded (20)

04242024_CCC TUG_Joins and Relationships

PKS-TGC-1084-630 - Stage 1 Proposal.pptx

Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf

From idea to production in a day – Leveraging Azure ML and Streamlit to build...

Schema on read is obsolete. Welcome metaprogramming..pdf

Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...

代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改

Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh

Brighton SEO | April 2024 | Data Storytelling

Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...

Unveiling Insights: The Role of a Data Analyst

Digi Khata Problem along complete plan.pptx

Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call

VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...

Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai

100-Concepts-of-AI by Anupama Kate .pptx

Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call

Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝

E-Commerce Order PredictionShraddha Kamble.pptx

Dubai Call Girls Wifey O52&786472 Call Girls Dubai

Value Function Approximation via Low-Rank Models

1. Value Function Approximation via Low-Rank Models Hao Yi Ong AA 222, Stanford University May 28, 2015

2. Outline Introduction Formulation Approach Numerical experiments Introduction 2

3. Value function approximation Markov decision process can be solved optimally given the state-action value function – value function gives utility for taking an action given a state; want to ﬁnd action that maximizes utility – can be represented as a matrix for discrete problems – typically millions or billions of dimensions for practical problems value function approximation ﬁnds compact alternative – basis functions used widely in reinforcement learning (RL) – e.g., Gaussian radial basis function, neural network Introduction 3

4. Value function decomposition idea: approximate value function as low-rank plus sparse components assumes intrinsic low-dimensionality – i.e., value function can be captured by small set of features – hinted by success of basis function approximation in RL falls under category of Robust Principal Component Analysis (PCA) – widely used in image/video analysis and collaborative ﬁltering; e.g., Netﬂix challenge – novel application of Robust PCA as far as author is aware Introduction 4

5. Outline Introduction Formulation Approach Numerical experiments Formulation 5

6. Markov decision process deﬁned by the tuple (S, A, T, R) S and A are the sets of all possible states and actions, respectively T gives the probability of transitioning into state s from taking action a at the current state s, and is often denoted T (s, a, s ) R gives a scalar value indicating the immediate reward received for taking action a at the current state s and is denoted R (s, a) Formulation 6

7. Value iteration want to ﬁnd the optimal policy π (s) returns action that maximizes the utility from any given state related to state-action value function Q (s, a) π (s) = argmax a∈A Q (s, a) value iteration updates value function guess ˆQ until convergence ˆQ (s, a) := R (s, a) + s ∈S T (s, a, s ) max a ∈A ˆQ (s , a ) Formulation 7

8. Matrix decomposition suppose matrix M ∈ Rm×n encodes Q (s, a) – m and n are the cardinalities of the state and action spaces approximate with decomposition M = L0 + S0 – L0 and S0 are the true low-rank and sparse components why should this work? – implicit assumption about correlation of utility values across actions Formulation 8

9. Matrix decomposition M (m×n) = AL0 (m×r) BT L0 (r×n) + S0 (m×n) Formulation 9

10. Outline Introduction Formulation Approach Numerical experiments Approach 10

11. Principal Component Pursuit (PCP) best (known) convex estimate of Robust PCA minimize L ∗ + λ S 1 subject to L + S = M intuitively – nuclear norm · ∗ is best convex approximation to minimizing rank – 1-norm has sparsifying property remarkably, solution to PCP decomposes M perfectly [CLMW11] Approach 11

12. Outline Introduction Formulation Approach Numerical experiments Numerical experiments 12

13. Mountain car Numerical experiments 13

14. Inverted pendulum Numerical experiments 14

15. Implementation https://github.com/haoyio/LowRankMDP Numerical experiments 15

16. References Emmanuel J Candes, Xiaodong Li, Yi Ma, and John Wright. Robust principal component analysis? Journal of the Association for Computing Machinery, 58(3), 2011. 16

Value Function Approximation via Low-Rank Models

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Value Function Approximation via Low-Rank Models

Similar to Value Function Approximation via Low-Rank Models (20)

Recently uploaded

Recently uploaded (20)

Value Function Approximation via Low-Rank Models