SlideShare a Scribd company logo
information theoretic
aspect of reinforcement
learning
HA JONG SU
TABLE
• PAC
• MULTI SKILL RL
PAC (Probably
Approximately
Correct)
PAC = Theoretical base of
occam's razor
PAC (Probably Approximately Correct)
R : error of learned model with
given all data
S: subset of data
H : hypothetical space
h : learned parameter, trained
model
The bound decreases when the
data increases or the hypothetical
space narrows
Hoeffding’s inequality
“Moment generating function”
http://cs229.stanford.edu/extra-
notes/hoeffding.pdf
Infinite data is given
PAC (Probably Approximately Correct)
difference between RL and other
algorithms
HMM, CRF, LSTM, GRU, Memory network, NTM, DNC has their own memory structure which is called
“hidden state” and play same role as “state” in RL thus make model to have low hypothetical space
CNN-like-algorithms deploy parameters to accommodate different data points with each other. The reason
why hinton write Capsule net saying “we need equivalent not invariant” which has memory structure
Simple implementations
GRU implementation from the script with tensorflow
https://github.com/kkugosu/PYTHON-Tensorflow---Jupyter---gru
CRF-GRU implementation with pytorch
https://github.com/kkugosu/PYTORCH---
dialogue_intention_extraction/blob/master/source_code/model/discrete_mo
del/comp_model.py
Limitation of PAC
Bounds are vacuous
Single parameter in neural network is encoded with float32
• 60parameter : H = 2^(32*60)
So we need quantize the network to use this bounds.
• What if even parameter is distribution??
Bayesian neural network
Gaussian process approximation
Explanation and python implementation
https://github.com/kkugosu/bayesian_neural_network
PAC-bayes
To estimate the performance of a model, which has an output composed of a distribution like
Bayesian neural networks...
PAC-bayes
PAC-bayes
PAC-bayes
Learning dynamics of
RL
• Model don’t need many number of data as increasing
bias, like 1-step td training, high assumption, but pose
high probability of distributional shift
• If they have high Variance, like Monte-Carlo or naïve PG,
their results are not accurate also because they need
tremendous number of data.
Learning dynamics of RL
• Distributional shift-like problem is a chronic issue specific to
generative models. Also occurring in generative models like
GANs. it is called 'Model collapse,' which is a problem same as
distributional shift
• An iterative process of making assumptions upon assumptions.
• Expectation <-> maximization
• Q assumption based on policy <-> policy update based on Q
Learning dynamics of RL
• Low assumption(need lots of data), Sufficient exploration <-> High
assumption (distributional shift)
• What if we are short of data and inevitably have to use a highly
biased model?
• We have to deal with distributional shift!
Learning dynamics of RL
We can find optimal n-step to negotiate
these problems
Learning dynamics of RL
Make model to be Robust with regularization term which usually
restrict changing rate of distribution (decreasing hypothetical space pac
bayes)
TRPO (local optimality)
Ensure the update of J -> Restrict the
changing rate of posterior distribution
SAC (local optimality)
Another basic RL algorithms
• Pytorch implementation of PG, DQN, AC, DDPG, TRPO, PPO, SAC
• https://github.com/kkugosu/RL_BASIC
How to secure global optimality
We restricted distribution so lowing hypothetical space
But we can’t say optimality because We still can’t use PAC theory.
The problem is that Data depends on the function
We can solve this problem by represent uncertainty as distribution
• (Bayesian reinforcement learning, distributional reinforcement
learning)
PAC-MDP (global optimality)
• E3 -> Near-Optimal Reinforcement Learning in Polynomial Time
• R-max –> A General Polynomial Time Algorithm for Near-Optimal
Reinforcement Learning
• On the Sample Complexity of Reinforcement Learning (R max modify)
• MBIE -> A theoretical analysis of Model-Based Interval Estimation
• MBIE–EB -> An analysis of model-based Interval Estimation for Markov
Decision Processes
• DELAY-Q -> PAC Model-Free Reinforcement Learning
• Reinforcement Learning in Finite MDPs: PAC Analysis
• PAC-inspired Option Discovery in Lifelong Reinforcement Learning
PAC-MDP
• PAC Continuous State Online Multitask Reinforcement Learning
with Identification (continuous space)
MBIE–EB (exploratory bonus) (2008)
original
IE version
MBIE-EB
CI for reward
CI for trajectory
MBIE-EB
Bound of performance given by CI of reward and trajectory
Sham
Machandranath
Kakade
• Hidden monsters in RL theory
field
Thompson sampling
Variance decrease as sample
increase -> uncertainty
decrease
LQR
Not learning
or
approximation
process
F and C are
given
And calculate
K which is
policy
iLQR
Learning
process
F and C is not
correct but
learned
PILCO
(2011)
PILCO (GP + iLQR)
Deep PILCO
Bayesian NN with sampling
We can’t use iLQR because BNN can’t expressed as computable
function
GPS
• modeling environment with Bayesian NN
• Training policy with iLQR at the same time
• Dual gradient descent makes use of strong duality in convex
optimization.
• https://github.com/kkugosu/RL_MODELBASED
PAC-Bayes for MDP (2022.11)
• PAC-Bayes Bounds for Bandit Problems: A Survey and Experimental Comparison
• https://arxiv.org/pdf/2211.16110.pdf
Auto ML
• HPO (Hyper Parameter Optimization)
• NAS (neural architecture search)
• Finding most plausible H
• meta learning
• Increasing number of data
Meta Reinforcement Learning
• Transfer learning
• hierarchical multi skill RL
• contrastive multi skill RL
• …
• They improve their performance by solving multiple task at the same
time or accumulatively
Contrastive-Based Multi-Skill RL
• Consider how much certain skill contributes compared to the
entire skill set
• maximizing IG(information gain)
• makes skills to push each other
Contrastive-Based Multi-Skill RL
“Theoretically distributional shift doesn’t occur in this method,
Flawless theory”
diayn
They aren't given a certain task, but they are trained in various
skills to differentiate themselves from each other. The main
objective is to achieve distinctiveness and uniqueness compared
to other skills
diayn
diayn
They are given reward to
maximize information gain
Regularization term
strengthen feature
embedding at the
same time
diayn
• learning “log_probability” with neural network inherently has
tuberance which is inaccuracy in approximation
diayn
• Strenghten feature towards skills that encompass that particular
state with that “turberance” makes model to fall in local minimal
easily which might be called “distributional shift problem” in
reinforcement learning
• So they added regularization term but it seems to be insufficient.
This model was short of coverage in state space
Edl, smm
• SMM: Efficient Exploration
• via State Marginal Matching
• they try to locate skill on the uncovered area
Edl, smm
• To adderess insufficient coverage problem, they even just fix
the p(s)
• EDL: Explore, Discover and Learn: Unsupervised Discovery of State-Covering Skills
apt
Give up to learn distinctiveness of each skill,
still strengthen feature embedding but through Learning Contrastive
Representations not by distinctiveness
they just maximize entropy of state which is occupied of skills
apt
Give reward to
maximize the
entropy of state in
the space of k-
nearest neighbors
apt
THE INFORMATION GEOMETRY OF
UNSUPERVISED REINFORCEMENT LEARNING
• Fixed p(s)
• -> lowering the understanding of environment
• -> learned skill becomes far from optimal so we have to train and find
more and more skill
• But if we don’t fix the p(s), the learned skills are optimal to certain
reward function.
• They implemented contrastive based multi skill rl on “3 state mdp”
without regularization term. And found out that is optimized on some
reward function which is not given while learning (not every reward
function), number of skill doesn’t increase at some point
THE INFORMATION GEOMETRY OF
UNSUPERVISED REINFORCEMENT
LEARNING
THE INFORMATION GEOMETRY OF
UNSUPERVISED REINFORCEMENT
LEARNING
Implementation
• I implemented every contrastive multi skill algorithm on my github
with pytorch
• Vic, Diayn, Dads, Edl, Visr, Valor, Apt, Aps, Cic….
• https://github.com/kkugosu/RL_META
reference
• https://www.youtube.com/playlist?list=PL_iWQOsE6TfURIIhCrlt-wj9ByIVpbfGc
• https://www.youtube.com/watch?v=t5GBuBD0ibc
• https://www.youtube.com/watch?v=ar9RLwgUvVQ
• https://www.edwith.org/bayesiandeeplearning
• https://www.sciencedirect.com/science/article/pii/S0022000008000767
• https://web.stanford.edu/class/cs234/CS234Win2020/slides/lecture13.pdf
• https://arxiv.org/abs/1802.06070
• https://arxiv.org/abs/1906.05274
• https://arxiv.org/abs/2103.04551
• https://arxiv.org/abs/2110.02719
• …

More Related Content

Similar to Information Theoretic aspect of reinforcement learning

How Machine Learning Helps Organizations to Work More Efficiently?
How Machine Learning Helps Organizations to Work More Efficiently?How Machine Learning Helps Organizations to Work More Efficiently?
How Machine Learning Helps Organizations to Work More Efficiently?
Tuan Yang
 
MACHINE LEARNING YEAR DL SECOND PART.pptx
MACHINE LEARNING YEAR DL SECOND PART.pptxMACHINE LEARNING YEAR DL SECOND PART.pptx
MACHINE LEARNING YEAR DL SECOND PART.pptx
NAGARAJANS68
 
6 large-scale-learning.pptx
6 large-scale-learning.pptx6 large-scale-learning.pptx
6 large-scale-learning.pptx
mustafa sarac
 
Two strategies for large-scale multi-label classification on the YouTube-8M d...
Two strategies for large-scale multi-label classification on the YouTube-8M d...Two strategies for large-scale multi-label classification on the YouTube-8M d...
Two strategies for large-scale multi-label classification on the YouTube-8M d...
Dalei Li
 
The Search for a New Visual Search Beyond Language - StampedeCon AI Summit 2017
The Search for a New Visual Search Beyond Language - StampedeCon AI Summit 2017The Search for a New Visual Search Beyond Language - StampedeCon AI Summit 2017
The Search for a New Visual Search Beyond Language - StampedeCon AI Summit 2017
StampedeCon
 
lec1.ppt
lec1.pptlec1.ppt
lec1.ppt
SVasuKrishna1
 
Lecture 1
Lecture 1Lecture 1
Lecture 1
Aun Akbar
 
An overview of gradient descent optimization algorithms
An overview of gradient descent optimization algorithms An overview of gradient descent optimization algorithms
An overview of gradient descent optimization algorithms
Hakky St
 
November, 2006 CCKM'06 1
November, 2006 CCKM'06 1 November, 2006 CCKM'06 1
November, 2006 CCKM'06 1
butest
 
Nimrita deep learning
Nimrita deep learningNimrita deep learning
Nimrita deep learning
Nimrita Koul
 
Sim-to-Real Transfer in Deep Reinforcement Learning
Sim-to-Real Transfer in Deep Reinforcement LearningSim-to-Real Transfer in Deep Reinforcement Learning
Sim-to-Real Transfer in Deep Reinforcement Learning
atulshah16
 
Studies of HPCC Systems from Machine Learning Perspectives
Studies of HPCC Systems from Machine Learning PerspectivesStudies of HPCC Systems from Machine Learning Perspectives
Studies of HPCC Systems from Machine Learning Perspectives
HPCC Systems
 
Intro to Deep Reinforcement Learning
Intro to Deep Reinforcement LearningIntro to Deep Reinforcement Learning
Intro to Deep Reinforcement Learning
Khaled Saleh
 
IEEE 2015 Matlab Projects
IEEE 2015 Matlab ProjectsIEEE 2015 Matlab Projects
IEEE 2015 Matlab Projects
Vijay Karan
 
Professor Steve Roberts; The Bayesian Crowd: scalable information combinati...
Professor Steve Roberts; The Bayesian Crowd: scalable information combinati...Professor Steve Roberts; The Bayesian Crowd: scalable information combinati...
Professor Steve Roberts; The Bayesian Crowd: scalable information combinati...
Bayes Nets meetup London
 
Professor Steve Roberts; The Bayesian Crowd: scalable information combinati...
Professor Steve Roberts; The Bayesian Crowd: scalable information combinati...Professor Steve Roberts; The Bayesian Crowd: scalable information combinati...
Professor Steve Roberts; The Bayesian Crowd: scalable information combinati...
Ian Morgan
 
Remote Sensing IEEE 2015 Projects
Remote Sensing IEEE 2015 ProjectsRemote Sensing IEEE 2015 Projects
Remote Sensing IEEE 2015 Projects
Vijay Karan
 
Distributed deep learning_over_spark_20_nov_2014_ver_2.8
Distributed deep learning_over_spark_20_nov_2014_ver_2.8Distributed deep learning_over_spark_20_nov_2014_ver_2.8
Distributed deep learning_over_spark_20_nov_2014_ver_2.8
Vijay Srinivas Agneeswaran, Ph.D
 
Deep Learning in Computer Vision
Deep Learning in Computer VisionDeep Learning in Computer Vision
Deep Learning in Computer Vision
Sungjoon Choi
 
lec10svm.ppt
lec10svm.pptlec10svm.ppt
lec10svm.ppt
pushkarjoshi42
 

Similar to Information Theoretic aspect of reinforcement learning (20)

How Machine Learning Helps Organizations to Work More Efficiently?
How Machine Learning Helps Organizations to Work More Efficiently?How Machine Learning Helps Organizations to Work More Efficiently?
How Machine Learning Helps Organizations to Work More Efficiently?
 
MACHINE LEARNING YEAR DL SECOND PART.pptx
MACHINE LEARNING YEAR DL SECOND PART.pptxMACHINE LEARNING YEAR DL SECOND PART.pptx
MACHINE LEARNING YEAR DL SECOND PART.pptx
 
6 large-scale-learning.pptx
6 large-scale-learning.pptx6 large-scale-learning.pptx
6 large-scale-learning.pptx
 
Two strategies for large-scale multi-label classification on the YouTube-8M d...
Two strategies for large-scale multi-label classification on the YouTube-8M d...Two strategies for large-scale multi-label classification on the YouTube-8M d...
Two strategies for large-scale multi-label classification on the YouTube-8M d...
 
The Search for a New Visual Search Beyond Language - StampedeCon AI Summit 2017
The Search for a New Visual Search Beyond Language - StampedeCon AI Summit 2017The Search for a New Visual Search Beyond Language - StampedeCon AI Summit 2017
The Search for a New Visual Search Beyond Language - StampedeCon AI Summit 2017
 
lec1.ppt
lec1.pptlec1.ppt
lec1.ppt
 
Lecture 1
Lecture 1Lecture 1
Lecture 1
 
An overview of gradient descent optimization algorithms
An overview of gradient descent optimization algorithms An overview of gradient descent optimization algorithms
An overview of gradient descent optimization algorithms
 
November, 2006 CCKM'06 1
November, 2006 CCKM'06 1 November, 2006 CCKM'06 1
November, 2006 CCKM'06 1
 
Nimrita deep learning
Nimrita deep learningNimrita deep learning
Nimrita deep learning
 
Sim-to-Real Transfer in Deep Reinforcement Learning
Sim-to-Real Transfer in Deep Reinforcement LearningSim-to-Real Transfer in Deep Reinforcement Learning
Sim-to-Real Transfer in Deep Reinforcement Learning
 
Studies of HPCC Systems from Machine Learning Perspectives
Studies of HPCC Systems from Machine Learning PerspectivesStudies of HPCC Systems from Machine Learning Perspectives
Studies of HPCC Systems from Machine Learning Perspectives
 
Intro to Deep Reinforcement Learning
Intro to Deep Reinforcement LearningIntro to Deep Reinforcement Learning
Intro to Deep Reinforcement Learning
 
IEEE 2015 Matlab Projects
IEEE 2015 Matlab ProjectsIEEE 2015 Matlab Projects
IEEE 2015 Matlab Projects
 
Professor Steve Roberts; The Bayesian Crowd: scalable information combinati...
Professor Steve Roberts; The Bayesian Crowd: scalable information combinati...Professor Steve Roberts; The Bayesian Crowd: scalable information combinati...
Professor Steve Roberts; The Bayesian Crowd: scalable information combinati...
 
Professor Steve Roberts; The Bayesian Crowd: scalable information combinati...
Professor Steve Roberts; The Bayesian Crowd: scalable information combinati...Professor Steve Roberts; The Bayesian Crowd: scalable information combinati...
Professor Steve Roberts; The Bayesian Crowd: scalable information combinati...
 
Remote Sensing IEEE 2015 Projects
Remote Sensing IEEE 2015 ProjectsRemote Sensing IEEE 2015 Projects
Remote Sensing IEEE 2015 Projects
 
Distributed deep learning_over_spark_20_nov_2014_ver_2.8
Distributed deep learning_over_spark_20_nov_2014_ver_2.8Distributed deep learning_over_spark_20_nov_2014_ver_2.8
Distributed deep learning_over_spark_20_nov_2014_ver_2.8
 
Deep Learning in Computer Vision
Deep Learning in Computer VisionDeep Learning in Computer Vision
Deep Learning in Computer Vision
 
lec10svm.ppt
lec10svm.pptlec10svm.ppt
lec10svm.ppt
 

Recently uploaded

一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理
一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理
一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理
bopyb
 
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
nyfuhyz
 
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
nuttdpt
 
Challenges of Nation Building-1.pptx with more important
Challenges of Nation Building-1.pptx with more importantChallenges of Nation Building-1.pptx with more important
Challenges of Nation Building-1.pptx with more important
Sm321
 
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
Aggregage
 
一比一原版巴斯大学毕业证(Bath毕业证书)学历如何办理
一比一原版巴斯大学毕业证(Bath毕业证书)学历如何办理一比一原版巴斯大学毕业证(Bath毕业证书)学历如何办理
一比一原版巴斯大学毕业证(Bath毕业证书)学历如何办理
y3i0qsdzb
 
University of New South Wales degree offer diploma Transcript
University of New South Wales degree offer diploma TranscriptUniversity of New South Wales degree offer diploma Transcript
University of New South Wales degree offer diploma Transcript
soxrziqu
 
Module 1 ppt BIG DATA ANALYTICS_NOTES FOR MCA
Module 1 ppt BIG DATA ANALYTICS_NOTES FOR MCAModule 1 ppt BIG DATA ANALYTICS_NOTES FOR MCA
Module 1 ppt BIG DATA ANALYTICS_NOTES FOR MCA
yuvarajkumar334
 
Palo Alto Cortex XDR presentation .......
Palo Alto Cortex XDR presentation .......Palo Alto Cortex XDR presentation .......
Palo Alto Cortex XDR presentation .......
Sachin Paul
 
End-to-end pipeline agility - Berlin Buzzwords 2024
End-to-end pipeline agility - Berlin Buzzwords 2024End-to-end pipeline agility - Berlin Buzzwords 2024
End-to-end pipeline agility - Berlin Buzzwords 2024
Lars Albertsson
 
一比一原版南十字星大学毕业证(SCU毕业证书)学历如何办理
一比一原版南十字星大学毕业证(SCU毕业证书)学历如何办理一比一原版南十字星大学毕业证(SCU毕业证书)学历如何办理
一比一原版南十字星大学毕业证(SCU毕业证书)学历如何办理
slg6lamcq
 
一比一原版英属哥伦比亚大学毕业证(UBC毕业证书)学历如何办理
一比一原版英属哥伦比亚大学毕业证(UBC毕业证书)学历如何办理一比一原版英属哥伦比亚大学毕业证(UBC毕业证书)学历如何办理
一比一原版英属哥伦比亚大学毕业证(UBC毕业证书)学历如何办理
z6osjkqvd
 
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
v7oacc3l
 
Predictably Improve Your B2B Tech Company's Performance by Leveraging Data
Predictably Improve Your B2B Tech Company's Performance by Leveraging DataPredictably Improve Your B2B Tech Company's Performance by Leveraging Data
Predictably Improve Your B2B Tech Company's Performance by Leveraging Data
Kiwi Creative
 
Orchestrating the Future: Navigating Today's Data Workflow Challenges with Ai...
Orchestrating the Future: Navigating Today's Data Workflow Challenges with Ai...Orchestrating the Future: Navigating Today's Data Workflow Challenges with Ai...
Orchestrating the Future: Navigating Today's Data Workflow Challenges with Ai...
Kaxil Naik
 
Experts live - Improving user adoption with AI
Experts live - Improving user adoption with AIExperts live - Improving user adoption with AI
Experts live - Improving user adoption with AI
jitskeb
 
A presentation that explain the Power BI Licensing
A presentation that explain the Power BI LicensingA presentation that explain the Power BI Licensing
A presentation that explain the Power BI Licensing
AlessioFois2
 
Monthly Management report for the Month of May 2024
Monthly Management report for the Month of May 2024Monthly Management report for the Month of May 2024
Monthly Management report for the Month of May 2024
facilitymanager11
 
原版一比一利兹贝克特大学毕业证(LeedsBeckett毕业证书)如何办理
原版一比一利兹贝克特大学毕业证(LeedsBeckett毕业证书)如何办理原版一比一利兹贝克特大学毕业证(LeedsBeckett毕业证书)如何办理
原版一比一利兹贝克特大学毕业证(LeedsBeckett毕业证书)如何办理
wyddcwye1
 
一比一原版(Unimelb毕业证书)墨尔本大学毕业证如何办理
一比一原版(Unimelb毕业证书)墨尔本大学毕业证如何办理一比一原版(Unimelb毕业证书)墨尔本大学毕业证如何办理
一比一原版(Unimelb毕业证书)墨尔本大学毕业证如何办理
xclpvhuk
 

Recently uploaded (20)

一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理
一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理
一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理
 
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
 
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
 
Challenges of Nation Building-1.pptx with more important
Challenges of Nation Building-1.pptx with more importantChallenges of Nation Building-1.pptx with more important
Challenges of Nation Building-1.pptx with more important
 
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
 
一比一原版巴斯大学毕业证(Bath毕业证书)学历如何办理
一比一原版巴斯大学毕业证(Bath毕业证书)学历如何办理一比一原版巴斯大学毕业证(Bath毕业证书)学历如何办理
一比一原版巴斯大学毕业证(Bath毕业证书)学历如何办理
 
University of New South Wales degree offer diploma Transcript
University of New South Wales degree offer diploma TranscriptUniversity of New South Wales degree offer diploma Transcript
University of New South Wales degree offer diploma Transcript
 
Module 1 ppt BIG DATA ANALYTICS_NOTES FOR MCA
Module 1 ppt BIG DATA ANALYTICS_NOTES FOR MCAModule 1 ppt BIG DATA ANALYTICS_NOTES FOR MCA
Module 1 ppt BIG DATA ANALYTICS_NOTES FOR MCA
 
Palo Alto Cortex XDR presentation .......
Palo Alto Cortex XDR presentation .......Palo Alto Cortex XDR presentation .......
Palo Alto Cortex XDR presentation .......
 
End-to-end pipeline agility - Berlin Buzzwords 2024
End-to-end pipeline agility - Berlin Buzzwords 2024End-to-end pipeline agility - Berlin Buzzwords 2024
End-to-end pipeline agility - Berlin Buzzwords 2024
 
一比一原版南十字星大学毕业证(SCU毕业证书)学历如何办理
一比一原版南十字星大学毕业证(SCU毕业证书)学历如何办理一比一原版南十字星大学毕业证(SCU毕业证书)学历如何办理
一比一原版南十字星大学毕业证(SCU毕业证书)学历如何办理
 
一比一原版英属哥伦比亚大学毕业证(UBC毕业证书)学历如何办理
一比一原版英属哥伦比亚大学毕业证(UBC毕业证书)学历如何办理一比一原版英属哥伦比亚大学毕业证(UBC毕业证书)学历如何办理
一比一原版英属哥伦比亚大学毕业证(UBC毕业证书)学历如何办理
 
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
 
Predictably Improve Your B2B Tech Company's Performance by Leveraging Data
Predictably Improve Your B2B Tech Company's Performance by Leveraging DataPredictably Improve Your B2B Tech Company's Performance by Leveraging Data
Predictably Improve Your B2B Tech Company's Performance by Leveraging Data
 
Orchestrating the Future: Navigating Today's Data Workflow Challenges with Ai...
Orchestrating the Future: Navigating Today's Data Workflow Challenges with Ai...Orchestrating the Future: Navigating Today's Data Workflow Challenges with Ai...
Orchestrating the Future: Navigating Today's Data Workflow Challenges with Ai...
 
Experts live - Improving user adoption with AI
Experts live - Improving user adoption with AIExperts live - Improving user adoption with AI
Experts live - Improving user adoption with AI
 
A presentation that explain the Power BI Licensing
A presentation that explain the Power BI LicensingA presentation that explain the Power BI Licensing
A presentation that explain the Power BI Licensing
 
Monthly Management report for the Month of May 2024
Monthly Management report for the Month of May 2024Monthly Management report for the Month of May 2024
Monthly Management report for the Month of May 2024
 
原版一比一利兹贝克特大学毕业证(LeedsBeckett毕业证书)如何办理
原版一比一利兹贝克特大学毕业证(LeedsBeckett毕业证书)如何办理原版一比一利兹贝克特大学毕业证(LeedsBeckett毕业证书)如何办理
原版一比一利兹贝克特大学毕业证(LeedsBeckett毕业证书)如何办理
 
一比一原版(Unimelb毕业证书)墨尔本大学毕业证如何办理
一比一原版(Unimelb毕业证书)墨尔本大学毕业证如何办理一比一原版(Unimelb毕业证书)墨尔本大学毕业证如何办理
一比一原版(Unimelb毕业证书)墨尔本大学毕业证如何办理
 

Information Theoretic aspect of reinforcement learning

  • 1. information theoretic aspect of reinforcement learning HA JONG SU
  • 3. PAC (Probably Approximately Correct) PAC = Theoretical base of occam's razor
  • 4. PAC (Probably Approximately Correct) R : error of learned model with given all data S: subset of data H : hypothetical space h : learned parameter, trained model The bound decreases when the data increases or the hypothetical space narrows
  • 5. Hoeffding’s inequality “Moment generating function” http://cs229.stanford.edu/extra- notes/hoeffding.pdf Infinite data is given
  • 7. difference between RL and other algorithms HMM, CRF, LSTM, GRU, Memory network, NTM, DNC has their own memory structure which is called “hidden state” and play same role as “state” in RL thus make model to have low hypothetical space CNN-like-algorithms deploy parameters to accommodate different data points with each other. The reason why hinton write Capsule net saying “we need equivalent not invariant” which has memory structure
  • 8. Simple implementations GRU implementation from the script with tensorflow https://github.com/kkugosu/PYTHON-Tensorflow---Jupyter---gru CRF-GRU implementation with pytorch https://github.com/kkugosu/PYTORCH--- dialogue_intention_extraction/blob/master/source_code/model/discrete_mo del/comp_model.py
  • 9. Limitation of PAC Bounds are vacuous Single parameter in neural network is encoded with float32 • 60parameter : H = 2^(32*60) So we need quantize the network to use this bounds. • What if even parameter is distribution??
  • 10. Bayesian neural network Gaussian process approximation Explanation and python implementation https://github.com/kkugosu/bayesian_neural_network
  • 11. PAC-bayes To estimate the performance of a model, which has an output composed of a distribution like Bayesian neural networks...
  • 15.
  • 16. Learning dynamics of RL • Model don’t need many number of data as increasing bias, like 1-step td training, high assumption, but pose high probability of distributional shift • If they have high Variance, like Monte-Carlo or naïve PG, their results are not accurate also because they need tremendous number of data.
  • 17. Learning dynamics of RL • Distributional shift-like problem is a chronic issue specific to generative models. Also occurring in generative models like GANs. it is called 'Model collapse,' which is a problem same as distributional shift • An iterative process of making assumptions upon assumptions. • Expectation <-> maximization • Q assumption based on policy <-> policy update based on Q
  • 18. Learning dynamics of RL • Low assumption(need lots of data), Sufficient exploration <-> High assumption (distributional shift) • What if we are short of data and inevitably have to use a highly biased model? • We have to deal with distributional shift!
  • 19. Learning dynamics of RL We can find optimal n-step to negotiate these problems
  • 20. Learning dynamics of RL Make model to be Robust with regularization term which usually restrict changing rate of distribution (decreasing hypothetical space pac bayes)
  • 21. TRPO (local optimality) Ensure the update of J -> Restrict the changing rate of posterior distribution
  • 23. Another basic RL algorithms • Pytorch implementation of PG, DQN, AC, DDPG, TRPO, PPO, SAC • https://github.com/kkugosu/RL_BASIC
  • 24. How to secure global optimality We restricted distribution so lowing hypothetical space But we can’t say optimality because We still can’t use PAC theory. The problem is that Data depends on the function We can solve this problem by represent uncertainty as distribution • (Bayesian reinforcement learning, distributional reinforcement learning)
  • 25. PAC-MDP (global optimality) • E3 -> Near-Optimal Reinforcement Learning in Polynomial Time • R-max –> A General Polynomial Time Algorithm for Near-Optimal Reinforcement Learning • On the Sample Complexity of Reinforcement Learning (R max modify) • MBIE -> A theoretical analysis of Model-Based Interval Estimation • MBIE–EB -> An analysis of model-based Interval Estimation for Markov Decision Processes • DELAY-Q -> PAC Model-Free Reinforcement Learning • Reinforcement Learning in Finite MDPs: PAC Analysis • PAC-inspired Option Discovery in Lifelong Reinforcement Learning
  • 26. PAC-MDP • PAC Continuous State Online Multitask Reinforcement Learning with Identification (continuous space)
  • 27. MBIE–EB (exploratory bonus) (2008) original IE version
  • 28. MBIE-EB CI for reward CI for trajectory
  • 29. MBIE-EB Bound of performance given by CI of reward and trajectory
  • 31. Thompson sampling Variance decrease as sample increase -> uncertainty decrease
  • 32. LQR Not learning or approximation process F and C are given And calculate K which is policy
  • 33. iLQR Learning process F and C is not correct but learned
  • 35. Deep PILCO Bayesian NN with sampling We can’t use iLQR because BNN can’t expressed as computable function
  • 36. GPS • modeling environment with Bayesian NN • Training policy with iLQR at the same time • Dual gradient descent makes use of strong duality in convex optimization. • https://github.com/kkugosu/RL_MODELBASED
  • 37. PAC-Bayes for MDP (2022.11) • PAC-Bayes Bounds for Bandit Problems: A Survey and Experimental Comparison • https://arxiv.org/pdf/2211.16110.pdf
  • 38. Auto ML • HPO (Hyper Parameter Optimization) • NAS (neural architecture search) • Finding most plausible H • meta learning • Increasing number of data
  • 39. Meta Reinforcement Learning • Transfer learning • hierarchical multi skill RL • contrastive multi skill RL • … • They improve their performance by solving multiple task at the same time or accumulatively
  • 40. Contrastive-Based Multi-Skill RL • Consider how much certain skill contributes compared to the entire skill set • maximizing IG(information gain) • makes skills to push each other
  • 41. Contrastive-Based Multi-Skill RL “Theoretically distributional shift doesn’t occur in this method, Flawless theory”
  • 42. diayn They aren't given a certain task, but they are trained in various skills to differentiate themselves from each other. The main objective is to achieve distinctiveness and uniqueness compared to other skills
  • 43. diayn
  • 44. diayn They are given reward to maximize information gain Regularization term strengthen feature embedding at the same time
  • 45. diayn • learning “log_probability” with neural network inherently has tuberance which is inaccuracy in approximation
  • 46. diayn • Strenghten feature towards skills that encompass that particular state with that “turberance” makes model to fall in local minimal easily which might be called “distributional shift problem” in reinforcement learning • So they added regularization term but it seems to be insufficient. This model was short of coverage in state space
  • 47. Edl, smm • SMM: Efficient Exploration • via State Marginal Matching • they try to locate skill on the uncovered area
  • 48. Edl, smm • To adderess insufficient coverage problem, they even just fix the p(s) • EDL: Explore, Discover and Learn: Unsupervised Discovery of State-Covering Skills
  • 49. apt Give up to learn distinctiveness of each skill, still strengthen feature embedding but through Learning Contrastive Representations not by distinctiveness they just maximize entropy of state which is occupied of skills
  • 50. apt Give reward to maximize the entropy of state in the space of k- nearest neighbors
  • 51. apt
  • 52. THE INFORMATION GEOMETRY OF UNSUPERVISED REINFORCEMENT LEARNING • Fixed p(s) • -> lowering the understanding of environment • -> learned skill becomes far from optimal so we have to train and find more and more skill • But if we don’t fix the p(s), the learned skills are optimal to certain reward function. • They implemented contrastive based multi skill rl on “3 state mdp” without regularization term. And found out that is optimized on some reward function which is not given while learning (not every reward function), number of skill doesn’t increase at some point
  • 53. THE INFORMATION GEOMETRY OF UNSUPERVISED REINFORCEMENT LEARNING
  • 54. THE INFORMATION GEOMETRY OF UNSUPERVISED REINFORCEMENT LEARNING
  • 55. Implementation • I implemented every contrastive multi skill algorithm on my github with pytorch • Vic, Diayn, Dads, Edl, Visr, Valor, Apt, Aps, Cic…. • https://github.com/kkugosu/RL_META
  • 56. reference • https://www.youtube.com/playlist?list=PL_iWQOsE6TfURIIhCrlt-wj9ByIVpbfGc • https://www.youtube.com/watch?v=t5GBuBD0ibc • https://www.youtube.com/watch?v=ar9RLwgUvVQ • https://www.edwith.org/bayesiandeeplearning • https://www.sciencedirect.com/science/article/pii/S0022000008000767 • https://web.stanford.edu/class/cs234/CS234Win2020/slides/lecture13.pdf • https://arxiv.org/abs/1802.06070 • https://arxiv.org/abs/1906.05274 • https://arxiv.org/abs/2103.04551 • https://arxiv.org/abs/2110.02719 • …