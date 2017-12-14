Machine Learning Reinforcement Learning 2017.12. JY
3 ㅣMachine Learning Definition of Machine Learning • Machine Learning is a field of study that give computer ability to le...
4 ㅣMachine Learning There are 3 types of Machine Learning Algorithms : • Supervised Learning • Unsupervised Learning • Rei...
5 ㅣMachine Learning There are 3 types of Machine Learning Algorithms : • Supervised Learning • Unsupervised Learning • Rei...
6 ㅣMachine Learning There are 3 types of Machine Learning Algorithms : • Supervised Learning • Unsupervised Learning • Rei...
7 ㅣMachine Learning There are 3 types of Machine Learning Algorithms : • Supervised Learning • Unsupervised Learning • Rei...
8 ㅣMachine Learning There are 3 types of Machine Learning Algorithms : • Supervised Learning • Unsupervised Learning • Rei...
9 ㅣMachine Learning There are 3 types of Machine Learning Algorithms : • Supervised Learning • Unsupervised Learning • Rei...
10 ㅣMarkov Decision Process Def. A Markov Decision Process is a tuple 𝑆, 𝐴, 𝑃, 𝑅, 𝛾 • 𝑆 is a finite set of states • 𝐴 is a...
11 ㅣMarkov Decision Process State & Action Source : Fundamental of Reinforcement Learning
12 ㅣMarkov Decision Process Markov Chain • A state ONLY depends on the previous state. • State diagram • State transition ...
13 ㅣMarkov Decision Process Markov Chain • State diagram Source : Fundamental of Reinforcement Learning
14 ㅣMarkov Decision Process Reward Def. 𝑅 is a reward function, 𝑅 𝑠 𝑎 = 𝐸𝑥𝑝 𝑅𝑡+1|𝑆𝑡 = 𝑠, 𝐴 𝑡 = 𝑎 A Markov Decision Process...
15 ㅣMarkov Decision Process Discount factor Def. 𝛾 is a discount factor 𝛾 ∈ [0,1] It’s reasonable to maximize the sum of r...
16 ㅣMarkov Decision Process Policy Def. A policy 𝜋 is a distribution over actions given states • 𝜋 𝑎 𝑠 = 𝑃𝑟𝑜𝑏[𝐴 𝑡 = 𝑎|𝑆𝑡 =...
17 ㅣMarkov Decision Process Value function • State-value function Def. The return 𝐺𝑡 is the total discounted reward from t...
18 ㅣMarkov Decision Process Example of state-value function Sample returns for Markov Reward Process: • Starting from 𝑆1 =...
19 ㅣMarkov Decision Process Value function • State-value function for policy Def. The state-value function 𝑣 𝜋(𝑠) of an Ma...
20 ㅣMarkov Decision Process Value function • Action-value function Def. The action-value function 𝑞 𝜋(𝑠, 𝑎) is the expecte...
21 ㅣBellman Equation Bellman Expectation Equation • Bellman equation for value function The value function can be decompos...
22 ㅣBellman Equation Bellman Optimality Equation • Optimal value function Def. The optimal state-value function 𝑣∗(𝑠) is t...
23 ㅣBellman Equation Bellman Optimality Equation • Optimal policy An optimal policy can be found by maximizing over 𝑞∗ 𝑠, ...
24 ㅣBellman Equation State Transition Probability Diagram Source : Fundamental of Reinforcement Learning
25 ㅣBellman Equation Bellman Optimality Equation The optimal value functions are recursively related by the Bellman Optima...
26 ㅣDynamic Programming Dynamic Programming divides problem into subproblems, which are themselves usually divided into fu...
27 ㅣReinforcement Learning
Thank you
