SlideShare a Scribd company logo
1 of 46
Download to read offline
Presented by
Muhammed Kocabas
Human­level control through deep 
reinforcement learning
Volodymyr Mnih, Koray Kavukcuoglu, David Silver et. al.
Google Deepmind
Figure from paper
2
Outline
●
Introduction
●
What is Reinforcement Learning?
●
Q­Learning
●
Background and related work
●
Methods
●
Experiments
●
Results
●
Conclusion
●
References
Figure from link
3
Introduction
●
Former reinforcement learning agents are successful in some domains in which useful 
features can be handcrafted, or in fully observed, low dimensional state spaces.
●
Authors used the recent advances in deep neural network training, and developed 
Deep­Q­Network.
●
Deep­Q­Network:
– Learns successful policies
– End­to­end RL method.
– In Atari 2600 games, receives only the pixels and the game score as inputs just like a 
human learns and perform super­human capabilities in half of the games.
●
Comparison: Human players, linear learners
4
What is reinforcement learning?
Types of Machine Learning
Supervised Unsupervised Reinforcement
Learns to react an environment
Sparse and time delayed labels 
(Rewards)
Data driven
(Clustering)
Task driven
(Regression/Classification)
Task driven
(Regression/Classification)
RL is the problem of getting an agent to act in the world so as to 
maximize its rewards. 
5
What is reinforcement learning?
Figure from link
6
What is reinforcement learning?
●
No explicit training data set.
●
Nature provides reward for each of the learners actions.
●
At each time,
– Learner has a state and choses an action.
– Nature responds with new state and a reward.
– Learner learns from reward and makes better decisions.
7
What is reinforcement learning?
●
Main goal is to maximizing the reward Rt
●
Looking at only immediate rewards wouldn’t work well.
●
We need to take into account “future” rewards.
●
At time t, the total future reward is:
Rt = rt + rt+1 + … + rn
●
We want to take the action that maximizes Rt
●
But we have to consider the fact that:
The environment is stochastic 
8
Discounted future rewards
●
We can never be sure, if we will get the same rewards the next time we 
perform the same actions. The more into the future we go, the more it may 
diverge.
●
Main goal is to maximizing the reward Rt
Rt = rt + ° rt+1 + ° 2 rt+2 + … + ° n-t rn
●
 °  is the discount factor between 0 and 1.
●
The more into the future the reward is, the less we take it into consideration.
●
Simply:
 Rt = rt + ° (rt+1 + ° (rt+2 + … ))
Rt = rt + ° Rt+1
9
Q­Function
Q(st, at) = max Rt+1
●
The maximum discounted future reward when we perform action a in state 
s, and continue optimally from that point on.
●
The way to think about Q(st, at) is that it is “the best possible score at the end 
of the game after performing action a in state s”. 
●
It is called Q­function, because it represents the “quality” of a certain action 
in a given state.
10
Policy & Bellman equation
¼(s) = argmaxa Q(s, a)
●
¼ represents the policy, the rule how we choose an action in each state.
Q(s, a) = r + ° maxa´Q(s´, a´)
●
This is called the Bellman equation. 
●
Maximum future reward for this state and action is the immediate reward plus 
maximum future reward for the next state.
●
The basic idea behind many reinforcement learning algorithms is to estimate the 
action­value function by using the Bellman equation as an iterative update
11
Q­Learning
●
α in the algorithm is a learning rate that controls how much of the difference 
between previous Q­value and newly proposed Q­value is taken into account.
● The maxa´
Q(s´, a´) that we use to update Q(s, a) is only an approximation 
and in early stages of learning it may be completely wrong. However the 
approximation get more and more accurate with every iteration and it has been 
shown, that if we perform this update enough times, then the Q­function will 
converge and represent the true Q­value.
12
Related Work
●
Neural fitted Q­iteration (2005)
– Riedmiller, M. Neural fitted Q iteration ­ first experiences with a data efficient neural 
reinforcement learning method. Mach. Learn.: ECML
●
Deep auto­encoder NN in RL (2010)
– Lange, S. & Riedmiller, M. Deep auto­encoder neural networks in reinforcement learning. 
Proc. Int. Jt. Conf. Neural. Netw.
Comparison Papers:
●
The arcade learning environment: An evaluation platform for general agents (2013)
– Bellemare, M. G., Naddaf, Y., Veness, J. & Bowling, M. The arcade learning environment: 
An evaluation platform for general agents. J. Artif. Intell. Res.
●
Investigating contingency awareness using Atari 2600 games (2012)
– Bellemare, M. G., Veness, J. & Bowling, M. Investigating contingency awareness using 
Atari 2600 games. Proc. Conf. AAAI. Artif. Intell.
13
Neural Fitted Q­Iteration
●
This paper introduces NFQ, an algorithm for efficient and effective training of 
a Q­value function represented by a multi­layer perceptron.
●
The main drawback of this type of architecture is that a separate forward 
pass is required to compute the Q­value of each action, resulting in a cost 
that scales linearly with the number of actions.
●
This method involve the repeated training of networks de novo on hundreds 
of iterations.
Figure from paper
14
Deep Auto­Encoder NN in RL
●
This paper tries to solve visual reinforcement learning problems e.g. simple 
mazes.
●
Propose to use deep auto­encoder nn’s to to obtain low dimensional feature 
space by extracting representative features from states.
●
Then, uses kernel based approximators e.g. FQI (Fitted Q Iterations) to 
approximate Q­function over feature vectors outputted by DAE.
Figures from paper
15
Investigating contingency awareness using 
Atari 2600 games
●
Contingency awareness is the recognition that a future observation is under 
an agent’s control and not solely determined by the environment.
●
The contingent regions of an observation are the components whose value is 
dependent on the most recent choice of action.
●
They tries to learn the contingent regions in game play with a contingency 
learning method.
●
Contingency learning method is a logistic regression classifier that can 
predict whether or not a pixel belongs to the contingent regions within an 
arbitrary Atari 2600 game 
Figure from paper
16
The arcade learning environment: An 
evaluation platform for general agents
●
This paper introduces ALE – Arcade Learning Environment as a testbed for 
RL algorithms. 
●
Paper is basically a benchmark paper of feature generation methods with 
linear function approximation method.
●
Feature generation methods are: 
17
Deep Q­Learning
●
The basic idea behind many reinforcement learning algorithms is to estimate 
the action­value function by using the Bellman equation as an iterative 
update.
Qi+1(s, a) = r + ° maxa´Qi(s´, a´)
●
Such value iteration algorithms converge to the optimal action­value 
function, Qi+1 → Q* as i →
●
In practice, this basic approach is impractical, because the action­value 
function is estimated separately for each sequence, without any 
generalization.
●
It is common to use a function approximators to estimate the action­value 
function as linear or nonlinear function approximators e.g. neural networks.
8
18
Deep Q­Learning
●
The Q­function can be approximated using a neural network model.
Naive formulation of deep Q­network.  More optimized architecture of deep Q­
network, used in DeepMind paper.
Figures from link
19
Deep Q­Learning
●
To efficiently evaluate maxa´Q(s´, a´),  one should use the below 
architecture.
Figure from paperFigure from link
20
Loss Function
Target Prediction
Q­values can be any real values, which makes it a regression task, that can be 
optimized with simple squared error loss.
}
}
●
Targets depend on the network weights; this is in contrast with the targets 
used for supervised learning, which are fixed before learning begins.
●
we hold the parameters from the previous iteration      fixed when optimizing 
the ith loss function 
}
Exp. Replay
21
Gradient Update Rule
●
Stochastic gradient descent was used to optimize loss function
22
Network Model
Figure from paper
23
Network Model
●
Network is a classical convolutional neural network with three convolutional 
layers, followed by two fully connected layers.
●
There are no pooling layers:
– Pooling is for translation invariance.
– For games, the location of the ball i.e states is crucial in determining the potential 
reward.
●
4 last frames, each 84x84:
– To understand the last taken action e.g. ball speed, agent direction etc.
●
Sequences of actions and observations, are input to the algorithm, which then learns 
game strategies depending upon these sequences.
●
Discount factor °  was set to 0.99 throughout
●
Outputs of the network are Q­values for each possible action (18 in Atari).
24
Training Details
●
49 Atari 2600 games.
●
A different network for each game
●
Reward clipping:
– As the scale of scores varies greatly from game to game, we clipped all positive rewards at 
1 and all negative rewards at ­1, leaving 0 rewards unchanged.
– Clipping the rewards in this manner limits the scale of the error derivatives and makes it 
easier to use the same learning rate across multiple games.
– It could affect the performance of our agent since it can’t differentiate between rewards 
of different magnitude.
●
Minibatch size 32.
●
The behaviour policy during training was  ­greedy.ε
●
 decreases over time from 1 to 0.1 – in the beginning the system makes completely random ε
moves to explore the state space maximally, and then it settles down to a fixed exploration rate.
25
Training Details
●
Frame skipping:
– the agent sees and selects actions on every kth frame instead of every frame, and 
its last action is repeated on skipped frames.
– This technique allows the agent to play roughly k times more games without 
significantly increasing the runtime.
Figure from link
26
Unstable Non­linear Q­Learning
●
Problem: Approximation of Q­values using non­linear functions is not very 
stable.
●
Reason: Correlation present in the sequence of observations.
●
How: Small updates to Q may significantly change the policy and data 
distribution.
●
Solution: Experience replay to remove correlations in the observation.
                Seperate target network to remove correlations with the target.
27
Experience Replay
●
During gameplay all the experiences <s,a,r,s´> are stored in a replay 
memory.
●
When training the network, random minibatches from the replay memory are 
used instead of the most recent transition.
●
This breaks the similarity of subsequent training samples, which otherwise 
might drive the network into a local minimum.
●
Experience replay makes the training task more similar to usual supervised 
learning, which simplifies debugging and testing the algorithm.
●
One could actually collect all those experiences from human gameplay and 
then train network on these.
28
Experience Replay
●
Each step of experience is potentially used in many weight updates, which 
allows for greater data efficiency.
●
Learning directly from consecutive samples is inefficient, owing to the strong 
correlations between the samples.
●
Randomizing the samples breaks these correlations and therefore reduces the 
variance of the updates.
●
By using experience replay the behaviour distribution is averaged over many 
of its previous states, smoothing out learning and avoiding oscillations or 
divergence in the parameters.
●
Algorithm only stores the last N experience tuples in the replay memory, and 
samples uniformly at random from D when performing updates.
29
Experience Replay
●
To remove correlations, build data­set from agent’s own experience.
●
Sample experiences from data­set and apply update.
Replay Buffer – fixed size
30
Experience Replay ­ Analogy
Sequential­Correlated Varied Data Dist.
31
Seperate Target Network
●
To improve the stability of method with neural networks is to use a separate 
network for generating the targets in the Q­learning update.
●
Every C updates, clone the network Q to obtain a target network Q´ and use 
Q´ for generating the Q­learning targets for the following C updates to Q.
●
This modification makes the algorithm more stable compared to standard 
online Q­learning
●
Reduces oscillations or divergence of the policy.
●
Generating the targets using an older set of parameters adds a delay between 
the time an update to Q is made and the time the update affects the targets, 
making divergence or oscillations much more unlikely.
32
Seperate Target Network
Q´
Target Network
Q
Prediction Network
Input
Parameter update at every 
C iterations
33
Exploration­Exploitation
●
Firstly observe, that when a Q­table or Q­network is initialized randomly, then its 
predictions are initially random as well.
●
If we pick an action with the highest Q­value, the action will be random and the agent 
performs crude “exploration”.
●
As a Q­function converges, it returns more consistent Q­values and the amount of 
exploration decreases.
●
So one could say, that Q­learning incorporates the exploration as part of the 
algorithm.
●
A simple and effective fix for the above problem is  ­greedy exploration – with ε
probability   choose a random action, otherwise go with the “greedy” action with the ε
highest Q­value.
●
In their system DeepMind actually decreases   over time from 1 to 0.1 – in the ε
beginning the system makes completely random moves to explore the state space 
maximally, and then it settles down to a fixed exploration rate.
34
Algorithm
35
Experiments
●
Atari 2600 platform, 49 games.
●
Same network architecture for all tasks.
●
Input: visual images & number of actions
●
Results compared with:
– Bellemare, M. G., Naddaf, Y., Veness, J. & Bowling, M. The arcade learning 
environment: An evaluation platform for general agents. J. Artif. Intell. Res.
– Bellemare, M. G., Veness, J. & Bowling, M. Investigating contingency awareness 
using Atari 2600 games. Proc. Conf. AAAI. Artif. Intell.
– Human players.
36
Results
●
“Our DQN method outperforms the best existing reinforcement learning 
methods on 43 of the games without incorporating any of the additional 
prior knowledge about Atari 2600 games used by other approaches.”
●
“Our DQN agent performed at a level that was comparable to that of a pro­ 
fessional human games tester across the set of 49 games, achieving more 
than 75% of the human score on more than half of the games. (29 games)
37
Figure from paper
38
Figure from paper
States that are close in terms of reward expectation but perceptually dissimilar.
39
Results
●
In certain games DQN is able to discover a relatively long­term strategy.
– Breakout game: first dig a tunnel around the side of the wall, send the ball back.
Figure from paper
40
Results
●
Nevertheless, games demanding more temporally extended planning 
strategies still constitute a major challenge for all existing agents including 
DQN
– Montezuma’s Revenge
Figure from link
41
Results
Figure from paper
42
Results
Figure from paper
43
Results
Figure from paper
44
Figure from paper
45
Conclusion
●
Single architecture can successfully learn control policies in a range of 
different environments
– Minimal prior knowledge,
– Only pixels and game score as input,
– Same algorithm,
– Same architecture,
– Same hyperparameters,
– “Just like a human player”
46
Thank you!
? / + / ­

More Related Content

What's hot

Reinforcement learning
Reinforcement learningReinforcement learning
Reinforcement learningDing Li
 
Introduction of Deep Reinforcement Learning
Introduction of Deep Reinforcement LearningIntroduction of Deep Reinforcement Learning
Introduction of Deep Reinforcement LearningNAVER Engineering
 
QMIX: monotonic value function factorization paper review
QMIX: monotonic value function factorization paper reviewQMIX: monotonic value function factorization paper review
QMIX: monotonic value function factorization paper review민재 정
 
Multi-armed Bandits
Multi-armed BanditsMulti-armed Bandits
Multi-armed BanditsDongmin Lee
 
RLCode와 A3C 쉽고 깊게 이해하기
RLCode와 A3C 쉽고 깊게 이해하기RLCode와 A3C 쉽고 깊게 이해하기
RLCode와 A3C 쉽고 깊게 이해하기Woong won Lee
 
Reinforcement Learning Tutorial | Edureka
Reinforcement Learning Tutorial | EdurekaReinforcement Learning Tutorial | Edureka
Reinforcement Learning Tutorial | EdurekaEdureka!
 
Unsupervised learning represenation with DCGAN
Unsupervised learning represenation with DCGANUnsupervised learning represenation with DCGAN
Unsupervised learning represenation with DCGANShyam Krishna Khadka
 
CNN Attention Networks
CNN Attention NetworksCNN Attention Networks
CNN Attention NetworksTaeoh Kim
 
Intro to Reinforcement learning - part III
Intro to Reinforcement learning - part IIIIntro to Reinforcement learning - part III
Intro to Reinforcement learning - part IIIMikko Mäkipää
 
Deep Reinforcement Learning: Q-Learning
Deep Reinforcement Learning: Q-LearningDeep Reinforcement Learning: Q-Learning
Deep Reinforcement Learning: Q-LearningKai-Wen Zhao
 
Q-learning and Deep Q Network (Reinforcement Learning)
Q-learning and Deep Q Network (Reinforcement Learning)Q-learning and Deep Q Network (Reinforcement Learning)
Q-learning and Deep Q Network (Reinforcement Learning)Thom Lane
 
Continuous Control with Deep Reinforcement Learning, lillicrap et al, 2015
Continuous Control with Deep Reinforcement Learning, lillicrap et al, 2015Continuous Control with Deep Reinforcement Learning, lillicrap et al, 2015
Continuous Control with Deep Reinforcement Learning, lillicrap et al, 2015Chris Ohk
 
Reinforcement Learning
Reinforcement LearningReinforcement Learning
Reinforcement Learningbutest
 
Deep sarsa, Deep Q-learning, DQN
Deep sarsa, Deep Q-learning, DQNDeep sarsa, Deep Q-learning, DQN
Deep sarsa, Deep Q-learning, DQNEuijin Jeong
 
Playing Atari with Deep Reinforcement Learning
Playing Atari with Deep Reinforcement LearningPlaying Atari with Deep Reinforcement Learning
Playing Atari with Deep Reinforcement LearningYoonho Shin
 
Real-world Reinforcement Learning
Real-world Reinforcement LearningReal-world Reinforcement Learning
Real-world Reinforcement LearningMax Pagels
 
강화학습 해부학 교실: Rainbow 이론부터 구현까지 (2nd dlcat in Daejeon)
강화학습 해부학 교실: Rainbow 이론부터 구현까지 (2nd dlcat in Daejeon)강화학습 해부학 교실: Rainbow 이론부터 구현까지 (2nd dlcat in Daejeon)
강화학습 해부학 교실: Rainbow 이론부터 구현까지 (2nd dlcat in Daejeon)Kyunghwan Kim
 

What's hot (20)

Reinforcement learning
Reinforcement learningReinforcement learning
Reinforcement learning
 
Introduction of Deep Reinforcement Learning
Introduction of Deep Reinforcement LearningIntroduction of Deep Reinforcement Learning
Introduction of Deep Reinforcement Learning
 
Deep Reinforcement Learning
Deep Reinforcement LearningDeep Reinforcement Learning
Deep Reinforcement Learning
 
Lec3 dqn
Lec3 dqnLec3 dqn
Lec3 dqn
 
QMIX: monotonic value function factorization paper review
QMIX: monotonic value function factorization paper reviewQMIX: monotonic value function factorization paper review
QMIX: monotonic value function factorization paper review
 
Multi-armed Bandits
Multi-armed BanditsMulti-armed Bandits
Multi-armed Bandits
 
RLCode와 A3C 쉽고 깊게 이해하기
RLCode와 A3C 쉽고 깊게 이해하기RLCode와 A3C 쉽고 깊게 이해하기
RLCode와 A3C 쉽고 깊게 이해하기
 
Reinforcement Learning Tutorial | Edureka
Reinforcement Learning Tutorial | EdurekaReinforcement Learning Tutorial | Edureka
Reinforcement Learning Tutorial | Edureka
 
Unsupervised learning represenation with DCGAN
Unsupervised learning represenation with DCGANUnsupervised learning represenation with DCGAN
Unsupervised learning represenation with DCGAN
 
CNN Attention Networks
CNN Attention NetworksCNN Attention Networks
CNN Attention Networks
 
Intro to Reinforcement learning - part III
Intro to Reinforcement learning - part IIIIntro to Reinforcement learning - part III
Intro to Reinforcement learning - part III
 
Deep Reinforcement Learning: Q-Learning
Deep Reinforcement Learning: Q-LearningDeep Reinforcement Learning: Q-Learning
Deep Reinforcement Learning: Q-Learning
 
Q-learning and Deep Q Network (Reinforcement Learning)
Q-learning and Deep Q Network (Reinforcement Learning)Q-learning and Deep Q Network (Reinforcement Learning)
Q-learning and Deep Q Network (Reinforcement Learning)
 
Continuous Control with Deep Reinforcement Learning, lillicrap et al, 2015
Continuous Control with Deep Reinforcement Learning, lillicrap et al, 2015Continuous Control with Deep Reinforcement Learning, lillicrap et al, 2015
Continuous Control with Deep Reinforcement Learning, lillicrap et al, 2015
 
Reinforcement learning
Reinforcement learningReinforcement learning
Reinforcement learning
 
Reinforcement Learning
Reinforcement LearningReinforcement Learning
Reinforcement Learning
 
Deep sarsa, Deep Q-learning, DQN
Deep sarsa, Deep Q-learning, DQNDeep sarsa, Deep Q-learning, DQN
Deep sarsa, Deep Q-learning, DQN
 
Playing Atari with Deep Reinforcement Learning
Playing Atari with Deep Reinforcement LearningPlaying Atari with Deep Reinforcement Learning
Playing Atari with Deep Reinforcement Learning
 
Real-world Reinforcement Learning
Real-world Reinforcement LearningReal-world Reinforcement Learning
Real-world Reinforcement Learning
 
강화학습 해부학 교실: Rainbow 이론부터 구현까지 (2nd dlcat in Daejeon)
강화학습 해부학 교실: Rainbow 이론부터 구현까지 (2nd dlcat in Daejeon)강화학습 해부학 교실: Rainbow 이론부터 구현까지 (2nd dlcat in Daejeon)
강화학습 해부학 교실: Rainbow 이론부터 구현까지 (2nd dlcat in Daejeon)
 

Similar to Human-level Control Through Deep Reinforcement Learning (Presentation)

Frontier in reinforcement learning
Frontier in reinforcement learningFrontier in reinforcement learning
Frontier in reinforcement learningJie-Han Chen
 
An introduction to reinforcement learning
An introduction to  reinforcement learningAn introduction to  reinforcement learning
An introduction to reinforcement learningJie-Han Chen
 
Deep Learning Jump Start
Deep Learning Jump StartDeep Learning Jump Start
Deep Learning Jump StartMichele Toni
 
Deep Learning in Robotics: Robot gains Social Intelligence through Multimodal...
Deep Learning in Robotics: Robot gains Social Intelligence through Multimodal...Deep Learning in Robotics: Robot gains Social Intelligence through Multimodal...
Deep Learning in Robotics: Robot gains Social Intelligence through Multimodal...gabrielesisinna
 
Memory-based Reinforcement Learning
Memory-based Reinforcement LearningMemory-based Reinforcement Learning
Memory-based Reinforcement LearningHung Le
 
Q-Learning with a Neural Network - Xavier Giró - UPC Barcelona 2020
Q-Learning with a Neural Network - Xavier Giró - UPC Barcelona 2020Q-Learning with a Neural Network - Xavier Giró - UPC Barcelona 2020
Q-Learning with a Neural Network - Xavier Giró - UPC Barcelona 2020Universitat Politècnica de Catalunya
 
Introduction of Machine learning and Deep Learning
Introduction of Machine learning and Deep LearningIntroduction of Machine learning and Deep Learning
Introduction of Machine learning and Deep LearningMadhu Sanjeevi (Mady)
 
State Representation Learning for control: an overview
State Representation Learning for control: an overviewState Representation Learning for control: an overview
State Representation Learning for control: an overviewNatalia Díaz Rodríguez
 
MLIP - Chapter 3 - Introduction to deep learning
MLIP - Chapter 3 - Introduction to deep learningMLIP - Chapter 3 - Introduction to deep learning
MLIP - Chapter 3 - Introduction to deep learningCharles Deledalle
 
Introduction to Deep Learning for Image Analysis at Strata NYC, Sep 2015
Introduction to Deep Learning for Image Analysis at Strata NYC, Sep 2015Introduction to Deep Learning for Image Analysis at Strata NYC, Sep 2015
Introduction to Deep Learning for Image Analysis at Strata NYC, Sep 2015Turi, Inc.
 
Human Level Artificial Intelligence
Human Level Artificial IntelligenceHuman Level Artificial Intelligence
Human Level Artificial IntelligenceRahul Chaurasia
 
ML DL AI DS BD - An Introduction
ML DL AI DS BD - An IntroductionML DL AI DS BD - An Introduction
ML DL AI DS BD - An IntroductionDony Riyanto
 
Reinforcement Learning for Marlo
Reinforcement Learning for MarloReinforcement Learning for Marlo
Reinforcement Learning for MarloRuben Romero
 
Memory for Lean Reinforcement Learning.pdf
Memory for Lean Reinforcement Learning.pdfMemory for Lean Reinforcement Learning.pdf
Memory for Lean Reinforcement Learning.pdfHung Le
 
Deep Reinforcement Learning for Visual Navigation
Deep Reinforcement Learning for Visual NavigationDeep Reinforcement Learning for Visual Navigation
Deep Reinforcement Learning for Visual NavigationManish Pandey
 
Advance deep learning
Advance deep learningAdvance deep learning
Advance deep learningaliaKhan71
 
Reinforcement Learning
Reinforcement LearningReinforcement Learning
Reinforcement LearningDongHyun Kwak
 

Similar to Human-level Control Through Deep Reinforcement Learning (Presentation) (20)

Frontier in reinforcement learning
Frontier in reinforcement learningFrontier in reinforcement learning
Frontier in reinforcement learning
 
An introduction to reinforcement learning
An introduction to  reinforcement learningAn introduction to  reinforcement learning
An introduction to reinforcement learning
 
MILA DL & RL summer school highlights
MILA DL & RL summer school highlights MILA DL & RL summer school highlights
MILA DL & RL summer school highlights
 
Deep Learning Jump Start
Deep Learning Jump StartDeep Learning Jump Start
Deep Learning Jump Start
 
Deep Learning in Robotics: Robot gains Social Intelligence through Multimodal...
Deep Learning in Robotics: Robot gains Social Intelligence through Multimodal...Deep Learning in Robotics: Robot gains Social Intelligence through Multimodal...
Deep Learning in Robotics: Robot gains Social Intelligence through Multimodal...
 
Memory-based Reinforcement Learning
Memory-based Reinforcement LearningMemory-based Reinforcement Learning
Memory-based Reinforcement Learning
 
Q-Learning with a Neural Network - Xavier Giró - UPC Barcelona 2020
Q-Learning with a Neural Network - Xavier Giró - UPC Barcelona 2020Q-Learning with a Neural Network - Xavier Giró - UPC Barcelona 2020
Q-Learning with a Neural Network - Xavier Giró - UPC Barcelona 2020
 
Introduction of Machine learning and Deep Learning
Introduction of Machine learning and Deep LearningIntroduction of Machine learning and Deep Learning
Introduction of Machine learning and Deep Learning
 
State Representation Learning for control: an overview
State Representation Learning for control: an overviewState Representation Learning for control: an overview
State Representation Learning for control: an overview
 
MLIP - Chapter 3 - Introduction to deep learning
MLIP - Chapter 3 - Introduction to deep learningMLIP - Chapter 3 - Introduction to deep learning
MLIP - Chapter 3 - Introduction to deep learning
 
Introduction to Deep Learning for Image Analysis at Strata NYC, Sep 2015
Introduction to Deep Learning for Image Analysis at Strata NYC, Sep 2015Introduction to Deep Learning for Image Analysis at Strata NYC, Sep 2015
Introduction to Deep Learning for Image Analysis at Strata NYC, Sep 2015
 
Learning To Run
Learning To RunLearning To Run
Learning To Run
 
Human Level Artificial Intelligence
Human Level Artificial IntelligenceHuman Level Artificial Intelligence
Human Level Artificial Intelligence
 
ML DL AI DS BD - An Introduction
ML DL AI DS BD - An IntroductionML DL AI DS BD - An Introduction
ML DL AI DS BD - An Introduction
 
Reinforcement Learning for Marlo
Reinforcement Learning for MarloReinforcement Learning for Marlo
Reinforcement Learning for Marlo
 
Memory for Lean Reinforcement Learning.pdf
Memory for Lean Reinforcement Learning.pdfMemory for Lean Reinforcement Learning.pdf
Memory for Lean Reinforcement Learning.pdf
 
Deep Reinforcement Learning for Visual Navigation
Deep Reinforcement Learning for Visual NavigationDeep Reinforcement Learning for Visual Navigation
Deep Reinforcement Learning for Visual Navigation
 
Advance deep learning
Advance deep learningAdvance deep learning
Advance deep learning
 
Reinforcement Learning
Reinforcement LearningReinforcement Learning
Reinforcement Learning
 
Unit1: Introduction to AI
Unit1: Introduction to AIUnit1: Introduction to AI
Unit1: Introduction to AI
 

Recently uploaded

TECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providerTECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providermohitmore19
 
A Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxA Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxComplianceQuest1
 
5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdfWave PLM
 
DNT_Corporate presentation know about us
DNT_Corporate presentation know about usDNT_Corporate presentation know about us
DNT_Corporate presentation know about usDynamic Netsoft
 
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...panagenda
 
How To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.jsHow To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.jsAndolasoft Inc
 
HR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comHR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comFatema Valibhai
 
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...OnePlan Solutions
 
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online ☂️
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online  ☂️CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online  ☂️
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online ☂️anilsa9823
 
Salesforce Certified Field Service Consultant
Salesforce Certified Field Service ConsultantSalesforce Certified Field Service Consultant
Salesforce Certified Field Service ConsultantAxelRicardoTrocheRiq
 
why an Opensea Clone Script might be your perfect match.pdf
why an Opensea Clone Script might be your perfect match.pdfwhy an Opensea Clone Script might be your perfect match.pdf
why an Opensea Clone Script might be your perfect match.pdfjoe51371421
 
How To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected WorkerHow To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected WorkerThousandEyes
 
Right Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsRight Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsJhone kinadey
 
Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...OnePlan Solutions
 
Software Quality Assurance Interview Questions
Software Quality Assurance Interview QuestionsSoftware Quality Assurance Interview Questions
Software Quality Assurance Interview QuestionsArshad QA
 
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️Delhi Call girls
 
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...harshavardhanraghave
 

Recently uploaded (20)

TECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providerTECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service provider
 
A Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxA Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docx
 
5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf
 
DNT_Corporate presentation know about us
DNT_Corporate presentation know about usDNT_Corporate presentation know about us
DNT_Corporate presentation know about us
 
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
 
How To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.jsHow To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.js
 
HR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comHR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.com
 
Exploring iOS App Development: Simplifying the Process
Exploring iOS App Development: Simplifying the ProcessExploring iOS App Development: Simplifying the Process
Exploring iOS App Development: Simplifying the Process
 
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
 
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online ☂️
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online  ☂️CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online  ☂️
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online ☂️
 
Salesforce Certified Field Service Consultant
Salesforce Certified Field Service ConsultantSalesforce Certified Field Service Consultant
Salesforce Certified Field Service Consultant
 
Microsoft AI Transformation Partner Playbook.pdf
Microsoft AI Transformation Partner Playbook.pdfMicrosoft AI Transformation Partner Playbook.pdf
Microsoft AI Transformation Partner Playbook.pdf
 
Vip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS Live
Vip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS LiveVip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS Live
Vip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS Live
 
why an Opensea Clone Script might be your perfect match.pdf
why an Opensea Clone Script might be your perfect match.pdfwhy an Opensea Clone Script might be your perfect match.pdf
why an Opensea Clone Script might be your perfect match.pdf
 
How To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected WorkerHow To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected Worker
 
Right Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsRight Money Management App For Your Financial Goals
Right Money Management App For Your Financial Goals
 
Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...
 
Software Quality Assurance Interview Questions
Software Quality Assurance Interview QuestionsSoftware Quality Assurance Interview Questions
Software Quality Assurance Interview Questions
 
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
 
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
 

Human-level Control Through Deep Reinforcement Learning (Presentation)