https://telecombcn-dl.github.io/2018-dlai/
Deep learning technologies are at the core of the current revolution in artificial intelligence for multimedia data analysis. The convergence of large-scale annotated datasets and affordable GPU hardware has allowed the training of neural networks for data analysis tasks which were previously addressed with hand-crafted features. Architectures such as convolutional neural networks, recurrent neural networks or Q-nets for reinforcement learning have shaped a brand new scenario in signal processing. This course will cover the basic principles of deep learning from both an algorithmic and computational perspectives.
GRU4Rec v2 - Recurrent Neural Networks with Top-k Gains for Session-based Rec...Balázs Hidasi
Slides of my presentation at CIKM2018 about version 2 of the GRU4Rec algorithm, a recurrent neural network based algorithm for the session-based recommendation task.
We discuss sampling strategies and introduce additional sampling to the algorithm. We also redesign the loss function to cope with additional sampling. The resulting BPR-max loss function is able to efficiently handle many negative samples without encountering the vanishing gradient problem. We also introduce constrained embeddings which speeds up the conversion of item representations and reduces memory usage by a factor of 4. These improvements increase offline measures up to 52%.
In the talk we also discuss online A/B test and the implications of long time observations. Most of these observations are exclusive to this talk and are not in the paper.
You can access the preprint version of the paper on arXiv: https://arxiv.org/abs/1706.03847
The code is available on GitHub: https://github.com/hidasib/GRU4Rec
Parallel Recurrent Neural Network Architectures for Feature-rich Session-base...Balázs Hidasi
Slides for my RecSys 2016 talk on integrating image and textual information into session based recommendations using novel parallel RNN architectures.
Link to the paper: http://www.hidasi.eu/en/publications.html#p_rnn_recsys16
Ask Me Any Rating: A Content-based Recommender System based on Recurrent Neur...Alessandro Suglia
Presentation for "Ask Me Any Rating: A Content-based Recommender System based on Recurrent Neural Networks" at the 7th Italian Information Retrieval Workshop.
See paper: http://ceur-ws.org/Vol-1653/paper_11.pdf
Deep learning to the rescue - solving long standing problems of recommender ...Balázs Hidasi
I gave this talk at the 1st Budapest RecSys and Personalization Meetup about using deep learning to solve long standing problems of recommender systems. I also presented our approach on using RNNs for session-based recommendations in details.
https://telecombcn-dl.github.io/2018-dlai/
Deep learning technologies are at the core of the current revolution in artificial intelligence for multimedia data analysis. The convergence of large-scale annotated datasets and affordable GPU hardware has allowed the training of neural networks for data analysis tasks which were previously addressed with hand-crafted features. Architectures such as convolutional neural networks, recurrent neural networks or Q-nets for reinforcement learning have shaped a brand new scenario in signal processing. This course will cover the basic principles of deep learning from both an algorithmic and computational perspectives.
GRU4Rec v2 - Recurrent Neural Networks with Top-k Gains for Session-based Rec...Balázs Hidasi
Slides of my presentation at CIKM2018 about version 2 of the GRU4Rec algorithm, a recurrent neural network based algorithm for the session-based recommendation task.
We discuss sampling strategies and introduce additional sampling to the algorithm. We also redesign the loss function to cope with additional sampling. The resulting BPR-max loss function is able to efficiently handle many negative samples without encountering the vanishing gradient problem. We also introduce constrained embeddings which speeds up the conversion of item representations and reduces memory usage by a factor of 4. These improvements increase offline measures up to 52%.
In the talk we also discuss online A/B test and the implications of long time observations. Most of these observations are exclusive to this talk and are not in the paper.
You can access the preprint version of the paper on arXiv: https://arxiv.org/abs/1706.03847
The code is available on GitHub: https://github.com/hidasib/GRU4Rec
Parallel Recurrent Neural Network Architectures for Feature-rich Session-base...Balázs Hidasi
Slides for my RecSys 2016 talk on integrating image and textual information into session based recommendations using novel parallel RNN architectures.
Link to the paper: http://www.hidasi.eu/en/publications.html#p_rnn_recsys16
Ask Me Any Rating: A Content-based Recommender System based on Recurrent Neur...Alessandro Suglia
Presentation for "Ask Me Any Rating: A Content-based Recommender System based on Recurrent Neural Networks" at the 7th Italian Information Retrieval Workshop.
See paper: http://ceur-ws.org/Vol-1653/paper_11.pdf
Deep learning to the rescue - solving long standing problems of recommender ...Balázs Hidasi
I gave this talk at the 1st Budapest RecSys and Personalization Meetup about using deep learning to solve long standing problems of recommender systems. I also presented our approach on using RNNs for session-based recommendations in details.
In this talk we discuss about the aplicação of Reinforcement Learning to Games. Recently, OpenAI created an algorithm capable of beating a human team in DOTA, considered a game with great amount of complexity and strategy. In this talk, we'll evaluate the role Reinforcement Learning plays in the world of games, taking a look at some of main achievements and how they look like in terms of implementation. We'll also take a look at some of the history of AI applied to games and how things evolved over time.
Reinforcement Learning (RL) approaches to deal with finding an optimal reward based policy to act in an environment (Charla en Inglés)
However, what has led to their widespread use is its combination with deep neural networks (DNN) i.e., deep reinforcement learning (Deep RL). Recent successes on not only learning to play games but also superseding humans in it and academia-industry research collaborations like for manipulation of objects, locomotion skills, smart grids, etc. have surely demonstrated their case on a wide variety of challenging tasks.
With application spanning across games, robotics, dialogue, healthcare, marketing, energy and many more domains, Deep RL might just be the power that drives the next generation of Artificial Intelligence (AI) agents!
With the explosive growth of online information, recommender system has been an effective tool to overcome information overload and promote sales. In recent years, deep learning's revolutionary advances in speech recognition, image analysis and natural language processing have gained significant attention. Meanwhile, recent studies also demonstrate its efficacy in coping with information retrieval and recommendation tasks. Applying deep learning techniques into recommender system has been gaining momentum due to its state-of-the-art performance. In this talk, I will present recent development of deep learning based recommender models and highlight some future challenges and open issues of this research field.
Sara Hooker & Sean McPherson, Delta Analytics, at MLconf Seattle 2017MLconf
Abstract summary
Data Science for Good: Stopping Illegal Deforestation Using Deep Learning:
Interested in using your data skills to give back? Delta Analytics is a Bay Area non-profit that provides free data science consulting to grant recipients all over the world. Rainforest Connection, a Delta grant recipient, worked with Delta fellows to detect illegal deforestation by applying deep learning to audio streamed from rainforests in Peru, Ecuador and Brazil. We will share insights from our work with Rainforest Connection, discuss our fellowship and partnership process, and suggest some best practices for skill based volunteering.
발표자: 곽동현(서울대 박사과정, 현 NAVER Clova)
강화학습(Reinforcement learning)의 개요 및 최근 Deep learning 기반의 RL 트렌드를 소개합니다.
발표영상:
http://tv.naver.com/v/2024376
https://youtu.be/dw0sHzE1oAc
[GAN by Hung-yi Lee]Part 2: The application of GAN to speech and text processingNAVER Engineering
Generative Adversarial Network and its Applications on Speech and Natural Language Processing, Part 2.
발표자: Hung-yi Lee(국립 타이완대 교수)
발표일: 18.7.
Generative adversarial network (GAN) is a new idea for training models, in which a generator and a discriminator compete against each other to improve the generation quality. Recently, GAN has shown amazing results in image generation, and a large amount and a wide variety of new ideas, techniques, and applications have been developed based on it. Although there are only few successful cases, GAN has great potential to be applied to text and speech generations to overcome limitations in the conventional methods.
In the first part of the talk, I will first give an introduction of GAN and provide a thorough review about this technology. In the second part, I will focus on the applications of GAN to speech and natural language processing. I will demonstrate the applications of GAN on voice I will also talk about the research directions towards unsupervised speech recognition by GAN.conversion, unsupervised abstractive summarization and sentiment controllable chat-bot.
Review :: Demystifying deep reinforcement learning (written by Tambet Matiisen)Hogeon Seo
The link of the original article: https://ai.intel.com/demystifying-deep-reinforcement-learning/
This review summarizes:
How do I learn reinforcement learning?
Reinforcement Learning is Hot!
What is the RL?
General approach to model the RL problem
Maximize the total future reward
A function Q(s,a) = the maximum DFR
How to get Q-function?
Deep Q Network
Experience Replay
Exploration-Exploitation
Iterative Multi-document Neural Attention for Multiple Answer PredictionClaudio Greco
Slides for the presentation of the paper "Iterative Multi-document Neural Attention for Multiple Answer Prediction" at the Deep Understanding and Reasoning: A challenge for Next-generation Intelligent Agents (URANIA) workshop, held in the context of the AI*IA 2016 conference.
발표자: 조경현 (NYU 교수)
Kyunghyun Cho is an assistant professor of computer science and data science at New York University.
He was a postdoctoral fellow at University of Montreal until summer 2015, and received PhD and MSc degrees from Aalto University early 2014.
He tries best to find a balance among machine learning, natural language processing and life, but often fails to do so.
개요:
There are three axes along which advances in machine learning and deep learning happen. They are (1) network architectures, (2) learning algorithms and (3) spatio-temporal abstraction.
In this talk, I will describe a set of research topics I’ve pursued in each of these axes.
- For network architectures, I will describe how recurrent neural networks, which were largely forgotten during 90s and early 2000s, have evolved over time and have finally become a de facto standard in machine translation.
- I continue on to discussing various learning paradigms, how they related to each other, and how they are combined in order to build a strong learning system. Along this line, I briefly discuss my latest research on designing a query-efficient imitation learning algorithm for autonomous driving.
- Lastly, I present my view on what it means to be a higher-level learning system. Under this view each and every end-to-end trainable neural network serves as a module, regardless of how they were trained, and interacts with each other in order to solve a higher-level task.
I will describe my latest research on trainable decoding algorithm as a first step toward building such a framework.
발표영상: https://youtu.be/soZXAH3leeQ (본 발표는 영어로 진행됩니다.)
Deep learning: the future of recommendationsBalázs Hidasi
An informative talk about deep learning and its potential uses in recommender systems. Presented at the Budapest Startup Safary, 21 April, 2016.
The breakthroughs of the last decade in neural network research and the quick increasing of computational power resulted in the revival of deep neural networks and the field focusing on their training: deep learning. Deep learning methods have succeeded in complex tasks where other machine learning methods have failed, such as computer vision and natural language processing. Recently deep learning has began to gain ground in recommender systems as well. This talk introduces deep learning and its applications, with emphasis on how deep learning methods can solve long standing recommendation problems.
Deep Learning in Robotics
- There are two major branches in applying deep learning techniques in robotics.
- One is to combine DL with Q learning algorithms. For example, awesome work on playing Atari games done by deep mind is a representative study. While this approach can effectively handle several problems that can hardly be solved via traditional methods, these methods are not appropriate for real manipulators as it often requires an enormous number of training data.
- The other branch of work uses a concept of guided policy search. It combines trajectory optimization methods with supervised learning algorithm like CNNs to come up with a robust 'policy' function that can actually be used in real robots, e.g., Baxter of PR2.
https://telecombcn-dl.github.io/2017-dlai/
Deep learning technologies are at the core of the current revolution in artificial intelligence for multimedia data analysis. The convergence of large-scale annotated datasets and affordable GPU hardware has allowed the training of neural networks for data analysis tasks which were previously addressed with hand-crafted features. Architectures such as convolutional neural networks, recurrent neural networks or Q-nets for reinforcement learning have shaped a brand new scenario in signal processing. This course will cover the basic principles of deep learning from both an algorithmic and computational perspectives.
Knowledge Graphs have proven to be extremely valuable to rec-
ommender systems, as they enable hybrid graph-based recommen-
dation models encompassing both collaborative and content infor-
mation. Leveraging this wealth of heterogeneous information for
top-N item recommendation is a challenging task, as it requires the
ability of effectively encoding a diversity of semantic relations and
connectivity patterns. In this work, we propose entity2rec, a novel
approach to learning user-item relatedness from knowledge graphs
for top-N item recommendation. We start from a knowledge graph
modeling user-item and item-item relations and we learn property-
specific vector representations of users and items applying neural
language models on the network. These representations are used
to create property-specific user-item relatedness features, which
are in turn fed into learning to rank algorithms to learn a global
relatedness model that optimizes top-N item recommendations. We
evaluate the proposed approach in terms of ranking quality on
the MovieLens 1M dataset, outperforming a number of state-of-
the-art recommender systems, and we assess the importance of
property-specific relatedness scores on the overall ranking quality.
In this talk we discuss about the aplicação of Reinforcement Learning to Games. Recently, OpenAI created an algorithm capable of beating a human team in DOTA, considered a game with great amount of complexity and strategy. In this talk, we'll evaluate the role Reinforcement Learning plays in the world of games, taking a look at some of main achievements and how they look like in terms of implementation. We'll also take a look at some of the history of AI applied to games and how things evolved over time.
Reinforcement Learning (RL) approaches to deal with finding an optimal reward based policy to act in an environment (Charla en Inglés)
However, what has led to their widespread use is its combination with deep neural networks (DNN) i.e., deep reinforcement learning (Deep RL). Recent successes on not only learning to play games but also superseding humans in it and academia-industry research collaborations like for manipulation of objects, locomotion skills, smart grids, etc. have surely demonstrated their case on a wide variety of challenging tasks.
With application spanning across games, robotics, dialogue, healthcare, marketing, energy and many more domains, Deep RL might just be the power that drives the next generation of Artificial Intelligence (AI) agents!
With the explosive growth of online information, recommender system has been an effective tool to overcome information overload and promote sales. In recent years, deep learning's revolutionary advances in speech recognition, image analysis and natural language processing have gained significant attention. Meanwhile, recent studies also demonstrate its efficacy in coping with information retrieval and recommendation tasks. Applying deep learning techniques into recommender system has been gaining momentum due to its state-of-the-art performance. In this talk, I will present recent development of deep learning based recommender models and highlight some future challenges and open issues of this research field.
Sara Hooker & Sean McPherson, Delta Analytics, at MLconf Seattle 2017MLconf
Abstract summary
Data Science for Good: Stopping Illegal Deforestation Using Deep Learning:
Interested in using your data skills to give back? Delta Analytics is a Bay Area non-profit that provides free data science consulting to grant recipients all over the world. Rainforest Connection, a Delta grant recipient, worked with Delta fellows to detect illegal deforestation by applying deep learning to audio streamed from rainforests in Peru, Ecuador and Brazil. We will share insights from our work with Rainforest Connection, discuss our fellowship and partnership process, and suggest some best practices for skill based volunteering.
발표자: 곽동현(서울대 박사과정, 현 NAVER Clova)
강화학습(Reinforcement learning)의 개요 및 최근 Deep learning 기반의 RL 트렌드를 소개합니다.
발표영상:
http://tv.naver.com/v/2024376
https://youtu.be/dw0sHzE1oAc
[GAN by Hung-yi Lee]Part 2: The application of GAN to speech and text processingNAVER Engineering
Generative Adversarial Network and its Applications on Speech and Natural Language Processing, Part 2.
발표자: Hung-yi Lee(국립 타이완대 교수)
발표일: 18.7.
Generative adversarial network (GAN) is a new idea for training models, in which a generator and a discriminator compete against each other to improve the generation quality. Recently, GAN has shown amazing results in image generation, and a large amount and a wide variety of new ideas, techniques, and applications have been developed based on it. Although there are only few successful cases, GAN has great potential to be applied to text and speech generations to overcome limitations in the conventional methods.
In the first part of the talk, I will first give an introduction of GAN and provide a thorough review about this technology. In the second part, I will focus on the applications of GAN to speech and natural language processing. I will demonstrate the applications of GAN on voice I will also talk about the research directions towards unsupervised speech recognition by GAN.conversion, unsupervised abstractive summarization and sentiment controllable chat-bot.
Review :: Demystifying deep reinforcement learning (written by Tambet Matiisen)Hogeon Seo
The link of the original article: https://ai.intel.com/demystifying-deep-reinforcement-learning/
This review summarizes:
How do I learn reinforcement learning?
Reinforcement Learning is Hot!
What is the RL?
General approach to model the RL problem
Maximize the total future reward
A function Q(s,a) = the maximum DFR
How to get Q-function?
Deep Q Network
Experience Replay
Exploration-Exploitation
Iterative Multi-document Neural Attention for Multiple Answer PredictionClaudio Greco
Slides for the presentation of the paper "Iterative Multi-document Neural Attention for Multiple Answer Prediction" at the Deep Understanding and Reasoning: A challenge for Next-generation Intelligent Agents (URANIA) workshop, held in the context of the AI*IA 2016 conference.
발표자: 조경현 (NYU 교수)
Kyunghyun Cho is an assistant professor of computer science and data science at New York University.
He was a postdoctoral fellow at University of Montreal until summer 2015, and received PhD and MSc degrees from Aalto University early 2014.
He tries best to find a balance among machine learning, natural language processing and life, but often fails to do so.
개요:
There are three axes along which advances in machine learning and deep learning happen. They are (1) network architectures, (2) learning algorithms and (3) spatio-temporal abstraction.
In this talk, I will describe a set of research topics I’ve pursued in each of these axes.
- For network architectures, I will describe how recurrent neural networks, which were largely forgotten during 90s and early 2000s, have evolved over time and have finally become a de facto standard in machine translation.
- I continue on to discussing various learning paradigms, how they related to each other, and how they are combined in order to build a strong learning system. Along this line, I briefly discuss my latest research on designing a query-efficient imitation learning algorithm for autonomous driving.
- Lastly, I present my view on what it means to be a higher-level learning system. Under this view each and every end-to-end trainable neural network serves as a module, regardless of how they were trained, and interacts with each other in order to solve a higher-level task.
I will describe my latest research on trainable decoding algorithm as a first step toward building such a framework.
발표영상: https://youtu.be/soZXAH3leeQ (본 발표는 영어로 진행됩니다.)
Deep learning: the future of recommendationsBalázs Hidasi
An informative talk about deep learning and its potential uses in recommender systems. Presented at the Budapest Startup Safary, 21 April, 2016.
The breakthroughs of the last decade in neural network research and the quick increasing of computational power resulted in the revival of deep neural networks and the field focusing on their training: deep learning. Deep learning methods have succeeded in complex tasks where other machine learning methods have failed, such as computer vision and natural language processing. Recently deep learning has began to gain ground in recommender systems as well. This talk introduces deep learning and its applications, with emphasis on how deep learning methods can solve long standing recommendation problems.
Deep Learning in Robotics
- There are two major branches in applying deep learning techniques in robotics.
- One is to combine DL with Q learning algorithms. For example, awesome work on playing Atari games done by deep mind is a representative study. While this approach can effectively handle several problems that can hardly be solved via traditional methods, these methods are not appropriate for real manipulators as it often requires an enormous number of training data.
- The other branch of work uses a concept of guided policy search. It combines trajectory optimization methods with supervised learning algorithm like CNNs to come up with a robust 'policy' function that can actually be used in real robots, e.g., Baxter of PR2.
https://telecombcn-dl.github.io/2017-dlai/
Deep learning technologies are at the core of the current revolution in artificial intelligence for multimedia data analysis. The convergence of large-scale annotated datasets and affordable GPU hardware has allowed the training of neural networks for data analysis tasks which were previously addressed with hand-crafted features. Architectures such as convolutional neural networks, recurrent neural networks or Q-nets for reinforcement learning have shaped a brand new scenario in signal processing. This course will cover the basic principles of deep learning from both an algorithmic and computational perspectives.
Knowledge Graphs have proven to be extremely valuable to rec-
ommender systems, as they enable hybrid graph-based recommen-
dation models encompassing both collaborative and content infor-
mation. Leveraging this wealth of heterogeneous information for
top-N item recommendation is a challenging task, as it requires the
ability of effectively encoding a diversity of semantic relations and
connectivity patterns. In this work, we propose entity2rec, a novel
approach to learning user-item relatedness from knowledge graphs
for top-N item recommendation. We start from a knowledge graph
modeling user-item and item-item relations and we learn property-
specific vector representations of users and items applying neural
language models on the network. These representations are used
to create property-specific user-item relatedness features, which
are in turn fed into learning to rank algorithms to learn a global
relatedness model that optimizes top-N item recommendations. We
evaluate the proposed approach in terms of ranking quality on
the MovieLens 1M dataset, outperforming a number of state-of-
the-art recommender systems, and we assess the importance of
property-specific relatedness scores on the overall ranking quality.
deep reinforcement learning with double q learningSeungHyeok Baek
presentation for Lab seminar
Double DQN Algorithm of Deepmind
Van Hasselt, Hado, Arthur Guez, and David Silver. "Deep Reinforcement Learning with Double Q-Learning." AAAI. Vol. 2. 2016.
https://telecombcn-dl.github.io/drl-2020/
This course presents the principles of reinforcement learning as an artificial intelligence tool based on the interaction of the machine with its environment, with applications to control tasks (eg. robotics, autonomous driving) o decision making (eg. resource optimization in wireless communication networks). It also advances in the development of deep neural networks trained with little or no supervision, both for discriminative and generative tasks, with special attention on multimedia applications (vision, language and speech).
Matineh Shaker, Artificial Intelligence Scientist, Bonsai at MLconf SF 2017MLconf
Deep Reinforcement Learning with Shallow Trees:
In this talk, I present Concept Network Reinforcement Learning (CNRL), developed at Bonsai. It is an industrially applicable approach to solving complex tasks using reinforcement learning, which facilitates problem decomposition, allows component reuse, and simplifies reward functions. Inspired by Sutton’s options framework, we introduce the notion of “Concept Networks” which are tree-like structures in which leaves are “sub-concepts” (sub-tasks), representing policies on a subset of state space. The parent (non-leaf) nodes are “Selectors”, containing policies on which sub-concept to choose from the child nodes, at each time during an episode. There will be a high-level overview on the reinforcement learning fundamentals at the beginning of the talk.
Bio: Matineh Shaker is an Artificial Intelligence Scientist at Bonsai in Berkeley, CA, where she builds machine learning, reinforcement learning, and deep learning tools and algorithms for general purpose intelligent systems. She was previously a Machine Learning Researcher at Geometric Intelligence, Data Science Fellow at Insight Data Science, Predoctoral Fellow at Harvard Medical School. She received her PhD from Northeastern University with a dissertation in geometry-inspired manifold learning.
Ask Me Any Rating: A Content-based Recommender System based on Recurrent Neur...Claudio Greco
Slides for the presentation of the paper "Ask Me Any Rating: A Content-based Recommender System based on Recurrent Neural Networks" at the 7th Italian Information Retrieval Workshop.
Deep Learning in Recommender Systems - RecSys Summer School 2017Balázs Hidasi
This is the presentation accompanying my tutorial about deep learning methods in the recommender systems domain. The tutorial consists of a brief general overview of deep learning and the introduction of the four most prominent research direction of DL in recsys as of 2017. Presented during RecSys Summer School 2017 in Bolzano, Italy.
Naver learning to rank question answer pairs using hrde-ltcNAVER Engineering
The automatic question answering (QA) task has long been considered a primary objective of artificial intelligence.
Among the QA sub-systems, we focused on answer-ranking part. In particular, we investigated a novel neural network architecture with additional data clustering module to improve the performance in ranking answer candidates which are longer than a single sentence. This work can be used not only for the QA ranking task, but also to evaluate the relevance of next utterance with given dialogue generated from the dialogue model.
In this talk, I'll present our research results (NAACL 2018), and also its potential use cases (i.e. fake news detection). Finally, I'll conclude by introducing some issues on previous research, and by introducing recent approach in academic.
Yinyin Liu presents at SD Robotics Meetup on November 8th, 2016. Deep learning has made great success in image understanding, speech, text recognition and natural language processing. Deep Learning also has tremendous potential to tackle the challenges in robotic vision, and sensorimotor learning in a robotic learning environment. In this talk, we will talk about how current and future deep learning technologies can be applied for robotic applications.
Deep Convolutional GANs - meaning of latent spaceHansol Kang
DCGAN은 GAN에 단순히 conv net을 적용했을 뿐만 아니라, latent space에서도 의미를 찾음.
DCGAN 논문 리뷰 및 PyTorch 기반의 구현.
VAE 세미나 이슈 사항에 대한 리뷰.
my github : https://github.com/messy-snail/GAN_PyTorch
[참고]
https://github.com/znxlwm/pytorch-MNIST-CelebA-GAN-DCGAN
https://github.com/taeoh-kim/Pytorch_DCGAN
Radford, Alec, Luke Metz, and Soumith Chintala. "Unsupervised representation learning with deep convolutional generative adversarial networks." arXiv preprint arXiv:1511.06434 (2015).
Asynchronous Methods for Deep Reinforcement LearningSsu-Rui Lee
My first paper presentation slides of a paper in ICML 2016.
2017/05/31 (with Mion) in NCTU course: Deep Learning and Practice
2017/09/08 in NTHU course: The Cutting Edge of Deep Learning
Paper Information:
Asynchronous Methods for Deep Reinforcement Learning
Volodymyr Mnih, Adrià Puigdomènech Badia, Mehdi Mirza, Alex Graves, Timothy P. Lillicrap, Tim Harley, David Silver, Koray Kavukcuoglu
https://arxiv.org/abs/1602.01783
Deep Reinforcement Learning Talk at PI School. Covering following contents as:
1- Deep Reinforcement Learning
2- QLearning
3- Deep QLearning (DQN)
4- Google Deepmind Paper (DQN for ATARI)
KuberTENes Birthday Bash Guadalajara - K8sGPT first impressionsVictor Morales
K8sGPT is a tool that analyzes and diagnoses Kubernetes clusters. This presentation was used to share the requirements and dependencies to deploy K8sGPT in a local environment.
Understanding Inductive Bias in Machine LearningSUTEJAS
This presentation explores the concept of inductive bias in machine learning. It explains how algorithms come with built-in assumptions and preferences that guide the learning process. You'll learn about the different types of inductive bias and how they can impact the performance and generalizability of machine learning models.
The presentation also covers the positive and negative aspects of inductive bias, along with strategies for mitigating potential drawbacks. We'll explore examples of how bias manifests in algorithms like neural networks and decision trees.
By understanding inductive bias, you can gain valuable insights into how machine learning models work and make informed decisions when building and deploying them.
NUMERICAL SIMULATIONS OF HEAT AND MASS TRANSFER IN CONDENSING HEAT EXCHANGERS...ssuser7dcef0
Power plants release a large amount of water vapor into the
atmosphere through the stack. The flue gas can be a potential
source for obtaining much needed cooling water for a power
plant. If a power plant could recover and reuse a portion of this
moisture, it could reduce its total cooling water intake
requirement. One of the most practical way to recover water
from flue gas is to use a condensing heat exchanger. The power
plant could also recover latent heat due to condensation as well
as sensible heat due to lowering the flue gas exit temperature.
Additionally, harmful acids released from the stack can be
reduced in a condensing heat exchanger by acid condensation. reduced in a condensing heat exchanger by acid condensation.
Condensation of vapors in flue gas is a complicated
phenomenon since heat and mass transfer of water vapor and
various acids simultaneously occur in the presence of noncondensable
gases such as nitrogen and oxygen. Design of a
condenser depends on the knowledge and understanding of the
heat and mass transfer processes. A computer program for
numerical simulations of water (H2O) and sulfuric acid (H2SO4)
condensation in a flue gas condensing heat exchanger was
developed using MATLAB. Governing equations based on
mass and energy balances for the system were derived to
predict variables such as flue gas exit temperature, cooling
water outlet temperature, mole fraction and condensation rates
of water and sulfuric acid vapors. The equations were solved
using an iterative solution technique with calculations of heat
and mass transfer coefficients and physical properties.
A review on techniques and modelling methodologies used for checking electrom...nooriasukmaningtyas
The proper function of the integrated circuit (IC) in an inhibiting electromagnetic environment has always been a serious concern throughout the decades of revolution in the world of electronics, from disjunct devices to today’s integrated circuit technology, where billions of transistors are combined on a single chip. The automotive industry and smart vehicles in particular, are confronting design issues such as being prone to electromagnetic interference (EMI). Electronic control devices calculate incorrect outputs because of EMI and sensors give misleading values which can prove fatal in case of automotives. In this paper, the authors have non exhaustively tried to review research work concerned with the investigation of EMI in ICs and prediction of this EMI using various modelling methodologies and measurement setups.
Harnessing WebAssembly for Real-time Stateless Streaming PipelinesChristina Lin
Traditionally, dealing with real-time data pipelines has involved significant overhead, even for straightforward tasks like data transformation or masking. However, in this talk, we’ll venture into the dynamic realm of WebAssembly (WASM) and discover how it can revolutionize the creation of stateless streaming pipelines within a Kafka (Redpanda) broker. These pipelines are adept at managing low-latency, high-data-volume scenarios.
[RLPR S2] Deep Recurrent Q-Learning for Partially Observable MDPs
1. [Season2] Reinforcement Learning Paper Review
Deep Recurrent Q-Learning
for Partially Observable MDPs
M. Hausknecht and P. Stone, AAAI, 2015
Department of Artificial Intelligence,
Korea University, Seoul
Minsuk Sung
2. [RLPR-S2] Matthew Hausknecht and Peter Stone, “Deep Recurrent Q-Learning for Partially Observable MDPs”,
Association for the Advancement of Artificial Intelligence (AAAI), 2015.
Table of Contents
• Introduction
• Deep Q-Learning
• Partial Observability
• DRQN Architecture
• Atari Games: MDP or POMDP?
• Flickering Pong POMDP
• Generalization Performance
• Can DRQN Mimic an Unobscured Pong Player?
• Evaluation on Standard Atari Games
• MDP to POMDP Generalization
• Discussion and Conclusion
• Implementation
2
3. [RLPR-S2] Matthew Hausknecht and Peter Stone, “Deep Recurrent Q-Learning for Partially Observable MDPs”,
Association for the Advancement of Artificial Intelligence (AAAI), 2015.
INTRODUCTION
3
4. [RLPR-S2] Matthew Hausknecht and Peter Stone, “Deep Recurrent Q-Learning for Partially Observable MDPs”,
Association for the Advancement of Artificial Intelligence (AAAI), 2015.
Deep Q-Network
• Q-Learning
• Estimating the state-action values (Q-values) of executing an action
from a given state
• Q-values function
• 𝑄 𝑠, 𝑎 = 𝑄 𝑠, 𝑎 + 𝛼(𝑟 + 𝛾 max
𝑎′
𝑄(𝑠′
, 𝑎′
) − 𝑄(𝑠, 𝑎))
• Limitations
• Hard to estimate Q-value when too many unique states exist
• Deep Q-Network (DQN) [Mnih et al., 2015]
• Approximating the Q-values using neural network
parameterized by weights and biases
collectively denoted as 𝜃
• Loss function
• 𝐿 𝑠, 𝑎 𝜃𝑖 = 𝑟 + 𝛾 max
𝑎′
𝑄 𝑠′
, 𝑎′
𝜃𝑖 − 𝑄 𝑠, 𝑎 𝜃𝑖
2
• Weight parameter update
• 𝜃𝑖+1 = 𝜃𝑖 + 𝛼∇ 𝜃 𝐿 𝜃𝑖
• Limitations
• Hard to approximate Q-value on partially-observable system
4
Q-Learning vs DQN
DQN Architecture
5. [RLPR-S2] Matthew Hausknecht and Peter Stone, “Deep Recurrent Q-Learning for Partially Observable MDPs”,
Association for the Advancement of Artificial Intelligence (AAAI), 2015.
Full Observability
• Markov Decision Process (MDP)
• Described by a 4-tuple 𝒮, 𝒜, 𝒫, ℛ
• At each timestep 𝑡, an agent interacting with the MDP observes a state 𝑠𝑡 ∈ 𝒮 which determines the reward
𝑟𝑡 ~ ℛ(𝑠𝑡, 𝑎 𝑡) and next state 𝑠𝑡+1 ~ 𝒫(𝑠𝑡, 𝑎 𝑡)
• Limitations of MDP
• Rare that the full state of the system can be provided to the agent or even determined
5
!
Agents Agents
?
?
6. [RLPR-S2] Matthew Hausknecht and Peter Stone, “Deep Recurrent Q-Learning for Partially Observable MDPs”,
Association for the Advancement of Artificial Intelligence (AAAI), 2015.
Partial Observability
• Partially-Observable Markov Decision Process (POMDP)
• Described by a 6-tuple 𝒮, 𝒜, 𝒫, ℛ, Ω, Ο
• 𝒮, 𝒜, 𝒫, ℛ are same with MDP
• Agent receives an observation 𝑜 ∈ Ω and 𝑜 generated from the underlying system state according to the
probability distribution 𝑜 ~ Ο
• Better captures the dynamics of many real-world environments
• In the general case, estimating a Q-value from an observation can be different since 𝑄 𝑜, 𝑎 𝜃 ≠ 𝑄 𝑠, 𝑎 𝜃
6
Examples of frozen lake on MDP (Left) and POMDP (Right)
?
?
Agents
7. [RLPR-S2] Matthew Hausknecht and Peter Stone, “Deep Recurrent Q-Learning for Partially Observable MDPs”,
Association for the Advancement of Artificial Intelligence (AAAI), 2015.
Goal
To narrow the gap between 𝑄 𝑜, 𝑎 𝜃 and 𝑄 𝑠, 𝑎 𝜃
with adding recurrency to Deep Q-Network on POMDP
7
?
?
Agents
?
OK
Agents
Already
Known
Information
8. [RLPR-S2] Matthew Hausknecht and Peter Stone, “Deep Recurrent Q-Learning for Partially Observable MDPs”,
Association for the Advancement of Artificial Intelligence (AAAI), 2015.
DRQN
ARCHITECTURE
8
9. [RLPR-S2] Matthew Hausknecht and Peter Stone, “Deep Recurrent Q-Learning for Partially Observable MDPs”,
Association for the Advancement of Artificial Intelligence (AAAI), 2015.
Architecture of DRQN
• DRQN = DQN + LSTM
• DRQN replaces DQN’s first fully connected layer with a
Long Short-Term Memory (LSTM) [Hochreiter and Schmidhuber 1997]
• Overall Architecture
• Input Layer
• Taking a single 84 × 84 preprocessed image,
instead of the last four images required by DQN
• Conv1 ~ Conv3 Layer
• Detecting paddle, ball, and interaction between
each objects
• LSTM Layer
• Detecting high-level features
• The agent missing the ball
• Ball reflections off of paddles
• Ball reflections off the walls
• Fully Connected Layer
9
Paddle
Ball
movement
Ball and
paddle
interaction
Deflections, ball velocity,
and direction of travel
High-level
features
10. [RLPR-S2] Matthew Hausknecht and Peter Stone, “Deep Recurrent Q-Learning for Partially Observable MDPs”,
Association for the Advancement of Artificial Intelligence (AAAI), 2015.
Q&A
( 1 / 3 )
10
11. [RLPR-S2] Matthew Hausknecht and Peter Stone, “Deep Recurrent Q-Learning for Partially Observable MDPs”,
Association for the Advancement of Artificial Intelligence (AAAI), 2015.
STABLE
RECURRENT UPDATES
11
12. [RLPR-S2] Matthew Hausknecht and Peter Stone, “Deep Recurrent Q-Learning for Partially Observable MDPs”,
Association for the Advancement of Artificial Intelligence (AAAI), 2015.
Two types of Updates
• Bootstrapped Sequential Updates
• Episodes are selected randomly from replay memory
• Updates begin at the beginning of the episode and proceed forward through time to conclusion of
the episode
• The targets at each timestep are generated from the target Q-network
• RNN’s hidden state is carried forward throughout the episode
• Bootstrapped Random Updates
• Episodes are selected randomly from replay memory
• Updates begin at the random points in the episode and proceed for only unroll iterations timesteps
• The targets at each timestep are generated from the target Q-network
• RNN’s initial state is zeroed at the start of the update
12
13. [RLPR-S2] Matthew Hausknecht and Peter Stone, “Deep Recurrent Q-Learning for Partially Observable MDPs”,
Association for the Advancement of Artificial Intelligence (AAAI), 2015.
Sequential Updates vs Random Updates
• Sequential Updates
• Having advantage of carrying of carrying the LSTM’s hidden state forward from the beginning of the
episode
• Violating DQN’s random sampling policy by sampling experiences sequentially for a full episode
• Random Updates
• Better adjusting to the policy of randomly sampling experience
• LSTM hidden state must be zeroed at the start of each update
• Making it harder for the LSTM to learn functions that span longer time scales
• Experiments show both types of updates yield convergent policies with similar performance
• All results in these paper use the randomized update strategy to limit complexity
13
14. [RLPR-S2] Matthew Hausknecht and Peter Stone, “Deep Recurrent Q-Learning for Partially Observable MDPs”,
Association for the Advancement of Artificial Intelligence (AAAI), 2015.
ATARI GAMES:
MDP or POMDP?
14
15. [RLPR-S2] Matthew Hausknecht and Peter Stone, “Deep Recurrent Q-Learning for Partially Observable MDPs”,
Association for the Advancement of Artificial Intelligence (AAAI), 2015.
Atari Games: MDP or POMDP?
• Atari 2600 Games
• Fully described by 128 bytes of console RAM
• Observed only the console-generated game screens
• POMDP → MDP
• A single screen during Atari games is insufficient to determine the state of the system
• DQN infers the full state of an Atari game by expanding the state representation to encompass the last
four game screens
• Modification to the Pong game
• To introduce partial observability to Atari games without reducing the number of input frames
15
16. [RLPR-S2] Matthew Hausknecht and Peter Stone, “Deep Recurrent Q-Learning for Partially Observable MDPs”,
Association for the Advancement of Artificial Intelligence (AAAI), 2015.
FLICKERING
PONG POMDP
16
17. [RLPR-S2] Matthew Hausknecht and Peter Stone, “Deep Recurrent Q-Learning for Partially Observable MDPs”,
Association for the Advancement of Artificial Intelligence (AAAI), 2015.
Flickering Pong POMDP
• Flickering Pong POMDP
• A modification to the classic game of Pong
• At each time step, the screen is either fully reveal
or fully obscured with probability 𝑝 = 0.5
• Obscuring frames induces an incomplete memory of observations
• Three types of networks to play Flickering Pong
• The recurrent 1-frame DRQN
• A standard 4-frame DQN [Mnih et al., 2015]
• An augmented 10-frame DQN
17
18. [RLPR-S2] Matthew Hausknecht and Peter Stone, “Deep Recurrent Q-Learning for Partially Observable MDPs”,
Association for the Advancement of Artificial Intelligence (AAAI), 2015.
Generalization Performance
• Evaluating the Best Policies for DRQN, 10-frame DQN, and 4-frame DQN
• Trained on Flickering Pong with 𝑝 = 0.5
• Evaluated against different 𝑝 values
• Observation Quality
• DRQN learns a policy which allows performance to scale
as a function of observation quality
• More information, more high score
• Valuable for domains in which the quality of observations
varies through time
18
19. [RLPR-S2] Matthew Hausknecht and Peter Stone, “Deep Recurrent Q-Learning for Partially Observable MDPs”,
Association for the Advancement of Artificial Intelligence (AAAI), 2015.
Q&A
( 2 / 3 )
19
20. [RLPR-S2] Matthew Hausknecht and Peter Stone, “Deep Recurrent Q-Learning for Partially Observable MDPs”,
Association for the Advancement of Artificial Intelligence (AAAI), 2015.
EVALUATION
ON STANDARD ATARI GAMES
20
21. [RLPR-S2] Matthew Hausknecht and Peter Stone, “Deep Recurrent Q-Learning for Partially Observable MDPs”,
Association for the Advancement of Artificial Intelligence (AAAI), 2015.
Evaluation on Standard Atari Games
• Selected 9 games for Evaluations for DRQN
1. Asteroids
2. Beam Rider → Worse than DQN
3. Bowling
4. Centipede
5. Chopper Command
6. Double Dunk → Better than DQN
7. Frostbite → Better than DQN
8. Ice Hockey
9. Ms. Pacman
21
22. [RLPR-S2] Matthew Hausknecht and Peter Stone, “Deep Recurrent Q-Learning for Partially Observable MDPs”,
Association for the Advancement of Artificial Intelligence (AAAI), 2015.
Evaluation on Standard Atari Games
• Selected 9 games for Evaluations for DRQN
1. Asteroids
2. Beam Rider → Worse than DQN
3. Bowling
4. Centipede
5. Chopper Command
6. Double Dunk → Better than DQN
7. Frostbite → Better than DQN
8. Ice Hockey
9. Ms. Pacman
22
23. [RLPR-S2] Matthew Hausknecht and Peter Stone, “Deep Recurrent Q-Learning for Partially Observable MDPs”,
Association for the Advancement of Artificial Intelligence (AAAI), 2015.
MDP to POMDP
GENERALIZATION
23
24. [RLPR-S2] Matthew Hausknecht and Peter Stone, “Deep Recurrent Q-Learning for Partially Observable MDPs”,
Association for the Advancement of Artificial Intelligence (AAAI), 2015.
MDP to POMDP Generalization
• Comparison Performance between DQN and DRQN
• Train: POMDP / Test: MDP
• Train: MDP / Test: POMDP
• Recurrent Controller
• Robustness against missing information, even trained with full state information
24
25. [RLPR-S2] Matthew Hausknecht and Peter Stone, “Deep Recurrent Q-Learning for Partially Observable MDPs”,
Association for the Advancement of Artificial Intelligence (AAAI), 2015.
DISCUSSION
AND CONCLUSION
25
26. [RLPR-S2] Matthew Hausknecht and Peter Stone, “Deep Recurrent Q-Learning for Partially Observable MDPs”,
Association for the Advancement of Artificial Intelligence (AAAI), 2015.
Discussion and Conclusion
• Better Performance than DQN on POMDP
• DRQN handling the noisy and incomplete characteristic of POMDPs by combining a LSTM with DQN
• Only a single frame at each step, DRQN still integrating information across frames to detect relevant
information
• Generalization Performance
• Trained with partial observations
• DRQN learns policies that are both robust enough to handle to missing game screens, and scalable enough to
improve performance as more data becomes available
• Trained with fully observations
• DRQN performs better than DQN’s at all levels of partial information
• Adding Recurrency
• Experiments show that LSTM is viable method for handling multiple state observations
26
27. [RLPR-S2] Matthew Hausknecht and Peter Stone, “Deep Recurrent Q-Learning for Partially Observable MDPs”,
Association for the Advancement of Artificial Intelligence (AAAI), 2015.
IMPLEMENTATION
27
28. [RLPR-S2] Matthew Hausknecht and Peter Stone, “Deep Recurrent Q-Learning for Partially Observable MDPs”,
Association for the Advancement of Artificial Intelligence (AAAI), 2015.
DQRN
• GitHub
• Caffe : https://github.com/mhauskn/dqn/tree/recurrent
• PyTorch : https://github.com/mynkpl1998/Recurrent-Deep-Q-Learning
28
29. [RLPR-S2] Matthew Hausknecht and Peter Stone, “Deep Recurrent Q-Learning for Partially Observable MDPs”,
Association for the Advancement of Artificial Intelligence (AAAI), 2015.
DQRN (PyTorch ver)
29
30. [RLPR-S2] Matthew Hausknecht and Peter Stone, “Deep Recurrent Q-Learning for Partially Observable MDPs”,
Association for the Advancement of Artificial Intelligence (AAAI), 2015.
Q&A
( 3 / 3 )
30
31. [RLPR-S2] Matthew Hausknecht and Peter Stone, “Deep Recurrent Q-Learning for Partially Observable MDPs”,
Association for the Advancement of Artificial Intelligence (AAAI), 2015.
THANK YOU
FOR LISTENING
31
32. [RLPR-S2] Matthew Hausknecht and Peter Stone, “Deep Recurrent Q-Learning for Partially Observable MDPs”,
Association for the Advancement of Artificial Intelligence (AAAI), 2015.
References
• Paper
• https://www.aaai.org/ocs/index.php/FSS/FSS15/paper/viewPaper/11673
• Blog
• https://sumniya.tistory.com/22
• https://medium.com/emergent-future/simple-reinforcement-learning-with-tensorflow-part-6-partial-o
bservability-and-deep-recurrent-q-68463e9aeefc
• https://jay.tech.blog/2017/02/06/deep-recurrent-q-learning/
• https://whereisend.tistory.com/108
• https://ddanggle.github.io/demystifyingDL
• Code
• https://github.com/mhauskn/dqn/tree/recurrent
• https://github.com/mynkpl1998/Recurrent-Deep-Q-Learning
• https://github.com/qfettes/DeepRL-Tutorials/blob/master/11.DRQN.ipynb
• https://github.com/awjuliani/DeepRL-Agents/blob/master/Deep-Recurrent-Q-Network.ipynb
32