https://telecombcn-dl.github.io/2017-dlai/
Deep learning technologies are at the core of the current revolution in artificial intelligence for multimedia data analysis. The convergence of large-scale annotated datasets and affordable GPU hardware has allowed the training of neural networks for data analysis tasks which were previously addressed with hand-crafted features. Architectures such as convolutional neural networks, recurrent neural networks or Q-nets for reinforcement learning have shaped a brand new scenario in signal processing. This course will cover the basic principles of deep learning from both an algorithmic and computational perspectives.
https://telecombcn-dl.github.io/2018-dlai/
Deep learning technologies are at the core of the current revolution in artificial intelligence for multimedia data analysis. The convergence of large-scale annotated datasets and affordable GPU hardware has allowed the training of neural networks for data analysis tasks which were previously addressed with hand-crafted features. Architectures such as convolutional neural networks, recurrent neural networks or Q-nets for reinforcement learning have shaped a brand new scenario in signal processing. This course will cover the basic principles of deep learning from both an algorithmic and computational perspectives.
Reinforcement Learning (RL) approaches to deal with finding an optimal reward based policy to act in an environment (Charla en Inglés)
However, what has led to their widespread use is its combination with deep neural networks (DNN) i.e., deep reinforcement learning (Deep RL). Recent successes on not only learning to play games but also superseding humans in it and academia-industry research collaborations like for manipulation of objects, locomotion skills, smart grids, etc. have surely demonstrated their case on a wide variety of challenging tasks.
With application spanning across games, robotics, dialogue, healthcare, marketing, energy and many more domains, Deep RL might just be the power that drives the next generation of Artificial Intelligence (AI) agents!
The lecture slides in DSAI 2018, National Cheng Kung University. It's about famous deep reinforcement learning algorithm: Actor-Critc. In this slides, we introduce advantage function, A3C/A2C.
https://telecombcn-dl.github.io/2018-dlai/
Deep learning technologies are at the core of the current revolution in artificial intelligence for multimedia data analysis. The convergence of large-scale annotated datasets and affordable GPU hardware has allowed the training of neural networks for data analysis tasks which were previously addressed with hand-crafted features. Architectures such as convolutional neural networks, recurrent neural networks or Q-nets for reinforcement learning have shaped a brand new scenario in signal processing. This course will cover the basic principles of deep learning from both an algorithmic and computational perspectives.
Reinforcement Learning (RL) approaches to deal with finding an optimal reward based policy to act in an environment (Charla en Inglés)
However, what has led to their widespread use is its combination with deep neural networks (DNN) i.e., deep reinforcement learning (Deep RL). Recent successes on not only learning to play games but also superseding humans in it and academia-industry research collaborations like for manipulation of objects, locomotion skills, smart grids, etc. have surely demonstrated their case on a wide variety of challenging tasks.
With application spanning across games, robotics, dialogue, healthcare, marketing, energy and many more domains, Deep RL might just be the power that drives the next generation of Artificial Intelligence (AI) agents!
The lecture slides in DSAI 2018, National Cheng Kung University. It's about famous deep reinforcement learning algorithm: Actor-Critc. In this slides, we introduce advantage function, A3C/A2C.
This is the lecture slides in DSAI 2018, National Cheng Kung University. In this slides, we introduce transfer learning and some examples in reinforcement learning. Besides, we also give a brief introduction to curriculum learning.
Deep Reinforcement Learning Talk at PI School. Covering following contents as:
1- Deep Reinforcement Learning
2- QLearning
3- Deep QLearning (DQN)
4- Google Deepmind Paper (DQN for ATARI)
Lecture slides in DASI spring 2018, National Cheng Kung University, Taiwan. The content is about deep reinforcement learning: policy gradient including variance reduction and importance sampling
Miriam Bellver, Xavier Giro-i-Nieto, Ferran Marques, and Jordi Torres. "Hierarchical Object Detection with Deep Reinforcement Learning." In Deep Reinforcement Learning Workshop (NIPS). 2016.
We present a method for performing hierarchical object detection in images guided by a deep reinforcement learning agent. The key idea is to focus on those parts of the image that contain richer information and zoom on them. We train an intelligent agent that, given an image window, is capable of deciding where to focus the attention among five different predefined region candidates (smaller windows). This procedure is iterated providing a hierarchical image analysis.We compare two different candidate proposal strategies to guide the object search: with and without overlap. Moreover, our work compares two different strategies to extract features from a convolutional neural network for each region proposal: a first one that computes new feature maps for each region proposal, and a second one that computes the feature maps for the whole image to later generate crops for each region proposal. Experiments indicate better results for the overlapping candidate proposal strategy and a loss of performance for the cropped image features due to the loss of spatial resolution. We argue that, while this loss seems unavoidable when working with large amounts of object candidates, the much more reduced amount of region proposals generated by our reinforcement learning agent allows considering to extract features for each location without sharing convolutional computation among regions.
https://imatge-upc.github.io/detection-2016-nipsws/
Reinforcement learning is an area of machine learning inspired by behaviorist psychology, concerned with how software agents ought to take actions in an environment so as to maximize some notion of cumulative reward.
This presentation contains an introduction to reinforcement learning, comparison with others learning ways, introduction to Q-Learning and some applications of reinforcement learning in video games.
발표자: 곽동현(서울대 박사과정, 현 NAVER Clova)
강화학습(Reinforcement learning)의 개요 및 최근 Deep learning 기반의 RL 트렌드를 소개합니다.
발표영상:
http://tv.naver.com/v/2024376
https://youtu.be/dw0sHzE1oAc
In this talk we discuss about the aplicação of Reinforcement Learning to Games. Recently, OpenAI created an algorithm capable of beating a human team in DOTA, considered a game with great amount of complexity and strategy. In this talk, we'll evaluate the role Reinforcement Learning plays in the world of games, taking a look at some of main achievements and how they look like in terms of implementation. We'll also take a look at some of the history of AI applied to games and how things evolved over time.
Financial Trading as a Game: A Deep Reinforcement Learning Approach謙益 黃
An automatic program that generates constant profit from the financial market is lucrative for every market practitioner. Recent advance in deep reinforcement learning provides a framework toward end-to-end training of such trading agent. In this paper, we propose an Markov Decision Process (MDP) model suitable for the financial trading task and solve it with the state-of-the-art deep recurrent Q-network (DRQN) algorithm. We propose several modifications to the existing learning algorithm to make it more suitable under the financial trading setting, namely 1. We employ a substantially small replay memory (only a few hundreds in size) compared to ones used in modern deep reinforcement learning algorithms (often millions in size.) 2. We develop an action augmentation technique to mitigate the need for random exploration by providing extra feedback signals for all actions to the agent. This enables us to use greedy policy over the course of learning and shows strong empirical performance compared to more commonly used ε-greedy exploration. However, this technique is specific to financial trading under a few market assumptions. 3. We sample a longer sequence for recurrent neural network training. A side product of this mechanism is that we can now train the agent for every T steps. This greatly reduces training time since the overall computation is down by a factor of T. We combine all of the above into a complete online learning algorithm and validate our approach on the spot foreign exchange market.
https://telecombcn-dl.github.io/2017-dlai/
Deep learning technologies are at the core of the current revolution in artificial intelligence for multimedia data analysis. The convergence of large-scale annotated datasets and affordable GPU hardware has allowed the training of neural networks for data analysis tasks which were previously addressed with hand-crafted features. Architectures such as convolutional neural networks, recurrent neural networks or Q-nets for reinforcement learning have shaped a brand new scenario in signal processing. This course will cover the basic principles of deep learning from both an algorithmic and computational perspectives.
https://telecombcn-dl.github.io/2017-dlai/
Deep learning technologies are at the core of the current revolution in artificial intelligence for multimedia data analysis. The convergence of large-scale annotated datasets and affordable GPU hardware has allowed the training of neural networks for data analysis tasks which were previously addressed with hand-crafted features. Architectures such as convolutional neural networks, recurrent neural networks or Q-nets for reinforcement learning have shaped a brand new scenario in signal processing. This course will cover the basic principles of deep learning from both an algorithmic and computational perspectives.
This is the lecture slides in DSAI 2018, National Cheng Kung University. In this slides, we introduce transfer learning and some examples in reinforcement learning. Besides, we also give a brief introduction to curriculum learning.
Deep Reinforcement Learning Talk at PI School. Covering following contents as:
1- Deep Reinforcement Learning
2- QLearning
3- Deep QLearning (DQN)
4- Google Deepmind Paper (DQN for ATARI)
Lecture slides in DASI spring 2018, National Cheng Kung University, Taiwan. The content is about deep reinforcement learning: policy gradient including variance reduction and importance sampling
Miriam Bellver, Xavier Giro-i-Nieto, Ferran Marques, and Jordi Torres. "Hierarchical Object Detection with Deep Reinforcement Learning." In Deep Reinforcement Learning Workshop (NIPS). 2016.
We present a method for performing hierarchical object detection in images guided by a deep reinforcement learning agent. The key idea is to focus on those parts of the image that contain richer information and zoom on them. We train an intelligent agent that, given an image window, is capable of deciding where to focus the attention among five different predefined region candidates (smaller windows). This procedure is iterated providing a hierarchical image analysis.We compare two different candidate proposal strategies to guide the object search: with and without overlap. Moreover, our work compares two different strategies to extract features from a convolutional neural network for each region proposal: a first one that computes new feature maps for each region proposal, and a second one that computes the feature maps for the whole image to later generate crops for each region proposal. Experiments indicate better results for the overlapping candidate proposal strategy and a loss of performance for the cropped image features due to the loss of spatial resolution. We argue that, while this loss seems unavoidable when working with large amounts of object candidates, the much more reduced amount of region proposals generated by our reinforcement learning agent allows considering to extract features for each location without sharing convolutional computation among regions.
https://imatge-upc.github.io/detection-2016-nipsws/
Reinforcement learning is an area of machine learning inspired by behaviorist psychology, concerned with how software agents ought to take actions in an environment so as to maximize some notion of cumulative reward.
This presentation contains an introduction to reinforcement learning, comparison with others learning ways, introduction to Q-Learning and some applications of reinforcement learning in video games.
발표자: 곽동현(서울대 박사과정, 현 NAVER Clova)
강화학습(Reinforcement learning)의 개요 및 최근 Deep learning 기반의 RL 트렌드를 소개합니다.
발표영상:
http://tv.naver.com/v/2024376
https://youtu.be/dw0sHzE1oAc
In this talk we discuss about the aplicação of Reinforcement Learning to Games. Recently, OpenAI created an algorithm capable of beating a human team in DOTA, considered a game with great amount of complexity and strategy. In this talk, we'll evaluate the role Reinforcement Learning plays in the world of games, taking a look at some of main achievements and how they look like in terms of implementation. We'll also take a look at some of the history of AI applied to games and how things evolved over time.
Financial Trading as a Game: A Deep Reinforcement Learning Approach謙益 黃
An automatic program that generates constant profit from the financial market is lucrative for every market practitioner. Recent advance in deep reinforcement learning provides a framework toward end-to-end training of such trading agent. In this paper, we propose an Markov Decision Process (MDP) model suitable for the financial trading task and solve it with the state-of-the-art deep recurrent Q-network (DRQN) algorithm. We propose several modifications to the existing learning algorithm to make it more suitable under the financial trading setting, namely 1. We employ a substantially small replay memory (only a few hundreds in size) compared to ones used in modern deep reinforcement learning algorithms (often millions in size.) 2. We develop an action augmentation technique to mitigate the need for random exploration by providing extra feedback signals for all actions to the agent. This enables us to use greedy policy over the course of learning and shows strong empirical performance compared to more commonly used ε-greedy exploration. However, this technique is specific to financial trading under a few market assumptions. 3. We sample a longer sequence for recurrent neural network training. A side product of this mechanism is that we can now train the agent for every T steps. This greatly reduces training time since the overall computation is down by a factor of T. We combine all of the above into a complete online learning algorithm and validate our approach on the spot foreign exchange market.
https://telecombcn-dl.github.io/2017-dlai/
Deep learning technologies are at the core of the current revolution in artificial intelligence for multimedia data analysis. The convergence of large-scale annotated datasets and affordable GPU hardware has allowed the training of neural networks for data analysis tasks which were previously addressed with hand-crafted features. Architectures such as convolutional neural networks, recurrent neural networks or Q-nets for reinforcement learning have shaped a brand new scenario in signal processing. This course will cover the basic principles of deep learning from both an algorithmic and computational perspectives.
https://telecombcn-dl.github.io/2017-dlai/
Deep learning technologies are at the core of the current revolution in artificial intelligence for multimedia data analysis. The convergence of large-scale annotated datasets and affordable GPU hardware has allowed the training of neural networks for data analysis tasks which were previously addressed with hand-crafted features. Architectures such as convolutional neural networks, recurrent neural networks or Q-nets for reinforcement learning have shaped a brand new scenario in signal processing. This course will cover the basic principles of deep learning from both an algorithmic and computational perspectives.
Deep learning technologies are at the core of the current revolution in artificial intelligence for multimedia data analysis. The convergence of large-scale annotated datasets and affordable GPU hardware has allowed the training of neural networks for data analysis tasks which were previously addressed with hand-crafted features. Architectures such as convolutional neural networks, recurrent neural networks or Q-nets for reinforcement learning have shaped a brand new scenario in signal processing. This course will cover the basic principles of deep learning from both an algorithmic and computational perspectives.
알파고의 작동 원리를 설명한 슬라이드입니다.
English version: http://www.slideshare.net/ShaneSeungwhanMoon/how-alphago-works
- 비전공자 분들을 위한 티저: 바둑 인공지능은 과연 어떻게 만들까요? 딥러닝 딥러닝 하는데 그게 뭘까요? 바둑 인공지능은 또 어디에 쓰일 수 있을까요?
- 전공자 분들을 위한 티저: 알파고의 main components는 재밌게도 CNN (Convolutional Neural Network), 그리고 30년 전부터 유행하던 Reinforcement learning framework와 MCTS (Monte Carlo Tree Search) 정도입니다. 새로울 게 없는 재료들이지만 적절히 활용하는 방법이 신선하네요.
https://telecombcn-dl.github.io/2018-dlai/
Deep learning technologies are at the core of the current revolution in artificial intelligence for multimedia data analysis. The convergence of large-scale annotated datasets and affordable GPU hardware has allowed the training of neural networks for data analysis tasks which were previously addressed with hand-crafted features. Architectures such as convolutional neural networks, recurrent neural networks or Q-nets for reinforcement learning have shaped a brand new scenario in signal processing. This course will cover the basic principles of deep learning from both an algorithmic and computational perspectives.
Unlocking Exploration: Self-Motivated Agents Thrive on Memory-Driven CuriosityHung Le
Despite remarkable successes in various domains such as robotics and games, Reinforcement Learning (RL) still struggles with exploration inefficiency. For example, in hard Atari games, state-of-the-art agents often require billions of trial actions, equivalent to years of practice, while a moderately skilled human player can achieve the same score in just a few hours of play. This contrast emerges from the difference in exploration strategies between humans, leveraging memory, intuition and experience, and current RL agents, primarily relying on random trials and errors. This tutorial reviews recent advances in enhancing RL exploration efficiency through intrinsic motivation or curiosity, allowing agents to navigate environments without external rewards. Unlike previous surveys, we analyze intrinsic motivation through a memory-centric perspective, drawing parallels between human and agent curiosity, and providing a memory-driven taxonomy of intrinsic motivation approaches.
The talk consists of three main parts. Part A provides a brief introduction to RL basics, delves into the historical context of the explore-exploit dilemma, and raises the challenge of exploration inefficiency. In Part B, we present a taxonomy of self-motivated agents leveraging deliberate, RAM-like, and replay memory models to compute surprise, novelty, and goal, respectively. Part C explores advanced topics, presenting recent methods using language models and causality for exploration. Whenever possible, case studies and hands-on coding demonstrations. will be presented.
Matineh Shaker, Artificial Intelligence Scientist, Bonsai at MLconf SF 2017MLconf
Deep Reinforcement Learning with Shallow Trees:
In this talk, I present Concept Network Reinforcement Learning (CNRL), developed at Bonsai. It is an industrially applicable approach to solving complex tasks using reinforcement learning, which facilitates problem decomposition, allows component reuse, and simplifies reward functions. Inspired by Sutton’s options framework, we introduce the notion of “Concept Networks” which are tree-like structures in which leaves are “sub-concepts” (sub-tasks), representing policies on a subset of state space. The parent (non-leaf) nodes are “Selectors”, containing policies on which sub-concept to choose from the child nodes, at each time during an episode. There will be a high-level overview on the reinforcement learning fundamentals at the beginning of the talk.
Bio: Matineh Shaker is an Artificial Intelligence Scientist at Bonsai in Berkeley, CA, where she builds machine learning, reinforcement learning, and deep learning tools and algorithms for general purpose intelligent systems. She was previously a Machine Learning Researcher at Geometric Intelligence, Data Science Fellow at Insight Data Science, Predoctoral Fellow at Harvard Medical School. She received her PhD from Northeastern University with a dissertation in geometry-inspired manifold learning.
This presentation is for introducing google DeepMind's DeepDPG algorithm to my colleagues.
I tried my best to make it easy to be understood...
Comment is always welcome :)
hiddenmaze91.blogspot.com
An efficient use of temporal difference technique in Computer Game LearningPrabhu Kumar
A computer game using temporal difference algorithm of Machine learning which improves the ability of the computer to learn and also explore the best next move for the game by greedy movement techniques and exploration method techniques for the future states of the game.
https://telecombcn-dl.github.io/drl-2020/
This course presents the principles of reinforcement learning as an artificial intelligence tool based on the interaction of the machine with its environment, with applications to control tasks (eg. robotics, autonomous driving) o decision making (eg. resource optimization in wireless communication networks). It also advances in the development of deep neural networks trained with little or no supervision, both for discriminative and generative tasks, with special attention on multimedia applications (vision, language and speech).
We present our approach for the NIPS 2017 "Learning To Run" challenge. The goal of the challenge is to develop a controller able to run in a complex environment, by training a model with Deep Reinforcement Learning methods.
We follow the approach of the team Reason8 (3rd place). We begin from the algorithm that performed better on the task, DDPG. We implement and benchmark several improvements over vanilla DDPG, including parallel sampling, parameter noise, layer normalization and domain specific changes. We were able to reproduce results of the Reason8 team, obtaining a model able to run for more than 30m.
this talk was an introduction to Reinforcement Learning based on the book by Andrew Barto and Richard S. Sutton. We explained the main components of an RL problem and detailed the tabular solutions and approximate solutions methods.
Similar to Reinforcement Learning (DLAI D7L2 2017 UPC Deep Learning for Artificial Intelligence) (20)
This document provides an overview of deep generative learning and summarizes several key generative models including GANs, VAEs, diffusion models, and autoregressive models. It discusses the motivation for generative models and their applications such as image generation, text-to-image synthesis, and enhancing other media like video and speech. Example state-of-the-art models are provided for each application. The document also covers important concepts like the difference between discriminative and generative modeling, sampling techniques, and the training procedures for GANs and VAEs.
Machine translation and computer vision have greatly benefited from the advances in deep learning. A large and diverse amount of textual and visual data have been used to train neural networks whether in a supervised or self-supervised manner. Nevertheless, the convergence of the two fields in sign language translation and production still poses multiple open challenges, like the low video resources, limitations in hand pose estimation, or 3D spatial grounding from poses.
The transformer is the neural architecture that has received most attention in the early 2020's. It removed the recurrency in RNNs, replacing it with and attention mechanism across the input and output tokens of a sequence (cross-attenntion) and between the tokens composing the input (and output) sequences, named self-attention.
These slides review the research of our lab since 2016 on applied deep learning, starting from our participation in the TRECVID Instance Search 2014, moving into video analysis with CNN+RNN architectures, and our current efforts in sign language translation and production.
Machine translation and computer vision have greatly benefited of the advances in deep learning. The large and diverse amount of textual and visual data have been used to train neural networks whether in a supervised or self-supervised manner. Nevertheless, the convergence of the two field in sign language translation and production is still poses multiple open challenges, like the low video resources, limitations in hand pose estimation, or 3D spatial grounding from poses. This talk will present these challenges and the How2✌️Sign dataset (https://how2sign.github.io) recorded at CMU in collaboration with UPC, BSC, Gallaudet University and Facebook.
https://imatge.upc.edu/web/publications/sign-language-translation-and-production-multimedia-and-multimodal-challenges-all
https://imatge-upc.github.io/synthref/
Integrating computer vision with natural language processing has achieved significant progress
over the last years owing to the continuous evolution of deep learning. A novel vision and language
task, which is tackled in the present Master thesis is referring video object segmentation, in which a
language query defines which instance to segment from a video sequence. One of the biggest chal-
lenges for this task is the lack of relatively large annotated datasets since a tremendous amount of
time and human effort is required for annotation. Moreover, existing datasets suffer from poor qual-
ity annotations in the sense that approximately one out of ten language expressions fails to uniquely
describe the target object.
The purpose of the present Master thesis is to address these challenges by proposing a novel
method for generating synthetic referring expressions for an image (video frame). This method pro-
duces synthetic referring expressions by using only the ground-truth annotations of the objects as well
as their attributes, which are detected by a state-of-the-art object detection deep neural network. One
of the advantages of the proposed method is that its formulation allows its application to any object
detection or segmentation dataset.
By using the proposed method, the first large-scale dataset with synthetic referring expressions for
video object segmentation is created, based on an existing large benchmark dataset for video instance
segmentation. A statistical analysis and comparison of the created synthetic dataset with existing ones
is also provided in the present Master thesis.
The conducted experiments on three different datasets used for referring video object segmen-
tation prove the efficiency of the generated synthetic data. More specifically, the obtained results
demonstrate that by pre-training a deep neural network with the proposed synthetic dataset one can
improve the ability of the network to generalize across different datasets, without any additional annotation cost. This outcome is even more important taking into account that no additional annotation cost is involved.
Master MATT thesis defense by Juan José Nieto
Advised by Víctor Campos and Xavier Giro-i-Nieto.
27th May 2021.
Pre-training Reinforcement Learning (RL) agents in a task-agnostic manner has shown promising results. However, previous works still struggle to learn and discover meaningful skills in high-dimensional state-spaces. We approach the problem by leveraging unsupervised skill discovery and self-supervised learning of state representations. In our work, we learn a compact latent representation by making use of variational or contrastive techniques. We demonstrate that both allow learning a set of basic navigation skills by maximizing an information theoretic objective. We assess our method in Minecraft 3D maps with different complexities. Our results show that representations and conditioned policies learned from pixels are enough for toy examples, but do not scale to realistic and complex maps. We also explore alternative rewards and input observations to overcome these limitations.
https://imatge.upc.edu/web/publications/discovery-and-learning-navigation-goals-pixels-minecraft
Peter Muschick MSc thesis
Universitat Pollitecnica de Catalunya, 2020
Sign language recognition and translation has been an active research field in the recent years with most approaches using deep neural networks to extract information from sign language data. This work investigates the mostly disregarded approach of using human keypoint estimation from image and video data with OpenPose in combination with transformer network architecture. Firstly, it was shown that it is possible to recognize individual signs (4.5% word error rate (WER)). Continuous sign language recognition though was more error prone (77.3% WER) and sign language translation was not possible using the proposed methods, which might be due to low accuracy scores of human keypoint estimation by OpenPose and accompanying loss of information or insufficient capacities of the used transformer model. Results may improve with the use of datasets containing higher repetition rates of individual signs or focusing more precisely on keypoint extraction of hands.
https://github.com/telecombcn-dl/lectures-all/
These slides review techniques for interpreting the behavior of deep neural networks. The talk reviews basic techniques such as the display of filters and tensors, as well as more advanced ones that try to interpret which part of the input data is responsible for the predictions, or generate data that maximizes the activation of certain neurons.
Deep learning technologies are at the core of the current revolution in artificial intelligence for multimedia data analysis. The convergence of large-scale annotated datasets and affordable GPU hardware has allowed the training of neural networks for data analysis tasks which were previously addressed with hand-crafted features. Architectures such as convolutional neural networks, recurrent neural networks or Q-nets for reinforcement learning have shaped a brand new scenario in signal processing. This course will cover the basic principles of deep learning from both an algorithmic and computational perspectives.
Deep learning technologies are at the core of the current revolution in artificial intelligence for multimedia data analysis. The convergence of large-scale annotated datasets and affordable GPU hardware has allowed the training of neural networks for data analysis tasks which were previously addressed with hand-crafted features. Architectures such as convolutional neural networks, recurrent neural networks or Q-nets for reinforcement learning have shaped a brand new scenario in signal processing. This course will cover the basic principles of deep learning from both an algorithmic and computational perspectives.
Deep learning technologies are at the core of the current revolution in artificial intelligence for multimedia data analysis. The convergence of large-scale annotated datasets and affordable GPU hardware has allowed the training of neural networks for data analysis tasks which were previously addressed with hand-crafted features. Architectures such as convolutional neural networks, recurrent neural networks or Q-nets for reinforcement learning have shaped a brand new scenario in signal processing. This course will cover the basic principles of deep learning from both an algorithmic and computational perspectives.
https://telecombcn-dl.github.io/dlai-2020/
Deep learning technologies are at the core of the current revolution in artificial intelligence for multimedia data analysis. The convergence of large-scale annotated datasets and affordable GPU hardware has allowed the training of neural networks for data analysis tasks which were previously addressed with hand-crafted features. Architectures such as convolutional neural networks, recurrent neural networks or Q-nets for reinforcement learning have shaped a brand new scenario in signal processing. This course will cover the basic principles of deep learning from both an algorithmic and computational perspectives.
Giro-i-Nieto, X. One Perceptron to Rule Them All: Language, Vision, Audio and Speech. In Proceedings of the 2020 International Conference on Multimedia Retrieval (pp. 7-8).
Tutorial page:
https://imatge.upc.edu/web/publications/one-perceptron-rule-them-all-language-vision-audio-and-speech-tutorial
Deep neural networks have boosted the convergence of multimedia data analytics in a unified framework shared by practitioners in natural language, vision and speech. Image captioning, lip reading or video sonorization are some of the first applications of a new and exciting field of research exploiting the generalization properties of deep neural representation. This tutorial will firstly review the basic neural architectures to encode and decode vision, text and audio, to later review the those models that have successfully translated information across modalities.
Image segmentation is a classic computer vision task that aims at labeling pixels with semantic classes. These slides provide an overview of the basic approaches applied from the deep learning field to tackle this challenge and presents the basic subtasks (semantic, instance and panoptic segmentation) and related datasets.
Presented at the International Summer School on Deep Learning (ISSonDL) 2020 held online and organized by the University of Gdansk (Poland) between the 30th August and 2nd September.
http://2020.dl-lab.eu/virtual-summer-school-on-deep-learning/
https://imatge-upc.github.io/rvos-mots/
Video object segmentation can be understood as a sequence-to-sequence task that can benefit from the curriculum learning strategies for better and faster training of deep neural networks. This work explores different schedule sampling and frame skipping variations to significantly improve the performance of a recurrent architecture. Our results on the car class of the KITTI-MOTS challenge indicate that, surprisingly, an inverse schedule sampling is a better option than a classic forward one. Also, that a progressive skipping of frames during training is beneficial, but only when training with the ground truth masks instead of the predicted ones.
Deep neural networks have achieved outstanding results in various applications such as vision, language, audio, speech, or reinforcement learning. These powerful function approximators typically require large amounts of data to be trained, which poses a challenge in the usual case where little labeled data is available. During the last year, multiple solutions have been proposed to leverage this problem, based on the concept of self-supervised learning, which can be understood as a specific case of unsupervised learning. This talk will cover its basic principles and provide examples in the field of multimedia.
Deep neural networks have revolutionized the data analytics scene by improving results in several and diverse benchmarks with the same recipe: learning feature representations from data. These achievements have raised the interest across multiple scientific fields, especially in those where large amounts of data and computation are available. This change of paradigm in data analytics has several ethical and economic implications that are driving large investments, political debates and sounding press coverage under the generic label of artificial intelligence (AI). This talk will present the fundamentals of deep learning through the classic example of image classification, and point at how the same principal has been adopted for several tasks. Finally, some of the forthcoming potentials and risks for AI will be pointed.
More from Universitat Politècnica de Catalunya (20)
Analysis insight about a Flyball dog competition team's performanceroli9797
Insight of my analysis about a Flyball dog competition team's last year performance. Find more: https://github.com/rolandnagy-ds/flyball_race_analysis/tree/main
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Data and AI
Round table discussion of vector databases, unstructured data, ai, big data, real-time, robots and Milvus.
A lively discussion with NJ Gen AI Meetup Lead, Prasad and Procure.FYI's Co-Found
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Data and AI
Discussion on Vector Databases, Unstructured Data and AI
https://www.meetup.com/unstructured-data-meetup-new-york/
This meetup is for people working in unstructured data. Speakers will come present about related topics such as vector databases, LLMs, and managing data at scale. The intended audience of this group includes roles like machine learning engineers, data scientists, data engineers, software engineers, and PMs.This meetup was formerly Milvus Meetup, and is sponsored by Zilliz maintainers of Milvus.
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...John Andrews
SlideShare Description for "Chatty Kathy - UNC Bootcamp Final Project Presentation"
Title: Chatty Kathy: Enhancing Physical Activity Among Older Adults
Description:
Discover how Chatty Kathy, an innovative project developed at the UNC Bootcamp, aims to tackle the challenge of low physical activity among older adults. Our AI-driven solution uses peer interaction to boost and sustain exercise levels, significantly improving health outcomes. This presentation covers our problem statement, the rationale behind Chatty Kathy, synthetic data and persona creation, model performance metrics, a visual demonstration of the project, and potential future developments. Join us for an insightful Q&A session to explore the potential of this groundbreaking project.
Project Team: Jay Requarth, Jana Avery, John Andrews, Dr. Dick Davis II, Nee Buntoum, Nam Yeongjin & Mat Nicholas
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdfGetInData
Recently we have observed the rise of open-source Large Language Models (LLMs) that are community-driven or developed by the AI market leaders, such as Meta (Llama3), Databricks (DBRX) and Snowflake (Arctic). On the other hand, there is a growth in interest in specialized, carefully fine-tuned yet relatively small models that can efficiently assist programmers in day-to-day tasks. Finally, Retrieval-Augmented Generation (RAG) architectures have gained a lot of traction as the preferred approach for LLMs context and prompt augmentation for building conversational SQL data copilots, code copilots and chatbots.
In this presentation, we will show how we built upon these three concepts a robust Data Copilot that can help to democratize access to company data assets and boost performance of everyone working with data platforms.
Why do we need yet another (open-source ) Copilot?
How can we build one?
Architecture and evaluation
Adjusting OpenMP PageRank : SHORT REPORT / NOTESSubhajit Sahu
For massive graphs that fit in RAM, but not in GPU memory, it is possible to take
advantage of a shared memory system with multiple CPUs, each with multiple cores, to
accelerate pagerank computation. If the NUMA architecture of the system is properly taken
into account with good vertex partitioning, the speedup can be significant. To take steps in
this direction, experiments are conducted to implement pagerank in OpenMP using two
different approaches, uniform and hybrid. The uniform approach runs all primitives required
for pagerank in OpenMP mode (with multiple threads). On the other hand, the hybrid
approach runs certain primitives in sequential mode (i.e., sumAt, multiply).
2. 2
Acknowledegments
Bellver M, Giró-i-Nieto X, Marqués F, Torres J. Hierarchical Object Detection with Deep Reinforcement Learning. In Deep
Reinforcement Learning Workshop, NIPS 2016. 2016.
6. 6
Motivation
What is Reinforcement Learning ?
“a way of programming agents by reward and punishment without needing to
specify how the task is to be achieved”
[Kaelbling, Littman, & Moore, 96]
Kaelbling, Leslie Pack, Michael L. Littman, and Andrew W. Moore. "Reinforcement learning: A survey." Journal of artificial
intelligence research 4 (1996): 237-285.
8. We can categorize three types of learning procedures:
1. Supervised Learning:
= ƒ( )
2. Unsupervised Learning:
ƒ( )
3. Reinforcement Learning (RL):
= ƒ( )
8
Predict label y corresponding to
observation x
Estimate the distribution of
observation x
Predict action y based on
observation x, to maximize a future
reward z
Motivation
9. We can categorize three types of learning procedures:
1. Supervised Learning:
= ƒ( )
2. Unsupervised Learning:
ƒ( )
3. Reinforcement Learning (RL):
= ƒ( )
9
Motivation
11. 11
Mnih, Volodymyr, Koray Kavukcuoglu, David Silver, Alex Graves, Ioannis Antonoglou, Daan Wierstra, and Martin Riedmiller.
"Playing atari with deep reinforcement learning." arXiv preprint arXiv:1312.5602 (2013).
18. 18Figure: UCL Course on RL by David Silver
Environment
Agent
state (st
)
Architecture
19. 19Figure: UCL Course on RL by David Silver
Environment
Agent
action (At
)state (st
)
Architecture
20. 20Figure: UCL Course on RL by David Silver
Environment
Agent
action (At
)state (st
)
Architecture
21. 21Figure: UCL Course on RL by David Silver
Environment
Agent
action (At
)
reward (rt
)
state (st
)
Architecture
22. 22Figure: UCL Course on RL by David Silver
Environment
Agent
action (At
)
reward (rt
)
state (st
)
Architecture
Reward is given to
the agent delayed
with respect to
previous states and
actions !
23. 23Figure: UCL Course on RL by David Silver
Environment
Agent
action (At
)
reward (rt
)
state (st+1
)
Architecture
24. 24Figure: UCL Course on RL by David Silver
Environment
Agent
action (At
)
reward (rt
)
state (st+1
)
Architecture GOAL: Complete the
game with the
highest score.
25. 25Figure: UCL Course on RL by David Silver
Environment
Agent
action (At
)
reward (rt
)
state (st+1
)
Architecture GOAL: Learn how to
take actions to
maximize
accumulative reward
26. 26
Other problems that can be formulated with a RL architecture.
Cart-Pole Problem
Objective: Balance a pole on top of a movable car
Architecture
Slide credit: Serena Yeung, “Deep Reinforcement Learning”. Stanford University CS231n, 2017.
27. 27
Architecture
Slide concept: Serena Yeung, “Deep Reinforcement Learning”. Stanford University CS231n, 2017.
Environment
Agent
action (At
)
reward (rt
)
state (st
) Angle
Angular speed
Position
Horizontal velocity
Horizontal force
applied in the car1 at each time
step if the pole is
upright
28. 28
Other problems that can be formulated with a RL architecture.
Robot Locomotion
Objective: Make the robot move forward
Architecture
Schulman, John, Philipp Moritz, Sergey Levine, Michael Jordan, and Pieter Abbeel. "High-dimensional continuous control using generalized
advantage estimation." ICLR 2016 [project page]
29. 29
Architecture
Slide concept: Serena Yeung, “Deep Reinforcement Learning”. Stanford University CS231n, 2017.
Environment
Agent
action (At
)
reward (rt
)
state (st
) Angle and position
of the joints
Torques applied
on joints1 at each time
step upright +
forward
movement
30. 30
Schulman, John, Philipp Moritz, Sergey Levine, Michael Jordan, and Pieter Abbeel. "High-dimensional continuous control using generalized
advantage estimation." ICLR 2016 [project page]
31. 31
Outline
1. Motivation
2. Architecture
3. Markov Decision Process (MDP)
○ Policy
○ Optimal Policy
○ Value Function
○ Q-value function
○ Optimal Q-value function
○ Bellman equation
○ Value iteration algorithm
4. Deep Q-learning
5. RL Frameworks
6. Learn more
32. 32
Markov Decision Processes (MDP)
Markov Decision Processes provide a formalism for reinforcement learning
problems.
Slide concept: Serena Yeung, “Deep Reinforcement Learning”. Stanford University CS231n, 2017.
Markov property:
Current state completely
characterises the state of the world.
33. 33
Markov Decision Processes (MDP)
Slide concept: Serena Yeung, “Deep Reinforcement Learning”. Stanford University CS231n, 2017.
S A R P ४
34. 34
Markov Decision Processes (MDP)
Slide concept: Serena Yeung, “Deep Reinforcement Learning”. Stanford University CS231n, 2017.
S A R P ४
Environment
samples initial
state s0
~ p(s0
)
Agent
selects
action at
Environment samples
next state st+1
~ P ( .| st
, at
)
Environment samples
reward rt
~ R(. | st
,at
) reward
(rt
)
state (st
) action (at
)
35. 35
MDP: Policy
Slide concept: Serena Yeung, “Deep Reinforcement Learning”. Stanford University CS231n, 2017.
S A R P ४
Agent
selects
action at
policy π
A Policy π is a function S ➝ A that specifies which action to take in each state.
36. 36
MDP: Policy
Slide concept: Serena Yeung, “Deep Reinforcement Learning”. Stanford University CS231n, 2017.
Agent
selects
action at
policy π
A Policy π is a function S ➝ A that specifies which action to take in each state.
GOAL: Learn how
to take actions to
maximize reward
Agent
GOAL: Find policy π* that
maximizes the cumulative
discounted reward:
MDP
37. 37
Other problems that can be formulated with a RL architecture.
Slide concept: Serena Yeung, “Deep Reinforcement Learning”. Stanford University CS231n, 2017.
MDP: Policy
Grid World (a simple MDP)
Objective: reach one of the terminal
states (greyed out) in least number of
actions.
38. 38Slide concept: Serena Yeung, “Deep Reinforcement Learning”. Stanford University CS231n, 2017.
Environment
Agent
action (At
)
reward (rt
)
state (st
)
Each cell is a state:
A negative
“reward” (penalty)
for each transition
rt
= r = -1
MDP: Policy
39. 39Slide concept: Serena Yeung, “Deep Reinforcement Learning”. Stanford University CS231n, 2017.
MDP: Policy
Example: Actions resulting from applying a random policy on this Grid World
problem.
40. 40Slide concept: Serena Yeung, “Deep Reinforcement Learning”. Stanford University CS231n, 2017.
Exercise: Draw the actions resulting from applying an optimal policy in this Grid
World problem.
MDP: Optimal Policy π*
41. 41Slide concept: Serena Yeung, “Deep Reinforcement Learning”. Stanford University CS231n, 2017.
Solution: Draw the actions resulting from applying an optimal policy in this Grid World
problem.
MDP: Optimal Policy π*
42. 42
MDP: Optimal Policy π*
Slide concept: Serena Yeung, “Deep Reinforcement Learning”. Stanford University CS231n, 2017.
How do we handle the randomness (initial
state s0
, transition probabilities, action...) ?
GOAL: Find policy π* that
maximizes the cumulative
discounted reward:
Environment
samples initial
state s0
~ p(s0
)
Agent selects
action at
~π
(.|st
)
Environment samples
next state st+1
~ P ( .| st
, at
)
Environment samples
reward rt
~ R(. | st
,at
) reward
(rt
)
state (st
) action (at
)
43. 43Slide concept: Serena Yeung, “Deep Reinforcement Learning”. Stanford University CS231n, 2017.
How do we handle the randomness (initial
state s0
, transition probabilities, action) ?
GOAL: Find policy π* that
maximizes the cumulative
discounted reward:
The optimal policy π* will maximize the
expected sum of rewards:
initial
state
selected
action at t
sampled state for
t+1expected cumulative
discounted reward
MDP: Optimal Policy π*
44. 44
MDP: Policy: Value function Vπ
(s)
Slide concept: Serena Yeung, “Deep Reinforcement Learning”. Stanford University CS231n, 2017.
How to estimate how good state s is for a given policy π ?
With the value function at state s, Vπ
(s), the expected cumulative reward from
following policy π from state s.
“...from following policy π
from state s.”
“Expected
cumulative reward…””
45. 45
MDP: Policy: Q-value function Qπ
(s,a)
Slide concept: Serena Yeung, “Deep Reinforcement Learning”. Stanford University CS231n, 2017.
How to estimate how good a state-action pair (s,a) is for a given policy π ?
With the Q-value function at state s and action a, Qπ
(s,a), the expected cumulative
reward from taking action a in state s, and then following policy π.
“...from taking action a in state s
and then following policy π.”
“Expected
cumulative reward…””
46. 46
MDP: Policy: Optimal Q-value function Q*
(s,a)
Slide concept: Serena Yeung, “Deep Reinforcement Learning”. Stanford University CS231n, 2017.
The optimal Q-value function at state s and action, Q*
(s,a), is the maximum
expected cumulative reward achievable from a given (state, action) pair:
choose the policy that
maximizes the expected
cumulative reward
(From the previous page)
Q-value function
47. 47
MDP: Policy: Bellman equation
Slide concept: Serena Yeung, “Deep Reinforcement Learning”. Stanford University CS231n, 2017.
Q*
(s,a) satisfies the following Bellman equation:
Maximum expected
cumulative reward for
future pair (s’,a’)
FUTURE REWARD
(From the previous page)
Optimal Q-value function
reward for
considered
pair (s,a)
Maximum expected
cumulative reward
for considered pair
(s,a)
Expectation
across possible
future states s’
(randomness) discount
factor
48. 48
MDP: Policy: Bellman equation
Slide concept: Serena Yeung, “Deep Reinforcement Learning”. Stanford University CS231n, 2017.
Q*
(s,a) satisfies the following Bellman equation:
The optimal policy π* corresponds to taking the best action in any state according to
Q*.
GOAL: Find policy π* that
maximizes the cumulative
discounted reward:
select action a’ that maximizes
expected cumulative reward
49. 49
MDP: Policy: Solving the Optimal Policy
Slide concept: Serena Yeung, “Deep Reinforcement Learning”. Stanford University CS231n, 2017.
Value iteration algorithm: Estimate the Bellman equation with an iterative update.
The iterative estimation Qi
(s,a) will converge to the optimal Q*(s,a) as i ➝ ∞.
(From the previous page)
Bellman Equation
Updated Q-value
function
Current Q-value for
future pair (s’,a’)
50. 50
MDP: Policy: Solving the Optimal Policy
Slide concept: Serena Yeung, “Deep Reinforcement Learning”. Stanford University CS231n, 2017.
Qi
(s,a) will converge to the optimal Q*(s,a) as i ➝ ∞.
Updated Q-value for
current pair (s,a)
Current Q-value for
next pair (s’,a’)
This iterative approach is not scalable because it
requires computing Qi
(s,a) for every state-action pair.
Eg. If state is current game pixels, computationally unfeasible to compute
Qi
(s,a) for the entire state space !
51. 51
MDP: Policy: Solving the Optimal Policy
Slide concept: Serena Yeung, “Deep Reinforcement Learning”. Stanford University CS231n, 2017.
This iterative approach is not scalable because it
requires computing Q(s,a) for every state-action pair.
Eg. If state is current game pixels, computationally unfeasible to compute
Q(s,a) for the entire state space !
Solution: Use a deep neural network as
an function approximator of Q*(s,a).
Q(s,a,Ө) ≈ Q*(s,a)
Neural Network parameters
52. 52
Outline
1. Motivation
2. Architecture
3. Markov Decision Process (MDP)
4. Deep Q-learning
○ Forward and Backward passes
○ DQN
○ Experience Replay
○ Examples
5. RL Frameworks
6. Learn more
○ Coming next…
53. 53
Deep Q-learning
Slide concept: Serena Yeung, “Deep Reinforcement Learning”. Stanford University CS231n, 2017.
The function to approximate is a Q-function that satisfies the Bellman equation:
Q(s,a,Ө) ≈ Q*(s,a)
54. 54
Deep Q-learning
Slide concept: Serena Yeung, “Deep Reinforcement Learning”. Stanford University CS231n, 2017.
The function to approximate is a Q-function that satisfies the Bellman equation:
Q(s,a,Ө) ≈ Q*(s,a)
Forward Pass
Loss function:
Sample a (s,a) pair Predicted Q-value
with Өi
Sample a
future state s’
Predict Q-value with Өi-1
55. 55
Deep Q-learning
Slide concept: Serena Yeung, “Deep Reinforcement Learning”. Stanford University CS231n, 2017.
Train the DNN
to approximate
a Q-value
function that
satisfies the
Bellman
equation
56. 56
Deep Q-learning
Slide concept: Serena Yeung, “Deep Reinforcement Learning”. Stanford University CS231n, 2017.
Must compute
reward during
training
57. 57
Deep Q-learning
Slide concept: Serena Yeung, “Deep Reinforcement Learning”. Stanford University CS231n, 2017.
Backward Pass
Gradient update (with respect to Q-function parameters Ө):
Forward Pass
Loss function:
58. 58Source: Tambet Matiisen, Demystifying Deep Reinforcement Learning (Nervana)
Deep Q-learning: Deep Q-Network DQN
Q(s,a,Ө) ≈ Q*(s,a)
59. 59Source: Tambet Matiisen, Demystifying Deep Reinforcement Learning (Nervana)
Deep Q-learning: Deep Q-Network DQN
Q(s,a,Ө) ≈ Q*(s,a)
efficiency Single
Feed
Forward
Pass
A single feedforward pass to compute the Q-values
for all actions from the current state (efficient)
60. 60
Mnih, Volodymyr, Koray Kavukcuoglu, David Silver, Andrei A. Rusu, Joel Veness, Marc G. Bellemare, Alex Graves et al.
"Human-level control through deep reinforcement learning." Nature 518, no. 7540 (2015): 529-533.
Deep Q-learning: Deep Q-Network DQN
Number of
actions between
4-18, depending
on the Atari
game
61. 61
Deep Q-learning: Deep Q-Network DQN
Slide concept: Serena Yeung, “Deep Reinforcement Learning”. Stanford University CS231n, 2017.
Q(st
, ⬅), Q(st
, ➡), Q(st
, ⬆), Q(st
,⬇ )
62. 62
Deep Q-learning: Experience Replay
Slide concept: Serena Yeung, “Deep Reinforcement Learning”. Stanford University CS231n, 2017.
Learning from batches of consecutive samples
is problematic:
● Samples are too correlated ➡
inefficient learning
● Q-network parameters determine the next
training samples ➡
can lead to bad feedback loops.
63. 63
Deep Q-learning: Experience Replay
Slide concept: Serena Yeung, “Deep Reinforcement Learning”. Stanford University CS231n, 2017.
Experience replay:
● Continually update a replay memory table
of transitions (st
, at
, rt
, st+1
) as game
(experience) episodes are played.
● Train a Q-network on random minibatches
of transitions from the replay memory,
instead of consecutiev samples.
65. 65
Miriam Bellver, Xavier Giro-i-Nieto, Ferran Marques, and Jordi Torres. "Hierarchical Object Detection
with Deep Reinforcement Learning." Deep Reinforcement Learning Workshop NIPS 2016.
Deep Q-learning: DQN: Computer Vision
Method for performing hierarchical object detection in images guided by a
deep reinforcement learning agent.
OBJECT
FOUND
66. 66
Deep Q-learning: DQN: Computer Vision
State: The agent will decide which action to choose based on:
● visual description of the current observed region
● history vector that maps past actions performed
67. 67
Deep Q-learning: DQN: Computer Vision
Reward:
Reward for movement actions
Reward for terminal action
68. 68
Deep Q-learning: DQN: Computer Vision
Actions: Two kind of actions:
● movement actions: to which of the 5 possible regions defined by the
hierarchy to move
● terminal action: the agent indicates that the object has been found
69. 69
Miriam Bellver, Xavier Giro-i-Nieto, Ferran Marques, and Jordi Torres. "Hierarchical Object Detection
with Deep Reinforcement Learning." Deep Reinforcement Learning Workshop NIPS 2016.
Deep Q-learning: DQN: Computer Vision
71. 71
RL Frameworks
OpenAI Gym + keras-rl
+
keras-rl
keras-rl implements some state-of-the
art deep reinforcement learning
algorithms in Python and seamlessly
integrates with the deep learning
library Keras. Just like Keras, it works
with either Theano or TensorFlow,
which means that you can train your
algorithm efficiently either on CPU or
GPU. Furthermore, keras-rl works with
OpenAI Gym out of the box.
Slide credit: Míriam Bellver
77. Nando de Freitas, “Machine Learning” (University of Oxford)
Learn more
78. 78
Pieter Abbeel and John Schulman, CS 294-112 Deep Reinforcement Learning,
Berkeley.
Slides: “Reinforcement Learning - Policy Optimization” OpenAI / UC Berkeley (2017)
Learn more
79. 79
Learn more
Slide credit: Míriam Bellver
actor
state
critic
‘q-value’action (5)
state
action (5)
actor performs an
action
critic assesses how good the action was, and the
gradients are used to train the actor and the critic
Actor-Critic algorithm
Grondman, Ivo, Lucian Busoniu, Gabriel AD Lopes, and Robert Babuska. "A survey of actor-critic reinforcement learning: Standard and natural
policy gradients." IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews) 42, no. 6 (2012): 1291-1307.
81. 81
Outline
1. Motivation
2. Architecture
3. Markov Decision Process (MDP)
○ Policy
○ Optimal Policy
○ Value Function
○ Q-value function
○ Optimal Q-value function
○ Bellman equation
○ Value iteration algorithm
4. Deep Q-learning
○ Forward and Backward passes
○ DQN
○ Experience Replay
○ Examples
5. RL Frameworks
6. Learn more
○ Coming next…
82. Conclusions
Reinforcement Learning
● There is no supervisor, only reward
signal
● Feedback is delayed, not
instantaneous
● Time really matters (sequential, non
i.i.d data)
Slide credit: UCL Course on RL by David Silver
84. 84
Silver, D., Huang, A., Maddison, C.J., Guez, A., Sifre, L., Van Den Driessche, G., Schrittwieser, J., Antonoglou, I.,
Panneershelvam, V., Lanctot, M. and Dieleman, S., 2016. Mastering the game of Go with deep neural networks and tree
search. Nature, 529(7587), pp.484-489
Coming next...