파이콘 코리아 2018년도 튜토리얼 세션의 "RL Adventure : DQN 부터 Rainbow DQN까지"의 발표 자료입니다.
2017년도 Deepmind에서 발표한 value based 강화학습 모형인 Rainbow의 이해를 돕기 위한 튜토리얼로 DQN부터 Rainbow까지 순차적으로 중요한 점만 요약된 내용이 들어있습니다.
파트 1 : DQN, Double & Dueling DQN - 성태경
파트 2 : PER and NoisyNet - 양홍선
파트 3 : Distributed RL - 이의령
파트 4 : RAINBOW - 김예찬
관련된 코드와 구현체를 확인하고 싶으신 분들은
https://github.com/hongdam/pycon2018-RL_Adventure
에서 확인하실 수 있습니다
오늘 소개할 논문은 'MOReL: Model-Based Offline Reinforcement Learning'입니다.
이 논문은 오프라인 강화 학습(Reinforcement Learning, RL)에 초점을 맞추고 있습니다. 오프라인 RL은 행동 정책을 개선하기 위해 사전에 수집된 데이터만을 사용하는 학습 방법입니다. 이 논문에서는 MOReL이라는 새로운 알고리즘 프레임워크를 제시하며, 이는 오프라인 RL을 위한 것입니다.
MOReL은 두 단계로 구성되어 있습니다: 첫째, 오프라인 데이터셋을 사용하여 비관적인 MDP(Model-based Decision Process)를 학습하고, 둘째, 이 P-MDP에서 거의 최적의 정책을 학습합니다. 학습된 P-MDP는 정책 평가와 학습에 대한 좋은 대리자 역할을 하며, 모델 기반 RL의 일반적인 함정인 모델 활용을 극복합니다.
이 논문에서는 MOReL이 오프라인 RL에 대해 최소최대 최적(minimax optimal)이며, 널리 연구된 벤치마크에서 최첨단 성능을 달성함을 보여줍니다. 또한, 이 논문은 오프라인 RL의 중요한 문제인 행동 정책의 안전성에 대한 중요한 통찰력을 제공합니다.
이 논문은 오프라인 강화 학습의 새로운 접근법을 제시하며, 이를 통해 더 효율적인 방식으로 다양한 강화 학습 작업에 대한 성능을 향상시킬 수 있음을 보여줍니다.
Model-based reinforcement learning uses a learned model of the environment to improve sample efficiency compared to model-free methods. Early work combined model predictive control with neural network models, but struggled with long-term predictions. More recent methods like World Model avoid separate controllers and instead train a large world model and small controller together, allowing the agent to "dream" without interacting with the real environment. Iterative methods alternate between improving the world model and controller to address errors from modeling an imperfect environment. Overall, model-based methods show promise for improved learning efficiency but remain challenging to apply to complex, stochastic domains.
오늘 소개할 논문은 'MOReL: Model-Based Offline Reinforcement Learning'입니다.
이 논문은 오프라인 강화 학습(Reinforcement Learning, RL)에 초점을 맞추고 있습니다. 오프라인 RL은 행동 정책을 개선하기 위해 사전에 수집된 데이터만을 사용하는 학습 방법입니다. 이 논문에서는 MOReL이라는 새로운 알고리즘 프레임워크를 제시하며, 이는 오프라인 RL을 위한 것입니다.
MOReL은 두 단계로 구성되어 있습니다: 첫째, 오프라인 데이터셋을 사용하여 비관적인 MDP(Model-based Decision Process)를 학습하고, 둘째, 이 P-MDP에서 거의 최적의 정책을 학습합니다. 학습된 P-MDP는 정책 평가와 학습에 대한 좋은 대리자 역할을 하며, 모델 기반 RL의 일반적인 함정인 모델 활용을 극복합니다.
이 논문에서는 MOReL이 오프라인 RL에 대해 최소최대 최적(minimax optimal)이며, 널리 연구된 벤치마크에서 최첨단 성능을 달성함을 보여줍니다. 또한, 이 논문은 오프라인 RL의 중요한 문제인 행동 정책의 안전성에 대한 중요한 통찰력을 제공합니다.
이 논문은 오프라인 강화 학습의 새로운 접근법을 제시하며, 이를 통해 더 효율적인 방식으로 다양한 강화 학습 작업에 대한 성능을 향상시킬 수 있음을 보여줍니다.
Model-based reinforcement learning uses a learned model of the environment to improve sample efficiency compared to model-free methods. Early work combined model predictive control with neural network models, but struggled with long-term predictions. More recent methods like World Model avoid separate controllers and instead train a large world model and small controller together, allowing the agent to "dream" without interacting with the real environment. Iterative methods alternate between improving the world model and controller to address errors from modeling an imperfect environment. Overall, model-based methods show promise for improved learning efficiency but remain challenging to apply to complex, stochastic domains.
YOLOv4: optimal speed and accuracy of object detection reviewLEE HOSEONG
YOLOv4 builds upon previous YOLO models and introduces techniques like CSPDarknet53, SPP, PAN, Mosaic data augmentation, and modifications to existing methods to achieve state-of-the-art object detection speed and accuracy while being trainable on a single GPU. Experiments show that combining these techniques through a "bag of freebies" and "bag of specials" approach improves classifier and detector performance over baselines on standard datasets. The paper contributes an efficient object detection model suitable for production use with limited resources.
[DL輪読会]Libra R-CNN: Towards Balanced Learning for Object DetectionDeep Learning JP
Libra R-CNN is a paper that was presented at CVPR2019. It analyzes three major issues in existing object detection methods: imbalance at the sample extraction stage, feature extraction stage, and optimization stage. To address these, it proposes IoU-balanced sampling to better select hard negative samples, a Balanced Feature Pyramid to better integrate features from different layers, and a Balanced L1 loss function to improve optimization. Experiments show the proposed methods effectively improve object detection performance over existing state-of-the-art methods on the COCO dataset.
This document introduces the deep reinforcement learning model 'A3C' by Japanese.
Original literature is "Asynchronous Methods for Deep Reinforcement Learning" written by V. Mnih, et. al.
Maximum Entropy Reinforcement Learning (Stochastic Control)Dongmin Lee
I reviewed the following papers.
- T. Haarnoja, et al., “Reinforcement Learning with Deep Energy-Based Policies", ICML 2017
- T. Haarnoja, et al., “Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor", ICML 2018
- T. Haarnoja, et al., “Soft Actor-Critic Algorithms and Applications", arXiv preprint 2018
Thank you.
발표자: 곽동현(서울대 박사과정, 현 NAVER Clova)
강화학습(Reinforcement learning)의 개요 및 최근 Deep learning 기반의 RL 트렌드를 소개합니다.
발표영상:
http://tv.naver.com/v/2024376
https://youtu.be/dw0sHzE1oAc
알파고의 작동 원리를 설명한 슬라이드입니다.
English version: http://www.slideshare.net/ShaneSeungwhanMoon/how-alphago-works
- 비전공자 분들을 위한 티저: 바둑 인공지능은 과연 어떻게 만들까요? 딥러닝 딥러닝 하는데 그게 뭘까요? 바둑 인공지능은 또 어디에 쓰일 수 있을까요?
- 전공자 분들을 위한 티저: 알파고의 main components는 재밌게도 CNN (Convolutional Neural Network), 그리고 30년 전부터 유행하던 Reinforcement learning framework와 MCTS (Monte Carlo Tree Search) 정도입니다. 새로울 게 없는 재료들이지만 적절히 활용하는 방법이 신선하네요.
The document summarizes imitation learning techniques. It introduces behavioral cloning, which frames imitation learning as a supervised learning problem by learning to mimic expert demonstrations. However, behavioral cloning has limitations as it does not allow for recovery from mistakes. Alternative approaches involve direct policy learning using an interactive expert or inverse reinforcement learning, which aims to learn a reward function that explains the expert's behavior. The document outlines different types of imitation learning problems and algorithms for interactive direct policy learning, including data aggregation and policy aggregation methods.
論文紹介:Dueling network architectures for deep reinforcement learningKazuki Adachi
Wang, Ziyu, et al. "Dueling network architectures for deep reinforcement learning." Proceedings of The 33rd International Conference on Machine Learning, PMLR 48:1995-2003, 2016.
Deep Sarsa and Deep Q-learning use neural networks to estimate state-action values in reinforcement learning problems. Deep Q-learning uses experience replay and a target network to improve stability over the basic Deep Q-learning algorithm. Experience replay stores transitions in a replay buffer, and the target network is periodically updated to reduce bias from bootstrapping. Deepmind's DQN algorithm combined Deep Q-learning with experience replay and a target network to achieve good performance on complex tasks.
The document summarizes the Batch Normalization technique presented in the paper "Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift". Batch Normalization aims to address the issue of internal covariate shift in deep neural networks by normalizing layer inputs to have zero mean and unit variance. It works by computing normalization statistics for each mini-batch and applying them to the inputs. This helps in faster and more stable training of deep networks by reducing the distribution shift across layers. The paper presented ablation studies on MNIST and ImageNet datasets showing Batch Normalization improves training speed and accuracy compared to prior techniques.
zkStudy Club: Subquadratic SNARGs in the Random Oracle ModelAlex Pruden
Slides for Eylon Yogev's (Bar-Ilan University) presentation at ZKStudyClub, covering his new work (co-authored w/ Alessandro Chiesa of UC Berkeley) about SNARGs in the random oracle model of sub- quadratic complexity.
Link to the original paper: https://eprint.iacr.org/2021/281.pdf
Dueling Network Architectures for Deep Reinforcement LearningYoonho Lee
This document summarizes reinforcement learning algorithms like Deep Q-Network (DQN), Double DQN, prioritized experience replay, and the Dueling Network architecture. DQN uses a deep neural network to estimate the Q-function and select actions greedily during training. Double DQN decouples action selection from evaluation to reduce overestimation. Prioritized replay improves sampling to focus on surprising transitions. The Dueling Network separately estimates the state value and state-dependent action advantages to better determine the optimal action. It achieves state-of-the-art performance on Atari games by implicitly splitting credit assignment between choosing now versus later actions.
YOLOv4: optimal speed and accuracy of object detection reviewLEE HOSEONG
YOLOv4 builds upon previous YOLO models and introduces techniques like CSPDarknet53, SPP, PAN, Mosaic data augmentation, and modifications to existing methods to achieve state-of-the-art object detection speed and accuracy while being trainable on a single GPU. Experiments show that combining these techniques through a "bag of freebies" and "bag of specials" approach improves classifier and detector performance over baselines on standard datasets. The paper contributes an efficient object detection model suitable for production use with limited resources.
[DL輪読会]Libra R-CNN: Towards Balanced Learning for Object DetectionDeep Learning JP
Libra R-CNN is a paper that was presented at CVPR2019. It analyzes three major issues in existing object detection methods: imbalance at the sample extraction stage, feature extraction stage, and optimization stage. To address these, it proposes IoU-balanced sampling to better select hard negative samples, a Balanced Feature Pyramid to better integrate features from different layers, and a Balanced L1 loss function to improve optimization. Experiments show the proposed methods effectively improve object detection performance over existing state-of-the-art methods on the COCO dataset.
This document introduces the deep reinforcement learning model 'A3C' by Japanese.
Original literature is "Asynchronous Methods for Deep Reinforcement Learning" written by V. Mnih, et. al.
Maximum Entropy Reinforcement Learning (Stochastic Control)Dongmin Lee
I reviewed the following papers.
- T. Haarnoja, et al., “Reinforcement Learning with Deep Energy-Based Policies", ICML 2017
- T. Haarnoja, et al., “Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor", ICML 2018
- T. Haarnoja, et al., “Soft Actor-Critic Algorithms and Applications", arXiv preprint 2018
Thank you.
발표자: 곽동현(서울대 박사과정, 현 NAVER Clova)
강화학습(Reinforcement learning)의 개요 및 최근 Deep learning 기반의 RL 트렌드를 소개합니다.
발표영상:
http://tv.naver.com/v/2024376
https://youtu.be/dw0sHzE1oAc
알파고의 작동 원리를 설명한 슬라이드입니다.
English version: http://www.slideshare.net/ShaneSeungwhanMoon/how-alphago-works
- 비전공자 분들을 위한 티저: 바둑 인공지능은 과연 어떻게 만들까요? 딥러닝 딥러닝 하는데 그게 뭘까요? 바둑 인공지능은 또 어디에 쓰일 수 있을까요?
- 전공자 분들을 위한 티저: 알파고의 main components는 재밌게도 CNN (Convolutional Neural Network), 그리고 30년 전부터 유행하던 Reinforcement learning framework와 MCTS (Monte Carlo Tree Search) 정도입니다. 새로울 게 없는 재료들이지만 적절히 활용하는 방법이 신선하네요.
The document summarizes imitation learning techniques. It introduces behavioral cloning, which frames imitation learning as a supervised learning problem by learning to mimic expert demonstrations. However, behavioral cloning has limitations as it does not allow for recovery from mistakes. Alternative approaches involve direct policy learning using an interactive expert or inverse reinforcement learning, which aims to learn a reward function that explains the expert's behavior. The document outlines different types of imitation learning problems and algorithms for interactive direct policy learning, including data aggregation and policy aggregation methods.
論文紹介:Dueling network architectures for deep reinforcement learningKazuki Adachi
Wang, Ziyu, et al. "Dueling network architectures for deep reinforcement learning." Proceedings of The 33rd International Conference on Machine Learning, PMLR 48:1995-2003, 2016.
Deep Sarsa and Deep Q-learning use neural networks to estimate state-action values in reinforcement learning problems. Deep Q-learning uses experience replay and a target network to improve stability over the basic Deep Q-learning algorithm. Experience replay stores transitions in a replay buffer, and the target network is periodically updated to reduce bias from bootstrapping. Deepmind's DQN algorithm combined Deep Q-learning with experience replay and a target network to achieve good performance on complex tasks.
The document summarizes the Batch Normalization technique presented in the paper "Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift". Batch Normalization aims to address the issue of internal covariate shift in deep neural networks by normalizing layer inputs to have zero mean and unit variance. It works by computing normalization statistics for each mini-batch and applying them to the inputs. This helps in faster and more stable training of deep networks by reducing the distribution shift across layers. The paper presented ablation studies on MNIST and ImageNet datasets showing Batch Normalization improves training speed and accuracy compared to prior techniques.
zkStudy Club: Subquadratic SNARGs in the Random Oracle ModelAlex Pruden
Slides for Eylon Yogev's (Bar-Ilan University) presentation at ZKStudyClub, covering his new work (co-authored w/ Alessandro Chiesa of UC Berkeley) about SNARGs in the random oracle model of sub- quadratic complexity.
Link to the original paper: https://eprint.iacr.org/2021/281.pdf
Dueling Network Architectures for Deep Reinforcement LearningYoonho Lee
This document summarizes reinforcement learning algorithms like Deep Q-Network (DQN), Double DQN, prioritized experience replay, and the Dueling Network architecture. DQN uses a deep neural network to estimate the Q-function and select actions greedily during training. Double DQN decouples action selection from evaluation to reduce overestimation. Prioritized replay improves sampling to focus on surprising transitions. The Dueling Network separately estimates the state value and state-dependent action advantages to better determine the optimal action. It achieves state-of-the-art performance on Atari games by implicitly splitting credit assignment between choosing now versus later actions.
This document discusses the process of backpropagation in neural networks. It begins with an example of forward propagation through a neural network with an input, hidden and output layer. It then introduces backpropagation, which uses the calculation of errors at the output to calculate gradients and update weights in order to minimize the overall error. The key steps are outlined, including calculating the error derivatives, weight updates proportional to the local gradient, and backpropagating error signals from the output through the hidden layers. Formulas for calculating each step of backpropagation are provided.
Hardware Acceleration for Machine LearningCastLabKAIST
This document provides an overview of a lecture on hardware acceleration for machine learning. The lecture will cover deep neural network models like convolutional neural networks and recurrent neural networks. It will also discuss various hardware accelerators developed for machine learning, including those designed for mobile/edge and cloud computing environments. The instructor's background and the agenda topics are also outlined.
Neural network basic and introduction of Deep learningTapas Majumdar
Deep learning tools and techniques can be used to build convolutional neural networks (CNNs). Neural networks learn from observational training data by automatically inferring rules to solve problems. Neural networks use multiple hidden layers of artificial neurons to process input data and produce output. Techniques like backpropagation, cross-entropy cost functions, softmax activations, and regularization help neural networks learn more effectively and avoid issues like overfitting.
This document discusses using R for high performance computing (HPC) on remote computer clusters. It provides examples of running R code in parallel on a cluster to speed up computations. Specifically, it shows how to submit R scripts to a job queue and run Monte Carlo simulations, train machine learning models, and perform coin flipping experiments in parallel. It emphasizes using specialized R packages that are optimized for performance, taking advantage of built-in parallelism, and accessing large memory resources on clusters.
"Practical Machine Learning With Ruby" by Iqbal Farabi (ID Ruby Community)Tech in Asia ID
This slide was shared on Tech in Asia Jakarta 2016 @ 17 November 2016.
Get updates about our dev events delivered straight to your inbox by signing up here: http://bit.ly/tia-dev ! Be the first to know when new information is available!
1. The document discusses using deep reinforcement learning to train an AI agent to play Atari games.
2. Specifically, it describes how an agent trained using a convolutional neural network can learn to play games like Breakout by only observing the screen pixels and receiving rewards for scoring points.
3. The model achieved human-level performance on some Atari games using this approach without any domain-specific knowledge about the games.
Homomorphic encryption allows computations to be performed on encrypted data without decrypting it first. This document discusses homomorphic encryption techniques including partially homomorphic encryptions that support either addition or multiplication operations, and fully homomorphic encryption introduced by Craig Gentry that supports both types of operations. It also covers the use of ideal lattices in lattice-based cryptosystems and the bootstrapping technique used to "refresh" ciphertexts and prevent noise from accumulating during homomorphic computations.
A CGRA-based Approachfor Accelerating Convolutional Neural NetworksShinya Takamaeda-Y
The document presents an approach for accelerating convolutional neural networks (CNNs) using a coarse-grained reconfigurable array (CGRA) called EMAX. EMAX features processing elements with local memory to improve data locality and memory bandwidth utilization. CNN computations like convolutions are mapped to EMAX by assigning weight matrices to constant registers and performing numerous small matrix multiplications in parallel. Evaluation shows EMAX achieves better performance per memory bandwidth and area than GPUs for CNN workloads due to its optimization for small matrix operations.
The document provides an introduction to deep learning and how to compute gradients in deep learning models. It discusses machine learning concepts like training models on data to learn patterns, supervised learning tasks like image classification, and optimization techniques like stochastic gradient descent. It then explains how to compute gradients using backpropagation in deep multi-layer neural networks, allowing models to be trained on large datasets. Key steps like the chain rule and backpropagation of errors from the final layer back through the network are outlined.
This document introduces neural networks and deep learning. It discusses perceptrons, multilayer perceptrons for recognizing handwritten digits, and the backpropagation algorithm for training neural networks. It also describes deep convolutional neural networks, including local receptive fields, shared weights, and pooling layers. As an example, it discusses AlphaGo and how it uses a convolutional neural network along with Monte Carlo tree search to master the game of Go.
Deep Feed Forward Neural Networks and RegularizationYan Xu
Deep feedforward networks use regularization techniques like L2/L1 regularization, dropout, batch normalization, and early stopping to reduce overfitting. They employ techniques like data augmentation to increase the size and variability of training datasets. Backpropagation allows information about the loss to flow backward through the network to efficiently compute gradients and update weights with gradient descent.
This document discusses using Clojure to play the board game Go. It begins by covering the basic rules of Go and qualifications of the author. It then discusses modeling the game as a game tree and approaches to search the tree like minimax, Monte Carlo simulation, and Monte Carlo tree search. The rest of the document discusses implementing a Go simulator in Clojure, including representing the game state incrementally and optimizing performance through techniques like using mutable state and removing layers of indirection.
The document discusses neural networks, generative adversarial networks, and image-to-image translation. It begins by explaining how neural networks learn through forward propagation, calculating loss, and using the loss to update weights via backpropagation. Generative adversarial networks are introduced as a game between a generator and discriminator, where the generator tries to fool the discriminator and vice versa. Image-to-image translation uses conditional GANs to translate images from one domain to another, such as maps to aerial photos.
Le Song, Assistant Professor, College of Computing, Georgia Institute of Tech...MLconf
Understanding Deep Learning for Big Data: The complexity and scale of big data impose tremendous challenges for their analysis. Yet, big data also offer us great opportunities. Some nonlinear phenomena, features or relations, which are not clear or cannot be inferred reliably from small and medium data, now become clear and can be learned robustly from big data. Typically, the form of the nonlinearity is unknown to us, and needs to be learned from data as well. Being able to harness the nonlinear structures from big data could allow us to tackle problems which are impossible before or obtain results which are far better than previous state-of-the-arts.
Nowadays, deep neural networks are the methods of choice when it comes to large scale nonlinear learning problems. What makes deep neural networks work? Is there any general principle for tackling high dimensional nonlinear problems which we can learn from deep neural works? Can we design competitive or better alternatives based on such knowledge? To make progress in these questions, my machine learning group performed both theoretical and experimental analysis on existing and new deep learning architectures, and investigate three crucial aspects on the usefulness of the fully connected layers, the advantage of the feature learning process, and the importance of the compositional structures. Our results point to some promising directions for future research, and provide guideline for building new deep learning models.
Variational Autoencoded Regression of Visual Data with Generative Adversarial...NAVER Engineering
발표자: 유영준 (서울대 박사 후 연구원)
발표일: 2017.8.
Postdoctoral Research Associate: Graduate School of Convergence Science and Technology, Seoul National University. March, 2017
See more info at https://sites.google.com/view/yjyoo3312/
개요:
Presents a new visual image generation method using regression combined with recent deep generative models: VAE, GAN.
Practical and Worst-Case Efficient ApportionmentRaphael Reitzig
Proportional apportionment is the problem of assigning seats to parties according to their relative share of votes. Divisor methods are the de-facto standard solution, used in many countries.
In recent literature, there are two algorithms that implement divisor methods: one by Cheng and Eppstein (ISAAC, 2014) has worst-case optimal running time but is complex, while the other (Pukelsheim, 2014) is relatively simple and fast in practice but does not offer worst-case guarantees.
This talk presents the ideas behind a novel algorithm that avoids the shortcomings of both. We investigate the three contenders in order to determine which is most useful in practice.
Read more over here: http://reitzig.github.io/publications/RW2015b
Diversity is all you need(DIAYN) : Learning Skills without a Reward FunctionYeChan(Paul) Kim
DIAYN is an unsupervised reinforcement learning method that learns diverse skills without a reward function. It works by maximizing the mutual information between skills and states visited to ensure skills dictate different states, while minimizing the mutual information between skills and actions given a state to distinguish skills based on states. It also maximizes a mixture of policies to encourage diverse skills. Experiments show DIAYN discovers locomotion skills in complex environments and sometimes learns skills that solve benchmark tasks. The learned skills can then be adapted to maximize rewards, used for hierarchical RL, and to imitate experts.
ESR spectroscopy in liquid food and beverages.pptxPRIYANKA PATEL
With increasing population, people need to rely on packaged food stuffs. Packaging of food materials requires the preservation of food. There are various methods for the treatment of food to preserve them and irradiation treatment of food is one of them. It is the most common and the most harmless method for the food preservation as it does not alter the necessary micronutrients of food materials. Although irradiated food doesn’t cause any harm to the human health but still the quality assessment of food is required to provide consumers with necessary information about the food. ESR spectroscopy is the most sophisticated way to investigate the quality of the food and the free radicals induced during the processing of the food. ESR spin trapping technique is useful for the detection of highly unstable radicals in the food. The antioxidant capability of liquid food and beverages in mainly performed by spin trapping technique.
The binding of cosmological structures by massless topological defectsSérgio Sacani
Assuming spherical symmetry and weak field, it is shown that if one solves the Poisson equation or the Einstein field
equations sourced by a topological defect, i.e. a singularity of a very specific form, the result is a localized gravitational
field capable of driving flat rotation (i.e. Keplerian circular orbits at a constant speed for all radii) of test masses on a thin
spherical shell without any underlying mass. Moreover, a large-scale structure which exploits this solution by assembling
concentrically a number of such topological defects can establish a flat stellar or galactic rotation curve, and can also deflect
light in the same manner as an equipotential (isothermal) sphere. Thus, the need for dark matter or modified gravity theory is
mitigated, at least in part.
When I was asked to give a companion lecture in support of ‘The Philosophy of Science’ (https://shorturl.at/4pUXz) I decided not to walk through the detail of the many methodologies in order of use. Instead, I chose to employ a long standing, and ongoing, scientific development as an exemplar. And so, I chose the ever evolving story of Thermodynamics as a scientific investigation at its best.
Conducted over a period of >200 years, Thermodynamics R&D, and application, benefitted from the highest levels of professionalism, collaboration, and technical thoroughness. New layers of application, methodology, and practice were made possible by the progressive advance of technology. In turn, this has seen measurement and modelling accuracy continually improved at a micro and macro level.
Perhaps most importantly, Thermodynamics rapidly became a primary tool in the advance of applied science/engineering/technology, spanning micro-tech, to aerospace and cosmology. I can think of no better a story to illustrate the breadth of scientific methodologies and applications at their best.
The cost of acquiring information by natural selectionCarl Bergstrom
This is a short talk that I gave at the Banff International Research Station workshop on Modeling and Theory in Population Biology. The idea is to try to understand how the burden of natural selection relates to the amount of information that selection puts into the genome.
It's based on the first part of this research paper:
The cost of information acquisition by natural selection
Ryan Seamus McGee, Olivia Kosterlitz, Artem Kaznatcheev, Benjamin Kerr, Carl T. Bergstrom
bioRxiv 2022.07.02.498577; doi: https://doi.org/10.1101/2022.07.02.498577
Describing and Interpreting an Immersive Learning Case with the Immersion Cub...Leonel Morgado
Current descriptions of immersive learning cases are often difficult or impossible to compare. This is due to a myriad of different options on what details to include, which aspects are relevant, and on the descriptive approaches employed. Also, these aspects often combine very specific details with more general guidelines or indicate intents and rationales without clarifying their implementation. In this paper we provide a method to describe immersive learning cases that is structured to enable comparisons, yet flexible enough to allow researchers and practitioners to decide which aspects to include. This method leverages a taxonomy that classifies educational aspects at three levels (uses, practices, and strategies) and then utilizes two frameworks, the Immersive Learning Brain and the Immersion Cube, to enable a structured description and interpretation of immersive learning cases. The method is then demonstrated on a published immersive learning case on training for wind turbine maintenance using virtual reality. Applying the method results in a structured artifact, the Immersive Learning Case Sheet, that tags the case with its proximal uses, practices, and strategies, and refines the free text case description to ensure that matching details are included. This contribution is thus a case description method in support of future comparative research of immersive learning cases. We then discuss how the resulting description and interpretation can be leveraged to change immersion learning cases, by enriching them (considering low-effort changes or additions) or innovating (exploring more challenging avenues of transformation). The method holds significant promise to support better-grounded research in immersive learning.
The debris of the ‘last major merger’ is dynamically youngSérgio Sacani
The Milky Way’s (MW) inner stellar halo contains an [Fe/H]-rich component with highly eccentric orbits, often referred to as the
‘last major merger.’ Hypotheses for the origin of this component include Gaia-Sausage/Enceladus (GSE), where the progenitor
collided with the MW proto-disc 8–11 Gyr ago, and the Virgo Radial Merger (VRM), where the progenitor collided with the
MW disc within the last 3 Gyr. These two scenarios make different predictions about observable structure in local phase space,
because the morphology of debris depends on how long it has had to phase mix. The recently identified phase-space folds in Gaia
DR3 have positive caustic velocities, making them fundamentally different than the phase-mixed chevrons found in simulations
at late times. Roughly 20 per cent of the stars in the prograde local stellar halo are associated with the observed caustics. Based
on a simple phase-mixing model, the observed number of caustics are consistent with a merger that occurred 1–2 Gyr ago.
We also compare the observed phase-space distribution to FIRE-2 Latte simulations of GSE-like mergers, using a quantitative
measurement of phase mixing (2D causticality). The observed local phase-space distribution best matches the simulated data
1–2 Gyr after collision, and certainly not later than 3 Gyr. This is further evidence that the progenitor of the ‘last major merger’
did not collide with the MW proto-disc at early times, as is thought for the GSE, but instead collided with the MW disc within
the last few Gyr, consistent with the body of work surrounding the VRM.
The technology uses reclaimed CO₂ as the dyeing medium in a closed loop process. When pressurized, CO₂ becomes supercritical (SC-CO₂). In this state CO₂ has a very high solvent power, allowing the dye to dissolve easily.
PPT on Direct Seeded Rice presented at the three-day 'Training and Validation Workshop on Modules of Climate Smart Agriculture (CSA) Technologies in South Asia' workshop on April 22, 2024.
11. DQN
NEURAL NETWORKS IN ONE SLIDE
9
Convolutional neural network
Max-pooling
Softmax
Weight 연산
Backpropagation
Non-linear function
12. DQN
Q-LEARNING
‣ 목적: 현재의 상황에서 어떤 행동을 하는 것이 가장 좋은지
V. Minh, et al. Playing Atari with Deep Reinforcement Learning. NIPS, 2013
C. J. C. H. Watkins, P. Dayan. Q-learning. 1992.
Qnew
(st, at) ← (1 − α)Q(st, at) + α(rt+γmaxaQ(st+1, at))
[Value iteration update]
Qπ(s, a) = 𝔼[
∞
∑
t=0
γt
R(xt, at)], γ ∈ (0,1)
[Expected rewards]
10
다음 state의 reward값현재의 reward값
13. DQN
MOTIVATION
11
V. Minh, et al. Playing Atari with Deep Reinforcement Learning. NIPS, 2013
Q(s, a) → → Q(s, a; θ)
Li(θi) = 𝔼s,a,r,s′[(r + γmaxa′Q(s′, a′; θi) − Q(s, a; θi))
2
]
목표값
TD error
예측값
뉴럴 네트워크로 Q함수를 근사화
14. DQN
PROBLEM
12
‣ Unstable update
‣ 입력 데이터간의 high correlations
V. Minh, et al. Playing Atari with Deep Reinforcement Learning. NIPS, 2013
https://curt-park.github.io/2018-05-17/dqn/
‣ Non-stationary targets (같은 네트워크 파라미터)
[Objective function]
Li(θi) = 𝔼s,a,r,s′[(r + γmaxa′Q(s′, a′; θi) − Q(s, a; θi))
2
]
15. DQN
SOLUTION
13
‣ Experience replay
Matiisen, Tambet Demystifying Deep Reinforcement Learning. Computational Neuroscience LAB. 2015.
Mnih, V. et al. Human-level control through deep reinforcement learning. Nature, 2015.
→ {s1, a1, r1, s2, …, sT−1, aT, rT−1, sT}
{
[Buffer]
Training
sampling
Episode
Experience
16. DQN
SOLUTION
14
‣ Experience replay
Matiisen, Tambet Demystifying Deep Reinforcement Learning. Computational Neuroscience LAB. 2015.
Mnih, V. et al. Human-level control through deep reinforcement learning. Nature, 2015.
→ {s1, a1, r1, s2, …, sT−1, aT, rT−1, sT}
{
[Buffer]
Training
sampling
Episode
Experience
‣ Fixed Q-targets
Li(θi) = 𝔼s,a,r,s′[(r + γmaxa′
̂Q(s′, a′; θ−
i ) − Q(s, a; θi))
2
]
Li(θi) = 𝔼s,a,r,s′[(r + γmaxa′Q(s′, a′; θi) − Q(s, a; θi))
2
]
[Objective function]
31. DOUBLE Q-LEARNING
MOTIVATION
‣ DQN의 문제:
van Hasselt H., Guez A. and Silver D. Deep reinforcement learning with double Q-learning, AAAI, 2015
van Hasselt H., Double Q-learning, NIPS, 2011
29
Q(s, a) = r(s, a) + γmaxaQ(s′, a)
Q-target Accumulated rewards Maximum Q-value of next state
Overestimating the action values. What if the environment is noisy?
32. DOUBLE Q-LEARNING
MOTIVATION
‣ DQN의 문제:
van Hasselt H., Guez A. and Silver D. Deep reinforcement learning with double Q-learning, AAAI, 2015
van Hasselt H., Double Q-learning, NIPS, 2011
30
Q(s, a) = r(s, a) + γmaxaQ(s′, a)
Q-target Accumulated rewards Maximum Q-value of next state
Overestimating the action values.
‣ 해결:
Q(s, a) = r(s, a)+γQ(s′, argmaxaQ(s′, a))
DQN Network choose action for next state
What if the environment is noisy?
37. DUELING DQN
MOTIVATION
35
Q(s, a) = V(s) + A(s, a)
[Q-value decomposition] State value Advantage value
‣ 현재 state의 가치에 비교가치로 정보를 추가한다
‣ 가치의 차이(advantage value) —> 더 빠른 학습속도
하나의 action 값만 반영
다른 actions는 그대로
선택한 하나의 action보다
얼마나 더 좋은지(비교)를 나타낸다
51. A Motivating Example
8
Two actions: ‘right(→→)’ and ‘wrong(→)’
The environment requires an exponential number of random steps until the first
non-zero reward
The most relevant transitions are hidden in a mass of highly redundant failure
cases
54. 11
A low TD-Error on first visit may not be replayed for a long time
The PER with TD-Error is sensitive to noise spikes
Greedy prioritization focuses on a small subset of the experience
Weakness
83. • p inputs and q outputs
• Independent Gaussian noise
• Using an independent Gaussian noise entry per weight
• pq+q
• Factorised Gaussian noise
• Using and independent noise per each output and input
• p+q
41
NoisyNet
105. Reward를 Random Variable 관점에서 바라보면…
§ 가치함수는 discount된 미래 보상에 대한 기댓값을 리턴한다.
§ 기댓값 = Scalar(o) / Distribution(x)
§ 미래 보상 값들은 complex, Multimodal의 특성을 가진다.
§ 기댓값은 각 보상들이 가지는 intrinsic(본질적인)한 특성을 담아내지 못한다.
8
Expected RL
Ε [R x ] =
()
36
× 200 −
0
36
× 1,800
= 144
111. § Expected RL à Distributional RL
§ Return에 대한 Value Distribution을 만들자.
§ C51 = Categorical / 이산형 분포
§ 51개의 bin을 이용하여 분포를 만든다.
14
Distributional RL
A Distributional Perspective on Reinforcement Learning (C51)
https://arxiv.org/abs/1707.06887
112. § Distributional Bellman Equation
§ Cf) Bellman Equation
§ 𝑍 𝑠, 𝑎 는 Distribution을 의미, 이를 이용하여 Distribution을 생성
15
Distributional RL
A Distributional Perspective on Reinforcement Learning (C51)
𝑄 𝑥, 𝑎 = 𝑅(𝑥, 𝑎) + 𝛾𝑄B(𝑥′, 𝑎′)
𝑄 𝑠, 𝑎 = 𝐸 𝑍 𝑠, 𝑎 = L 𝑝N 𝑥N
O
NP0
118. Distributional DQN
1. Return에 대한 Value Distribution(51개 bin)을 만든다.
2. 각 스텝마다 만든 Value Distribution 들간의 거리를 구한다.
à 논문에서 이론상 Wasserstein distance로 정의했지만
실험에서 KL-divergence로 계산
3. Cross entropy로 분포간의 Loss 계산
21
Distributional RL
A Distributional Perspective on Reinforcement Learning (C51)