발표자: 곽동현(서울대 박사과정, 현 NAVER Clova)
강화학습(Reinforcement learning)의 개요 및 최근 Deep learning 기반의 RL 트렌드를 소개합니다.
발표영상:
http://tv.naver.com/v/2024376
https://youtu.be/dw0sHzE1oAc
Reinforcement Learning (RL) approaches to deal with finding an optimal reward based policy to act in an environment (Charla en Inglés)
However, what has led to their widespread use is its combination with deep neural networks (DNN) i.e., deep reinforcement learning (Deep RL). Recent successes on not only learning to play games but also superseding humans in it and academia-industry research collaborations like for manipulation of objects, locomotion skills, smart grids, etc. have surely demonstrated their case on a wide variety of challenging tasks.
With application spanning across games, robotics, dialogue, healthcare, marketing, energy and many more domains, Deep RL might just be the power that drives the next generation of Artificial Intelligence (AI) agents!
RLKorea의 프로젝트인 피지여행에서 진행한 내용을 정리한 것입니다. 피지여행은 DeepRL에서 중요한 Policy Gradient를 쭉 정리해보는 프로젝트입니다. PG의 처음 시작인 REINFORCE 부터 현재 새로운 baseline이 된 PPO까지 이론과 코드를 함께 살펴봅니다.
Reinforcement Learning (RL) approaches to deal with finding an optimal reward based policy to act in an environment (Charla en Inglés)
However, what has led to their widespread use is its combination with deep neural networks (DNN) i.e., deep reinforcement learning (Deep RL). Recent successes on not only learning to play games but also superseding humans in it and academia-industry research collaborations like for manipulation of objects, locomotion skills, smart grids, etc. have surely demonstrated their case on a wide variety of challenging tasks.
With application spanning across games, robotics, dialogue, healthcare, marketing, energy and many more domains, Deep RL might just be the power that drives the next generation of Artificial Intelligence (AI) agents!
RLKorea의 프로젝트인 피지여행에서 진행한 내용을 정리한 것입니다. 피지여행은 DeepRL에서 중요한 Policy Gradient를 쭉 정리해보는 프로젝트입니다. PG의 처음 시작인 REINFORCE 부터 현재 새로운 baseline이 된 PPO까지 이론과 코드를 함께 살펴봅니다.
Deep Reinforcement Learning Talk at PI School. Covering following contents as:
1- Deep Reinforcement Learning
2- QLearning
3- Deep QLearning (DQN)
4- Google Deepmind Paper (DQN for ATARI)
안녕하세요. 이동민입니다. :)
2018. 8. 9일에 한국항공우주연구원에서 발표한 "Safe Reinforcement Learning" 발표 자료입니다.
목차는 다음과 같습니다.
1. Reinforcement Learning
2. Safe Reinforcement Learning
3. Optimization Criterion
4. Exploration Process
강화학습 계속 공부하면서 실제로 많은 분들이 쓸 수 있게 하려면 더 안전하고 빨라야한다는 생각이 들었습니다. 그래서 이에 관련하여 논문과 각종 자료들로 공부하여 발표하였습니다.
많은 분들께 도움이 되었으면 좋겠습니다. 감사합니다!
This slide described about Deep sarsa, Deep Q-learning, and DQN, and used for Reinforcement Learning study group's lecture, where is belonged to Korea Artificial Intelligence Laboratory.
딥러닝과 강화 학습으로 나보다 잘하는 쿠키런 AI 구현하기 DEVIEW 2016Taehoon Kim
발표 영상 : https://goo.gl/jrKrvf
데모 영상 : https://youtu.be/exXD6wJLJ6s
Deep Q-Network, Double Q-learning, Dueling Network 등의 기술을 소개하며, hyperparameter, debugging, ensemble 등의 엔지니어링으로 성능을 끌어 올린 과정을 공유합니다.
Reinforcement Learning 5. Monte Carlo MethodsSeung Jae Lee
A summary of Chapter 5: Monte Carlo Methods of the book 'Reinforcement Learning: An Introduction' by Sutton and Barto. You can find the full book in Professor Sutton's website: http://incompleteideas.net/book/the-book-2nd.html
Check my website for more slides of books and papers!
https://www.endtoend.ai
MIT 6.S091: Introduction to Deep Reinforcement Learning (Deep RL) by Lex FridmanPeerasak C.
MIT 6.S091: Introduction to Deep Reinforcement Learning (Deep RL) by Lex Fridman
Watch video: https://youtu.be/zR11FLZ-O9M
First lecture of MIT course 6.S091: Deep Reinforcement Learning, introducing the fascinating field of Deep RL. For more lecture videos on deep learning, reinforcement learning (RL), artificial intelligence (AI & AGI), and podcast conversations, visit our website or follow TensorFlow code tutorials on our GitHub repo.
INFO:
Website: https://deeplearning.mit.edu
CONNECT:
- If you enjoyed this video, please subscribe to this channel.
- Twitter: https://twitter.com/lexfridman
- LinkedIn: https://www.linkedin.com/in/lexfridman
- Facebook: https://www.facebook.com/lexfridman
- Instagram: https://www.instagram.com/lexfridman
This presentation contains an introduction to reinforcement learning, comparison with others learning ways, introduction to Q-Learning and some applications of reinforcement learning in video games.
Reinforcement Learning 6. Temporal Difference LearningSeung Jae Lee
A summary of Chapter 6: Temporal Difference Learning of the book 'Reinforcement Learning: An Introduction' by Sutton and Barto. You can find the full book in Professor Sutton's website: http://incompleteideas.net/book/the-book-2nd.html
Check my website for more slides of books and papers!
https://www.endtoend.ai
Maximum Entropy Reinforcement Learning (Stochastic Control)Dongmin Lee
I reviewed the following papers.
- T. Haarnoja, et al., “Reinforcement Learning with Deep Energy-Based Policies", ICML 2017
- T. Haarnoja, et al., “Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor", ICML 2018
- T. Haarnoja, et al., “Soft Actor-Critic Algorithms and Applications", arXiv preprint 2018
Thank you.
자세한 내용은 https://www.youtube.com/watch?v=oPT9hHXrEpo 을 참조하세요.
AlphaGo가 어떤 원리로 구현되었으며, 어떻게 강력한 기력을 확보하게 되었는지를 설명드립니다. 이 자료를 이해하기 위해서 인공지능과 전산과학에 기초적인 지식이 필요할 수 있습니다.
Deep Reinforcement Learning Talk at PI School. Covering following contents as:
1- Deep Reinforcement Learning
2- QLearning
3- Deep QLearning (DQN)
4- Google Deepmind Paper (DQN for ATARI)
안녕하세요. 이동민입니다. :)
2018. 8. 9일에 한국항공우주연구원에서 발표한 "Safe Reinforcement Learning" 발표 자료입니다.
목차는 다음과 같습니다.
1. Reinforcement Learning
2. Safe Reinforcement Learning
3. Optimization Criterion
4. Exploration Process
강화학습 계속 공부하면서 실제로 많은 분들이 쓸 수 있게 하려면 더 안전하고 빨라야한다는 생각이 들었습니다. 그래서 이에 관련하여 논문과 각종 자료들로 공부하여 발표하였습니다.
많은 분들께 도움이 되었으면 좋겠습니다. 감사합니다!
This slide described about Deep sarsa, Deep Q-learning, and DQN, and used for Reinforcement Learning study group's lecture, where is belonged to Korea Artificial Intelligence Laboratory.
딥러닝과 강화 학습으로 나보다 잘하는 쿠키런 AI 구현하기 DEVIEW 2016Taehoon Kim
발표 영상 : https://goo.gl/jrKrvf
데모 영상 : https://youtu.be/exXD6wJLJ6s
Deep Q-Network, Double Q-learning, Dueling Network 등의 기술을 소개하며, hyperparameter, debugging, ensemble 등의 엔지니어링으로 성능을 끌어 올린 과정을 공유합니다.
Reinforcement Learning 5. Monte Carlo MethodsSeung Jae Lee
A summary of Chapter 5: Monte Carlo Methods of the book 'Reinforcement Learning: An Introduction' by Sutton and Barto. You can find the full book in Professor Sutton's website: http://incompleteideas.net/book/the-book-2nd.html
Check my website for more slides of books and papers!
https://www.endtoend.ai
MIT 6.S091: Introduction to Deep Reinforcement Learning (Deep RL) by Lex FridmanPeerasak C.
MIT 6.S091: Introduction to Deep Reinforcement Learning (Deep RL) by Lex Fridman
Watch video: https://youtu.be/zR11FLZ-O9M
First lecture of MIT course 6.S091: Deep Reinforcement Learning, introducing the fascinating field of Deep RL. For more lecture videos on deep learning, reinforcement learning (RL), artificial intelligence (AI & AGI), and podcast conversations, visit our website or follow TensorFlow code tutorials on our GitHub repo.
INFO:
Website: https://deeplearning.mit.edu
CONNECT:
- If you enjoyed this video, please subscribe to this channel.
- Twitter: https://twitter.com/lexfridman
- LinkedIn: https://www.linkedin.com/in/lexfridman
- Facebook: https://www.facebook.com/lexfridman
- Instagram: https://www.instagram.com/lexfridman
This presentation contains an introduction to reinforcement learning, comparison with others learning ways, introduction to Q-Learning and some applications of reinforcement learning in video games.
Reinforcement Learning 6. Temporal Difference LearningSeung Jae Lee
A summary of Chapter 6: Temporal Difference Learning of the book 'Reinforcement Learning: An Introduction' by Sutton and Barto. You can find the full book in Professor Sutton's website: http://incompleteideas.net/book/the-book-2nd.html
Check my website for more slides of books and papers!
https://www.endtoend.ai
Maximum Entropy Reinforcement Learning (Stochastic Control)Dongmin Lee
I reviewed the following papers.
- T. Haarnoja, et al., “Reinforcement Learning with Deep Energy-Based Policies", ICML 2017
- T. Haarnoja, et al., “Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor", ICML 2018
- T. Haarnoja, et al., “Soft Actor-Critic Algorithms and Applications", arXiv preprint 2018
Thank you.
자세한 내용은 https://www.youtube.com/watch?v=oPT9hHXrEpo 을 참조하세요.
AlphaGo가 어떤 원리로 구현되었으며, 어떻게 강력한 기력을 확보하게 되었는지를 설명드립니다. 이 자료를 이해하기 위해서 인공지능과 전산과학에 기초적인 지식이 필요할 수 있습니다.
발표자: 김은솔 (서울대 박사과정)
발표일: 2017.6.
2010년 9월부터 서울대 컴퓨터공학부 석박사 통합과정에 재학 중이며, 2014년 6월 젊은 여성과학자로 선정되었다.
개요:
본 발표에서는 사람과 기계가 컨텐츠를 같이 시청하고 컨텐츠의 내용에 대해 자연 언어로 묻고 답할 수 있는 기계 학습 엔진을 소개한다.
Hierarchical multimodal recurrent neural network 기술을 기반으로 컨텐츠에 포함된 이미지, 자막(텍스트), 소리 정보를 sequential하게 결합하여 multimodal episodic memory를 구축하고, 주어진 질문에 필요한 memory를 선택하여 답을 추출할 수 있는 방법을 소개한다.
또한 recurrent neural network으로 multimodal memory를 구축할 때에 long-term sequence를 효율적으로 학습하기 위한 방법으로, reinforcement learning 아이디어를 결합한 방법을 소개한다.
Online video object segmentation via convolutional trident networkNAVER Engineering
발표자: 장원동 (고려대 박사과정)
발표일: 2017.8.
개요:
A semi-supervised online video object segmentation algorithm, which accepts user annotations about a target object at the first frame, will be presented. It propagates the segmentation labels at the previous frame to the current frame using optical flow vectors.
However, the propagation is error-prone. Therefore, I’ve developed the convolutional trident network, which has three decoding branches: separative, definite foreground, and definite background decoders.
Then, the algorithm performs Markov random field optimization based on outputs of the three decoders.
These process is sequentially carried out from the second to the last frames to extract a segment track of the target object.
Experimental results will demonstrate that this algorithm significantly outperforms the state-of-the-art conventional algorithms on the DAVIS benchmark dataset.
발표자: 고영준 (고려대 박사과정)
발표일: 2017.6.
개요:
Algorithms to segment objects in a video sequence will be presented.
First, I will introduce a primary object segmentation algorithm based on region augmentation and reduction. Second, collaborative detection, tracking, and segmentation for online multiple object segmentation will be presented.
발표자: 민세원 (서울대 학사과정)
발표일: 2017.8.
Sewon Min(민세원) is a student at Seoul National University, majoring in computer science. She did her research at University of Washington with Minjoon Seo, Hannaneh Hajishirzi and Ali Farhadi. Her main interest is natural language understanding with a focus on question answering.
개요:
To achieve human-level understanding of natural language, it is crucial to carefully analyze the current state of machine ability to judge what machine can do and what it cannot do. Then, it is required to concern how to expand the current ability of a machine toward human-level. In this talk, I will first describe the current state of machine ability in question answering by analyzing recently well-studied dataset, SQuAD. Next, by focusing on its limitations, I will introduce some of my desired approaches toward the next step. Lastly, I will introduce my work on transfer learning in question answering as one of those approaches.
발표자: 조경현 (NYU 교수)
Kyunghyun Cho is an assistant professor of computer science and data science at New York University.
He was a postdoctoral fellow at University of Montreal until summer 2015, and received PhD and MSc degrees from Aalto University early 2014.
He tries best to find a balance among machine learning, natural language processing and life, but often fails to do so.
개요:
There are three axes along which advances in machine learning and deep learning happen. They are (1) network architectures, (2) learning algorithms and (3) spatio-temporal abstraction.
In this talk, I will describe a set of research topics I’ve pursued in each of these axes.
- For network architectures, I will describe how recurrent neural networks, which were largely forgotten during 90s and early 2000s, have evolved over time and have finally become a de facto standard in machine translation.
- I continue on to discussing various learning paradigms, how they related to each other, and how they are combined in order to build a strong learning system. Along this line, I briefly discuss my latest research on designing a query-efficient imitation learning algorithm for autonomous driving.
- Lastly, I present my view on what it means to be a higher-level learning system. Under this view each and every end-to-end trainable neural network serves as a module, regardless of how they were trained, and interacts with each other in order to solve a higher-level task.
I will describe my latest research on trainable decoding algorithm as a first step toward building such a framework.
발표영상: https://youtu.be/soZXAH3leeQ (본 발표는 영어로 진행됩니다.)
발표자: 김경민 (서울대 박사과정)
발표일: 2017.6.
현재 서울대학교 바이오지능연구실 박사과정이고 써로마인드로보틱스 책임연구원으로 있으며, 관심분야는 딥러닝을 활용한 대화/질의응답 시스템, 비디오 마이닝, 지식베이스 구축입니다.
개요:
비디오 스토리에 대한 질의 응답 문제는 실세계의 시각과 언어를 모두 다루고 있기 때문에 인간수준 인공지능을 달성하기 위해 중요한 문제이다.
본 세미나에서는
비디오 스토리 질의응답을 위한 ‘뽀로로QA’ 데이터셋을 소개한다.
‘뽀로로QA’는 20.5시간 분량의 만화 비디오 ‘뽀로로’의 16,066개 화면-대화 쌍, 27,328개의 화면 설명 문장과 8,913개의 스토리 관련 질의 응답 쌍을 포함한다.
그리고
딥러닝을 활용한 비디오 스토리 질의응답 모델인 심층 임베딩 메모리망을 소개한다.
심층 임베딩 메모리망은 비디오의 화면-대화 스트림을 은닉 임베딩 공간에 매핑시킴으로써 스토리를 이해한다.
발표자: 박태성 (UC Berkeley 박사과정)
발표일: 2017.6.
Taesung Park is a Ph.D. student at UC Berkeley in AI and computer vision, advised by Prof. Alexei Efros.
His research interest lies between computer vision and computational photography, such as generating realistic images or enhancing photo qualities. He received B.S. in mathematics and M.S. in computer science from Stanford University.
개요:
Image-to-image translation is a class of vision and graphics problems where the goal is to learn the mapping between an input image and an output image using a training set of aligned image pairs.
However, for many tasks, paired training data will not be available.
We present an approach for learning to translate an image from a source domain X to a target domain Y in the absence of paired examples.
Our goal is to learn a mapping G: X → Y such that the distribution of images from G(X) is indistinguishable from the distribution Y using an adversarial loss.
Because this mapping is highly under-constrained, we couple it with an inverse mapping F: Y → X and introduce a cycle consistency loss to push F(G(X)) ≈ X (and vice versa).
Qualitative results are presented on several tasks where paired training data does not exist, including collection style transfer, object transfiguration, season transfer, photo enhancement, etc.
Quantitative comparisons against several prior methods demonstrate the superiority of our approach.
발표자: 최윤제(고려대 석사과정)
최윤제 (Yunjey Choi)는 고려대학교에서 컴퓨터공학을 전공하였으며, 현재는 석사과정으로 Machine Learning을 공부하고 있는 학생이다. 코딩을 좋아하며 이해한 것을 다른 사람들에게 공유하는 것을 좋아한다. 1년 간 TensorFlow를 사용하여 Deep Learning을 공부하였고 현재는 PyTorch를 사용하여 Generative Adversarial Network를 공부하고 있다. TensorFlow로 여러 논문들을 구현, PyTorch Tutorial을 만들어 Github에 공개한 이력을 갖고 있다.
개요:
Generative Adversarial Network(GAN)은 2014년 Ian Goodfellow에 의해 처음으로 제안되었으며, 적대적 학습을 통해 실제 데이터의 분포를 추정하는 생성 모델입니다. 최근 들어 GAN은 가장 인기있는 연구 분야로 떠오르고 있고 하루에도 수 많은 관련 논문들이 쏟아져 나오고 있습니다.
수 없이 쏟아져 나오고 있는 GAN 논문들을 다 읽기가 힘드신가요? 괜찮습니다. 기본적인 GAN만 완벽하게 이해한다면 새로 나오는 논문들도 쉽게 이해할 수 있습니다.
이번 발표를 통해 제가 GAN에 대해 알고 있는 모든 것들을 전달해드리고자 합니다. GAN을 아예 모르시는 분들, GAN에 대한 이론적인 내용이 궁금하셨던 분들, GAN을 어떻게 활용할 수 있을지 궁금하셨던 분들이 발표를 들으면 좋을 것 같습니다.
발표영상: https://youtu.be/odpjk7_tGY0
알파고의 작동 원리를 설명한 슬라이드입니다.
English version: http://www.slideshare.net/ShaneSeungwhanMoon/how-alphago-works
- 비전공자 분들을 위한 티저: 바둑 인공지능은 과연 어떻게 만들까요? 딥러닝 딥러닝 하는데 그게 뭘까요? 바둑 인공지능은 또 어디에 쓰일 수 있을까요?
- 전공자 분들을 위한 티저: 알파고의 main components는 재밌게도 CNN (Convolutional Neural Network), 그리고 30년 전부터 유행하던 Reinforcement learning framework와 MCTS (Monte Carlo Tree Search) 정도입니다. 새로울 게 없는 재료들이지만 적절히 활용하는 방법이 신선하네요.
Let Android dream electric sheep: Making emotion model for chat-bot with Pyth...Jeongkyu Shin
summary
Chatbot is the underlying technology of an interactive interface. One of the problems to be solved for popularization of chatbots is the unnaturalness of inhuman conversation. This presentation introduces the process of implementing emotion status reading based on Python 3 for human conversation implementation, and the experience of simulating the emotional state of the bot itself, with the demonstration. We also share the problems and solutions we encountered in implementing the emotional models.
개요
챗봇은 대화형 인터페이스의 기반 기술이다. 챗봇의 대중화를 위해 해결해야 할 문제중 하나는 비인간적 대화에서 오는 부자연스러움이다. 이 발표에서는 인간적인 대화 구현을 위하여 Python 3를 기반으로 감정 상태 읽기를 구현한 과정과, 봇 자체의 감정 상태를 시뮬레이션한 경험을 데모와 함께 소개한다. 또한 감정 모형을 구현하는 과정에서 만났던 문제들 및 해결 방법을 공유한다.
상세
챗봇은 대화형 인터페이스 및 음성 인식과의 결합을 통한 무입력 방식 인터페이스의 기반 기술이다. 챗봇은 고객상담 서비스 분야부터 온라인 구매, 디지털 어시스턴트 등의 다양한 분야에 널러 사용되며, 텍스트 기반의 메신저부터 음성 인식 기반의 스마트 스피커등의 인터페이스를 통해 빠르게 보급되고 있다. 이러한 챗봇의 대중화는 최근 머신러닝을 기반으로 한 자연어 처리 기술의 성능 향상과, 딥러닝 분야의 발전에 힘입어 가능해진 end-to-end 모델 구현에 기술적으로 큰 영향을 받았다.
챗봇이 응용되는 분야가 넓어지고 다양한 분야에서 챗봇 서비스 및 사업이 성장함에 따라 '불편한 골짜기 ' 라고 불리는 비인간적 대화에서 오는 피로 문제가 점차 대두되고 있다. 비인간적 대화가 가져오는 피로는 사용자 경험 및 이용 지속성에 영향을 크게 미친다. 따라서 이 문제는 대화형 디지털 어시스턴트의 대중화 과정에서 어렵지만 우선적으로 해결해야 할 과제들 중 하나가 되었다.
감정 모형은 비인간적 대화를 벗어나 자연스러운 대화를 구현하기 위한 몇 가지 방법 중 하나이다. 이 발표에서는 Python 3를 기반으로 감정 상태 읽기를 구현하고, 감정상태를 시뮬레이션하는 과정에 대한 경험과 접근 방법을 소개한다. Python의 NLTK 패키지를 이용하여 감정 사전 데이터를 생성한다. 그 다음 기존 대화 데이터를 Python 3 및 Pandas를 이용하여 초벌가공한다. 가공한 데이터를 이용하여 wordvec 공간을 정의한다. Wordvec 공간의 각 단어에 감정 데이터를 이용해 만든 태그를 붙여 적절한 위상 공간을 정의한다. 이후 실시간 대화에서 들어오는 단어들을 일정 단위로 입력하여, 현재 화자의 감정 상태 및 감정 변화를 추적한다. 이후 봇의 감정을 담당하는 기계학습 모형을 만들어 학습시키고, 봇의 현재 감정에 따라 답변 문장을 변경하거나 기타 인터페이스를 통해 어필하도록 구현한다. 최종적으로는 봇 인터페이스 및 재미있는 인터페이스 아이디어와 함께 묶어 대화를 시연한다.
이 과정에서 겪은 문제들 및 해결 방법을 함께 소개한다. 우선 NLTK 패키지를 기반으로 감정 사전을 만드는 과정과, 감정 사전 인덱스를 한국어에 맞게 커스텀하는 과정을 설명한다. 그리고 감정 상태를 정의한 공간에서 봇의 감정 변화가 실제 인간과 다르게 심하게 튀는 문제를 고려하는 방법을 설명한다. 다양한 문제들의 해결 방법과 함께, 실제 서비스를 위해 멀티 모드 모델 체인에 컨텍스트 엔진 및 대화 엔진과 감정 모형을 연결하는 과정을 재미있는 데모와 함께 공유하고자 한다.
An efficient use of temporal difference technique in Computer Game LearningPrabhu Kumar
A computer game using temporal difference algorithm of Machine learning which improves the ability of the computer to learn and also explore the best next move for the game by greedy movement techniques and exploration method techniques for the future states of the game.
https://telecombcn-dl.github.io/2018-dlai/
Deep learning technologies are at the core of the current revolution in artificial intelligence for multimedia data analysis. The convergence of large-scale annotated datasets and affordable GPU hardware has allowed the training of neural networks for data analysis tasks which were previously addressed with hand-crafted features. Architectures such as convolutional neural networks, recurrent neural networks or Q-nets for reinforcement learning have shaped a brand new scenario in signal processing. This course will cover the basic principles of deep learning from both an algorithmic and computational perspectives.
Efficient Off-Policy Meta-Reinforcement Learning via Probabilistic Context Va...Dongmin Lee
I reviewed the PEARL paper.
PEARL (Probabilistic Embeddings for Actor-critic RL) is an off-policy meta-RL algorithm to achieve both meta-training and adaptation efficiency. It performs probabilistic encoder filtering of latent task variables to enables posterior sampling for structured and efficient exploration.
Outline
- Abstract
- Introduction
- Probabilistic Latent Context
- Off-Policy Meta-Reinforcement Learning
- Experiments
Link: https://arxiv.org/abs/1903.08254
Thank you!
TensorFlow and Deep Learning Tips and TricksBen Ball
Presented at https://www.meetup.com/TensorFlow-and-Deep-Learning-Singapore/events/241183195/ . Tips and Tricks for using Tensorflow with Deep Reinforcement Learning.
See our blog for more information at http://prediction-machines.com/blog/
Matineh Shaker, Artificial Intelligence Scientist, Bonsai at MLconf SF 2017MLconf
Deep Reinforcement Learning with Shallow Trees:
In this talk, I present Concept Network Reinforcement Learning (CNRL), developed at Bonsai. It is an industrially applicable approach to solving complex tasks using reinforcement learning, which facilitates problem decomposition, allows component reuse, and simplifies reward functions. Inspired by Sutton’s options framework, we introduce the notion of “Concept Networks” which are tree-like structures in which leaves are “sub-concepts” (sub-tasks), representing policies on a subset of state space. The parent (non-leaf) nodes are “Selectors”, containing policies on which sub-concept to choose from the child nodes, at each time during an episode. There will be a high-level overview on the reinforcement learning fundamentals at the beginning of the talk.
Bio: Matineh Shaker is an Artificial Intelligence Scientist at Bonsai in Berkeley, CA, where she builds machine learning, reinforcement learning, and deep learning tools and algorithms for general purpose intelligent systems. She was previously a Machine Learning Researcher at Geometric Intelligence, Data Science Fellow at Insight Data Science, Predoctoral Fellow at Harvard Medical School. She received her PhD from Northeastern University with a dissertation in geometry-inspired manifold learning.
A presentation about NGBoost (Natural Gradient Boosting) which I presented in the Information Theory and Probabilistic Programming course at the University of Oklahoma.
Review :: Demystifying deep reinforcement learning (written by Tambet Matiisen)Hogeon Seo
The link of the original article: https://ai.intel.com/demystifying-deep-reinforcement-learning/
This review summarizes:
How do I learn reinforcement learning?
Reinforcement Learning is Hot!
What is the RL?
General approach to model the RL problem
Maximize the total future reward
A function Q(s,a) = the maximum DFR
How to get Q-function?
Deep Q Network
Experience Replay
Exploration-Exploitation
Big data is set to offer tremendous insight. But with terabytes and petabytes of data pouring in to organizations today, traditional architectures and infrastructures are not up to the challenge. This begs the question: How do you present big data in a way that can be quickly understood and used? These data present tremendous opportunities in data mining, a burgeoning field in computer science that focuses on the development of methods that can extract knowledge from data. In many real world problems, data mining algorithms have access to massive amounts of data. Mining all the available data is prohibitive due to computational (time and memory) constraints. Much of the current research is concerned with scaling up data mining algorithms (i.e. improving on existing data mining algorithms for larger datasets). An alternative approach is to scale down the data. Thus, determining a smallest sufficient training set size that obtains the same accuracy as the entire available dataset remains an important research question. Our research focuses on selecting how many (sampling) instances to present to the data mining algorithm and also how to improve the quality of the data.
Dr. Ashwin Satyanarayana is an Assistant Professor in the Computer Systems Technology department at CityTech. Prior to joining CityTech, Ashwin was a Research Scientist at Microsoft, where he worked on several Big Data problems including Query Reformulation on Microsoft's search engine Bing. Ashwin's prior experience also includes a Senior Research Scientist on the area of Location Analytics at Placed Inc. He holds a PhD in Computer Science (Data Mining) from SUNY, with particular emphasis on Data Mining, Machine Learning and Applied Probability with applications in Real World Learning Problems.
We present our approach for the NIPS 2017 "Learning To Run" challenge. The goal of the challenge is to develop a controller able to run in a complex environment, by training a model with Deep Reinforcement Learning methods.
We follow the approach of the team Reason8 (3rd place). We begin from the algorithm that performed better on the task, DDPG. We implement and benchmark several improvements over vanilla DDPG, including parallel sampling, parameter noise, layer normalization and domain specific changes. We were able to reproduce results of the Reason8 team, obtaining a model able to run for more than 30m.
Similar to Introduction of Deep Reinforcement Learning (20)
비행기 설계를 왜 통일 해야 할까?
디자인 시스템을 하는 이유
비행기들이 다 용도가 다르다...어떻게 설계하지?
맥락이 다른 페이지와 패턴
경유지까지 아직 멀었다... 언제 수리하지?
디자인 시스템을 적용하는 시점
엔지니어랑 얘기해서 정비해야하는데...어떻게 수리하지?
디자인 시스템을 적용하는 프로세스
비행기 설계가 바뀐걸 어떻게 알리지?
디자인 시스템의 전파
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Ramesh Iyer
In today's fast-changing business world, Companies that adapt and embrace new ideas often need help to keep up with the competition. However, fostering a culture of innovation takes much work. It takes vision, leadership and willingness to take risks in the right proportion. Sachin Dev Duggal, co-founder of Builder.ai, has perfected the art of this balance, creating a company culture where creativity and growth are nurtured at each stage.
Key Trends Shaping the Future of Infrastructure.pdfCheryl Hung
Keynote at DIGIT West Expo, Glasgow on 29 May 2024.
Cheryl Hung, ochery.com
Sr Director, Infrastructure Ecosystem, Arm.
The key trends across hardware, cloud and open-source; exploring how these areas are likely to mature and develop over the short and long-term, and then considering how organisations can position themselves to adapt and thrive.
Epistemic Interaction - tuning interfaces to provide information for AI supportAlan Dix
Paper presented at SYNERGY workshop at AVI 2024, Genoa, Italy. 3rd June 2024
https://alandix.com/academic/papers/synergy2024-epistemic/
As machine learning integrates deeper into human-computer interactions, the concept of epistemic interaction emerges, aiming to refine these interactions to enhance system adaptability. This approach encourages minor, intentional adjustments in user behaviour to enrich the data available for system learning. This paper introduces epistemic interaction within the context of human-system communication, illustrating how deliberate interaction design can improve system understanding and adaptation. Through concrete examples, we demonstrate the potential of epistemic interaction to significantly advance human-computer interaction by leveraging intuitive human communication strategies to inform system design and functionality, offering a novel pathway for enriching user-system engagements.
Accelerate your Kubernetes clusters with Varnish CachingThijs Feryn
A presentation about the usage and availability of Varnish on Kubernetes. This talk explores the capabilities of Varnish caching and shows how to use the Varnish Helm chart to deploy it to Kubernetes.
This presentation was delivered at K8SUG Singapore. See https://feryn.eu/presentations/accelerate-your-kubernetes-clusters-with-varnish-caching-k8sug-singapore-28-2024 for more details.
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Tobias Schneck
As AI technology is pushing into IT I was wondering myself, as an “infrastructure container kubernetes guy”, how get this fancy AI technology get managed from an infrastructure operational view? Is it possible to apply our lovely cloud native principals as well? What benefit’s both technologies could bring to each other?
Let me take this questions and provide you a short journey through existing deployment models and use cases for AI software. On practical examples, we discuss what cloud/on-premise strategy we may need for applying it to our own infrastructure to get it to work from an enterprise perspective. I want to give an overview about infrastructure requirements and technologies, what could be beneficial or limiting your AI use cases in an enterprise environment. An interactive Demo will give you some insides, what approaches I got already working for real.
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...DanBrown980551
Do you want to learn how to model and simulate an electrical network from scratch in under an hour?
Then welcome to this PowSyBl workshop, hosted by Rte, the French Transmission System Operator (TSO)!
During the webinar, you will discover the PowSyBl ecosystem as well as handle and study an electrical network through an interactive Python notebook.
PowSyBl is an open source project hosted by LF Energy, which offers a comprehensive set of features for electrical grid modelling and simulation. Among other advanced features, PowSyBl provides:
- A fully editable and extendable library for grid component modelling;
- Visualization tools to display your network;
- Grid simulation tools, such as power flows, security analyses (with or without remedial actions) and sensitivity analyses;
The framework is mostly written in Java, with a Python binding so that Python developers can access PowSyBl functionalities as well.
What you will learn during the webinar:
- For beginners: discover PowSyBl's functionalities through a quick general presentation and the notebook, without needing any expert coding skills;
- For advanced developers: master the skills to efficiently apply PowSyBl functionalities to your real-world scenarios.
GraphRAG is All You need? LLM & Knowledge GraphGuy Korland
Guy Korland, CEO and Co-founder of FalkorDB, will review two articles on the integration of language models with knowledge graphs.
1. Unifying Large Language Models and Knowledge Graphs: A Roadmap.
https://arxiv.org/abs/2306.08302
2. Microsoft Research's GraphRAG paper and a review paper on various uses of knowledge graphs:
https://www.microsoft.com/en-us/research/blog/graphrag-unlocking-llm-discovery-on-narrative-private-data/
Search and Society: Reimagining Information Access for Radical FuturesBhaskar Mitra
The field of Information retrieval (IR) is currently undergoing a transformative shift, at least partly due to the emerging applications of generative AI to information access. In this talk, we will deliberate on the sociotechnical implications of generative AI for information access. We will argue that there is both a critical necessity and an exciting opportunity for the IR community to re-center our research agendas on societal needs while dismantling the artificial separation between the work on fairness, accountability, transparency, and ethics in IR and the rest of IR research. Instead of adopting a reactionary strategy of trying to mitigate potential social harms from emerging technologies, the community should aim to proactively set the research agenda for the kinds of systems we should build inspired by diverse explicitly stated sociotechnical imaginaries. The sociotechnical imaginaries that underpin the design and development of information access technologies needs to be explicitly articulated, and we need to develop theories of change in context of these diverse perspectives. Our guiding future imaginaries must be informed by other academic fields, such as democratic theory and critical theory, and should be co-developed with social science scholars, legal scholars, civil rights and social justice activists, and artists, among others.
Connector Corner: Automate dynamic content and events by pushing a buttonDianaGray10
Here is something new! In our next Connector Corner webinar, we will demonstrate how you can use a single workflow to:
Create a campaign using Mailchimp with merge tags/fields
Send an interactive Slack channel message (using buttons)
Have the message received by managers and peers along with a test email for review
But there’s more:
In a second workflow supporting the same use case, you’ll see:
Your campaign sent to target colleagues for approval
If the “Approve” button is clicked, a Jira/Zendesk ticket is created for the marketing design team
But—if the “Reject” button is pushed, colleagues will be alerted via Slack message
Join us to learn more about this new, human-in-the-loop capability, brought to you by Integration Service connectors.
And...
Speakers:
Akshay Agnihotri, Product Manager
Charlie Greenberg, Host
4. Deep Learning in RL
• DeepRL에서 딥러닝은 그저 하나의
module로써만 사용된다.
• Deep Learning의 강점 :
1. 층을 쌓을 수록 더욱 Abstract
Feature Learning이 가능한 유일
한 알고리즘.
2. Universal Function Approximator.
7. •Supervised Learning :
y = f(x)
•Unsupervised Learning :
x ~ p(x) or x = f(x)
•Reinforcement Learning :
Find a policy, p(a|s) which maximizes the sum of reward
Machine Learning
9. Example of Unsupervised Learning :
Clustering
http://www.frankichamaki.com/data-driven-market-segmentation-more-effective-marketing-to-segments-using-ai/
10. Example of Reinforcement Learning :
Optimal Control Problem
http://graveleylab.cam.uchc.edu/WebData/mduff/older_papers.html
https://studywolf.wordpress.com/2015/03/29/reinforcement-learning-part-3-egocentric-learning/
12. • Discrete state space : S = { A , B }
• State transition probability : P(S’|S) = {PAA = 0.7 , PAB = 0.3 , PBA = 0.5 , PBB = 0.5 }
• Purpose : Finding a steady state distribution
Markov Processes
13. • Discrete state space : S = { A , B }
• State transition probability : P(S’|S) = {PAA = 0.7 , PAB = 0.3 , PBA = 0.5 , PBB = 0.5 }
• Reward function : R(S’=A) = +1 , R(S’=B) = -1
• Purpose : Finding an expected reward distribution
Markov Reward Processes
14. • Discrete state space : S = { A , B }
• Discrete action space : A = { X, Y }
• (Action conditional) State transition probability : P(S’|S , A) = { … }
• Reward function : R(S’=A) = +1 , R(S’=B) = -1
• Purpose : Finding an optimal policy (maximizes the expected sum of future reward)
Markov Decision Processes
15. •Markov decision processes :
a mathematical framework for modeling decision
making.
• MDP are solved via dynamic programming and reinforcement learning.
• Applications : robotics, automated control, economics and manufacturing.
• Examples of MDP :
1) AlphaGo에서는 바둑을 MDP로 정의
2) 자동차 운전을 MDP로 정의
3) 주식시장을 MDP로 정의
Markov Decision Processes
16. • Objective : Finding an optimal policy which maximizes the expected
sum of future rewards
• Algorithms
1) Planning : Exhaustive Search / Dynamic Programming
2) Reinforcement Learning : MC method / TD Learning(Q-learning , …)
Agent-Environment Interaction
17. Discount Factor
• Sum of future rewards in episodic tasks
Gt := Rt+1 + Rt+2 + Rt+3 + … + RT
• Sum of future rewards in continuous tasks
Gt := Rt+1 + Rt+2 + Rt+3 + … + RT + …
Gt ∞ (diverge)
• Sum of discounted future rewards in both case
Gt := Rt+1 + γRt+2 + γ2Rt+3 + … + γT-1RT + …
= γk−1Rt+k
∞
𝒌=𝟏 (converge)
(Rt is bounded / γ : discount factor, 0 <= γ < 1)
22. Value Function
• We will introduce a value function which let us know the expected
sum of future rewards at given state s, following policy π.
1) State-value function
2) Action-value function
23. 1) Policy from state-value function
One-step ahead search for all actions
with state transition probability(model).
2) Policy from action-value function
a = f(s) = 𝑎𝑟𝑔𝑚𝑎𝑥 𝑎 𝑞π
𝑠 , 𝑎
Optimal Control(Policy) with Value Function
24. •Objective : Finding an optimal policy which maximizes
the expected sum of future rewards
•Algorithms
1) Planning : Exhaustive Search / Dynamic Programming
주사위 눈의 평균 = 1*1/6 + 2*1/6 + … + 6*1/6 = 3.5
2) Reinforcement Learning : MC method / TD Learning
주사위 눈의 평균 = 100번을 던져서 나온 눈을 평균 냄 = 3.5
Planning vs Learning
25. Solution of the MDP : Planning
•So our last job is find optimal policy and there are two approaches.
2) Dynamic Programming
1) Exhaustive Search
26. Find Optimal Policy with Exhaustive Search
If we know the one step dynamics of the MDP,
P(s’,r|s,a) , we can do exhaustive search
iteratively until the end step T.
And we can choose the optimal action path, but
this needs O(NT)! A
A
(+1)
B
(-1)
A
(+1)
B
(-1)
A
(+1)
B
(-1)
A
(+1)
B
(-1)
A
(+1)
B
(-1)
A
(+1)
B
(-1)
X
X
Y
Y
A
(+1)
B
(-1)
A
(+1)
B
(-1)
A
(+1)
B
(-1)
A
(+1)
B
(-1)
27. Dynamic Programming
• 전체 문제를 풀 때 중복되는 계산이 발생
이를 subproblem으로 나누어 풀어 중복계산을 없앰
• 전체 path단위로 계산을 하면 앞부분 계산이 항상 중첩
두 step 단위로 subproblem 계산을 끝내고 다음 step으로 넘어감.
28. Dynamic Programming
• We can apply the DP in this problem, and the computational cost
reduces to O(N2T). (But still we need to know the environment
dynamics.)
• DP is a computer science technique which calculates the final goal
value with compositions of cumulative partial values.
29. Policy Iteration
29
• Policy iteration consists of two simultaneous,
interacting processes.
• Policy evaluation (Iterative formulation):
First, make the value function more precise with the
current policy.
• Policy improvement (Greedy selection):
Second, make greedy policy greedy with respect to
the current value function.
30. Policy Iteration
•How can we get the state-value function with DP?
(The state-value is subproblem. And action-value function is also similarly computed.)
Policy Iteration = Policy Evaluation + Policy Improvement
30
32. Solution of the MDP : Learning
•The planning methods must know the perfect dynamics of
environment, P(s’,r|s,a)
•But typically this is really hard to know and empirically impossible.
Therefore we will ignore this and just calculate the mean from
samples.
•This is the starting point of the machine learning is embedded.
1) Monte Carlo Methods
2) Temporal-Difference Learning
(a.k.a reinforcement learning)
32
33. Monte Carlo Methods
33
Starting State Value
S1 Average of G(S1)
S2 Average of G(S2)
S3 Average of G(S2)
Tabular state-value function
34. Monte Carlo Methods
34
•We need a full length of
experience for each starting state.
•This is really time consuming to
update one state while waiting
the terminal of episode.
•Continuous task에는 적용 불가
37. Temporal-Difference Learning
•TD learning is a combination of Monte Carlo idea and
dynamic programming (DP) idea.
•Like Monte Carlo methods, TD methods can learn directly
from raw experience without a model of the environment's
dynamics.
•Like DP, TD methods update estimates based in part without
waiting for a final outcome (they bootstrap).
37
41. On policy / Off policy
•On policy : Target policy = Behavioral policy
there can be only one policy.
This can learn a stochastic policy. Ex) SARSA
•Off policy : Target policy != Behavioral policy
there can be several policies.
Sample efficient. Ex) Q-learning
41
46. Planning + Learning
•There is only a difference between planning and learning. That is
the existence of model.
•So we call planning is model-based method, and learning is
model-free method.
46
49. • Value function approximation with deep learning
Large scale or infinite dimensional state can be solvable
Generalization with deep learning
This needs supervised learning techniques and online moving target regression.
50. Deep Q Networks (DQN)
• Q learning에서 특정 state에 대한 action-value function으로 CNN
을 사용
State
Q(State,Action=0)
Q(State,Action=1)
.
.
.
Q(State,Action=K)
52. Target Network
• Moving Target Regression :
Q를 학습하는 순간, target 값도 같이 변화 학습 성능 저하
Target Network : DQN과 동일한 구조를 가지고 있으며 학습 도
중 weight값이 변하지 않는 별도의 네트워크로 Q(s’,a’)로 고정된
target value를 사용하여 학습 안정성 개선
(Target Network의 weight값들은 주기적으로 DQN의 것을 복사)
53. Replay Memory
•Data inefficiency : 그 동안 경험한 experience들을 off-
policy learning으로 재활용(Mini-batch learning)
•Data correlation : Mini-batch를 On-line으로 구성할 경우
Mini batch 내의 데이터들이 서로 비슷함(Correlated)
한쪽으로 치우친 불균형한 학습이 발생함
Replay Memory : (State, Action, Reward, Next State) 데
이터를 Buffer에 저장해놓고 그 안에서 Random Sampling하
여 Mini-batch를 구성하는 방법으로 해결
54. Reward Clipping
•Reward Scale problem : 도메인에 따라 Reward의 scale이
다르기 때문에 학습되는 Q-value의 크기도 제각각.
이렇게 학습해야하는 Q-value의 variance가 매우 큰 경우
(ex : -1000~1000) 신경망 학습이 어려움.
Reward Clipping: Reward의 크기를 -1~1 사이로 잘라내
어 안정적으로 학습
55. DQN의 성능
• ATARI 2600 고전게임에서 실험
• 절반 이상의 게임에서 사람보다 우수
• 기존방식 (linear)에 비해 월등한 향상
• 일부 게임은 학습에 실패함.
(sparse reward, 복잡한 단계를 가진 게임)
57. DQN의 발전된 모델들 - Double DQN
•Positive bias: max 연산자에 의해 Q-value를 실
제보다 높게 평가하는 경향이 있음
Double Q learning을 사용하여 이 현상을 방,
DQN에 보다 빠른 학습이 가능
58. DQN의 발전된 모델들 - Prioritized Replay Memory
•일반적인 DQN은 Replay Memory에서 Uniform Sampling
TD error가 높은 데이터들이 선택될 확률을 높여,
더 빠르고 효율적인 학습이 가능
59. References
[1] Sutton, Richard S., and Andrew G. Barto. Reinforcement learning: An
introduction. MIT press, 1998.
[2] Mnih, Volodymyr, et al. "Human-level control through deep reinforcement
learning." Nature 518.7540 (2015): 529-533.
[3] Van Hasselt, Hado, Arthur Guez, and David Silver. "Deep Reinforcement
Learning with Double Q-Learning." AAAI. 2016.
[4] Wang, Ziyu, et al. "Dueling network architectures for deep reinforcement
learning." arXiv preprint arXiv:1511.06581 (2015).
[5] Schaul, Tom, et al. "Prioritized experience replay." arXiv preprint
arXiv:1511.05952 (2015).
59