the presentation of the article "Mastering the game of Go with deep neural networks and tree search" given at the Optimization Seminar 2015/2016
Notes:
- All URLs are clickable.
- All citations are clickable (when hovered over the "year" part of "[author year]").
- To download without a SlideShare account, use https://www.dropbox.com/s/p4rnlhoewbedkjg/AlphaGo.pdf?dl=0
- The corresponding leaflet is available at http://www.slideshare.net/KarelHa1/leaflet-for-the-talk-on-alphago
- The source code is available at https://github.com/mathemage/AlphaGo-presentation
This document discusses Go and strategies for developing Go-playing AI programs. It summarizes the state space and game tree complexity of Go compared to other games. Early Go programs used rule-based strategies and domain knowledge. More recent programs like AlphaGo use neural networks trained through reinforcement learning from self-play to predict moves and evaluate board positions, combined with Monte Carlo tree search to achieve superhuman performance at Go.
The document discusses how AlphaGo, a computer program developed by DeepMind, was able to defeat world champion Lee Sedol at the game of Go. It achieved this through a combination of deep learning and tree search techniques. Four deep neural networks were used: three convolutional networks to reduce the action space and search depth through imitation learning, self-play reinforcement learning, and value prediction; and a smaller network for faster simulations. This combination of deep learning and search allowed AlphaGo to master the complex game of Go, demonstrating the capabilities of modern AI.
알파고의 작동 원리를 설명한 슬라이드입니다.
English version: http://www.slideshare.net/ShaneSeungwhanMoon/how-alphago-works
- 비전공자 분들을 위한 티저: 바둑 인공지능은 과연 어떻게 만들까요? 딥러닝 딥러닝 하는데 그게 뭘까요? 바둑 인공지능은 또 어디에 쓰일 수 있을까요?
- 전공자 분들을 위한 티저: 알파고의 main components는 재밌게도 CNN (Convolutional Neural Network), 그리고 30년 전부터 유행하던 Reinforcement learning framework와 MCTS (Monte Carlo Tree Search) 정도입니다. 새로울 게 없는 재료들이지만 적절히 활용하는 방법이 신선하네요.
The document provides an introduction and overview of AlphaGo Zero, including:
- AlphaGo Zero achieved superhuman performance at Go without human data by using self-play reinforcement learning.
- It uses a policy network and Monte Carlo tree search to select moves. The network is trained through self-play games using its own policy and value outputs as training labels.
- Experiments showed AlphaGo Zero outperformed previous AlphaGo versions and human-trained networks, and continued improving with deeper networks and more self-play training.
TensorFlow and Keras are popular deep learning frameworks. TensorFlow is an open source library for numerical computation using data flow graphs. It was developed by Google and is widely used for machine learning and deep learning. Keras is a higher-level neural network API that can run on top of TensorFlow. It focuses on user-friendliness, modularization and extensibility. Both frameworks make building and training neural networks easier through modular layers and built-in optimization algorithms.
AlphaGo uses a novel combination of Monte Carlo tree search and neural networks to master the game of Go. It trains two neural networks - a policy network to predict expert moves and a value network to evaluate board positions. During gameplay, AlphaGo runs multiple Monte Carlo tree simulations that use the neural networks to guide search and evaluate positions. The move selected is the one most frequently visited after all simulations. This approach allowed AlphaGo to defeat world champion Lee Sedol 4-1, achieving a milestone in artificial intelligence.
This document discusses Go and strategies for developing Go-playing AI programs. It summarizes the state space and game tree complexity of Go compared to other games. Early Go programs used rule-based strategies and domain knowledge. More recent programs like AlphaGo use neural networks trained through reinforcement learning from self-play to predict moves and evaluate board positions, combined with Monte Carlo tree search to achieve superhuman performance at Go.
The document discusses how AlphaGo, a computer program developed by DeepMind, was able to defeat world champion Lee Sedol at the game of Go. It achieved this through a combination of deep learning and tree search techniques. Four deep neural networks were used: three convolutional networks to reduce the action space and search depth through imitation learning, self-play reinforcement learning, and value prediction; and a smaller network for faster simulations. This combination of deep learning and search allowed AlphaGo to master the complex game of Go, demonstrating the capabilities of modern AI.
알파고의 작동 원리를 설명한 슬라이드입니다.
English version: http://www.slideshare.net/ShaneSeungwhanMoon/how-alphago-works
- 비전공자 분들을 위한 티저: 바둑 인공지능은 과연 어떻게 만들까요? 딥러닝 딥러닝 하는데 그게 뭘까요? 바둑 인공지능은 또 어디에 쓰일 수 있을까요?
- 전공자 분들을 위한 티저: 알파고의 main components는 재밌게도 CNN (Convolutional Neural Network), 그리고 30년 전부터 유행하던 Reinforcement learning framework와 MCTS (Monte Carlo Tree Search) 정도입니다. 새로울 게 없는 재료들이지만 적절히 활용하는 방법이 신선하네요.
The document provides an introduction and overview of AlphaGo Zero, including:
- AlphaGo Zero achieved superhuman performance at Go without human data by using self-play reinforcement learning.
- It uses a policy network and Monte Carlo tree search to select moves. The network is trained through self-play games using its own policy and value outputs as training labels.
- Experiments showed AlphaGo Zero outperformed previous AlphaGo versions and human-trained networks, and continued improving with deeper networks and more self-play training.
TensorFlow and Keras are popular deep learning frameworks. TensorFlow is an open source library for numerical computation using data flow graphs. It was developed by Google and is widely used for machine learning and deep learning. Keras is a higher-level neural network API that can run on top of TensorFlow. It focuses on user-friendliness, modularization and extensibility. Both frameworks make building and training neural networks easier through modular layers and built-in optimization algorithms.
AlphaGo uses a novel combination of Monte Carlo tree search and neural networks to master the game of Go. It trains two neural networks - a policy network to predict expert moves and a value network to evaluate board positions. During gameplay, AlphaGo runs multiple Monte Carlo tree simulations that use the neural networks to guide search and evaluate positions. The move selected is the one most frequently visited after all simulations. This approach allowed AlphaGo to defeat world champion Lee Sedol 4-1, achieving a milestone in artificial intelligence.
발표자: 최윤제(고려대 석사과정)
최윤제 (Yunjey Choi)는 고려대학교에서 컴퓨터공학을 전공하였으며, 현재는 석사과정으로 Machine Learning을 공부하고 있는 학생이다. 코딩을 좋아하며 이해한 것을 다른 사람들에게 공유하는 것을 좋아한다. 1년 간 TensorFlow를 사용하여 Deep Learning을 공부하였고 현재는 PyTorch를 사용하여 Generative Adversarial Network를 공부하고 있다. TensorFlow로 여러 논문들을 구현, PyTorch Tutorial을 만들어 Github에 공개한 이력을 갖고 있다.
개요:
Generative Adversarial Network(GAN)은 2014년 Ian Goodfellow에 의해 처음으로 제안되었으며, 적대적 학습을 통해 실제 데이터의 분포를 추정하는 생성 모델입니다. 최근 들어 GAN은 가장 인기있는 연구 분야로 떠오르고 있고 하루에도 수 많은 관련 논문들이 쏟아져 나오고 있습니다.
수 없이 쏟아져 나오고 있는 GAN 논문들을 다 읽기가 힘드신가요? 괜찮습니다. 기본적인 GAN만 완벽하게 이해한다면 새로 나오는 논문들도 쉽게 이해할 수 있습니다.
이번 발표를 통해 제가 GAN에 대해 알고 있는 모든 것들을 전달해드리고자 합니다. GAN을 아예 모르시는 분들, GAN에 대한 이론적인 내용이 궁금하셨던 분들, GAN을 어떻게 활용할 수 있을지 궁금하셨던 분들이 발표를 들으면 좋을 것 같습니다.
발표영상: https://youtu.be/odpjk7_tGY0
Zaikun Xu from the Università della Svizzera Italiana presented this deck at the 2016 Switzerland HPC Conference.
“In the past decade, deep learning as a life-changing technology, has gained a huge success on various tasks, including image recognition, speech recognition, machine translation, etc. Pio- neered by several research groups, Geoffrey Hinton (U Toronto), Yoshua Benjio (U Montreal), Yann LeCun(NYU), Juergen Schmiduhuber (IDSIA, Switzerland), Deep learning is a renaissance of neural network in the Big data era.
Neural network is a learning algorithm that consists of input layer, hidden layers and output layers, where each circle represents a neural and the each arrow connection associates with a weight. The way neural network learns is based on how different between the output of output layer and the ground truth, following by calculating the gradients of this discrepancy w.r.b to the weights and adjust the weight accordingly. Ideally, it will find weights that maps input X to target y with error as lower as possible.”
Watch the video presentation: http://insidehpc.com/2016/03/deep-learning/
See more talks in the Swiss Conference Video Gallery: http://insidehpc.com/2016-swiss-hpc-conference/
Sign up for our insideHPC Newsletter: http://insidehpc.com/newsletter
The slides go through the implementation details of Google Deepmind's AlphaGo, a computer Go AI that defeated the European champion. The slides are targeted for beginners in the machine learning area.
Korean version (한국어 버젼): http://www.slideshare.net/ShaneSeungwhanMoon/ss-59226902
안녕하세요.
이번에 '1st 함께하는 딥러닝 컨퍼런스'에서 "안.전.제.일. 강화학습"이란 주제로 발표한 이동민이라고 합니다.
컨퍼런스 관련 링크는 다음과 같습니다.
https://tykimos.github.io/2018/06/28/ISS_1st_Deep_Learning_Conference_All_Together/
그리고 대략적인 개요는 다음과 같습니다.
1. What is Artificial Intelligence?
2. What is Reinforcement Learning?
3. What is Artificial General Intelligence?
4. Planning and Learning
5. Safe Reinforcement Learning
또한 이 자료에는 "Imagination-Augmented Agents for Deep Reinforcement Learning"이라는 논문을 자세히 설명하였습니다.
많은 분들이 보시고 도움이 되셨으면 좋겠습니다~!
This document summarizes generative adversarial networks (GANs) and their applications. It begins by introducing GANs and how they work by having a generator and discriminator play an adversarial game. It then discusses several variants of GANs including DCGAN, LSGAN, conditional GAN, and others. It provides examples of applications such as image-to-image translation, text-to-image synthesis, image generation, and more. It concludes by discussing major GAN variants and potential future applications like helping children learn to draw.
Generative Adversarial Networks (GANs) are a type of deep learning model used for unsupervised machine learning tasks like image generation. GANs work by having two neural networks, a generator and discriminator, compete against each other. The generator creates synthetic images and the discriminator tries to distinguish real images from fake ones. This allows the generator to improve over time at creating more realistic images that can fool the discriminator. The document discusses the intuition behind GANs, provides a PyTorch implementation example, and describes variants like DCGAN, LSGAN, and semi-supervised GANs.
Multi PPT - Agent Actor-Critic for Mixed Cooperative-Competitive EnvironmentsJisang Yoon
MADDPG is a multi-agent actor-critic reinforcement learning algorithm that can operate in mixed cooperative-competitive environments. It uses a decentralized actor and centralized critic architecture. The centralized critic takes the observations and actions of all agents as input to guide learning, even though each agent only controls its own actor. To deal with non-stationary environments, it approximates other agents' policies when they are unknown. It also trains with policy ensembles to prevent overfitting to competitors' strategies. Experiments show MADDPG outperforms decentralized methods on cooperative tasks and its performance benefits from approximating other agents and using policy ensembles in competitive settings.
“Toward Principled Methods for Training GANs, ICLR 2017, 172회 인용”은 Ian Goodfellow의 GAN에 대해서 근본적인 문제점을 제기합니다. 우리는 그냥 이미지를 잘 생성해 주니까 GAN을 사용하는데, 그 원리에 대해서 깊게 생각해 본 적은 없습니다. Generator와 Discriminator의 수렴에 대해서 관심을 가져본 적도 없죠. 이에 대해서 Distance부터 시작해서 수학적으로 질문을 던지는 논문입니다. 결국엔 Data Distribution이 확률 분포로써 어떻게 작용하는지에 대해서 살펴보고자 합니다. 물론 이에 대한 Solution을 제공한 것은 아닙니다만. 이러한 문제 제기는 GAN의 History에서 아주 큰 흐름을 가져왔습니다
- GAN에 대한 흔한 오해
- Kullback Leibler Divergence와 Jensen Shannon Divergence
- GAN 알고리즘의 수학적 분석
- GAN을 Training하는 과정에서 발생하는 치명적인 문제점
- 문제점을 해결하기 위한 시도들
- GAN 테크트리: 그래서 무슨 GAN을 사용할까
Tutorial on Deep Learning in Recommender System, Lars summer school 2019Anoop Deoras
This document provides an outline for a tutorial on deep learning in recommender systems. The tutorial covers various models from linear families such as matrix factorization and topic models, as well as non-linear models using deep learning techniques. It discusses modeling context, interpreting neural network recommender models, and using reinforcement learning in recommender systems. The outline also includes background on Netflix's recommender system and an evolution of recommender models from explicit to implicit feedback and linear to non-linear approaches.
What is TensorFlow? | Introduction to TensorFlow | TensorFlow Tutorial For Be...Simplilearn
This presentation on TensorFlow will help you in understanding what exactly is TensorFlow and how it is used in Deep Learning. TensorFlow is a software library developed by Google for the purposes of conducting machine learning and deep neural network research. In this tutorial, you will learn the fundamentals of TensorFlow concepts, functions, and operations required to implement deep learning algorithms and leverage data like never before. This TensorFlow tutorial is ideal for beginners who want to pursue a career in Deep Learning. Now, let us deep dive into this TensorFlow tutorial and understand what TensorFlow actually is and how to use it.
Below topics are explained in this TensorFlow presentation:
1. What is Deep Learning?
2. Top Deep Learning Libraries
3. Why TensorFlow?
4. What is TensorFlow?
5. What are Tensors?
6. What is a Data Flow Graph?
7. Program Elements in TensorFlow
8. Use case implementation using TensorFlow
Simplilearn’s Deep Learning course will transform you into an expert in deep learning techniques using TensorFlow, the open-source software library designed to conduct machine learning & deep neural network research. With our deep learning course, you’ll master deep learning and TensorFlow concepts, learn to implement algorithms, build artificial neural networks and traverse layers of data abstraction to understand the power of data and prepare you for your new role as deep learning scientist.
Why Deep Learning?
It is one of the most popular software platforms used for deep learning and contains powerful tools to help you build and implement artificial neural networks.
You can gain in-depth knowledge of Deep Learning by taking our Deep Learning certification training course. With Simplilearn’s Deep Learning course, you will prepare for a career as a Deep Learning engineer as you master concepts and techniques including supervised and unsupervised learning, mathematical and heuristic aspects, and hands-on modeling to develop algorithms. Those who complete the course will be able to:
1. Understand the concepts of TensorFlow, its main functions, operations and the execution pipeline
2. Implement deep learning algorithms, understand neural networks and traverse the layers of data abstraction which will empower you to understand data like never before
3. Master and comprehend advanced topics such as convolutional neural networks, recurrent neural networks, training deep networks and high-level interfaces
4. Build deep learning models in TensorFlow and interpret the results
5. Understand the language and fundamental concepts of artificial neural networks
6. Troubleshoot and improve deep learning models
7. Build your own deep learning project
8. Differentiate between machine learning, deep learning and artificial intelligence
Learn more at: https://www.simplilearn.com
Transfer Learning -- The Next Frontier for Machine LearningSebastian Ruder
Sebastian Ruder gave a presentation on transfer learning in machine learning. He began by defining transfer learning as applying knowledge gained from solving one problem to a different but related problem. Transfer learning is now important because machine learning models have matured and are being widely deployed, but often lack labeled data for new tasks or domains. Ruder discussed examples of transfer learning in computer vision and natural language processing. He described his research focus on finding better ways to transfer knowledge between domains, tasks, and languages in large-scale, real-world applications.
Reinforcement Learning (RL) approaches to deal with finding an optimal reward based policy to act in an environment (Charla en Inglés)
However, what has led to their widespread use is its combination with deep neural networks (DNN) i.e., deep reinforcement learning (Deep RL). Recent successes on not only learning to play games but also superseding humans in it and academia-industry research collaborations like for manipulation of objects, locomotion skills, smart grids, etc. have surely demonstrated their case on a wide variety of challenging tasks.
With application spanning across games, robotics, dialogue, healthcare, marketing, energy and many more domains, Deep RL might just be the power that drives the next generation of Artificial Intelligence (AI) agents!
[1312.5602] Playing Atari with Deep Reinforcement LearningSeung Jae Lee
Presentation slides for 'Playing Atari with Deep Reinforcement Learning' by Mnih et al.
You can find more presentation slides in my website:
https://www.endtoend.ai
We present the first deep learning model to successfully learn control policies directly from high-dimensional sensory input using reinforcement learning. The model is a convolutional neural network, trained with a variant of Q-learning, whose input is raw pixels and whose output is a value function estimating future rewards. We apply our method to seven Atari 2600 games from the Arcade Learning Environment, with no adjustment of the architecture or learning algorithm. We find that it outperforms all previous approaches on six of the games and surpasses a human expert on three of them.
1) The document discusses AlphaGo and its use of machine learning techniques like deep neural networks, reinforcement learning, and Monte Carlo tree search to master the game of Go.
2) AlphaGo uses reinforcement learning to learn Go strategies and evaluate board positions by playing many games against itself. It also uses deep neural networks and convolutional neural networks to pattern-match board positions and Monte Carlo tree search to simulate future moves and strategies.
3) By combining these techniques, AlphaGo was able to defeat top human Go players by developing an intuitive understanding of the game and strategizing several moves in advance.
Continuous Control with Deep Reinforcement Learning, lillicrap et al, 2015Chris Ohk
The paper introduces Deep Deterministic Policy Gradient (DDPG), a model-free reinforcement learning algorithm for problems with continuous action spaces. DDPG combines actor-critic methods with experience replay and target networks similar to DQN. It uses a replay buffer to minimize correlations between samples and target networks to provide stable learning targets. The algorithm was able to solve challenging control problems with high-dimensional observation and action spaces, demonstrating the ability of deep reinforcement learning to handle complex, continuous control tasks.
Reinforcement Learning with Deep Energy-Based PoliciesSangwoo Mo
This document discusses reinforcement learning with deep energy-based policies. It motivates using maximum entropy reinforcement learning to find policies that not only maximize reward but also explore possibilities. It presents an approach using energy-based models for the policy and soft Q-learning to find the optimal maximum entropy policy. The method uses neural networks to approximate the soft Q-function and a sampling network to draw samples from the policy. Experiments show maximum entropy policies provide better exploration, initialization, compositionality and robustness compared to deterministic policies.
Mastering the game of Go with deep neural networks and tree search: PresentationKarel Ha
the presentation of the article "Mastering the game of Go with deep neural networks and tree search" given at the Spring School of Combinatorics 2016
Notes:
- All URLs are clickable.
- All citations are clickable (when hovered over the "year" part of "[author year]").
- To download without a SlideShare account, use https://www.dropbox.com/s/4njuiaaou1po0y4/AlphaGo.pdf?dl=0
- The corresponding handout is available at http://www.slideshare.net/KarelHa1/mastering-the-game-of-go-with-deep-neural-networks-and-tree-search-handout
- The video is available at https://youtu.be/Lso2kE58JrI
- The source code is available at https://github.com/mathemage/AlphaGo-presentation
AI Supremacy in Games: Deep Blue, Watson, Cepheus, AlphaGo, DeepStack and Ten...Karel Ha
my presentation on AI Supremacy in Games: Deep Blue, Watson, Cepheus, AlphaGo, DeepStack and TensorCFR for STRETNUTIE DOKTORANDOV on 13. 6. 2018 (https://zona.fmph.uniba.sk/detail-novinky/back_to_page/fmfi-uk-zona/article/stretnutie-doktorandov-1362018/calendar_date/2018/june/)
LaTeX source code is available at https://github.com/mathemage/AISupremacyInGames-presentation
발표자: 최윤제(고려대 석사과정)
최윤제 (Yunjey Choi)는 고려대학교에서 컴퓨터공학을 전공하였으며, 현재는 석사과정으로 Machine Learning을 공부하고 있는 학생이다. 코딩을 좋아하며 이해한 것을 다른 사람들에게 공유하는 것을 좋아한다. 1년 간 TensorFlow를 사용하여 Deep Learning을 공부하였고 현재는 PyTorch를 사용하여 Generative Adversarial Network를 공부하고 있다. TensorFlow로 여러 논문들을 구현, PyTorch Tutorial을 만들어 Github에 공개한 이력을 갖고 있다.
개요:
Generative Adversarial Network(GAN)은 2014년 Ian Goodfellow에 의해 처음으로 제안되었으며, 적대적 학습을 통해 실제 데이터의 분포를 추정하는 생성 모델입니다. 최근 들어 GAN은 가장 인기있는 연구 분야로 떠오르고 있고 하루에도 수 많은 관련 논문들이 쏟아져 나오고 있습니다.
수 없이 쏟아져 나오고 있는 GAN 논문들을 다 읽기가 힘드신가요? 괜찮습니다. 기본적인 GAN만 완벽하게 이해한다면 새로 나오는 논문들도 쉽게 이해할 수 있습니다.
이번 발표를 통해 제가 GAN에 대해 알고 있는 모든 것들을 전달해드리고자 합니다. GAN을 아예 모르시는 분들, GAN에 대한 이론적인 내용이 궁금하셨던 분들, GAN을 어떻게 활용할 수 있을지 궁금하셨던 분들이 발표를 들으면 좋을 것 같습니다.
발표영상: https://youtu.be/odpjk7_tGY0
Zaikun Xu from the Università della Svizzera Italiana presented this deck at the 2016 Switzerland HPC Conference.
“In the past decade, deep learning as a life-changing technology, has gained a huge success on various tasks, including image recognition, speech recognition, machine translation, etc. Pio- neered by several research groups, Geoffrey Hinton (U Toronto), Yoshua Benjio (U Montreal), Yann LeCun(NYU), Juergen Schmiduhuber (IDSIA, Switzerland), Deep learning is a renaissance of neural network in the Big data era.
Neural network is a learning algorithm that consists of input layer, hidden layers and output layers, where each circle represents a neural and the each arrow connection associates with a weight. The way neural network learns is based on how different between the output of output layer and the ground truth, following by calculating the gradients of this discrepancy w.r.b to the weights and adjust the weight accordingly. Ideally, it will find weights that maps input X to target y with error as lower as possible.”
Watch the video presentation: http://insidehpc.com/2016/03/deep-learning/
See more talks in the Swiss Conference Video Gallery: http://insidehpc.com/2016-swiss-hpc-conference/
Sign up for our insideHPC Newsletter: http://insidehpc.com/newsletter
The slides go through the implementation details of Google Deepmind's AlphaGo, a computer Go AI that defeated the European champion. The slides are targeted for beginners in the machine learning area.
Korean version (한국어 버젼): http://www.slideshare.net/ShaneSeungwhanMoon/ss-59226902
안녕하세요.
이번에 '1st 함께하는 딥러닝 컨퍼런스'에서 "안.전.제.일. 강화학습"이란 주제로 발표한 이동민이라고 합니다.
컨퍼런스 관련 링크는 다음과 같습니다.
https://tykimos.github.io/2018/06/28/ISS_1st_Deep_Learning_Conference_All_Together/
그리고 대략적인 개요는 다음과 같습니다.
1. What is Artificial Intelligence?
2. What is Reinforcement Learning?
3. What is Artificial General Intelligence?
4. Planning and Learning
5. Safe Reinforcement Learning
또한 이 자료에는 "Imagination-Augmented Agents for Deep Reinforcement Learning"이라는 논문을 자세히 설명하였습니다.
많은 분들이 보시고 도움이 되셨으면 좋겠습니다~!
This document summarizes generative adversarial networks (GANs) and their applications. It begins by introducing GANs and how they work by having a generator and discriminator play an adversarial game. It then discusses several variants of GANs including DCGAN, LSGAN, conditional GAN, and others. It provides examples of applications such as image-to-image translation, text-to-image synthesis, image generation, and more. It concludes by discussing major GAN variants and potential future applications like helping children learn to draw.
Generative Adversarial Networks (GANs) are a type of deep learning model used for unsupervised machine learning tasks like image generation. GANs work by having two neural networks, a generator and discriminator, compete against each other. The generator creates synthetic images and the discriminator tries to distinguish real images from fake ones. This allows the generator to improve over time at creating more realistic images that can fool the discriminator. The document discusses the intuition behind GANs, provides a PyTorch implementation example, and describes variants like DCGAN, LSGAN, and semi-supervised GANs.
Multi PPT - Agent Actor-Critic for Mixed Cooperative-Competitive EnvironmentsJisang Yoon
MADDPG is a multi-agent actor-critic reinforcement learning algorithm that can operate in mixed cooperative-competitive environments. It uses a decentralized actor and centralized critic architecture. The centralized critic takes the observations and actions of all agents as input to guide learning, even though each agent only controls its own actor. To deal with non-stationary environments, it approximates other agents' policies when they are unknown. It also trains with policy ensembles to prevent overfitting to competitors' strategies. Experiments show MADDPG outperforms decentralized methods on cooperative tasks and its performance benefits from approximating other agents and using policy ensembles in competitive settings.
“Toward Principled Methods for Training GANs, ICLR 2017, 172회 인용”은 Ian Goodfellow의 GAN에 대해서 근본적인 문제점을 제기합니다. 우리는 그냥 이미지를 잘 생성해 주니까 GAN을 사용하는데, 그 원리에 대해서 깊게 생각해 본 적은 없습니다. Generator와 Discriminator의 수렴에 대해서 관심을 가져본 적도 없죠. 이에 대해서 Distance부터 시작해서 수학적으로 질문을 던지는 논문입니다. 결국엔 Data Distribution이 확률 분포로써 어떻게 작용하는지에 대해서 살펴보고자 합니다. 물론 이에 대한 Solution을 제공한 것은 아닙니다만. 이러한 문제 제기는 GAN의 History에서 아주 큰 흐름을 가져왔습니다
- GAN에 대한 흔한 오해
- Kullback Leibler Divergence와 Jensen Shannon Divergence
- GAN 알고리즘의 수학적 분석
- GAN을 Training하는 과정에서 발생하는 치명적인 문제점
- 문제점을 해결하기 위한 시도들
- GAN 테크트리: 그래서 무슨 GAN을 사용할까
Tutorial on Deep Learning in Recommender System, Lars summer school 2019Anoop Deoras
This document provides an outline for a tutorial on deep learning in recommender systems. The tutorial covers various models from linear families such as matrix factorization and topic models, as well as non-linear models using deep learning techniques. It discusses modeling context, interpreting neural network recommender models, and using reinforcement learning in recommender systems. The outline also includes background on Netflix's recommender system and an evolution of recommender models from explicit to implicit feedback and linear to non-linear approaches.
What is TensorFlow? | Introduction to TensorFlow | TensorFlow Tutorial For Be...Simplilearn
This presentation on TensorFlow will help you in understanding what exactly is TensorFlow and how it is used in Deep Learning. TensorFlow is a software library developed by Google for the purposes of conducting machine learning and deep neural network research. In this tutorial, you will learn the fundamentals of TensorFlow concepts, functions, and operations required to implement deep learning algorithms and leverage data like never before. This TensorFlow tutorial is ideal for beginners who want to pursue a career in Deep Learning. Now, let us deep dive into this TensorFlow tutorial and understand what TensorFlow actually is and how to use it.
Below topics are explained in this TensorFlow presentation:
1. What is Deep Learning?
2. Top Deep Learning Libraries
3. Why TensorFlow?
4. What is TensorFlow?
5. What are Tensors?
6. What is a Data Flow Graph?
7. Program Elements in TensorFlow
8. Use case implementation using TensorFlow
Simplilearn’s Deep Learning course will transform you into an expert in deep learning techniques using TensorFlow, the open-source software library designed to conduct machine learning & deep neural network research. With our deep learning course, you’ll master deep learning and TensorFlow concepts, learn to implement algorithms, build artificial neural networks and traverse layers of data abstraction to understand the power of data and prepare you for your new role as deep learning scientist.
Why Deep Learning?
It is one of the most popular software platforms used for deep learning and contains powerful tools to help you build and implement artificial neural networks.
You can gain in-depth knowledge of Deep Learning by taking our Deep Learning certification training course. With Simplilearn’s Deep Learning course, you will prepare for a career as a Deep Learning engineer as you master concepts and techniques including supervised and unsupervised learning, mathematical and heuristic aspects, and hands-on modeling to develop algorithms. Those who complete the course will be able to:
1. Understand the concepts of TensorFlow, its main functions, operations and the execution pipeline
2. Implement deep learning algorithms, understand neural networks and traverse the layers of data abstraction which will empower you to understand data like never before
3. Master and comprehend advanced topics such as convolutional neural networks, recurrent neural networks, training deep networks and high-level interfaces
4. Build deep learning models in TensorFlow and interpret the results
5. Understand the language and fundamental concepts of artificial neural networks
6. Troubleshoot and improve deep learning models
7. Build your own deep learning project
8. Differentiate between machine learning, deep learning and artificial intelligence
Learn more at: https://www.simplilearn.com
Transfer Learning -- The Next Frontier for Machine LearningSebastian Ruder
Sebastian Ruder gave a presentation on transfer learning in machine learning. He began by defining transfer learning as applying knowledge gained from solving one problem to a different but related problem. Transfer learning is now important because machine learning models have matured and are being widely deployed, but often lack labeled data for new tasks or domains. Ruder discussed examples of transfer learning in computer vision and natural language processing. He described his research focus on finding better ways to transfer knowledge between domains, tasks, and languages in large-scale, real-world applications.
Reinforcement Learning (RL) approaches to deal with finding an optimal reward based policy to act in an environment (Charla en Inglés)
However, what has led to their widespread use is its combination with deep neural networks (DNN) i.e., deep reinforcement learning (Deep RL). Recent successes on not only learning to play games but also superseding humans in it and academia-industry research collaborations like for manipulation of objects, locomotion skills, smart grids, etc. have surely demonstrated their case on a wide variety of challenging tasks.
With application spanning across games, robotics, dialogue, healthcare, marketing, energy and many more domains, Deep RL might just be the power that drives the next generation of Artificial Intelligence (AI) agents!
[1312.5602] Playing Atari with Deep Reinforcement LearningSeung Jae Lee
Presentation slides for 'Playing Atari with Deep Reinforcement Learning' by Mnih et al.
You can find more presentation slides in my website:
https://www.endtoend.ai
We present the first deep learning model to successfully learn control policies directly from high-dimensional sensory input using reinforcement learning. The model is a convolutional neural network, trained with a variant of Q-learning, whose input is raw pixels and whose output is a value function estimating future rewards. We apply our method to seven Atari 2600 games from the Arcade Learning Environment, with no adjustment of the architecture or learning algorithm. We find that it outperforms all previous approaches on six of the games and surpasses a human expert on three of them.
1) The document discusses AlphaGo and its use of machine learning techniques like deep neural networks, reinforcement learning, and Monte Carlo tree search to master the game of Go.
2) AlphaGo uses reinforcement learning to learn Go strategies and evaluate board positions by playing many games against itself. It also uses deep neural networks and convolutional neural networks to pattern-match board positions and Monte Carlo tree search to simulate future moves and strategies.
3) By combining these techniques, AlphaGo was able to defeat top human Go players by developing an intuitive understanding of the game and strategizing several moves in advance.
Continuous Control with Deep Reinforcement Learning, lillicrap et al, 2015Chris Ohk
The paper introduces Deep Deterministic Policy Gradient (DDPG), a model-free reinforcement learning algorithm for problems with continuous action spaces. DDPG combines actor-critic methods with experience replay and target networks similar to DQN. It uses a replay buffer to minimize correlations between samples and target networks to provide stable learning targets. The algorithm was able to solve challenging control problems with high-dimensional observation and action spaces, demonstrating the ability of deep reinforcement learning to handle complex, continuous control tasks.
Reinforcement Learning with Deep Energy-Based PoliciesSangwoo Mo
This document discusses reinforcement learning with deep energy-based policies. It motivates using maximum entropy reinforcement learning to find policies that not only maximize reward but also explore possibilities. It presents an approach using energy-based models for the policy and soft Q-learning to find the optimal maximum entropy policy. The method uses neural networks to approximate the soft Q-function and a sampling network to draw samples from the policy. Experiments show maximum entropy policies provide better exploration, initialization, compositionality and robustness compared to deterministic policies.
Mastering the game of Go with deep neural networks and tree search: PresentationKarel Ha
the presentation of the article "Mastering the game of Go with deep neural networks and tree search" given at the Spring School of Combinatorics 2016
Notes:
- All URLs are clickable.
- All citations are clickable (when hovered over the "year" part of "[author year]").
- To download without a SlideShare account, use https://www.dropbox.com/s/4njuiaaou1po0y4/AlphaGo.pdf?dl=0
- The corresponding handout is available at http://www.slideshare.net/KarelHa1/mastering-the-game-of-go-with-deep-neural-networks-and-tree-search-handout
- The video is available at https://youtu.be/Lso2kE58JrI
- The source code is available at https://github.com/mathemage/AlphaGo-presentation
AI Supremacy in Games: Deep Blue, Watson, Cepheus, AlphaGo, DeepStack and Ten...Karel Ha
my presentation on AI Supremacy in Games: Deep Blue, Watson, Cepheus, AlphaGo, DeepStack and TensorCFR for STRETNUTIE DOKTORANDOV on 13. 6. 2018 (https://zona.fmph.uniba.sk/detail-novinky/back_to_page/fmfi-uk-zona/article/stretnutie-doktorandov-1362018/calendar_date/2018/june/)
LaTeX source code is available at https://github.com/mathemage/AISupremacyInGames-presentation
Yuandong Tian at AI Frontiers: AI in Games: Achievements and ChallengesAI Frontiers
Recently, substantial progress of AI has been made in applications that require advanced pattern reading, including computer vision, speech recognition and natural language processing. However, it remains an open problem whether AI will make the same level of progress in tasks that require sophisticated reasoning, planning and decision making in complicated game environments similar to the real-world. In this talk, I present the state-of-the-art approaches to build such an AI, our recent contributions in terms of designing more effective algorithms and building extensive and fast general environments and platforms, as well as issues and challenges.
This document describes a system for recommending indie video games to users based on games they already like. It contains information on about 11,000 indie games from databases like IndieDB. The system takes in a summary of a game the user likes, stems and vectors the words, then calculates similarity scores to return the top matching indie games. It can optionally narrow matches based on genre tags. The system aims to provide better recommendations than only copy-pasted summaries or inadequate summary data.
Novel machine learning techniques comes from spending time with people that have distinct needs. This talk addresses how listening to end users can give rise to novel machine learning applications.
The document is a presentation about gaming programs at libraries. It discusses why libraries should offer gaming, how to create gaming experiences for patrons, popular games and gaming devices, and examples of successful gaming programs at other libraries. It provides guidance on collection development, programming, and next steps for starting a gaming program.
Как мы сделали многопользовательскую браузерную игру для HL++ с воксельной гр...Ontico
HighLoad++ 2017
Зал «Москва», 7 ноября, 14:00
Тезисы:
http://www.highload.ru/2017/abstracts/2881.html
У компании Ingram Micro Cloud стенд на HL. На нем мы организуем браузерную игру TheURBN (urbn.odn.pw) с воксельной графикой, в которой каждый может захватывать территорию общего мира и строить небоскребы при помощи кубиков, а за процессом можно наблюдать на стенде. На экранах мы будем в режиме реального времени показывать виртуальный 3D-мир, в котором участники будут строить небоскребы. Большего погружения можно получить в шлеме VR Oculus Rift на нашем стенде.
...
Artificial Intelligence, the Past, the Present & the FutureMathieu Croset
The document provides an overview of the history and future of artificial intelligence. It discusses how the cost of memory has declined tremendously over the past 60 years, from $32,000 per megabyte in 1956 to less than $0.0001 per megabyte today. The document also chronicles several important milestones in AI, including Deep Blue defeating Kasparov at chess in 1997, self-driving cars being developed by Google in 2009, and AI assistants being created by companies like Apple, Google and Microsoft between 2011-2014. It speculates that fully autonomous vehicles could replace human-driven cars within the next 20 years.
"Embark On a Cloud Odyssey" Cloud Campaign Induction ProgramAnayPund
🌟 "Embark On a Cloud Odyssey"🌟
🌟 GDSC Google Cloud Study Jam 2023 Induction Program 🌟
🚀 Welcome to the Future of Cloud Technology!
Are you ready to embark on an exciting journey into the world of Google Cloud? Join us for the GDSC Google Cloud Study Jam 2023 Induction Program, a thrilling event designed to kickstart your cloud computing adventure.
🎉 Why Attend:
- Explore the cutting-edge technologies powering Google Cloud.
- Build practical skills through guided exercises.
- Boost your resume with valuable cloud computing experience.
- Connect with a dynamic community of tech enthusiasts.
- Win prizes and bragging rights!
🌥️ Cloudy skies are ahead – let's soar together!🚀
In this talk we discuss about the aplicação of Reinforcement Learning to Games. Recently, OpenAI created an algorithm capable of beating a human team in DOTA, considered a game with great amount of complexity and strategy. In this talk, we'll evaluate the role Reinforcement Learning plays in the world of games, taking a look at some of main achievements and how they look like in terms of implementation. We'll also take a look at some of the history of AI applied to games and how things evolved over time.
Adam Streck - Reinforcement Learning in Unity. Teach Your Monsters - Codemoti...Codemotion
With the advent of deep learning many of the tasks in computer science that have been deemed impossible suddenly became only a few clicks away. One of the approaches made available is reinforcement learning - a method for solving problems by establishing an action-reward scheme. Combined with the power and availability of the general-purpose game engines, anyone with a rudimentary knowledge of the topic can create and train their virtual creatures. In this talk we will use this power to solve one of the most frustratingly difficult (according to the internet) games of our era.
Adam Streck - Reinforcement Learning in Unity - Teach Your Monsters - Codemot...Codemotion
With the advent of deep learning many of the tasks in computer science that have been deemed impossible suddenly became only a few clicks away. One of the approaches made available is reinforcement learning - a method for solving problems by establishing an action-reward scheme. Combined with the power and availability of the general-purpose game engines, anyone with a rudimentary knowledge of the topic can create and train their virtual creatures. In this talk we will use this power to solve one of the most frustratingly difficult (according to the internet) games of our era.
AI is used to create parts of our games. It provides intelligent enemy behavior, techniques such as pathfinding or can be used to generate in-game content procedurally. AI can also play our games. The idea to train computers to beat humans in game-like environments such as Jeopardy!, Chess, or soccer is not a new one. But can AI also design our games? The role of Artificial Intelligence in the game development process is constantly expanding. In this talk, Dr. Pirker will talk about the importance of AI in the past, the present, and especially the future of game development.
Learning Ethics with the Game, Fallout Shelter by Sherry Jones (Apr. 6, 2018)Sherry Jones
April 6, 2018 - This presentation was shown at the 2018 eLearning Consortium of Colorado Conference. The presentation addresses the rhetoric and ethics of the game, Fallout Shelter (2015), a free-to-play simulation game developed by Bethesda Game and Behaviour Interactive, and published by Bethesda Softworks.
The presentation explores the rhetoric and the ethics of Fallout Shelter using the following theories (from philosophy, political science, cultural studies, and psychology): Capitalism; Authoritarianism; Plutocracy; McCarthyism; Eugenics; Ageism; Egoism; Altruism.
Additional topics explored are: Nuclear War; Nuclear Fallout; Counterfactual History; Red Scare; Atomic Bomb; Atomic Age; U.S. History in the 1950s.
Presentation covers a number of Google's services and their applications to reference service or one's own curiosity (Presented as a 45-minute talk at the Mansfield / Richland County Public Library Staff Day, September 2016)
The document provides an introduction to the game Arimaa, summarizing what the game is, its history and development, the ongoing challenge for AI to defeat top human players, why the game is difficult for computers to master, the current status of the challenge, and how individuals can participate in playing or developing for the game.
Game Hacking discusses various techniques for hacking console, DOS, and Windows games. These include using devices like Game Genie to modify NES games, memory scanning DOS games to change values like health and ammo, hex editing save files, using debuggers like OllyDbg to modify StarCraft map code, and exploiting flaws in game logic or servers. Memory hacking is described as a common technique to achieve hacks like teleporting or speed increases in games like World of Warcraft.
The document discusses the Yahoo! User Interface (YUI) library and how the author initially dismissed it as bloated but came to appreciate its benefits for collaboration, cross-browser compatibility, and providing robust reusable components. It provides an overview of the key features of YUI including components for DOM manipulation, events, animation, AJAX, and design patterns. It encourages developers to use YUI APIs in their hacks and provides resources for getting started.
Similar to AlphaGo: Mastering the Game of Go with Deep Neural Networks and Tree Search (20)
Recent technological advances in DNA/RNA sequencing allow tackling some most important questions in many biological fields, including evolutionary genetics. Monitoring genomic signatures of natural selection are key to gain insights into such diverse phenomena as the evolution of susceptibility to common human diseases, as well as resistance to antibiotics and pesticides.
With access to large-scale population genomic data, we now have the opportunity to understand how evolution has shaped individual genomes. In particular, we can look into one of the most elusive questions in evolutionary biology: the extent to which natural selection has driven beneficial alleles to spread in time and space, within and among populations.
For this, the use of deep neural networks is a natural and effective solution, as it integrates the predictive power of machine learning with scalability to large datasets.
As a particular test case, my research will focus on novel deep learning methods to study the spread of insecticide-resistance in Anopheles gambiae, the malaria vector mosquitoes. Specifically, I will strive to:
a) incorporate temporal dimension for time-series predictions using sequence models (such as recurrent neural networks or dilated convolutional networks);
b) seek the optimal representation of population genomic data for machine learning;
c) and experiment with various ways to estimate probabilities of mutations migrating between populations.
- Capsule networks aim to address limitations of CNNs by modeling part-whole relationships through capsules that encode object properties like pose.
- A capsule is a group of neurons that outputs a vector encoding the probability an entity is present and its instantiation parameters like pose.
- Routing by agreement dynamically routes information between capsules, with lower-level capsules voting for higher-level capsules by adjusting routing weights to agree on instantiation parameters.
- Capsule networks show promise in tasks requiring reasoning about relationships like digit recognition, achieving state-of-the-art or competitive performance with much smaller networks.
AlphaZero is an AI system created by DeepMind that achieved superhuman ability in the games of chess, shogi, and Go without relying on human data. It uses a new form of deep reinforcement learning combined with Monte Carlo tree search to learn from games generated by self-play. AlphaZero was able to master each game to superhuman level in a matter of hours, defeating the previous world-champion programs in each case. It represents a major advance in unsupervised, self-taught machine learning.
Solving Endgames in Large Imperfect-Information Games such as PokerKarel Ha
My master thesis on solving endgames in imperfect-information games.
keywords: algorithmic game theory, imperfect-information games, Nash equilibrium, subgame, endgame, counterfactual regret minimization, Poker
This is the final report for my project as a Technical Student at CERN.
The Intel Xeon/Phi platform is a powerful x86 multi-core engine with a very high-speed memory interface. In its next version it will be able to operate as a stand-alone system with a very high-speed interconnect. This makes it a very interesting candidate for (near) real-time applications such as event-building, event-sorting and event preparation for subsequent processing by high level trigger software algorithms.
The document outlines key concepts in algorithmic game theory, including solution concepts like Nash equilibrium, dominant strategies, and correlated equilibrium. It also discusses different representations of games and examples like the prisoner's dilemma. The document provides definitions for fundamental game theory topics and outlines the structure of simultaneous move games involving multiple players with their own strategy sets.
Remote Sensing and Computational, Evolutionary, Supercomputing, and Intellige...University of Maribor
Slides from talk:
Aleš Zamuda: Remote Sensing and Computational, Evolutionary, Supercomputing, and Intelligent Systems.
11th International Conference on Electrical, Electronics and Computer Engineering (IcETRAN), Niš, 3-6 June 2024
Inter-Society Networking Panel GRSS/MTT-S/CIS Panel Session: Promoting Connection and Cooperation
https://www.etran.rs/2024/en/home-english/
Unlocking the mysteries of reproduction: Exploring fecundity and gonadosomati...AbdullaAlAsif1
The pygmy halfbeak Dermogenys colletei, is known for its viviparous nature, this presents an intriguing case of relatively low fecundity, raising questions about potential compensatory reproductive strategies employed by this species. Our study delves into the examination of fecundity and the Gonadosomatic Index (GSI) in the Pygmy Halfbeak, D. colletei (Meisner, 2001), an intriguing viviparous fish indigenous to Sarawak, Borneo. We hypothesize that the Pygmy halfbeak, D. colletei, may exhibit unique reproductive adaptations to offset its low fecundity, thus enhancing its survival and fitness. To address this, we conducted a comprehensive study utilizing 28 mature female specimens of D. colletei, carefully measuring fecundity and GSI to shed light on the reproductive adaptations of this species. Our findings reveal that D. colletei indeed exhibits low fecundity, with a mean of 16.76 ± 2.01, and a mean GSI of 12.83 ± 1.27, providing crucial insights into the reproductive mechanisms at play in this species. These results underscore the existence of unique reproductive strategies in D. colletei, enabling its adaptation and persistence in Borneo's diverse aquatic ecosystems, and call for further ecological research to elucidate these mechanisms. This study lends to a better understanding of viviparous fish in Borneo and contributes to the broader field of aquatic ecology, enhancing our knowledge of species adaptations to unique ecological challenges.
Travis Hills' Endeavors in Minnesota: Fostering Environmental and Economic Pr...Travis Hills MN
Travis Hills of Minnesota developed a method to convert waste into high-value dry fertilizer, significantly enriching soil quality. By providing farmers with a valuable resource derived from waste, Travis Hills helps enhance farm profitability while promoting environmental stewardship. Travis Hills' sustainable practices lead to cost savings and increased revenue for farmers by improving resource efficiency and reducing waste.
The ability to recreate computational results with minimal effort and actionable metrics provides a solid foundation for scientific research and software development. When people can replicate an analysis at the touch of a button using open-source software, open data, and methods to assess and compare proposals, it significantly eases verification of results, engagement with a diverse range of contributors, and progress. However, we have yet to fully achieve this; there are still many sociotechnical frictions.
Inspired by David Donoho's vision, this talk aims to revisit the three crucial pillars of frictionless reproducibility (data sharing, code sharing, and competitive challenges) with the perspective of deep software variability.
Our observation is that multiple layers — hardware, operating systems, third-party libraries, software versions, input data, compile-time options, and parameters — are subject to variability that exacerbates frictions but is also essential for achieving robust, generalizable results and fostering innovation. I will first review the literature, providing evidence of how the complex variability interactions across these layers affect qualitative and quantitative software properties, thereby complicating the reproduction and replication of scientific studies in various fields.
I will then present some software engineering and AI techniques that can support the strategic exploration of variability spaces. These include the use of abstractions and models (e.g., feature models), sampling strategies (e.g., uniform, random), cost-effective measurements (e.g., incremental build of software configurations), and dimensionality reduction methods (e.g., transfer learning, feature selection, software debloating).
I will finally argue that deep variability is both the problem and solution of frictionless reproducibility, calling the software science community to develop new methods and tools to manage variability and foster reproducibility in software systems.
Exposé invité Journées Nationales du GDR GPL 2024
EWOCS-I: The catalog of X-ray sources in Westerlund 1 from the Extended Weste...Sérgio Sacani
Context. With a mass exceeding several 104 M⊙ and a rich and dense population of massive stars, supermassive young star clusters
represent the most massive star-forming environment that is dominated by the feedback from massive stars and gravitational interactions
among stars.
Aims. In this paper we present the Extended Westerlund 1 and 2 Open Clusters Survey (EWOCS) project, which aims to investigate
the influence of the starburst environment on the formation of stars and planets, and on the evolution of both low and high mass stars.
The primary targets of this project are Westerlund 1 and 2, the closest supermassive star clusters to the Sun.
Methods. The project is based primarily on recent observations conducted with the Chandra and JWST observatories. Specifically,
the Chandra survey of Westerlund 1 consists of 36 new ACIS-I observations, nearly co-pointed, for a total exposure time of 1 Msec.
Additionally, we included 8 archival Chandra/ACIS-S observations. This paper presents the resulting catalog of X-ray sources within
and around Westerlund 1. Sources were detected by combining various existing methods, and photon extraction and source validation
were carried out using the ACIS-Extract software.
Results. The EWOCS X-ray catalog comprises 5963 validated sources out of the 9420 initially provided to ACIS-Extract, reaching a
photon flux threshold of approximately 2 × 10−8 photons cm−2
s
−1
. The X-ray sources exhibit a highly concentrated spatial distribution,
with 1075 sources located within the central 1 arcmin. We have successfully detected X-ray emissions from 126 out of the 166 known
massive stars of the cluster, and we have collected over 71 000 photons from the magnetar CXO J164710.20-455217.
Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...Ana Luísa Pinho
Functional Magnetic Resonance Imaging (fMRI) provides means to characterize brain activations in response to behavior. However, cognitive neuroscience has been limited to group-level effects referring to the performance of specific tasks. To obtain the functional profile of elementary cognitive mechanisms, the combination of brain responses to many tasks is required. Yet, to date, both structural atlases and parcellation-based activations do not fully account for cognitive function and still present several limitations. Further, they do not adapt overall to individual characteristics. In this talk, I will give an account of deep-behavioral phenotyping strategies, namely data-driven methods in large task-fMRI datasets, to optimize functional brain-data collection and improve inference of effects-of-interest related to mental processes. Key to this approach is the employment of fast multi-functional paradigms rich on features that can be well parametrized and, consequently, facilitate the creation of psycho-physiological constructs to be modelled with imaging data. Particular emphasis will be given to music stimuli when studying high-order cognitive mechanisms, due to their ecological nature and quality to enable complex behavior compounded by discrete entities. I will also discuss how deep-behavioral phenotyping and individualized models applied to neuroimaging data can better account for the subject-specific organization of domain-general cognitive systems in the human brain. Finally, the accumulation of functional brain signatures brings the possibility to clarify relationships among tasks and create a univocal link between brain systems and mental functions through: (1) the development of ontologies proposing an organization of cognitive processes; and (2) brain-network taxonomies describing functional specialization. To this end, tools to improve commensurability in cognitive science are necessary, such as public repositories, ontology-based platforms and automated meta-analysis tools. I will thus discuss some brain-atlasing resources currently under development, and their applicability in cognitive as well as clinical neuroscience.
The binding of cosmological structures by massless topological defectsSérgio Sacani
Assuming spherical symmetry and weak field, it is shown that if one solves the Poisson equation or the Einstein field
equations sourced by a topological defect, i.e. a singularity of a very specific form, the result is a localized gravitational
field capable of driving flat rotation (i.e. Keplerian circular orbits at a constant speed for all radii) of test masses on a thin
spherical shell without any underlying mass. Moreover, a large-scale structure which exploits this solution by assembling
concentrically a number of such topological defects can establish a flat stellar or galactic rotation curve, and can also deflect
light in the same manner as an equipotential (isothermal) sphere. Thus, the need for dark matter or modified gravity theory is
mitigated, at least in part.
Phenomics assisted breeding in crop improvementIshaGoswami9
As the population is increasing and will reach about 9 billion upto 2050. Also due to climate change, it is difficult to meet the food requirement of such a large population. Facing the challenges presented by resource shortages, climate
change, and increasing global population, crop yield and quality need to be improved in a sustainable way over the coming decades. Genetic improvement by breeding is the best way to increase crop productivity. With the rapid progression of functional
genomics, an increasing number of crop genomes have been sequenced and dozens of genes influencing key agronomic traits have been identified. However, current genome sequence information has not been adequately exploited for understanding
the complex characteristics of multiple gene, owing to a lack of crop phenotypic data. Efficient, automatic, and accurate technologies and platforms that can capture phenotypic data that can
be linked to genomics information for crop improvement at all growth stages have become as important as genotyping. Thus,
high-throughput phenotyping has become the major bottleneck restricting crop breeding. Plant phenomics has been defined as the high-throughput, accurate acquisition and analysis of multi-dimensional phenotypes
during crop growing stages at the organism level, including the cell, tissue, organ, individual plant, plot, and field levels. With the rapid development of novel sensors, imaging technology,
and analysis methods, numerous infrastructure platforms have been developed for phenotyping.
Current Ms word generated power point presentation covers major details about the micronuclei test. It's significance and assays to conduct it. It is used to detect the micronuclei formation inside the cells of nearly every multicellular organism. It's formation takes place during chromosomal sepration at metaphase.
hematic appreciation test is a psychological assessment tool used to measure an individual's appreciation and understanding of specific themes or topics. This test helps to evaluate an individual's ability to connect different ideas and concepts within a given theme, as well as their overall comprehension and interpretation skills. The results of the test can provide valuable insights into an individual's cognitive abilities, creativity, and critical thinking skills
Or: Beyond linear.
Abstract: Equivariant neural networks are neural networks that incorporate symmetries. The nonlinear activation functions in these networks result in interesting nonlinear equivariant maps between simple representations, and motivate the key player of this talk: piecewise linear representation theory.
Disclaimer: No one is perfect, so please mind that there might be mistakes and typos.
dtubbenhauer@gmail.com
Corrected slides: dtubbenhauer.com/talks.html
Equivariant neural networks and representation theory
AlphaGo: Mastering the Game of Go with Deep Neural Networks and Tree Search
1. AlphaGo: Mastering the game of Go
with deep neural networks and tree search
Karel Ha
article by Google DeepMind
Optimization Seminar, 20th April 2016
5. Applications of AI
spam filters
recommender systems (Netflix, YouTube)
predictive text (Swiftkey)
1
6. Applications of AI
spam filters
recommender systems (Netflix, YouTube)
predictive text (Swiftkey)
audio recognition (Shazam, SoundHound)
1
7. Applications of AI
spam filters
recommender systems (Netflix, YouTube)
predictive text (Swiftkey)
audio recognition (Shazam, SoundHound)
self-driving cars
1
13. Game of Thrones Generated Character by Character
http://pjreddie.com/darknet/rnns-in-darknet/ 5
14. Game of Thrones Generated Character by Character
JON
He leaned close and onions, barefoot from
his shoulder. “I am not a purple girl,” he
said as he stood over him. “The sight of
you sell your father with you a little choice.”
“I say to swear up his sea or a boy of stone
and heart, down,” Lord Tywin said. “I love
your word or her to me.”
Darknet (on Linux)
http://pjreddie.com/darknet/rnns-in-darknet/ 5
15. Game of Thrones Generated Character by Character
JON
He leaned close and onions, barefoot from
his shoulder. “I am not a purple girl,” he
said as he stood over him. “The sight of
you sell your father with you a little choice.”
“I say to swear up his sea or a boy of stone
and heart, down,” Lord Tywin said. “I love
your word or her to me.”
Darknet (on Linux)
JON
Each in days and the woods followed his
king. “I understand.”
“I am not your sister Lord Robert?”
“The door was always some cellar to do his
being girls and the Magnar of Baratheon,
and there were thousands of every bite of
half the same as though he was not a great
knight should be seen, and not to look at
the Redwyne two thousand men.”
Darknet (on OS X)
http://pjreddie.com/darknet/rnns-in-darknet/ 5
16. DeepDrumpf: a Twitter bot / neural network which learned
the language of Donald Trump from his speeches
Hayes 2016 6
17. DeepDrumpf: a Twitter bot / neural network which learned
the language of Donald Trump from his speeches
We’ve got nuclear weapons that are obsolete. I’m going to create jobs just by making the worst thing ever.
Hayes 2016 6
18. DeepDrumpf: a Twitter bot / neural network which learned
the language of Donald Trump from his speeches
We’ve got nuclear weapons that are obsolete. I’m going to create jobs just by making the worst thing ever.
The biggest risk to the world, is me, believe it or not.
Hayes 2016 6
19. DeepDrumpf: a Twitter bot / neural network which learned
the language of Donald Trump from his speeches
We’ve got nuclear weapons that are obsolete. I’m going to create jobs just by making the worst thing ever.
The biggest risk to the world, is me, believe it or not.
I am what ISIS doesn’t need.
Hayes 2016 6
20. DeepDrumpf: a Twitter bot / neural network which learned
the language of Donald Trump from his speeches
We’ve got nuclear weapons that are obsolete. I’m going to create jobs just by making the worst thing ever.
The biggest risk to the world, is me, believe it or not.
I am what ISIS doesn’t need.
I’d like to beat that @HillaryClinton. She is a horror. I told my supporter Putin to say that all the time. He
has been amazing.
Hayes 2016 6
21. DeepDrumpf: a Twitter bot / neural network which learned
the language of Donald Trump from his speeches
We’ve got nuclear weapons that are obsolete. I’m going to create jobs just by making the worst thing ever.
The biggest risk to the world, is me, believe it or not.
I am what ISIS doesn’t need.
I’d like to beat that @HillaryClinton. She is a horror. I told my supporter Putin to say that all the time. He
has been amazing.
I buy Hillary, it’s beautiful and I’m happy about it.
Hayes 2016 6
22. Atari Player by Google DeepMind
https://youtu.be/0X-NdPtFKq0?t=21m13s
Mnih et al. 2015 7
29. Supervised Learning (SL)
1. data collection: Google Search, Facebook “Likes”, Siri, Netflix, YouTube views, LHC collisions, KGS Go
Server...
http://www.nickgillian.com/ 9
30. Supervised Learning (SL)
1. data collection: Google Search, Facebook “Likes”, Siri, Netflix, YouTube views, LHC collisions, KGS Go
Server...
2. training on training set
http://www.nickgillian.com/ 9
31. Supervised Learning (SL)
1. data collection: Google Search, Facebook “Likes”, Siri, Netflix, YouTube views, LHC collisions, KGS Go
Server...
2. training on training set
3. testing on testing set
http://www.nickgillian.com/ 9
32. Supervised Learning (SL)
1. data collection: Google Search, Facebook “Likes”, Siri, Netflix, YouTube views, LHC collisions, KGS Go
Server...
2. training on training set
3. testing on testing set
4. deployment
http://www.nickgillian.com/ 9
33. Supervised Learning (SL)
1. data collection: Google Search, Facebook “Likes”, Siri, Netflix, YouTube views, LHC collisions, KGS Go
Server...
2. training on training set
3. testing on testing set
4. deployment
http://www.nickgillian.com/ 9
34. Supervised Learning (SL)
1. data collection: Google Search, Facebook “Likes”, Siri, Netflix, YouTube views, LHC collisions, KGS Go
Server...
2. training on training set
3. testing on testing set
4. deployment
http://www.nickgillian.com/ 9
41. Underfitting and Overfitting
Beware of overfitting!
It is like learning for a mathematical exam by memorizing proofs.
https://www.researchgate.net/post/How_to_Avoid_Overfitting 12
46. Tree Search
Optimal value v∗(s) determines the outcome of the game:
from every board position or state s
Silver et al. 2016 14
47. Tree Search
Optimal value v∗(s) determines the outcome of the game:
from every board position or state s
under perfect play by all players.
Silver et al. 2016 14
48. Tree Search
Optimal value v∗(s) determines the outcome of the game:
from every board position or state s
under perfect play by all players.
Silver et al. 2016 14
49. Tree Search
Optimal value v∗(s) determines the outcome of the game:
from every board position or state s
under perfect play by all players.
It is computed by recursively traversing a search tree containing
approximately bd possible sequences of moves, where
Silver et al. 2016 14
50. Tree Search
Optimal value v∗(s) determines the outcome of the game:
from every board position or state s
under perfect play by all players.
It is computed by recursively traversing a search tree containing
approximately bd possible sequences of moves, where
b is the games breadth (number of legal moves per position)
Silver et al. 2016 14
51. Tree Search
Optimal value v∗(s) determines the outcome of the game:
from every board position or state s
under perfect play by all players.
It is computed by recursively traversing a search tree containing
approximately bd possible sequences of moves, where
b is the games breadth (number of legal moves per position)
d is its depth (game length)
Silver et al. 2016 14
52. Game tree of Go
Sizes of trees for various games:
chess: b ≈ 35, d ≈ 80
Go: b ≈ 250, d ≈ 150
Allis et al. 1994 15
53. Game tree of Go
Sizes of trees for various games:
chess: b ≈ 35, d ≈ 80
Go: b ≈ 250, d ≈ 150 ⇒ more positions than atoms in the
universe!
Allis et al. 1994 15
54. Game tree of Go
Sizes of trees for various games:
chess: b ≈ 35, d ≈ 80
Go: b ≈ 250, d ≈ 150 ⇒ more positions than atoms in the
universe!
That makes Go a googol
[10100
] times more complex
than chess.
https://deepmind.com/alpha-go.html
Allis et al. 1994 15
55. Game tree of Go
Sizes of trees for various games:
chess: b ≈ 35, d ≈ 80
Go: b ≈ 250, d ≈ 150 ⇒ more positions than atoms in the
universe!
That makes Go a googol
[10100
] times more complex
than chess.
https://deepmind.com/alpha-go.html
How to handle the size of the game tree?
Allis et al. 1994 15
56. Game tree of Go
Sizes of trees for various games:
chess: b ≈ 35, d ≈ 80
Go: b ≈ 250, d ≈ 150 ⇒ more positions than atoms in the
universe!
That makes Go a googol
[10100
] times more complex
than chess.
https://deepmind.com/alpha-go.html
How to handle the size of the game tree?
for the breadth: a neural network to select moves
Allis et al. 1994 15
57. Game tree of Go
Sizes of trees for various games:
chess: b ≈ 35, d ≈ 80
Go: b ≈ 250, d ≈ 150 ⇒ more positions than atoms in the
universe!
That makes Go a googol
[10100
] times more complex
than chess.
https://deepmind.com/alpha-go.html
How to handle the size of the game tree?
for the breadth: a neural network to select moves
for the depth: a neural network to evaluate the current
position
Allis et al. 1994 15
58. Game tree of Go
Sizes of trees for various games:
chess: b ≈ 35, d ≈ 80
Go: b ≈ 250, d ≈ 150 ⇒ more positions than atoms in the
universe!
That makes Go a googol
[10100
] times more complex
than chess.
https://deepmind.com/alpha-go.html
How to handle the size of the game tree?
for the breadth: a neural network to select moves
for the depth: a neural network to evaluate the current
position
for the tree traverse: Monte Carlo tree search (MCTS)
Allis et al. 1994 15
62. Neural Networks (NN): Inspiration
inspired by the neuronal structure of the mammalian cerebral
cortex
http://pages.cs.wisc.edu/~bolo/shipyard/neural/local.html 17
63. Neural Networks (NN): Inspiration
inspired by the neuronal structure of the mammalian cerebral
cortex
but on much smaller scales
http://pages.cs.wisc.edu/~bolo/shipyard/neural/local.html 17
64. Neural Networks (NN): Inspiration
inspired by the neuronal structure of the mammalian cerebral
cortex
but on much smaller scales
suitable to model systems with a high tolerance to error
http://pages.cs.wisc.edu/~bolo/shipyard/neural/local.html 17
65. Neural Networks (NN): Inspiration
inspired by the neuronal structure of the mammalian cerebral
cortex
but on much smaller scales
suitable to model systems with a high tolerance to error
e.g. audio or image recognition
http://pages.cs.wisc.edu/~bolo/shipyard/neural/local.html 17
69. Neural Networks: Modes
Two modes
feedforward for making predictions
backpropagation for learning
Dieterle 2003 18
70. Neural Networks: an Example of Feedforward
http://stevenmiller888.github.io/mind-how-to-build-a-neural-network/ 19
71. Gradient Descent in Neural Networks
http://pages.cs.wisc.edu/~bolo/shipyard/neural/local.html 20
72. Gradient Descent in Neural Networks
Motto: ”Learn by mistakes!”
http://pages.cs.wisc.edu/~bolo/shipyard/neural/local.html 20
73. Gradient Descent in Neural Networks
Motto: ”Learn by mistakes!”
However, error functions are not necessarily convex or so “smooth”.
http://pages.cs.wisc.edu/~bolo/shipyard/neural/local.html 20
75. Convolutional Neural Networks (CNN or ConvNet)
http://code.flickr.net/2014/10/20/introducing-flickr-park-or-bird/ 21
76. (Deep) Convolutional Neural Networks
The hierarchy of concepts is captured in the number of layers: the deep in “Deep Learning”.
http://pages.cs.wisc.edu/~bolo/shipyard/neural/local.html 22
77. (Deep) Convolutional Neural Networks
The hierarchy of concepts is captured in the number of layers: the deep in “Deep Learning”.
http://pages.cs.wisc.edu/~bolo/shipyard/neural/local.html 22
85. Rules of Go
Black versus White. Black starts the game.
the rule of liberty
23
86. Rules of Go
Black versus White. Black starts the game.
the rule of liberty
the “ko” rule
23
87. Rules of Go
Black versus White. Black starts the game.
the rule of liberty
the “ko” rule
Handicap for difference in ranks: Black can place 1 or more stones
in advance (compensation for White’s greater strength). 23
89. Scoring Rules: Area Scoring
A player’s score is:
the number of stones that the player has on the board
https://en.wikipedia.org/wiki/Go_(game) 24
90. Scoring Rules: Area Scoring
A player’s score is:
the number of stones that the player has on the board
plus the number of empty intersections surrounded by that
player’s stones
https://en.wikipedia.org/wiki/Go_(game) 24
91. Scoring Rules: Area Scoring
A player’s score is:
the number of stones that the player has on the board
plus the number of empty intersections surrounded by that
player’s stones
plus komi(dashi) points for the White player
which is a compensation for the first move advantage of the Black player
https://en.wikipedia.org/wiki/Go_(game) 24
98. SL Policy Network (1/2)
13-layer deep convolutional neural network
Silver et al. 2016 28
99. SL Policy Network (1/2)
13-layer deep convolutional neural network
goal: to predict expert human moves
Silver et al. 2016 28
100. SL Policy Network (1/2)
13-layer deep convolutional neural network
goal: to predict expert human moves
task of classification
Silver et al. 2016 28
101. SL Policy Network (1/2)
13-layer deep convolutional neural network
goal: to predict expert human moves
task of classification
trained from 30 millions positions from the KGS Go Server
Silver et al. 2016 28
102. SL Policy Network (1/2)
13-layer deep convolutional neural network
goal: to predict expert human moves
task of classification
trained from 30 millions positions from the KGS Go Server
stochastic gradient ascent:
∆σ ∝
∂ log pσ(a|s)
∂σ
(to maximize the likelihood of the human move a selected in state s)
Silver et al. 2016 28
103. SL Policy Network (1/2)
13-layer deep convolutional neural network
goal: to predict expert human moves
task of classification
trained from 30 millions positions from the KGS Go Server
stochastic gradient ascent:
∆σ ∝
∂ log pσ(a|s)
∂σ
(to maximize the likelihood of the human move a selected in state s)
Silver et al. 2016 28
104. SL Policy Network (1/2)
13-layer deep convolutional neural network
goal: to predict expert human moves
task of classification
trained from 30 millions positions from the KGS Go Server
stochastic gradient ascent:
∆σ ∝
∂ log pσ(a|s)
∂σ
(to maximize the likelihood of the human move a selected in state s)
Results:
Silver et al. 2016 28
105. SL Policy Network (1/2)
13-layer deep convolutional neural network
goal: to predict expert human moves
task of classification
trained from 30 millions positions from the KGS Go Server
stochastic gradient ascent:
∆σ ∝
∂ log pσ(a|s)
∂σ
(to maximize the likelihood of the human move a selected in state s)
Results:
44.4% accuracy (the state-of-the-art from other groups)
Silver et al. 2016 28
106. SL Policy Network (1/2)
13-layer deep convolutional neural network
goal: to predict expert human moves
task of classification
trained from 30 millions positions from the KGS Go Server
stochastic gradient ascent:
∆σ ∝
∂ log pσ(a|s)
∂σ
(to maximize the likelihood of the human move a selected in state s)
Results:
44.4% accuracy (the state-of-the-art from other groups)
55.7% accuracy (raw board position + move history as input)
Silver et al. 2016 28
107. SL Policy Network (1/2)
13-layer deep convolutional neural network
goal: to predict expert human moves
task of classification
trained from 30 millions positions from the KGS Go Server
stochastic gradient ascent:
∆σ ∝
∂ log pσ(a|s)
∂σ
(to maximize the likelihood of the human move a selected in state s)
Results:
44.4% accuracy (the state-of-the-art from other groups)
55.7% accuracy (raw board position + move history as input)
57.0% accuracy (all input features)
Silver et al. 2016 28
108. SL Policy Network (2/2)
Small improvements in accuracy led to large improvements
in playing strength
Silver et al. 2016 29
109. Training the (Deep Convolutional) Neural Networks
Silver et al. 2016 30
111. Rollout Policy
Rollout policy pπ(a|s) is faster but less accurate than SL
policy network.
accuracy of 24.2%
Silver et al. 2016 31
112. Rollout Policy
Rollout policy pπ(a|s) is faster but less accurate than SL
policy network.
accuracy of 24.2%
It takes 2µs to select an action, compared to 3 ms in case
of SL policy network.
Silver et al. 2016 31
113. Training the (Deep Convolutional) Neural Networks
Silver et al. 2016 32
114. RL Policy Network (1/2)
identical in structure to the SL policy network
Silver et al. 2016 33
115. RL Policy Network (1/2)
identical in structure to the SL policy network
goal: to win in the games of self-play
Silver et al. 2016 33
116. RL Policy Network (1/2)
identical in structure to the SL policy network
goal: to win in the games of self-play
task of classification
Silver et al. 2016 33
117. RL Policy Network (1/2)
identical in structure to the SL policy network
goal: to win in the games of self-play
task of classification
weights ρ initialized to the same values, ρ := σ
Silver et al. 2016 33
118. RL Policy Network (1/2)
identical in structure to the SL policy network
goal: to win in the games of self-play
task of classification
weights ρ initialized to the same values, ρ := σ
games of self-play
Silver et al. 2016 33
119. RL Policy Network (1/2)
identical in structure to the SL policy network
goal: to win in the games of self-play
task of classification
weights ρ initialized to the same values, ρ := σ
games of self-play
between the current RL policy network and a randomly
selected previous iteration
Silver et al. 2016 33
120. RL Policy Network (1/2)
identical in structure to the SL policy network
goal: to win in the games of self-play
task of classification
weights ρ initialized to the same values, ρ := σ
games of self-play
between the current RL policy network and a randomly
selected previous iteration
to prevent overfitting to the current policy
Silver et al. 2016 33
121. RL Policy Network (1/2)
identical in structure to the SL policy network
goal: to win in the games of self-play
task of classification
weights ρ initialized to the same values, ρ := σ
games of self-play
between the current RL policy network and a randomly
selected previous iteration
to prevent overfitting to the current policy
stochastic gradient ascent:
∆ρ ∝
∂ log pρ(at|st)
∂ρ
zt
at time step t, where reward function zt is +1 for winning and −1 for losing.
Silver et al. 2016 33
122. RL Policy Network (1/2)
identical in structure to the SL policy network
goal: to win in the games of self-play
task of classification
weights ρ initialized to the same values, ρ := σ
games of self-play
between the current RL policy network and a randomly
selected previous iteration
to prevent overfitting to the current policy
stochastic gradient ascent:
∆ρ ∝
∂ log pρ(at|st)
∂ρ
zt
at time step t, where reward function zt is +1 for winning and −1 for losing.
Silver et al. 2016 33
123. RL Policy Network (1/2)
identical in structure to the SL policy network
goal: to win in the games of self-play
task of classification
weights ρ initialized to the same values, ρ := σ
games of self-play
between the current RL policy network and a randomly
selected previous iteration
to prevent overfitting to the current policy
stochastic gradient ascent:
∆ρ ∝
∂ log pρ(at|st)
∂ρ
zt
at time step t, where reward function zt is +1 for winning and −1 for losing.
Silver et al. 2016 33
124. RL Policy Network (2/2)
Results (by sampling each move at ∼ pρ(·|st)):
Silver et al. 2016 34
125. RL Policy Network (2/2)
Results (by sampling each move at ∼ pρ(·|st)):
80% of win rate against the SL policy network
Silver et al. 2016 34
126. RL Policy Network (2/2)
Results (by sampling each move at ∼ pρ(·|st)):
80% of win rate against the SL policy network
85% of win rate against the strongest open-source Go
program, Pachi (Baudiˇs and Gailly 2011)
Silver et al. 2016 34
127. RL Policy Network (2/2)
Results (by sampling each move at ∼ pρ(·|st)):
80% of win rate against the SL policy network
85% of win rate against the strongest open-source Go
program, Pachi (Baudiˇs and Gailly 2011)
The previous state-of-the-art, based only on SL of CNN:
11% of “win” rate against Pachi
Silver et al. 2016 34
128. Training the (Deep Convolutional) Neural Networks
Silver et al. 2016 35
129. Value Network (1/2)
similar architecture to the policy network, but outputs a single
prediction instead of a probability distribution
Silver et al. 2016 36
130. Value Network (1/2)
similar architecture to the policy network, but outputs a single
prediction instead of a probability distribution
goal: to estimate a value function
vp
(s) = E[zt|st = s, at...T ∼ p]
that predicts the outcome from position s (of games played
by using policy p)
Silver et al. 2016 36
131. Value Network (1/2)
similar architecture to the policy network, but outputs a single
prediction instead of a probability distribution
goal: to estimate a value function
vp
(s) = E[zt|st = s, at...T ∼ p]
that predicts the outcome from position s (of games played
by using policy p)
Double approximation: vθ(s) ≈ vpρ (s) ≈ v∗(s).
Silver et al. 2016 36
132. Value Network (1/2)
similar architecture to the policy network, but outputs a single
prediction instead of a probability distribution
goal: to estimate a value function
vp
(s) = E[zt|st = s, at...T ∼ p]
that predicts the outcome from position s (of games played
by using policy p)
Double approximation: vθ(s) ≈ vpρ (s) ≈ v∗(s).
task of regression
Silver et al. 2016 36
133. Value Network (1/2)
similar architecture to the policy network, but outputs a single
prediction instead of a probability distribution
goal: to estimate a value function
vp
(s) = E[zt|st = s, at...T ∼ p]
that predicts the outcome from position s (of games played
by using policy p)
Double approximation: vθ(s) ≈ vpρ (s) ≈ v∗(s).
task of regression
stochastic gradient descent:
∆θ ∝
∂vθ(s)
∂θ
(z − vθ(s))
(to minimize the mean squared error (MSE) between the predicted vθ(s) and the true z)
Silver et al. 2016 36
135. Value Network (2/2)
Beware of overfitting!
Consecutive positions are strongly correlated.
Silver et al. 2016 37
136. Value Network (2/2)
Beware of overfitting!
Consecutive positions are strongly correlated.
Value network memorized the game outcomes, rather than
generalizing to new positions.
Silver et al. 2016 37
137. Value Network (2/2)
Beware of overfitting!
Consecutive positions are strongly correlated.
Value network memorized the game outcomes, rather than
generalizing to new positions.
Solution: generate 30 million (new) positions, each sampled
from a seperate game
Silver et al. 2016 37
138. Value Network (2/2)
Beware of overfitting!
Consecutive positions are strongly correlated.
Value network memorized the game outcomes, rather than
generalizing to new positions.
Solution: generate 30 million (new) positions, each sampled
from a seperate game
almost the accuracy of Monte Carlo rollouts (using pρ), but
15000 times less computation!
Silver et al. 2016 37
139. Evaluation Accuracy in Various Stages of a Game
Move number is the number of moves that had been played in the given position.
Silver et al. 2016 38
140. Evaluation Accuracy in Various Stages of a Game
Move number is the number of moves that had been played in the given position.
Each position evaluated by:
forward pass of the value network vθ
Silver et al. 2016 38
141. Evaluation Accuracy in Various Stages of a Game
Move number is the number of moves that had been played in the given position.
Each position evaluated by:
forward pass of the value network vθ
100 rollouts, played out using the corresponding policy
Silver et al. 2016 38
142. Elo Ratings for Various Combinations of Networks
Silver et al. 2016 39
144. MCTS Algorithm
The next action is selected by lookahead search, using simulation:
Silver et al. 2016 40
145. MCTS Algorithm
The next action is selected by lookahead search, using simulation:
1. selection phase
Silver et al. 2016 40
146. MCTS Algorithm
The next action is selected by lookahead search, using simulation:
1. selection phase
2. expansion phase
Silver et al. 2016 40
147. MCTS Algorithm
The next action is selected by lookahead search, using simulation:
1. selection phase
2. expansion phase
3. evaluation phase
Silver et al. 2016 40
148. MCTS Algorithm
The next action is selected by lookahead search, using simulation:
1. selection phase
2. expansion phase
3. evaluation phase
4. backup phase (at end of all simulations)
Silver et al. 2016 40
149. MCTS Algorithm
The next action is selected by lookahead search, using simulation:
1. selection phase
2. expansion phase
3. evaluation phase
4. backup phase (at end of all simulations)
Silver et al. 2016 40
150. MCTS Algorithm
The next action is selected by lookahead search, using simulation:
1. selection phase
2. expansion phase
3. evaluation phase
4. backup phase (at end of all simulations)
Each edge (s, a) keeps:
action value Q(s, a)
Silver et al. 2016 40
151. MCTS Algorithm
The next action is selected by lookahead search, using simulation:
1. selection phase
2. expansion phase
3. evaluation phase
4. backup phase (at end of all simulations)
Each edge (s, a) keeps:
action value Q(s, a)
visit count N(s, a)
Silver et al. 2016 40
152. MCTS Algorithm
The next action is selected by lookahead search, using simulation:
1. selection phase
2. expansion phase
3. evaluation phase
4. backup phase (at end of all simulations)
Each edge (s, a) keeps:
action value Q(s, a)
visit count N(s, a)
prior probability P(s, a) (from SL policy network pσ)
Silver et al. 2016 40
153. MCTS Algorithm
The next action is selected by lookahead search, using simulation:
1. selection phase
2. expansion phase
3. evaluation phase
4. backup phase (at end of all simulations)
Each edge (s, a) keeps:
action value Q(s, a)
visit count N(s, a)
prior probability P(s, a) (from SL policy network pσ)
Silver et al. 2016 40
154. MCTS Algorithm
The next action is selected by lookahead search, using simulation:
1. selection phase
2. expansion phase
3. evaluation phase
4. backup phase (at end of all simulations)
Each edge (s, a) keeps:
action value Q(s, a)
visit count N(s, a)
prior probability P(s, a) (from SL policy network pσ)
The tree is traversed by simulation (descending the tree) from the
root state.
Silver et al. 2016 40
156. MCTS Algorithm: Selection
At each time step t, an action at is selected from state st
at = arg max
a
(Q(st , a) + u(st , a))
Silver et al. 2016 41
157. MCTS Algorithm: Selection
At each time step t, an action at is selected from state st
at = arg max
a
(Q(st , a) + u(st , a))
where bonus
u(st , a) ∝
P(s, a)
1 + N(s, a)
Silver et al. 2016 41
159. MCTS Algorithm: Expansion
A leaf position may be expanded (just once) by the SL policy network pσ.
Silver et al. 2016 42
160. MCTS Algorithm: Expansion
A leaf position may be expanded (just once) by the SL policy network pσ.
The output probabilities are stored as priors P(s, a) := pσ(a|s).
Silver et al. 2016 42
163. MCTS: Evaluation
evaluation from the value network vθ(s)
evaluation by the outcome z using the fast rollout policy pπ until the end of game
Silver et al. 2016 43
164. MCTS: Evaluation
evaluation from the value network vθ(s)
evaluation by the outcome z using the fast rollout policy pπ until the end of game
Silver et al. 2016 43
165. MCTS: Evaluation
evaluation from the value network vθ(s)
evaluation by the outcome z using the fast rollout policy pπ until the end of game
Using a mixing parameter λ, the final leaf evaluation V (s) is
V (s) = (1 − λ)vθ(s) + λz
Silver et al. 2016 43
166. MCTS: Backup
At the end of simulation, each traversed edge is updated by accumulating:
the action values Q
Silver et al. 2016 44
167. MCTS: Backup
At the end of simulation, each traversed edge is updated by accumulating:
the action values Q
visit counts N
Silver et al. 2016 44
168. Once the search is complete, the algorithm
chooses the most visited move from the root
position.
Silver et al. 2016 44
170. Principal Variation (Path with Maximum Visit Count)
The moves are presented in a numbered sequence.
Silver et al. 2016 46
171. Principal Variation (Path with Maximum Visit Count)
The moves are presented in a numbered sequence.
AlphaGo selected the move indicated by the red circle;
Silver et al. 2016 46
172. Principal Variation (Path with Maximum Visit Count)
The moves are presented in a numbered sequence.
AlphaGo selected the move indicated by the red circle;
Fan Hui responded with the move indicated by the white square;
Silver et al. 2016 46
173. Principal Variation (Path with Maximum Visit Count)
The moves are presented in a numbered sequence.
AlphaGo selected the move indicated by the red circle;
Fan Hui responded with the move indicated by the white square;
in his post-game commentary, he preferred the move (labelled 1) predicted by AlphaGo.
Silver et al. 2016 46
191. Fan Hui
professional 2 dan
European Go Champion in 2013, 2014 and 2015
https://en.wikipedia.org/wiki/Fan_Hui 50
192. Fan Hui
professional 2 dan
European Go Champion in 2013, 2014 and 2015
European Professional Go Champion in 2016
https://en.wikipedia.org/wiki/Fan_Hui 50
193. Fan Hui
professional 2 dan
European Go Champion in 2013, 2014 and 2015
European Professional Go Champion in 2016
biological neural network:
https://en.wikipedia.org/wiki/Fan_Hui 50
194. Fan Hui
professional 2 dan
European Go Champion in 2013, 2014 and 2015
European Professional Go Champion in 2016
biological neural network:
100 billion neurons
https://en.wikipedia.org/wiki/Fan_Hui 50
195. Fan Hui
professional 2 dan
European Go Champion in 2013, 2014 and 2015
European Professional Go Champion in 2016
biological neural network:
100 billion neurons
100 up to 1,000 trillion neuronal connections
https://en.wikipedia.org/wiki/Fan_Hui 50
197. AlphaGo versus Fan Hui
AlphaGo won 5:0 in a formal match on October 2015.
51
198. AlphaGo versus Fan Hui
AlphaGo won 5:0 in a formal match on October 2015.
[AlphaGo] is very strong and stable, it seems
like a wall. ... I know AlphaGo is a computer,
but if no one told me, maybe I would think
the player was a little strange, but a very
strong player, a real person.
Fan Hui 51
199. Lee Sedol “The Strong Stone”
https://en.wikipedia.org/wiki/Lee_Sedol 52
200. Lee Sedol “The Strong Stone”
professional 9 dan
https://en.wikipedia.org/wiki/Lee_Sedol 52
201. Lee Sedol “The Strong Stone”
professional 9 dan
the 2nd in international titles
https://en.wikipedia.org/wiki/Lee_Sedol 52
202. Lee Sedol “The Strong Stone”
professional 9 dan
the 2nd in international titles
the 5th youngest (12 years 4 months) to become
a professional Go player in South Korean history
https://en.wikipedia.org/wiki/Lee_Sedol 52
203. Lee Sedol “The Strong Stone”
professional 9 dan
the 2nd in international titles
the 5th youngest (12 years 4 months) to become
a professional Go player in South Korean history
Lee Sedol would win 97 out of 100 games against Fan Hui.
https://en.wikipedia.org/wiki/Lee_Sedol 52
204. Lee Sedol “The Strong Stone”
professional 9 dan
the 2nd in international titles
the 5th youngest (12 years 4 months) to become
a professional Go player in South Korean history
Lee Sedol would win 97 out of 100 games against Fan Hui.
biological neural network comparable to Fan Hui’s (in number
of neurons and connections)
https://en.wikipedia.org/wiki/Lee_Sedol 52
205. I heard Google DeepMind’s AI is surprisingly
strong and getting stronger, but I am
confident that I can win, at least this time.
Lee Sedol
52
206. I heard Google DeepMind’s AI is surprisingly
strong and getting stronger, but I am
confident that I can win, at least this time.
Lee Sedol
...even beating AlphaGo by 4:1 may allow
the Google DeepMind team to claim its de
facto victory and the defeat of him
[Lee Sedol], or even humankind.
interview in JTBC
Newsroom
52
207. I heard Google DeepMind’s AI is surprisingly
strong and getting stronger, but I am
confident that I can win, at least this time.
Lee Sedol
...even beating AlphaGo by 4:1 may allow
the Google DeepMind team to claim its de
facto victory and the defeat of him
[Lee Sedol], or even humankind.
interview in JTBC
Newsroom
52
208. AlphaGo versus Lee Sedol
https://en.wikipedia.org/wiki/AlphaGo_versus_Lee_Sedol 53
209. AlphaGo versus Lee Sedol
In March 2016 AlphaGo won 4:1 against the legendary Lee Sedol.
https://en.wikipedia.org/wiki/AlphaGo_versus_Lee_Sedol 53
210. AlphaGo versus Lee Sedol
In March 2016 AlphaGo won 4:1 against the legendary Lee Sedol.
AlphaGo won all but the 4th game; all games were won
by resignation.
https://en.wikipedia.org/wiki/AlphaGo_versus_Lee_Sedol 53
211. AlphaGo versus Lee Sedol
In March 2016 AlphaGo won 4:1 against the legendary Lee Sedol.
AlphaGo won all but the 4th game; all games were won
by resignation.
The winner of the match was slated to win $1 million.
https://en.wikipedia.org/wiki/AlphaGo_versus_Lee_Sedol 53
212. AlphaGo versus Lee Sedol
In March 2016 AlphaGo won 4:1 against the legendary Lee Sedol.
AlphaGo won all but the 4th game; all games were won
by resignation.
The winner of the match was slated to win $1 million.
Since AlphaGo won, Google DeepMind stated that the prize will be
donated to charities, including UNICEF, and Go organisations.
https://en.wikipedia.org/wiki/AlphaGo_versus_Lee_Sedol 53
213. AlphaGo versus Lee Sedol
In March 2016 AlphaGo won 4:1 against the legendary Lee Sedol.
AlphaGo won all but the 4th game; all games were won
by resignation.
The winner of the match was slated to win $1 million.
Since AlphaGo won, Google DeepMind stated that the prize will be
donated to charities, including UNICEF, and Go organisations.
Lee received $170,000 ($150,000 for participating in all the five
games, and an additional $20,000 for each game won).
https://en.wikipedia.org/wiki/AlphaGo_versus_Lee_Sedol 53
217. AlphaGo versus Ke Jie?
professional 9 dan
https://en.wikipedia.org/wiki/Ke_Jie 54
218. AlphaGo versus Ke Jie?
professional 9 dan
the 1st in (unofficial) world ranking list
https://en.wikipedia.org/wiki/Ke_Jie 54
219. AlphaGo versus Ke Jie?
professional 9 dan
the 1st in (unofficial) world ranking list
the youngest player to win 3 major international tournaments
https://en.wikipedia.org/wiki/Ke_Jie 54
220. AlphaGo versus Ke Jie?
professional 9 dan
the 1st in (unofficial) world ranking list
the youngest player to win 3 major international tournaments
head-to-head record against Lee Sedol 8:2
https://en.wikipedia.org/wiki/Ke_Jie 54
221. AlphaGo versus Ke Jie?
professional 9 dan
the 1st in (unofficial) world ranking list
the youngest player to win 3 major international tournaments
head-to-head record against Lee Sedol 8:2
biological neural network comparable to Fan Hui’s, and thus
by transitivity, also comparable to Lee Sedol’s
https://en.wikipedia.org/wiki/Ke_Jie 54
222. I believe I can beat it. Machines can be very
strong in many aspects but still have
loopholes in certain calculations.
Ke Jie
54
223. I believe I can beat it. Machines can be very
strong in many aspects but still have
loopholes in certain calculations.
Ke Jie
Now facing AlphaGo, I do not feel the same
strong instinct of victory when I play a
human player, but I still believe I have the
advantage against it. It’s 60 percent in
favor of me.
Ke Jie
54
224. I believe I can beat it. Machines can be very
strong in many aspects but still have
loopholes in certain calculations.
Ke Jie
Now facing AlphaGo, I do not feel the same
strong instinct of victory when I play a
human player, but I still believe I have the
advantage against it. It’s 60 percent in
favor of me.
Ke Jie
Even though AlphaGo may have defeated
Lee Sedol, it won’t beat me.
Ke Jie
54
228. Difficulties of Go
challenging decision-making
intractable search space
complex optimal solution
It appears infeasible to directly approximate using a policy or value function!
Silver et al. 2016 55
231. AlphaGo: summary
Monte Carlo tree search
effective move selection and position evaluation
through deep convolutional neural networks
Silver et al. 2016 56
232. AlphaGo: summary
Monte Carlo tree search
effective move selection and position evaluation
through deep convolutional neural networks
trained by novel combination of supervised and reinforcement
learning
Silver et al. 2016 56
233. AlphaGo: summary
Monte Carlo tree search
effective move selection and position evaluation
through deep convolutional neural networks
trained by novel combination of supervised and reinforcement
learning
new search algorithm combining
Silver et al. 2016 56
234. AlphaGo: summary
Monte Carlo tree search
effective move selection and position evaluation
through deep convolutional neural networks
trained by novel combination of supervised and reinforcement
learning
new search algorithm combining
neural network evaluation
Silver et al. 2016 56
235. AlphaGo: summary
Monte Carlo tree search
effective move selection and position evaluation
through deep convolutional neural networks
trained by novel combination of supervised and reinforcement
learning
new search algorithm combining
neural network evaluation
Monte Carlo rollouts
Silver et al. 2016 56
236. AlphaGo: summary
Monte Carlo tree search
effective move selection and position evaluation
through deep convolutional neural networks
trained by novel combination of supervised and reinforcement
learning
new search algorithm combining
neural network evaluation
Monte Carlo rollouts
scalable implementation
Silver et al. 2016 56
237. AlphaGo: summary
Monte Carlo tree search
effective move selection and position evaluation
through deep convolutional neural networks
trained by novel combination of supervised and reinforcement
learning
new search algorithm combining
neural network evaluation
Monte Carlo rollouts
scalable implementation
multi-threaded simulations on CPUs
Silver et al. 2016 56
238. AlphaGo: summary
Monte Carlo tree search
effective move selection and position evaluation
through deep convolutional neural networks
trained by novel combination of supervised and reinforcement
learning
new search algorithm combining
neural network evaluation
Monte Carlo rollouts
scalable implementation
multi-threaded simulations on CPUs
parallel GPU computations
Silver et al. 2016 56
239. AlphaGo: summary
Monte Carlo tree search
effective move selection and position evaluation
through deep convolutional neural networks
trained by novel combination of supervised and reinforcement
learning
new search algorithm combining
neural network evaluation
Monte Carlo rollouts
scalable implementation
multi-threaded simulations on CPUs
parallel GPU computations
distributed version over multiple machines
Silver et al. 2016 56
241. Novel approach
During the match against Fan Hui, AlphaGo evaluated thousands
of times fewer positions than Deep Blue against Kasparov.
Silver et al. 2016 57
242. Novel approach
During the match against Fan Hui, AlphaGo evaluated thousands
of times fewer positions than Deep Blue against Kasparov.
It compensated this by:
selecting those positions more intelligently (policy network)
Silver et al. 2016 57
243. Novel approach
During the match against Fan Hui, AlphaGo evaluated thousands
of times fewer positions than Deep Blue against Kasparov.
It compensated this by:
selecting those positions more intelligently (policy network)
evaluating them more precisely (value network)
Silver et al. 2016 57
244. Novel approach
During the match against Fan Hui, AlphaGo evaluated thousands
of times fewer positions than Deep Blue against Kasparov.
It compensated this by:
selecting those positions more intelligently (policy network)
evaluating them more precisely (value network)
Silver et al. 2016 57
245. Novel approach
During the match against Fan Hui, AlphaGo evaluated thousands
of times fewer positions than Deep Blue against Kasparov.
It compensated this by:
selecting those positions more intelligently (policy network)
evaluating them more precisely (value network)
Deep Blue relied on a handcrafted evaluation function.
Silver et al. 2016 57
246. Novel approach
During the match against Fan Hui, AlphaGo evaluated thousands
of times fewer positions than Deep Blue against Kasparov.
It compensated this by:
selecting those positions more intelligently (policy network)
evaluating them more precisely (value network)
Deep Blue relied on a handcrafted evaluation function.
AlphaGo was trained directly and automatically from gameplay.
It used general-purpose learning.
Silver et al. 2016 57
247. Novel approach
During the match against Fan Hui, AlphaGo evaluated thousands
of times fewer positions than Deep Blue against Kasparov.
It compensated this by:
selecting those positions more intelligently (policy network)
evaluating them more precisely (value network)
Deep Blue relied on a handcrafted evaluation function.
AlphaGo was trained directly and automatically from gameplay.
It used general-purpose learning.
This approach is not specific to the game of Go. The algorithm
can be used for much wider class of (so far seemingly)
intractable problems in AI!
Silver et al. 2016 57
251. Selection of Moves by the SL Policy Network
move probabilities taken directly from the SL policy network pσ (reported as a percentage if above 0.1%).
Silver et al. 2016
252. Selection of Moves by the Value Network
evaluation of all successors s of the root position s, using vθ(s)
Silver et al. 2016
253. Tree Evaluation from Value Network
action values Q(s, a) for each tree-edge (s, a) from root position s (averaged over value network evaluations only)
Silver et al. 2016
254. Tree Evaluation from Rollouts
action values Q(s, a), averaged over rollout evaluations only
Silver et al. 2016
255. Results of a tournament between different Go programs
Silver et al. 2016
256. Results of a tournament between AlphaGo and distributed Al-
phaGo, testing scalability with hardware
Silver et al. 2016
262. AlphaGo versus Lee Sedol: Game 1
https://youtu.be/vFr3K2DORc8
https://en.wikipedia.org/wiki/AlphaGo_versus_Lee_Sedol
263. AlphaGo versus Lee Sedol: Game 2 (1/2)
https://youtu.be/l-GsfyVCBu0
https://en.wikipedia.org/wiki/AlphaGo_versus_Lee_Sedol
264. AlphaGo versus Lee Sedol: Game 2 (2/2)
https://en.wikipedia.org/wiki/AlphaGo_versus_Lee_Sedol
265. AlphaGo versus Lee Sedol: Game 3
https://youtu.be/qUAmTYHEyM8
https://en.wikipedia.org/wiki/AlphaGo_versus_Lee_Sedol
266. AlphaGo versus Lee Sedol: Game 4
https://youtu.be/yCALyQRN3hw
https://en.wikipedia.org/wiki/AlphaGo_versus_Lee_Sedol
267. AlphaGo versus Lee Sedol: Game 5 (1/2)
https://youtu.be/mzpW10DPHeQ
https://en.wikipedia.org/wiki/AlphaGo_versus_Lee_Sedol
268. AlphaGo versus Lee Sedol: Game 5 (2/2)
https://en.wikipedia.org/wiki/AlphaGo_versus_Lee_Sedol
269. Further Reading I
AlphaGo:
Google Research Blog
http://googleresearch.blogspot.cz/2016/01/alphago-mastering-ancient-game-of-go.html
an article in Nature
http://www.nature.com/news/google-ai-algorithm-masters-ancient-game-of-go-1.19234
a reddit article claiming that AlphaGo is even stronger than it appears to be:
“AlphaGo would rather win by less points, but with higher probability.”
https://www.reddit.com/r/baduk/comments/49y17z/the_true_strength_of_alphago/
a video of how AlphaGo works (put in layman’s terms) https://youtu.be/qWcfiPi9gUU
Articles by Google DeepMind:
Atari player: a DeepRL system which combines Deep Neural Networks with Reinforcement Learning (Mnih
et al. 2015)
Neural Turing Machines (Graves, Wayne, and Danihelka 2014)
Artificial Intelligence:
Artificial Intelligence course at MIT
http://ocw.mit.edu/courses/electrical-engineering-and-computer-science/
6-034-artificial-intelligence-fall-2010/index.htm
270. Further Reading II
Introduction to Artificial Intelligence at Udacity
https://www.udacity.com/course/intro-to-artificial-intelligence--cs271
General Game Playing course https://www.coursera.org/course/ggp
Singularity http://waitbutwhy.com/2015/01/artificial-intelligence-revolution-1.html + Part 2
The Singularity Is Near (Kurzweil 2005)
Combinatorial Game Theory (founded by John H. Conway to study endgames in Go):
Combinatorial Game Theory course https://www.coursera.org/learn/combinatorial-game-theory
On Numbers and Games (Conway 1976)
Computer Go as a sum of local games: an application of combinatorial game theory (M¨uller 1995)
Chess:
Deep Blue beats G. Kasparov in 1997 https://youtu.be/NJarxpYyoFI
Machine Learning:
Machine Learning course
https://youtu.be/hPKJBXkyTK://www.coursera.org/learn/machine-learning/
Reinforcement Learning http://reinforcementlearning.ai-depot.com/
Deep Learning (LeCun, Bengio, and Hinton 2015)
271. Further Reading III
Deep Learning course https://www.udacity.com/course/deep-learning--ud730
Two Minute Papers https://www.youtube.com/user/keeroyz
Applications of Deep Learning https://youtu.be/hPKJBXkyTKM
Neuroscience:
http://www.brainfacts.org/
272. References I
Allis, Louis Victor et al. (1994). Searching for solutions in games and artificial intelligence. Ponsen & Looijen.
Baudiˇs, Petr and Jean-loup Gailly (2011). “Pachi: State of the art open source Go program”. In: Advances in
Computer Games. Springer, pp. 24–38.
Bowling, Michael et al. (2015). “Heads-up limit holdem poker is solved”. In: Science 347.6218, pp. 145–149. url:
http://poker.cs.ualberta.ca/15science.html.
Champandard, Alex J (2016). “Semantic Style Transfer and Turning Two-Bit Doodles into Fine Artworks”. In:
arXiv preprint arXiv:1603.01768.
Conway, John Horton (1976). “On Numbers and Games”. In: London Mathematical Society Monographs 6.
Dieterle, Frank Jochen (2003). “Multianalyte quantifications by means of integration of artificial neural networks,
genetic algorithms and chemometrics for time-resolved analytical data”. PhD thesis. Universit¨at T¨ubingen.
Gatys, Leon A., Alexander S. Ecker, and Matthias Bethge (2015). “A Neural Algorithm of Artistic Style”. In:
CoRR abs/1508.06576. url: http://arxiv.org/abs/1508.06576.
Graves, Alex, Greg Wayne, and Ivo Danihelka (2014). “Neural turing machines”. In: arXiv preprint
arXiv:1410.5401.
Hayes, Bradley (2016). url: https://twitter.com/deepdrumpf.
Karpathy, Andrej (2015). The Unreasonable Effectiveness of Recurrent Neural Networks. url:
http://karpathy.github.io/2015/05/21/rnn-effectiveness/ (visited on 04/01/2016).
273. References II
Kurzweil, Ray (2005). The singularity is near: When humans transcend biology. Penguin.
LeCun, Yann, Yoshua Bengio, and Geoffrey Hinton (2015). “Deep learning”. In: Nature 521.7553, pp. 436–444.
Li, Chuan and Michael Wand (2016). “Combining Markov Random Fields and Convolutional Neural Networks for
Image Synthesis”. In: CoRR abs/1601.04589. url: http://arxiv.org/abs/1601.04589.
Mnih, Volodymyr et al. (2015). “Human-level control through deep reinforcement learning”. In: Nature 518.7540,
pp. 529–533. url:
https://storage.googleapis.com/deepmind-data/assets/papers/DeepMindNature14236Paper.pdf.
M¨uller, Martin (1995). “Computer Go as a sum of local games: an application of combinatorial game theory”.
PhD thesis. TU Graz.
Silver, David et al. (2016). “Mastering the game of Go with deep neural networks and tree search”. In: Nature
529.7587, pp. 484–489.