the presentation of the article "Mastering the game of Go with deep neural networks and tree search" given at the Spring School of Combinatorics 2016
Notes:
- All URLs are clickable.
- All citations are clickable (when hovered over the "year" part of "[author year]").
- To download without a SlideShare account, use https://www.dropbox.com/s/4njuiaaou1po0y4/AlphaGo.pdf?dl=0
- The corresponding handout is available at http://www.slideshare.net/KarelHa1/mastering-the-game-of-go-with-deep-neural-networks-and-tree-search-handout
- The video is available at https://youtu.be/Lso2kE58JrI
- The source code is available at https://github.com/mathemage/AlphaGo-presentation
AlphaGo: Mastering the Game of Go with Deep Neural Networks and Tree SearchKarel Ha
the presentation of the article "Mastering the game of Go with deep neural networks and tree search" given at the Optimization Seminar 2015/2016
Notes:
- All URLs are clickable.
- All citations are clickable (when hovered over the "year" part of "[author year]").
- To download without a SlideShare account, use https://www.dropbox.com/s/p4rnlhoewbedkjg/AlphaGo.pdf?dl=0
- The corresponding leaflet is available at http://www.slideshare.net/KarelHa1/leaflet-for-the-talk-on-alphago
- The source code is available at https://github.com/mathemage/AlphaGo-presentation
The document discusses how AlphaGo, a computer program developed by DeepMind, was able to defeat world champion Lee Sedol at the game of Go. It achieved this through a combination of deep learning and tree search techniques. Four deep neural networks were used: three convolutional networks to reduce the action space and search depth through imitation learning, self-play reinforcement learning, and value prediction; and a smaller network for faster simulations. This combination of deep learning and search allowed AlphaGo to master the complex game of Go, demonstrating the capabilities of modern AI.
AlphaGo uses a novel combination of Monte Carlo tree search and neural networks to master the game of Go. It trains two neural networks - a policy network to predict expert moves and a value network to evaluate board positions. During gameplay, AlphaGo runs multiple Monte Carlo tree simulations that use the neural networks to guide search and evaluate positions. The move selected is the one most frequently visited after all simulations. This approach allowed AlphaGo to defeat world champion Lee Sedol 4-1, achieving a milestone in artificial intelligence.
AlphaZero is an AI system created by DeepMind that achieved superhuman ability in the games of chess, shogi, and Go without relying on human data. It uses a new form of deep reinforcement learning combined with Monte Carlo tree search to learn from games generated by self-play. AlphaZero was able to master each game to superhuman level in a matter of hours, defeating the previous world-champion programs in each case. It represents a major advance in unsupervised, self-taught machine learning.
1) Alpha Zero was an AI developed by DeepMind that achieved master level play in the games of chess, shogi, and Go without relying on human data or prior knowledge.
2) It was able to achieve this by using a new form of deep reinforcement learning that allowed it to learn to play solely from games of self-play, starting from random play.
3) Alpha Zero demonstrated superhuman performance in chess, shogi, and Go by defeating previous champion programs in these games, despite being provided no domain knowledge except the game rules.
AI Supremacy in Games: Deep Blue, Watson, Cepheus, AlphaGo, DeepStack and Ten...Karel Ha
my presentation on AI Supremacy in Games: Deep Blue, Watson, Cepheus, AlphaGo, DeepStack and TensorCFR for STRETNUTIE DOKTORANDOV on 13. 6. 2018 (https://zona.fmph.uniba.sk/detail-novinky/back_to_page/fmfi-uk-zona/article/stretnutie-doktorandov-1362018/calendar_date/2018/june/)
LaTeX source code is available at https://github.com/mathemage/AISupremacyInGames-presentation
AlphaGo is a Go-playing program developed by DeepMind that uses a combination of Monte Carlo tree search and deep neural networks to defeat human professionals. It uses policy networks trained via supervised and reinforcement learning to guide the search by providing prior probabilities over moves, and value networks trained via reinforcement learning to evaluate board positions. By integrating neural network guidance into the tree search process, AlphaGo was able to defeat other Go programs and the European Go champion without relying solely on brute force search of the enormous game tree.
The document discusses using genetic programming to develop chess strategies. Genetic programming uses genetic algorithms and Darwinian principles of natural selection to evolve computer programs to solve complex problems. It proposes using genetic programming to evolve chess evaluation functions and strategies. This is done by generating an initial population of random strategies, having them play each other, and using the results to breed new strategies via crossover and mutation until high-performing strategies emerge. The approach shows promise but also faces challenges like increased computational requirements as strategy complexity grows. It also suggests starting with the simpler game of "Loser's Chess" to reduce branching factors before scaling up to full chess.
AlphaGo: Mastering the Game of Go with Deep Neural Networks and Tree SearchKarel Ha
the presentation of the article "Mastering the game of Go with deep neural networks and tree search" given at the Optimization Seminar 2015/2016
Notes:
- All URLs are clickable.
- All citations are clickable (when hovered over the "year" part of "[author year]").
- To download without a SlideShare account, use https://www.dropbox.com/s/p4rnlhoewbedkjg/AlphaGo.pdf?dl=0
- The corresponding leaflet is available at http://www.slideshare.net/KarelHa1/leaflet-for-the-talk-on-alphago
- The source code is available at https://github.com/mathemage/AlphaGo-presentation
The document discusses how AlphaGo, a computer program developed by DeepMind, was able to defeat world champion Lee Sedol at the game of Go. It achieved this through a combination of deep learning and tree search techniques. Four deep neural networks were used: three convolutional networks to reduce the action space and search depth through imitation learning, self-play reinforcement learning, and value prediction; and a smaller network for faster simulations. This combination of deep learning and search allowed AlphaGo to master the complex game of Go, demonstrating the capabilities of modern AI.
AlphaGo uses a novel combination of Monte Carlo tree search and neural networks to master the game of Go. It trains two neural networks - a policy network to predict expert moves and a value network to evaluate board positions. During gameplay, AlphaGo runs multiple Monte Carlo tree simulations that use the neural networks to guide search and evaluate positions. The move selected is the one most frequently visited after all simulations. This approach allowed AlphaGo to defeat world champion Lee Sedol 4-1, achieving a milestone in artificial intelligence.
AlphaZero is an AI system created by DeepMind that achieved superhuman ability in the games of chess, shogi, and Go without relying on human data. It uses a new form of deep reinforcement learning combined with Monte Carlo tree search to learn from games generated by self-play. AlphaZero was able to master each game to superhuman level in a matter of hours, defeating the previous world-champion programs in each case. It represents a major advance in unsupervised, self-taught machine learning.
1) Alpha Zero was an AI developed by DeepMind that achieved master level play in the games of chess, shogi, and Go without relying on human data or prior knowledge.
2) It was able to achieve this by using a new form of deep reinforcement learning that allowed it to learn to play solely from games of self-play, starting from random play.
3) Alpha Zero demonstrated superhuman performance in chess, shogi, and Go by defeating previous champion programs in these games, despite being provided no domain knowledge except the game rules.
AI Supremacy in Games: Deep Blue, Watson, Cepheus, AlphaGo, DeepStack and Ten...Karel Ha
my presentation on AI Supremacy in Games: Deep Blue, Watson, Cepheus, AlphaGo, DeepStack and TensorCFR for STRETNUTIE DOKTORANDOV on 13. 6. 2018 (https://zona.fmph.uniba.sk/detail-novinky/back_to_page/fmfi-uk-zona/article/stretnutie-doktorandov-1362018/calendar_date/2018/june/)
LaTeX source code is available at https://github.com/mathemage/AISupremacyInGames-presentation
AlphaGo is a Go-playing program developed by DeepMind that uses a combination of Monte Carlo tree search and deep neural networks to defeat human professionals. It uses policy networks trained via supervised and reinforcement learning to guide the search by providing prior probabilities over moves, and value networks trained via reinforcement learning to evaluate board positions. By integrating neural network guidance into the tree search process, AlphaGo was able to defeat other Go programs and the European Go champion without relying solely on brute force search of the enormous game tree.
The document discusses using genetic programming to develop chess strategies. Genetic programming uses genetic algorithms and Darwinian principles of natural selection to evolve computer programs to solve complex problems. It proposes using genetic programming to evolve chess evaluation functions and strategies. This is done by generating an initial population of random strategies, having them play each other, and using the results to breed new strategies via crossover and mutation until high-performing strategies emerge. The approach shows promise but also faces challenges like increased computational requirements as strategy complexity grows. It also suggests starting with the simpler game of "Loser's Chess" to reduce branching factors before scaling up to full chess.
The document provides an introduction and overview of AlphaGo Zero, including:
- AlphaGo Zero achieved superhuman performance at Go without human data by using self-play reinforcement learning.
- It uses a policy network and Monte Carlo tree search to select moves. The network is trained through self-play games using its own policy and value outputs as training labels.
- Experiments showed AlphaGo Zero outperformed previous AlphaGo versions and human-trained networks, and continued improving with deeper networks and more self-play training.
Agile Use Cases: Balancing Utility with Simplicity - May 2009IIBA Rochester NY
A mainstay of conventional requirements gathering, use cases can ease the transition to agile methodologies. In this practical program, we explore:
* how to write uses cases
* how to adapt uses cases to agile projects
* how to automate acceptance testing with use cases
Ted Husted is a member of the International
Institute for Business Analysis (IIBA), and the Executive Vice
President Elect of the Rochester NY Chapter. Ted has published three
books and several magazine articles on software development and
testing, and he speaks at professional conferences and conventions on
a regular basis.
Ted Husted works in Pittsford NY with VanDamme Associates, a .NET integrator specializing in non-profits and associations.
Deep Blue was an IBM computer that defeated world chess champion Garry Kasparov in 1997. It used specialized chess chips that could evaluate millions of positions per second to search the game tree deeply. Each chip contained a move generator, evaluation function with fast and slow components, and search control logic. After initial losses to Kasparov in 1996, improvements to Deep Blue's evaluation function and search algorithms allowed it to win the rematch in 1997, demonstrating that a computer could defeat the top human player at chess under tournament time controls.
This document provides an overview of reinforcement learning and AlphaZero. It discusses the math behind reinforcement learning concepts like policy iteration, policy improvement, and policy evaluation. It then explains how AlphaZero uses these concepts along with a deep neural network and self-play to master the game of Go without human data. Key algorithms discussed include Monte Carlo tree search and how AlphaZero implements them in code to learn directly from games played between copies of itself.
The Slashdot Zoo: Mining a Social Network with Negative EdgesJérôme KUNEGIS
We analyse the corpus of user relationships of the Slashdot technology news site. The data was collected from the Slashdot Zoo feature where users of the website can tag other users as friends and foes, providing positive and negative endorsements. We adapt social network analysis techniques to the problem of negative edge weights. In particular, we consider signed variants of global network characteristics such as
the clustering coefficient, node-level characteristics such as centrality and popularity measures, and link-level characteristics such as distances and similarity measures. We evaluate
these measures on the task of identifying unpopular users,
as well as on the task of predicting the sign of links and show that the network exhibits multiplicative transitivity which allows algebraic methods based on matrix multiplication to
be used. We compare our methods to traditional methods which are only suitable for positively weighted edges.
1. The document describes steps involved in analyzing movie reviews from Twitter using text mining and natural language processing techniques. These steps include web scraping Twitter to collect data, loading relevant Python libraries for text analysis and word clouds, and performing sentiment analysis using packages for sentiment scoring, visualization, and data manipulation.
2. The analysis found that real story-based movies elicited more anger in reviews while negative views correlated with higher anger scores. Movie sequels generated more anticipation than new films despite heavy promotion. Fear and trust were found to be less important elements. Positive reviews seemed influenced more by production houses.
3. Other findings indicated that actor and director names drove interest more than other factors. Releasing popular song videos well before
Game Hacking discusses various techniques for hacking console, DOS, and Windows games. These include using devices like Game Genie to modify NES games, memory scanning DOS games to change values like health and ammo, hex editing save files, using debuggers like OllyDbg to modify StarCraft map code, and exploiting flaws in game logic or servers. Memory hacking is described as a common technique to achieve hacks like teleporting or speed increases in games like World of Warcraft.
The slides go through the implementation details of Google Deepmind's AlphaGo, a computer Go AI that defeated the European champion. The slides are targeted for beginners in the machine learning area.
Korean version (한국어 버젼): http://www.slideshare.net/ShaneSeungwhanMoon/ss-59226902
Solving Endgames in Large Imperfect-Information Games such as PokerKarel Ha
My master thesis on solving endgames in imperfect-information games.
keywords: algorithmic game theory, imperfect-information games, Nash equilibrium, subgame, endgame, counterfactual regret minimization, Poker
The document outlines key concepts in algorithmic game theory, including solution concepts like Nash equilibrium, dominant strategies, and correlated equilibrium. It also discusses different representations of games and examples like the prisoner's dilemma. The document provides definitions for fundamental game theory topics and outlines the structure of simultaneous move games involving multiple players with their own strategy sets.
1) The document discusses AlphaGo and its use of machine learning techniques like deep neural networks, reinforcement learning, and Monte Carlo tree search to master the game of Go.
2) AlphaGo uses reinforcement learning to learn Go strategies and evaluate board positions by playing many games against itself. It also uses deep neural networks and convolutional neural networks to pattern-match board positions and Monte Carlo tree search to simulate future moves and strategies.
3) By combining these techniques, AlphaGo was able to defeat top human Go players by developing an intuitive understanding of the game and strategizing several moves in advance.
AlphaGo is a computer program developed by Google DeepMind that uses machine learning to play the board game Go. It uses neural networks and reinforcement learning rather than handcrafted rules, allowing it to learn from experience. This breakthrough points towards robots that can learn physical tasks autonomously. Potential applications include virtual assistants, medical diagnostics, and data analysis.
The document provides an introduction and overview of AlphaGo Zero, including:
- AlphaGo Zero achieved superhuman performance at Go without human data by using self-play reinforcement learning.
- It uses a policy network and Monte Carlo tree search to select moves. The network is trained through self-play games using its own policy and value outputs as training labels.
- Experiments showed AlphaGo Zero outperformed previous AlphaGo versions and human-trained networks, and continued improving with deeper networks and more self-play training.
Agile Use Cases: Balancing Utility with Simplicity - May 2009IIBA Rochester NY
A mainstay of conventional requirements gathering, use cases can ease the transition to agile methodologies. In this practical program, we explore:
* how to write uses cases
* how to adapt uses cases to agile projects
* how to automate acceptance testing with use cases
Ted Husted is a member of the International
Institute for Business Analysis (IIBA), and the Executive Vice
President Elect of the Rochester NY Chapter. Ted has published three
books and several magazine articles on software development and
testing, and he speaks at professional conferences and conventions on
a regular basis.
Ted Husted works in Pittsford NY with VanDamme Associates, a .NET integrator specializing in non-profits and associations.
Deep Blue was an IBM computer that defeated world chess champion Garry Kasparov in 1997. It used specialized chess chips that could evaluate millions of positions per second to search the game tree deeply. Each chip contained a move generator, evaluation function with fast and slow components, and search control logic. After initial losses to Kasparov in 1996, improvements to Deep Blue's evaluation function and search algorithms allowed it to win the rematch in 1997, demonstrating that a computer could defeat the top human player at chess under tournament time controls.
This document provides an overview of reinforcement learning and AlphaZero. It discusses the math behind reinforcement learning concepts like policy iteration, policy improvement, and policy evaluation. It then explains how AlphaZero uses these concepts along with a deep neural network and self-play to master the game of Go without human data. Key algorithms discussed include Monte Carlo tree search and how AlphaZero implements them in code to learn directly from games played between copies of itself.
The Slashdot Zoo: Mining a Social Network with Negative EdgesJérôme KUNEGIS
We analyse the corpus of user relationships of the Slashdot technology news site. The data was collected from the Slashdot Zoo feature where users of the website can tag other users as friends and foes, providing positive and negative endorsements. We adapt social network analysis techniques to the problem of negative edge weights. In particular, we consider signed variants of global network characteristics such as
the clustering coefficient, node-level characteristics such as centrality and popularity measures, and link-level characteristics such as distances and similarity measures. We evaluate
these measures on the task of identifying unpopular users,
as well as on the task of predicting the sign of links and show that the network exhibits multiplicative transitivity which allows algebraic methods based on matrix multiplication to
be used. We compare our methods to traditional methods which are only suitable for positively weighted edges.
1. The document describes steps involved in analyzing movie reviews from Twitter using text mining and natural language processing techniques. These steps include web scraping Twitter to collect data, loading relevant Python libraries for text analysis and word clouds, and performing sentiment analysis using packages for sentiment scoring, visualization, and data manipulation.
2. The analysis found that real story-based movies elicited more anger in reviews while negative views correlated with higher anger scores. Movie sequels generated more anticipation than new films despite heavy promotion. Fear and trust were found to be less important elements. Positive reviews seemed influenced more by production houses.
3. Other findings indicated that actor and director names drove interest more than other factors. Releasing popular song videos well before
Game Hacking discusses various techniques for hacking console, DOS, and Windows games. These include using devices like Game Genie to modify NES games, memory scanning DOS games to change values like health and ammo, hex editing save files, using debuggers like OllyDbg to modify StarCraft map code, and exploiting flaws in game logic or servers. Memory hacking is described as a common technique to achieve hacks like teleporting or speed increases in games like World of Warcraft.
The slides go through the implementation details of Google Deepmind's AlphaGo, a computer Go AI that defeated the European champion. The slides are targeted for beginners in the machine learning area.
Korean version (한국어 버젼): http://www.slideshare.net/ShaneSeungwhanMoon/ss-59226902
Solving Endgames in Large Imperfect-Information Games such as PokerKarel Ha
My master thesis on solving endgames in imperfect-information games.
keywords: algorithmic game theory, imperfect-information games, Nash equilibrium, subgame, endgame, counterfactual regret minimization, Poker
The document outlines key concepts in algorithmic game theory, including solution concepts like Nash equilibrium, dominant strategies, and correlated equilibrium. It also discusses different representations of games and examples like the prisoner's dilemma. The document provides definitions for fundamental game theory topics and outlines the structure of simultaneous move games involving multiple players with their own strategy sets.
1) The document discusses AlphaGo and its use of machine learning techniques like deep neural networks, reinforcement learning, and Monte Carlo tree search to master the game of Go.
2) AlphaGo uses reinforcement learning to learn Go strategies and evaluate board positions by playing many games against itself. It also uses deep neural networks and convolutional neural networks to pattern-match board positions and Monte Carlo tree search to simulate future moves and strategies.
3) By combining these techniques, AlphaGo was able to defeat top human Go players by developing an intuitive understanding of the game and strategizing several moves in advance.
AlphaGo is a computer program developed by Google DeepMind that uses machine learning to play the board game Go. It uses neural networks and reinforcement learning rather than handcrafted rules, allowing it to learn from experience. This breakthrough points towards robots that can learn physical tasks autonomously. Potential applications include virtual assistants, medical diagnostics, and data analysis.
What did AlphaGo do to beat the strongest human Go player?Tobias Pfeiffer
This year AlphaGo shocked the world by decisively beating the strongest human Go player, Lee Sedol. An accomplishment that wasn't expected for years to come. How did AlphaGo do this? What algorithms did it use? What advances in AI made it possible? This talk will answer these questions.
Monte-Carlo Tree Search (MCTS) is an approach for computer Go that uses Monte Carlo simulations to evaluate positions and build a search tree. The MCTS approach selects moves using the UCT algorithm, which balances exploitation of promising child nodes based on past results and exploration of lesser-visited nodes. Simulations are conducted by randomly playing out moves until the end of the game, then updating the search tree with the outcome. This allows MCTS to gradually improve its evaluations and identify stronger moves without relying on expert knowledge or complex position analysis.
Jan vitek distributedrandomforest_5-2-2013Sri Ambati
- Powered by the open source machine learning software H2O.ai. Contributors welcome at: https://github.com/h2oai
- To view videos on H2O open source machine learning software, go to: https://www.youtube.com/user/0xdata
Using Deep Learning to do Real-Time Scoring in Practical ApplicationsGreg Makowski
http://www.meetup.com/SF-Bay-ACM/events/227480571/
(see also YouTube for a recording of the presentation)
The talk will cover a brief review of neural network basics and the following types of neural network deep learning:
* autocorrelational - unsupervised learning for extracting features. He will describe how additional layers build complexity in the feature extraction.
* convolutional - how to detect shift invariant patterns in various data sources. Horizontal shift invariant detection applies to signals like speech recognition or IoT data. Horizontal and vertical shift invariance applies to images or videos, for faces or self driving cars
* discuss details of applying deep net systems for continuous or real time scoring
* reinforcement learning or Q Learning - such as learning how to play Atari video games
* continuous space word models - such as word2vec, skipgram training, NLP understanding and translation
Adam Streck - Reinforcement Learning in Unity. Teach Your Monsters - Codemoti...Codemotion
With the advent of deep learning many of the tasks in computer science that have been deemed impossible suddenly became only a few clicks away. One of the approaches made available is reinforcement learning - a method for solving problems by establishing an action-reward scheme. Combined with the power and availability of the general-purpose game engines, anyone with a rudimentary knowledge of the topic can create and train their virtual creatures. In this talk we will use this power to solve one of the most frustratingly difficult (according to the internet) games of our era.
Adam Streck - Reinforcement Learning in Unity - Teach Your Monsters - Codemot...Codemotion
With the advent of deep learning many of the tasks in computer science that have been deemed impossible suddenly became only a few clicks away. One of the approaches made available is reinforcement learning - a method for solving problems by establishing an action-reward scheme. Combined with the power and availability of the general-purpose game engines, anyone with a rudimentary knowledge of the topic can create and train their virtual creatures. In this talk we will use this power to solve one of the most frustratingly difficult (according to the internet) games of our era.
This document discusses adversarial search techniques used in artificial intelligence to model games as search problems. It introduces the minimax algorithm and alpha-beta pruning to determine optimal strategies by looking ahead in the game tree. These techniques allow computers to search deeper and play games like chess and Go at a world-champion level by evaluating board positions and pruning unfavorable branches in the search.
Showcase of My Research on Games & AI "till the end of Oct. 2014"Mohammad Shaker
A presentation showcasing my research on Games and Artificial Intelligence (till the end of Oct. 2014) at IT University of Copenhagen, Copenhagen, Denmark.
Как мы сделали многопользовательскую браузерную игру для HL++ с воксельной гр...Ontico
HighLoad++ 2017
Зал «Москва», 7 ноября, 14:00
Тезисы:
http://www.highload.ru/2017/abstracts/2881.html
У компании Ingram Micro Cloud стенд на HL. На нем мы организуем браузерную игру TheURBN (urbn.odn.pw) с воксельной графикой, в которой каждый может захватывать территорию общего мира и строить небоскребы при помощи кубиков, а за процессом можно наблюдать на стенде. На экранах мы будем в режиме реального времени показывать виртуальный 3D-мир, в котором участники будут строить небоскребы. Большего погружения можно получить в шлеме VR Oculus Rift на нашем стенде.
...
Novel machine learning techniques comes from spending time with people that have distinct needs. This talk addresses how listening to end users can give rise to novel machine learning applications.
How games are driving advances in AI research- Unite Copenhagen 2019 Unity Technologies
Many recent advances in deep reinforcement and artificial intelligence learning have stemmed from video games. In this session, we'll explore a brief history of this relationship, looking specifically at how Unity is pushing the boundaries of AI research with the Obstacle Tower Challenge. We'll also show how Unity is leveraging cutting-edge research to solve gaming's biggest challenges with the Unity Machine Learning Agents Toolkit, one of the most popular open-source toolkits for deep learning.
Speaker: Matthew Crosby - Imperial College London
MongoDB.local Seattle 2019: Tips & Tricks for Avoiding Common Query PitfallsMongoDB
Query performance can either be a constant headache or the unsung hero of an application. MongoDB provides extremely powerful querying capabilities when used properly. As a member of the support team I will share common mistakes observed as well as tips and tricks to avoiding them.
The document describes an event for learning artificial intelligence hands-on. It includes information about labs to choose from on day 2, including image classification, image segmentation, autoencoders, and generative adversarial networks. Participants will also engage in a hackathon to code solutions for social good problems. Hands-on coding examples are provided for tasks like a rock paper scissors game and image segmentation. Overall it provides an agenda and information for an event focused on practical deep learning experiences.
The document is a presentation about gaming programs at libraries. It discusses why libraries should offer gaming, how to create gaming experiences for patrons, popular games and gaming devices, and examples of successful gaming programs at other libraries. It provides guidance on collection development, programming, and next steps for starting a gaming program.
Devoxx 2017 - AI Self-learning Game PlayingRichard Abbuhl
This document provides an overview of the history of AI self-learning game playing and machine learning. It discusses early work using search trees and perceptrons in the 1950s-1970s. Reinforcement learning techniques like TD-Gammon and Q-Learning are explained. Landmark projects including Deep Blue, AlphaGo, and AlphaGo Zero using neural networks and reinforcement learning to master challenging games like chess and Go are summarized. The document provides high-level descriptions of machine learning basics and techniques demonstrated through examples like Tic-Tac-Toe.
In this talk we discuss about the aplicação of Reinforcement Learning to Games. Recently, OpenAI created an algorithm capable of beating a human team in DOTA, considered a game with great amount of complexity and strategy. In this talk, we'll evaluate the role Reinforcement Learning plays in the world of games, taking a look at some of main achievements and how they look like in terms of implementation. We'll also take a look at some of the history of AI applied to games and how things evolved over time.
Yuandong Tian at AI Frontiers: AI in Games: Achievements and ChallengesAI Frontiers
Recently, substantial progress of AI has been made in applications that require advanced pattern reading, including computer vision, speech recognition and natural language processing. However, it remains an open problem whether AI will make the same level of progress in tasks that require sophisticated reasoning, planning and decision making in complicated game environments similar to the real-world. In this talk, I present the state-of-the-art approaches to build such an AI, our recent contributions in terms of designing more effective algorithms and building extensive and fast general environments and platforms, as well as issues and challenges.
J-Fall 2017 - AI Self-learning Game PlayingRichard Abbuhl
The document discusses AI self-learning game playing, providing an overview of machine learning and reinforcement learning techniques used in game playing such as backpropagation, Q-learning, TD-Gammon, and AlphaGo. It reviews the history of machine learning in game playing from the 1950s to modern implementations, and discusses concepts like weak and strong AI as well as skills needed for the future of employment with advances in AI.
Last.fm provides a public API that allows developers to access information about music and social features on Last.fm. The API includes over 100 methods to retrieve data on artists, albums, tracks, tags, users and more. It outputs data in XML or JSON formats and supports authentication for private data. Common methods include artist.getImages, album.getTags, and track.search. Developers need an API key to access the API.
The document outlines the rules and structure of the Infoyage 2014 Senior Quiz. The quiz consists of 6 rounds with varying rules for scoring and question format. Round 1 involves direct questions to teams with points for correct answers. Round 2 also involves direct questions but with higher point values and partial credit for passed questions. Round 3 poses connect-the-dot style questions worth points for linking answers. Round 4 consists of rapid-fire sets of questions to teams with a time limit and penalty for incorrect answers. Round 5 provides hints for a direct question with increasing/decreasing point values. The final round tests identification of companies, people, and their products within time limits.
AI is used to create parts of our games. It provides intelligent enemy behavior, techniques such as pathfinding or can be used to generate in-game content procedurally. AI can also play our games. The idea to train computers to beat humans in game-like environments such as Jeopardy!, Chess, or soccer is not a new one. But can AI also design our games? The role of Artificial Intelligence in the game development process is constantly expanding. In this talk, Dr. Pirker will talk about the importance of AI in the past, the present, and especially the future of game development.
Using (Free!) App Annie data to optimize your next gameEric Seufert
This presentation describes a framework for using the free data indexed by App Annie to make an informed decision about which audience segments represent the largest opportunities in the mobile marketplace.
Using (Free!) AppAnnie Data to Optimize Your Next Game | Eric SeufertJessica Tams
Delivered at Casual Connect USA 2016. In this talk, Eric will present a framework for using AppAnnie’s free data to conduct market research before a title goes into development. It will also walk through the different aspects that should be considered when beginning development, such as: theme, aesthetic, IP, and genre saturation. And finally, Eric will introduce a tool for using AppAnnie’s data to provide a quantitative measure of each of those aspects.
Similar to Mastering the game of Go with deep neural networks and tree search: Presentation (20)
Recent technological advances in DNA/RNA sequencing allow tackling some most important questions in many biological fields, including evolutionary genetics. Monitoring genomic signatures of natural selection are key to gain insights into such diverse phenomena as the evolution of susceptibility to common human diseases, as well as resistance to antibiotics and pesticides.
With access to large-scale population genomic data, we now have the opportunity to understand how evolution has shaped individual genomes. In particular, we can look into one of the most elusive questions in evolutionary biology: the extent to which natural selection has driven beneficial alleles to spread in time and space, within and among populations.
For this, the use of deep neural networks is a natural and effective solution, as it integrates the predictive power of machine learning with scalability to large datasets.
As a particular test case, my research will focus on novel deep learning methods to study the spread of insecticide-resistance in Anopheles gambiae, the malaria vector mosquitoes. Specifically, I will strive to:
a) incorporate temporal dimension for time-series predictions using sequence models (such as recurrent neural networks or dilated convolutional networks);
b) seek the optimal representation of population genomic data for machine learning;
c) and experiment with various ways to estimate probabilities of mutations migrating between populations.
- Capsule networks aim to address limitations of CNNs by modeling part-whole relationships through capsules that encode object properties like pose.
- A capsule is a group of neurons that outputs a vector encoding the probability an entity is present and its instantiation parameters like pose.
- Routing by agreement dynamically routes information between capsules, with lower-level capsules voting for higher-level capsules by adjusting routing weights to agree on instantiation parameters.
- Capsule networks show promise in tasks requiring reasoning about relationships like digit recognition, achieving state-of-the-art or competitive performance with much smaller networks.
This is the final report for my project as a Technical Student at CERN.
The Intel Xeon/Phi platform is a powerful x86 multi-core engine with a very high-speed memory interface. In its next version it will be able to operate as a stand-alone system with a very high-speed interconnect. This makes it a very interesting candidate for (near) real-time applications such as event-building, event-sorting and event preparation for subsequent processing by high level trigger software algorithms.
Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...Ana Luísa Pinho
Functional Magnetic Resonance Imaging (fMRI) provides means to characterize brain activations in response to behavior. However, cognitive neuroscience has been limited to group-level effects referring to the performance of specific tasks. To obtain the functional profile of elementary cognitive mechanisms, the combination of brain responses to many tasks is required. Yet, to date, both structural atlases and parcellation-based activations do not fully account for cognitive function and still present several limitations. Further, they do not adapt overall to individual characteristics. In this talk, I will give an account of deep-behavioral phenotyping strategies, namely data-driven methods in large task-fMRI datasets, to optimize functional brain-data collection and improve inference of effects-of-interest related to mental processes. Key to this approach is the employment of fast multi-functional paradigms rich on features that can be well parametrized and, consequently, facilitate the creation of psycho-physiological constructs to be modelled with imaging data. Particular emphasis will be given to music stimuli when studying high-order cognitive mechanisms, due to their ecological nature and quality to enable complex behavior compounded by discrete entities. I will also discuss how deep-behavioral phenotyping and individualized models applied to neuroimaging data can better account for the subject-specific organization of domain-general cognitive systems in the human brain. Finally, the accumulation of functional brain signatures brings the possibility to clarify relationships among tasks and create a univocal link between brain systems and mental functions through: (1) the development of ontologies proposing an organization of cognitive processes; and (2) brain-network taxonomies describing functional specialization. To this end, tools to improve commensurability in cognitive science are necessary, such as public repositories, ontology-based platforms and automated meta-analysis tools. I will thus discuss some brain-atlasing resources currently under development, and their applicability in cognitive as well as clinical neuroscience.
BREEDING METHODS FOR DISEASE RESISTANCE.pptxRASHMI M G
Plant breeding for disease resistance is a strategy to reduce crop losses caused by disease. Plants have an innate immune system that allows them to recognize pathogens and provide resistance. However, breeding for long-lasting resistance often involves combining multiple resistance genes
hematic appreciation test is a psychological assessment tool used to measure an individual's appreciation and understanding of specific themes or topics. This test helps to evaluate an individual's ability to connect different ideas and concepts within a given theme, as well as their overall comprehension and interpretation skills. The results of the test can provide valuable insights into an individual's cognitive abilities, creativity, and critical thinking skills
The binding of cosmological structures by massless topological defectsSérgio Sacani
Assuming spherical symmetry and weak field, it is shown that if one solves the Poisson equation or the Einstein field
equations sourced by a topological defect, i.e. a singularity of a very specific form, the result is a localized gravitational
field capable of driving flat rotation (i.e. Keplerian circular orbits at a constant speed for all radii) of test masses on a thin
spherical shell without any underlying mass. Moreover, a large-scale structure which exploits this solution by assembling
concentrically a number of such topological defects can establish a flat stellar or galactic rotation curve, and can also deflect
light in the same manner as an equipotential (isothermal) sphere. Thus, the need for dark matter or modified gravity theory is
mitigated, at least in part.
Phenomics assisted breeding in crop improvementIshaGoswami9
As the population is increasing and will reach about 9 billion upto 2050. Also due to climate change, it is difficult to meet the food requirement of such a large population. Facing the challenges presented by resource shortages, climate
change, and increasing global population, crop yield and quality need to be improved in a sustainable way over the coming decades. Genetic improvement by breeding is the best way to increase crop productivity. With the rapid progression of functional
genomics, an increasing number of crop genomes have been sequenced and dozens of genes influencing key agronomic traits have been identified. However, current genome sequence information has not been adequately exploited for understanding
the complex characteristics of multiple gene, owing to a lack of crop phenotypic data. Efficient, automatic, and accurate technologies and platforms that can capture phenotypic data that can
be linked to genomics information for crop improvement at all growth stages have become as important as genotyping. Thus,
high-throughput phenotyping has become the major bottleneck restricting crop breeding. Plant phenomics has been defined as the high-throughput, accurate acquisition and analysis of multi-dimensional phenotypes
during crop growing stages at the organism level, including the cell, tissue, organ, individual plant, plot, and field levels. With the rapid development of novel sensors, imaging technology,
and analysis methods, numerous infrastructure platforms have been developed for phenotyping.
Remote Sensing and Computational, Evolutionary, Supercomputing, and Intellige...University of Maribor
Slides from talk:
Aleš Zamuda: Remote Sensing and Computational, Evolutionary, Supercomputing, and Intelligent Systems.
11th International Conference on Electrical, Electronics and Computer Engineering (IcETRAN), Niš, 3-6 June 2024
Inter-Society Networking Panel GRSS/MTT-S/CIS Panel Session: Promoting Connection and Cooperation
https://www.etran.rs/2024/en/home-english/
This presentation explores a brief idea about the structural and functional attributes of nucleotides, the structure and function of genetic materials along with the impact of UV rays and pH upon them.
The use of Nauplii and metanauplii artemia in aquaculture (brine shrimp).pptxMAGOTI ERNEST
Although Artemia has been known to man for centuries, its use as a food for the culture of larval organisms apparently began only in the 1930s, when several investigators found that it made an excellent food for newly hatched fish larvae (Litvinenko et al., 2023). As aquaculture developed in the 1960s and ‘70s, the use of Artemia also became more widespread, due both to its convenience and to its nutritional value for larval organisms (Arenas-Pardo et al., 2024). The fact that Artemia dormant cysts can be stored for long periods in cans, and then used as an off-the-shelf food requiring only 24 h of incubation makes them the most convenient, least labor-intensive, live food available for aquaculture (Sorgeloos & Roubach, 2021). The nutritional value of Artemia, especially for marine organisms, is not constant, but varies both geographically and temporally. During the last decade, however, both the causes of Artemia nutritional variability and methods to improve poorquality Artemia have been identified (Loufi et al., 2024).
Brine shrimp (Artemia spp.) are used in marine aquaculture worldwide. Annually, more than 2,000 metric tons of dry cysts are used for cultivation of fish, crustacean, and shellfish larva. Brine shrimp are important to aquaculture because newly hatched brine shrimp nauplii (larvae) provide a food source for many fish fry (Mozanzadeh et al., 2021). Culture and harvesting of brine shrimp eggs represents another aspect of the aquaculture industry. Nauplii and metanauplii of Artemia, commonly known as brine shrimp, play a crucial role in aquaculture due to their nutritional value and suitability as live feed for many aquatic species, particularly in larval stages (Sorgeloos & Roubach, 2021).
Nucleophilic Addition of carbonyl compounds.pptxSSR02
Nucleophilic addition is the most important reaction of carbonyls. Not just aldehydes and ketones, but also carboxylic acid derivatives in general.
Carbonyls undergo addition reactions with a large range of nucleophiles.
Comparing the relative basicity of the nucleophile and the product is extremely helpful in determining how reversible the addition reaction is. Reactions with Grignards and hydrides are irreversible. Reactions with weak bases like halides and carboxylates generally don’t happen.
Electronic effects (inductive effects, electron donation) have a large impact on reactivity.
Large groups adjacent to the carbonyl will slow the rate of reaction.
Neutral nucleophiles can also add to carbonyls, although their additions are generally slower and more reversible. Acid catalysis is sometimes employed to increase the rate of addition.
DERIVATION OF MODIFIED BERNOULLI EQUATION WITH VISCOUS EFFECTS AND TERMINAL V...Wasswaderrick3
In this book, we use conservation of energy techniques on a fluid element to derive the Modified Bernoulli equation of flow with viscous or friction effects. We derive the general equation of flow/ velocity and then from this we derive the Pouiselle flow equation, the transition flow equation and the turbulent flow equation. In the situations where there are no viscous effects , the equation reduces to the Bernoulli equation. From experimental results, we are able to include other terms in the Bernoulli equation. We also look at cases where pressure gradients exist. We use the Modified Bernoulli equation to derive equations of flow rate for pipes of different cross sectional areas connected together. We also extend our techniques of energy conservation to a sphere falling in a viscous medium under the effect of gravity. We demonstrate Stokes equation of terminal velocity and turbulent flow equation. We look at a way of calculating the time taken for a body to fall in a viscous medium. We also look at the general equation of terminal velocity.
EWOCS-I: The catalog of X-ray sources in Westerlund 1 from the Extended Weste...Sérgio Sacani
Context. With a mass exceeding several 104 M⊙ and a rich and dense population of massive stars, supermassive young star clusters
represent the most massive star-forming environment that is dominated by the feedback from massive stars and gravitational interactions
among stars.
Aims. In this paper we present the Extended Westerlund 1 and 2 Open Clusters Survey (EWOCS) project, which aims to investigate
the influence of the starburst environment on the formation of stars and planets, and on the evolution of both low and high mass stars.
The primary targets of this project are Westerlund 1 and 2, the closest supermassive star clusters to the Sun.
Methods. The project is based primarily on recent observations conducted with the Chandra and JWST observatories. Specifically,
the Chandra survey of Westerlund 1 consists of 36 new ACIS-I observations, nearly co-pointed, for a total exposure time of 1 Msec.
Additionally, we included 8 archival Chandra/ACIS-S observations. This paper presents the resulting catalog of X-ray sources within
and around Westerlund 1. Sources were detected by combining various existing methods, and photon extraction and source validation
were carried out using the ACIS-Extract software.
Results. The EWOCS X-ray catalog comprises 5963 validated sources out of the 9420 initially provided to ACIS-Extract, reaching a
photon flux threshold of approximately 2 × 10−8 photons cm−2
s
−1
. The X-ray sources exhibit a highly concentrated spatial distribution,
with 1075 sources located within the central 1 arcmin. We have successfully detected X-ray emissions from 126 out of the 166 known
massive stars of the cluster, and we have collected over 71 000 photons from the magnetar CXO J164710.20-455217.
5. Applications of AI
spam filters
recommender systems (Netflix, YouTube)
predictive text (Swiftkey)
1
6. Applications of AI
spam filters
recommender systems (Netflix, YouTube)
predictive text (Swiftkey)
audio recognition (Shazam, SoundHound)
1
7. Applications of AI
spam filters
recommender systems (Netflix, YouTube)
predictive text (Swiftkey)
audio recognition (Shazam, SoundHound)
music generation (DeepHear - Composing and harmonizing
music with neural networks)
1
8. Applications of AI
spam filters
recommender systems (Netflix, YouTube)
predictive text (Swiftkey)
audio recognition (Shazam, SoundHound)
music generation (DeepHear - Composing and harmonizing
music with neural networks)
self-driving cars
1
12. Baby Names Generated Character by Character
Baby Killiel Saddie Char Ahbort With
Karpathy 2015 4
13. Baby Names Generated Character by Character
Baby Killiel Saddie Char Ahbort With
Rudi Levette Berice Lussa Hany Mareanne Chrestina Carissy
Karpathy 2015 4
26. Supervised versus Unsupervised Learning
Supervised learning:
data set must be labelled
e.g. which e-mail is regular/spam, which image is duck/face,
...
10
27. Supervised versus Unsupervised Learning
Supervised learning:
data set must be labelled
e.g. which e-mail is regular/spam, which image is duck/face,
...
10
28. Supervised versus Unsupervised Learning
Supervised learning:
data set must be labelled
e.g. which e-mail is regular/spam, which image is duck/face,
...
Unsupervised learning:
10
29. Supervised versus Unsupervised Learning
Supervised learning:
data set must be labelled
e.g. which e-mail is regular/spam, which image is duck/face,
...
Unsupervised learning:
data set is not labelled
10
30. Supervised versus Unsupervised Learning
Supervised learning:
data set must be labelled
e.g. which e-mail is regular/spam, which image is duck/face,
...
Unsupervised learning:
data set is not labelled
it can try to cluster the data into different groups
10
31. Supervised versus Unsupervised Learning
Supervised learning:
data set must be labelled
e.g. which e-mail is regular/spam, which image is duck/face,
...
Unsupervised learning:
data set is not labelled
it can try to cluster the data into different groups
e.g. grouping similar news, ...
10
33. Supervised Learning
1. data collection: Google Search, Facebook “Likes”, Siri, Netflix, YouTube views, LHC collisions, KGS Go
Server...
http://www.nickgillian.com/ 11
34. Supervised Learning
1. data collection: Google Search, Facebook “Likes”, Siri, Netflix, YouTube views, LHC collisions, KGS Go
Server...
2. training on training set
http://www.nickgillian.com/ 11
35. Supervised Learning
1. data collection: Google Search, Facebook “Likes”, Siri, Netflix, YouTube views, LHC collisions, KGS Go
Server...
2. training on training set
3. testing on testing set
http://www.nickgillian.com/ 11
36. Supervised Learning
1. data collection: Google Search, Facebook “Likes”, Siri, Netflix, YouTube views, LHC collisions, KGS Go
Server...
2. training on training set
3. testing on testing set
4. deployment
http://www.nickgillian.com/ 11
37. Supervised Learning
1. data collection: Google Search, Facebook “Likes”, Siri, Netflix, YouTube views, LHC collisions, KGS Go
Server...
2. training on training set
3. testing on testing set
4. deployment
http://www.nickgillian.com/ 11
38. Supervised Learning
1. data collection: Google Search, Facebook “Likes”, Siri, Netflix, YouTube views, LHC collisions, KGS Go
Server...
2. training on training set
3. testing on testing set
4. deployment
http://www.nickgillian.com/ 11
45. Underfitting and Overfitting
Beware of overfitting!
It is like learning for a math exam by memorizing proofs.
https://www.researchgate.net/post/How_to_Avoid_Overfitting 15
50. Tree Search
Optimal value v∗(s) determines the outcome of the game:
from every board position or state s
Silver et al. 2016 17
51. Tree Search
Optimal value v∗(s) determines the outcome of the game:
from every board position or state s
under perfect play by all players.
Silver et al. 2016 17
52. Tree Search
Optimal value v∗(s) determines the outcome of the game:
from every board position or state s
under perfect play by all players.
Silver et al. 2016 17
53. Tree Search
Optimal value v∗(s) determines the outcome of the game:
from every board position or state s
under perfect play by all players.
It is computed by recursively traversing a search tree containing
approximately bd possible sequences of moves, where
Silver et al. 2016 17
54. Tree Search
Optimal value v∗(s) determines the outcome of the game:
from every board position or state s
under perfect play by all players.
It is computed by recursively traversing a search tree containing
approximately bd possible sequences of moves, where
b is the games breadth (number of legal moves per position)
Silver et al. 2016 17
55. Tree Search
Optimal value v∗(s) determines the outcome of the game:
from every board position or state s
under perfect play by all players.
It is computed by recursively traversing a search tree containing
approximately bd possible sequences of moves, where
b is the games breadth (number of legal moves per position)
d is its depth (game length)
Silver et al. 2016 17
56. Game tree of Go
Sizes of trees for various games:
chess: b ≈ 35, d ≈ 80
Go: b ≈ 250, d ≈ 150
Allis et al. 1994 18
57. Game tree of Go
Sizes of trees for various games:
chess: b ≈ 35, d ≈ 80
Go: b ≈ 250, d ≈ 150 ⇒ more positions than atoms in the
universe!
Allis et al. 1994 18
58. Game tree of Go
Sizes of trees for various games:
chess: b ≈ 35, d ≈ 80
Go: b ≈ 250, d ≈ 150 ⇒ more positions than atoms in the
universe!
That makes Go a googol
times more complex than
chess.
https://deepmind.com/alpha-go.html
Allis et al. 1994 18
59. Game tree of Go
Sizes of trees for various games:
chess: b ≈ 35, d ≈ 80
Go: b ≈ 250, d ≈ 150 ⇒ more positions than atoms in the
universe!
That makes Go a googol
times more complex than
chess.
https://deepmind.com/alpha-go.html
How to handle the size of the game tree?
Allis et al. 1994 18
60. Game tree of Go
Sizes of trees for various games:
chess: b ≈ 35, d ≈ 80
Go: b ≈ 250, d ≈ 150 ⇒ more positions than atoms in the
universe!
That makes Go a googol
times more complex than
chess.
https://deepmind.com/alpha-go.html
How to handle the size of the game tree?
for the breadth: a neural network to select moves
Allis et al. 1994 18
61. Game tree of Go
Sizes of trees for various games:
chess: b ≈ 35, d ≈ 80
Go: b ≈ 250, d ≈ 150 ⇒ more positions than atoms in the
universe!
That makes Go a googol
times more complex than
chess.
https://deepmind.com/alpha-go.html
How to handle the size of the game tree?
for the breadth: a neural network to select moves
for the depth: a neural network to evaluate current position
Allis et al. 1994 18
62. Game tree of Go
Sizes of trees for various games:
chess: b ≈ 35, d ≈ 80
Go: b ≈ 250, d ≈ 150 ⇒ more positions than atoms in the
universe!
That makes Go a googol
times more complex than
chess.
https://deepmind.com/alpha-go.html
How to handle the size of the game tree?
for the breadth: a neural network to select moves
for the depth: a neural network to evaluate current position
for the tree traverse: Monte Carlo tree search (MCTS)
Allis et al. 1994 18
66. Neural Network: Inspiration
inspired by the neuronal structure of the mammalian cerebral
cortex
http://pages.cs.wisc.edu/~bolo/shipyard/neural/local.html 20
67. Neural Network: Inspiration
inspired by the neuronal structure of the mammalian cerebral
cortex
but on much smaller scales
http://pages.cs.wisc.edu/~bolo/shipyard/neural/local.html 20
68. Neural Network: Inspiration
inspired by the neuronal structure of the mammalian cerebral
cortex
but on much smaller scales
suitable to model systems with a high tolerance to error
http://pages.cs.wisc.edu/~bolo/shipyard/neural/local.html 20
69. Neural Network: Inspiration
inspired by the neuronal structure of the mammalian cerebral
cortex
but on much smaller scales
suitable to model systems with a high tolerance to error
e.g. audio or image recognition
http://pages.cs.wisc.edu/~bolo/shipyard/neural/local.html 20
73. Neural Network: Modes
Two modes
feedforward for making predictions
backpropagation for learning
Dieterle 2003 21
74. Neural Network: an example of feedforward
http://stevenmiller888.github.io/mind-how-to-build-a-neural-network/ 22
75. Gradient Descent in Neural Networks
http://pages.cs.wisc.edu/~bolo/shipyard/neural/local.html 23
76. Gradient Descent in Neural Networks
Motto: ”Learn by mistakes!”
http://pages.cs.wisc.edu/~bolo/shipyard/neural/local.html 23
77. Gradient Descent in Neural Networks
Motto: ”Learn by mistakes!”
However, error functions are not necessarily convex or so “smooth”.
http://pages.cs.wisc.edu/~bolo/shipyard/neural/local.html 23
78. Deep Neural Network: Inspiration
The hierarchy of concepts is captured in the number of layers (the deep in “deep learning”)
http://pages.cs.wisc.edu/~bolo/shipyard/neural/local.html 24
79. Deep Neural Network: Inspiration
The hierarchy of concepts is captured in the number of layers (the deep in “deep learning”)
http://pages.cs.wisc.edu/~bolo/shipyard/neural/local.html 24
89. Rules of Go
Black versus White. Black starts the game.
the rule of liberty
28
90. Rules of Go
Black versus White. Black starts the game.
the rule of liberty
the “ko” rule
28
91. Rules of Go
Black versus White. Black starts the game.
the rule of liberty
the “ko” rule
Handicap for difference in ranks: Black can place 1 or more stones
in advance (compensation for White’s greater strength). 28
93. Scoring Rules: Area Scoring
A player’s score is:
the number of stones that the player has on the board
https://en.wikipedia.org/wiki/Go_(game) 29
94. Scoring Rules: Area Scoring
A player’s score is:
the number of stones that the player has on the board
plus the number of empty intersections surrounded by that
player’s stones
https://en.wikipedia.org/wiki/Go_(game) 29
95. Scoring Rules: Area Scoring
A player’s score is:
the number of stones that the player has on the board
plus the number of empty intersections surrounded by that
player’s stones
plus komi(dashi) points for the White player
which is a compensation for the first move advantage of the Black player
https://en.wikipedia.org/wiki/Go_(game) 29
101. Training the (Deep Convolutional) Neural Networks
Silver et al. 2016 32
102. SL Policy Networks (1/3)
13-layer deep convolutional neural network
Silver et al. 2016 33
103. SL Policy Networks (1/3)
13-layer deep convolutional neural network
goal: to predict expert human moves
Silver et al. 2016 33
104. SL Policy Networks (1/3)
13-layer deep convolutional neural network
goal: to predict expert human moves
task of classification
Silver et al. 2016 33
105. SL Policy Networks (1/3)
13-layer deep convolutional neural network
goal: to predict expert human moves
task of classification
trained from 30 millions positions from the KGS Go Server
Silver et al. 2016 33
106. SL Policy Networks (1/3)
13-layer deep convolutional neural network
goal: to predict expert human moves
task of classification
trained from 30 millions positions from the KGS Go Server
stochastic gradient ascent:
∆σ ∝
∂ log pσ(a|s)
∂σ
(to maximize the likelihood of the human move a selected in state s)
Silver et al. 2016 33
107. SL Policy Networks (1/3)
13-layer deep convolutional neural network
goal: to predict expert human moves
task of classification
trained from 30 millions positions from the KGS Go Server
stochastic gradient ascent:
∆σ ∝
∂ log pσ(a|s)
∂σ
(to maximize the likelihood of the human move a selected in state s)
Silver et al. 2016 33
108. SL Policy Networks (1/3)
13-layer deep convolutional neural network
goal: to predict expert human moves
task of classification
trained from 30 millions positions from the KGS Go Server
stochastic gradient ascent:
∆σ ∝
∂ log pσ(a|s)
∂σ
(to maximize the likelihood of the human move a selected in state s)
Results:
Silver et al. 2016 33
109. SL Policy Networks (1/3)
13-layer deep convolutional neural network
goal: to predict expert human moves
task of classification
trained from 30 millions positions from the KGS Go Server
stochastic gradient ascent:
∆σ ∝
∂ log pσ(a|s)
∂σ
(to maximize the likelihood of the human move a selected in state s)
Results:
44.4% accuracy (the state-of-the-art from other groups)
Silver et al. 2016 33
110. SL Policy Networks (1/3)
13-layer deep convolutional neural network
goal: to predict expert human moves
task of classification
trained from 30 millions positions from the KGS Go Server
stochastic gradient ascent:
∆σ ∝
∂ log pσ(a|s)
∂σ
(to maximize the likelihood of the human move a selected in state s)
Results:
44.4% accuracy (the state-of-the-art from other groups)
55.7% accuracy (raw board position + move history as input)
Silver et al. 2016 33
111. SL Policy Networks (1/3)
13-layer deep convolutional neural network
goal: to predict expert human moves
task of classification
trained from 30 millions positions from the KGS Go Server
stochastic gradient ascent:
∆σ ∝
∂ log pσ(a|s)
∂σ
(to maximize the likelihood of the human move a selected in state s)
Results:
44.4% accuracy (the state-of-the-art from other groups)
55.7% accuracy (raw board position + move history as input)
57.0% accuracy (all input features)
Silver et al. 2016 33
112. SL Policy Networks (2/3)
Small improvements in accuracy led to large improvements
in playing strength (see the next slide)
Silver et al. 2016 34
113. SL Policy Networks (3/3)
move probabilities taken directly from the SL policy network pσ (reported as a percentage if above 0.1%).
Silver et al. 2016 35
114. Training the (Deep Convolutional) Neural Networks
Silver et al. 2016 36
116. Rollout Policy
Rollout policy pπ(a|s) is faster but less accurate than SL
policy network.
accuracy of 24.2%
Silver et al. 2016 37
117. Rollout Policy
Rollout policy pπ(a|s) is faster but less accurate than SL
policy network.
accuracy of 24.2%
It takes 2µs to select an action, compared to 3 ms in case
of SL policy network.
Silver et al. 2016 37
118. Training the (Deep Convolutional) Neural Networks
Silver et al. 2016 38
119. RL Policy Networks (1/2)
identical in structure to the SL policy network
Silver et al. 2016 39
120. RL Policy Networks (1/2)
identical in structure to the SL policy network
goal: to win in the games of self-play
Silver et al. 2016 39
121. RL Policy Networks (1/2)
identical in structure to the SL policy network
goal: to win in the games of self-play
task of classification
Silver et al. 2016 39
122. RL Policy Networks (1/2)
identical in structure to the SL policy network
goal: to win in the games of self-play
task of classification
weights ρ initialized to the same values, ρ := σ
Silver et al. 2016 39
123. RL Policy Networks (1/2)
identical in structure to the SL policy network
goal: to win in the games of self-play
task of classification
weights ρ initialized to the same values, ρ := σ
games of self-play
Silver et al. 2016 39
124. RL Policy Networks (1/2)
identical in structure to the SL policy network
goal: to win in the games of self-play
task of classification
weights ρ initialized to the same values, ρ := σ
games of self-play
between the current RL policy network and a randomly
selected previous iteration
Silver et al. 2016 39
125. RL Policy Networks (1/2)
identical in structure to the SL policy network
goal: to win in the games of self-play
task of classification
weights ρ initialized to the same values, ρ := σ
games of self-play
between the current RL policy network and a randomly
selected previous iteration
to prevent overfitting to the current policy
Silver et al. 2016 39
126. RL Policy Networks (1/2)
identical in structure to the SL policy network
goal: to win in the games of self-play
task of classification
weights ρ initialized to the same values, ρ := σ
games of self-play
between the current RL policy network and a randomly
selected previous iteration
to prevent overfitting to the current policy
stochastic gradient ascent:
∆ρ ∝
∂ log pρ(at|st)
∂ρ
zt
at time step t, where reward function zt is +1 for winning and −1 for losing.
Silver et al. 2016 39
127. RL Policy Networks (1/2)
identical in structure to the SL policy network
goal: to win in the games of self-play
task of classification
weights ρ initialized to the same values, ρ := σ
games of self-play
between the current RL policy network and a randomly
selected previous iteration
to prevent overfitting to the current policy
stochastic gradient ascent:
∆ρ ∝
∂ log pρ(at|st)
∂ρ
zt
at time step t, where reward function zt is +1 for winning and −1 for losing.
Silver et al. 2016 39
128. RL Policy Networks (1/2)
identical in structure to the SL policy network
goal: to win in the games of self-play
task of classification
weights ρ initialized to the same values, ρ := σ
games of self-play
between the current RL policy network and a randomly
selected previous iteration
to prevent overfitting to the current policy
stochastic gradient ascent:
∆ρ ∝
∂ log pρ(at|st)
∂ρ
zt
at time step t, where reward function zt is +1 for winning and −1 for losing.
Silver et al. 2016 39
129. RL Policy Networks (2/2)
Results (by sampling each move at ∼ pρ(·|st)):
Silver et al. 2016 40
130. RL Policy Networks (2/2)
Results (by sampling each move at ∼ pρ(·|st)):
80% of win rate against the SL policy network
Silver et al. 2016 40
131. RL Policy Networks (2/2)
Results (by sampling each move at ∼ pρ(·|st)):
80% of win rate against the SL policy network
85% of win rate against the strongest open-source Go
program, Pachi (Baudiˇs and Gailly 2011)
Silver et al. 2016 40
132. RL Policy Networks (2/2)
Results (by sampling each move at ∼ pρ(·|st)):
80% of win rate against the SL policy network
85% of win rate against the strongest open-source Go
program, Pachi (Baudiˇs and Gailly 2011)
The previous state-of-the-art, based only on SL of CNN:
Silver et al. 2016 40
133. RL Policy Networks (2/2)
Results (by sampling each move at ∼ pρ(·|st)):
80% of win rate against the SL policy network
85% of win rate against the strongest open-source Go
program, Pachi (Baudiˇs and Gailly 2011)
The previous state-of-the-art, based only on SL of CNN:
Silver et al. 2016 40
134. RL Policy Networks (2/2)
Results (by sampling each move at ∼ pρ(·|st)):
80% of win rate against the SL policy network
85% of win rate against the strongest open-source Go
program, Pachi (Baudiˇs and Gailly 2011)
The previous state-of-the-art, based only on SL of CNN:
11% of “win” rate against Pachi
Silver et al. 2016 40
135. Training the (Deep Convolutional) Neural Networks
Silver et al. 2016 41
136. Value Network (1/2)
similar architecture to the policy network, but outputs a single
prediction instead of a probability distribution
Silver et al. 2016 42
137. Value Network (1/2)
similar architecture to the policy network, but outputs a single
prediction instead of a probability distribution
goal: to estimate a value function
vp
(s) = E[zt|st = s, at...T ∼ p]
that predicts the outcome from position s (of games played
by using policy pρ)
Silver et al. 2016 42
138. Value Network (1/2)
similar architecture to the policy network, but outputs a single
prediction instead of a probability distribution
goal: to estimate a value function
vp
(s) = E[zt|st = s, at...T ∼ p]
that predicts the outcome from position s (of games played
by using policy pρ)
Specifically, vθ(s) ≈ vpρ (s) ≈ v∗(s).
Silver et al. 2016 42
139. Value Network (1/2)
similar architecture to the policy network, but outputs a single
prediction instead of a probability distribution
goal: to estimate a value function
vp
(s) = E[zt|st = s, at...T ∼ p]
that predicts the outcome from position s (of games played
by using policy pρ)
Specifically, vθ(s) ≈ vpρ (s) ≈ v∗(s).
task of regression
Silver et al. 2016 42
140. Value Network (1/2)
similar architecture to the policy network, but outputs a single
prediction instead of a probability distribution
goal: to estimate a value function
vp
(s) = E[zt|st = s, at...T ∼ p]
that predicts the outcome from position s (of games played
by using policy pρ)
Specifically, vθ(s) ≈ vpρ (s) ≈ v∗(s).
task of regression
stochastic gradient descent:
∆θ ∝
∂vθ(s)
∂θ
(z − vθ(s))
(to minimize the mean squared error (MSE) between the predicted vθ(s) and the true z)
Silver et al. 2016 42
142. Value Network (2/2)
Beware of overfitting!
Successive positions are strongly correlated.
Silver et al. 2016 43
143. Value Network (2/2)
Beware of overfitting!
Successive positions are strongly correlated.
Value network memorized the game outcomes, rather than
generalizing to new positions.
Silver et al. 2016 43
144. Value Network (2/2)
Beware of overfitting!
Successive positions are strongly correlated.
Value network memorized the game outcomes, rather than
generalizing to new positions.
Solution: generate 30 million (new) positions, each sampled
from a seperate game
Silver et al. 2016 43
145. Value Network (2/2)
Beware of overfitting!
Successive positions are strongly correlated.
Value network memorized the game outcomes, rather than
generalizing to new positions.
Solution: generate 30 million (new) positions, each sampled
from a seperate game
almost the accuracy of Monte Carlo rollouts (using pρ), but
15000 times less computation!
Silver et al. 2016 43
146. Selection of Moves by the Value Network
evaluation of all successors s of the root position s, using vθ(s)
Silver et al. 2016 44
147. Evaluation accuracy in various stages of a game
Move number is the number of moves that had been played in the given position.
Silver et al. 2016 45
148. Evaluation accuracy in various stages of a game
Move number is the number of moves that had been played in the given position.
Each position evaluated by:
forward pass of the value network vθ
Silver et al. 2016 45
149. Evaluation accuracy in various stages of a game
Move number is the number of moves that had been played in the given position.
Each position evaluated by:
forward pass of the value network vθ
100 rollouts, played out using the corresponding policy
Silver et al. 2016 45
150. Training the (Deep Convolutional) Neural Networks
Silver et al. 2016 46
151. ELO Ratings for Various Combinations of Networks
Silver et al. 2016 47
152. MCTS Algorithm
The next action is selected by lookahead search, using simulation:
Silver et al. 2016 48
153. MCTS Algorithm
The next action is selected by lookahead search, using simulation:
1. selection phase
Silver et al. 2016 48
154. MCTS Algorithm
The next action is selected by lookahead search, using simulation:
1. selection phase
2. expansion phase
Silver et al. 2016 48
155. MCTS Algorithm
The next action is selected by lookahead search, using simulation:
1. selection phase
2. expansion phase
3. evaluation phase
Silver et al. 2016 48
156. MCTS Algorithm
The next action is selected by lookahead search, using simulation:
1. selection phase
2. expansion phase
3. evaluation phase
4. backup phase (at end of simulation)
Silver et al. 2016 48
157. MCTS Algorithm
The next action is selected by lookahead search, using simulation:
1. selection phase
2. expansion phase
3. evaluation phase
4. backup phase (at end of simulation)
Silver et al. 2016 48
158. MCTS Algorithm
The next action is selected by lookahead search, using simulation:
1. selection phase
2. expansion phase
3. evaluation phase
4. backup phase (at end of simulation)
Each edge (s, a) keeps:
action value Q(s, a)
Silver et al. 2016 48
159. MCTS Algorithm
The next action is selected by lookahead search, using simulation:
1. selection phase
2. expansion phase
3. evaluation phase
4. backup phase (at end of simulation)
Each edge (s, a) keeps:
action value Q(s, a)
visit count N(s, a)
Silver et al. 2016 48
160. MCTS Algorithm
The next action is selected by lookahead search, using simulation:
1. selection phase
2. expansion phase
3. evaluation phase
4. backup phase (at end of simulation)
Each edge (s, a) keeps:
action value Q(s, a)
visit count N(s, a)
prior probability P(s, a) (from SL policy network pσ)
Silver et al. 2016 48
161. MCTS Algorithm
The next action is selected by lookahead search, using simulation:
1. selection phase
2. expansion phase
3. evaluation phase
4. backup phase (at end of simulation)
Each edge (s, a) keeps:
action value Q(s, a)
visit count N(s, a)
prior probability P(s, a) (from SL policy network pσ)
Silver et al. 2016 48
162. MCTS Algorithm
The next action is selected by lookahead search, using simulation:
1. selection phase
2. expansion phase
3. evaluation phase
4. backup phase (at end of simulation)
Each edge (s, a) keeps:
action value Q(s, a)
visit count N(s, a)
prior probability P(s, a) (from SL policy network pσ)
The tree is traversed by simulation (descending the tree) from the
root state.
Silver et al. 2016 48
164. MCTS Algorithm: Selection
At each time step t, an action at is selected from state st
at = arg max
a
(Q(st , a) + u(st , a))
Silver et al. 2016 49
165. MCTS Algorithm: Selection
At each time step t, an action at is selected from state st
at = arg max
a
(Q(st , a) + u(st , a))
where bonus
u(st , a) ∝
P(s, a)
1 + N(s, a)
Silver et al. 2016 49
167. MCTS Algorithm: Expansion
A leaf position may be expanded (just once) by the SL policy network pσ.
Silver et al. 2016 50
168. MCTS Algorithm: Expansion
A leaf position may be expanded (just once) by the SL policy network pσ.
The output probabilities are stored as priors P(s, a) := pσ(a|s).
Silver et al. 2016 50
171. MCTS: Evaluation
evaluation from the value network vθ(s)
evaluation by the outcome z using the fast rollout policy pπ until the end of game
Silver et al. 2016 51
172. MCTS: Evaluation
evaluation from the value network vθ(s)
evaluation by the outcome z using the fast rollout policy pπ until the end of game
Silver et al. 2016 51
173. MCTS: Evaluation
evaluation from the value network vθ(s)
evaluation by the outcome z using the fast rollout policy pπ until the end of game
Using a mixing parameter λ, the final leaf evaluation V (s) is
V (s) = (1 − λ)vθ(s) + λz
Silver et al. 2016 51
174. Tree Evaluation from Value Network
action values Q(s, a) for each tree-edge (s, a) from root position s (averaged over value network evaluations only)
Silver et al. 2016 52
175. Tree Evaluation from Rollouts
action values Q(s, a), averaged over rollout evaluations only
Silver et al. 2016 53
176. MCTS: Backup
At the end of simulation, each traversed edge is updated by accumulating:
the action values Q
Silver et al. 2016 54
177. MCTS: Backup
At the end of simulation, each traversed edge is updated by accumulating:
the action values Q
visit counts N
Silver et al. 2016 54
178. Once the search is complete, the algorithm
chooses the most visited move from the root
position.
Silver et al. 2016 54
180. Principal Variation (Path with Maximum Visit Count)
The moves are presented in a numbered sequence.
Silver et al. 2016 56
181. Principal Variation (Path with Maximum Visit Count)
The moves are presented in a numbered sequence.
AlphaGo selected the move indicated by the red circle;
Silver et al. 2016 56
182. Principal Variation (Path with Maximum Visit Count)
The moves are presented in a numbered sequence.
AlphaGo selected the move indicated by the red circle;
Fan Hui responded with the move indicated by the white square;
Silver et al. 2016 56
183. Principal Variation (Path with Maximum Visit Count)
The moves are presented in a numbered sequence.
AlphaGo selected the move indicated by the red circle;
Fan Hui responded with the move indicated by the white square;
in his post-game commentary, he preferred the move (labelled 1) predicted by AlphaGo.
Silver et al. 2016 56
200. Fan Hui
professional 2 dan
European Go Champion in 2013, 2014 and 2015
https://en.wikipedia.org/wiki/Fan_Hui 60
201. Fan Hui
professional 2 dan
European Go Champion in 2013, 2014 and 2015
European Professional Go Champion in 2016
https://en.wikipedia.org/wiki/Fan_Hui 60
202. Fan Hui
professional 2 dan
European Go Champion in 2013, 2014 and 2015
European Professional Go Champion in 2016
biological neural network:
https://en.wikipedia.org/wiki/Fan_Hui 60
203. Fan Hui
professional 2 dan
European Go Champion in 2013, 2014 and 2015
European Professional Go Champion in 2016
biological neural network:
100 billion neurons
https://en.wikipedia.org/wiki/Fan_Hui 60
204. Fan Hui
professional 2 dan
European Go Champion in 2013, 2014 and 2015
European Professional Go Champion in 2016
biological neural network:
100 billion neurons
100 up to 1,000 trillion neuronal connections
https://en.wikipedia.org/wiki/Fan_Hui 60
206. AlphaGo versus Fan Hui
AlphaGo won 5 - 0 in a formal match on October 2015.
61
207. AlphaGo versus Fan Hui
AlphaGo won 5 - 0 in a formal match on October 2015.
[AlphaGo] is very strong and stable, it seems
like a wall. ... I know AlphaGo is a computer,
but if no one told me, maybe I would think
the player was a little strange, but a very
strong player, a real person.
Fan Hui 61
208. Lee Sedol “The Strong Stone”
https://en.wikipedia.org/wiki/Lee_Sedol 62
209. Lee Sedol “The Strong Stone”
professional 9 dan
https://en.wikipedia.org/wiki/Lee_Sedol 62
210. Lee Sedol “The Strong Stone”
professional 9 dan
the 2nd in international titles
https://en.wikipedia.org/wiki/Lee_Sedol 62
211. Lee Sedol “The Strong Stone”
professional 9 dan
the 2nd in international titles
the 5th youngest (12 years 4 months) to become
a professional Go player in South Korean history
https://en.wikipedia.org/wiki/Lee_Sedol 62
212. Lee Sedol “The Strong Stone”
professional 9 dan
the 2nd in international titles
the 5th youngest (12 years 4 months) to become
a professional Go player in South Korean history
Lee Sedol would win 97 out of 100 games against Fan Hui.
https://en.wikipedia.org/wiki/Lee_Sedol 62
213. Lee Sedol “The Strong Stone”
professional 9 dan
the 2nd in international titles
the 5th youngest (12 years 4 months) to become
a professional Go player in South Korean history
Lee Sedol would win 97 out of 100 games against Fan Hui.
biological neural network, comparable to Fan Hui’s (in number
of neurons and connections)
https://en.wikipedia.org/wiki/Lee_Sedol 62
214. I heard Google DeepMind’s AI is surprisingly
strong and getting stronger, but I am
confident that I can win, at least this time.
Lee Sedol
62
215. I heard Google DeepMind’s AI is surprisingly
strong and getting stronger, but I am
confident that I can win, at least this time.
Lee Sedol
...even beating AlphaGo by 4-1 may allow
the Google DeepMind team to claim its de
facto victory and the defeat of him
[Lee Sedol], or even humankind.
interview in JTBC
Newsroom
62
216. I heard Google DeepMind’s AI is surprisingly
strong and getting stronger, but I am
confident that I can win, at least this time.
Lee Sedol
...even beating AlphaGo by 4-1 may allow
the Google DeepMind team to claim its de
facto victory and the defeat of him
[Lee Sedol], or even humankind.
interview in JTBC
Newsroom
62
217. AlphaGo versus Lee Sedol
https://en.wikipedia.org/wiki/AlphaGo_versus_Lee_Sedol 63
218. AlphaGo versus Lee Sedol
In March 2016 AlphaGo won 4-1 against the legendary Lee Sedol.
https://en.wikipedia.org/wiki/AlphaGo_versus_Lee_Sedol 63
219. AlphaGo versus Lee Sedol
In March 2016 AlphaGo won 4-1 against the legendary Lee Sedol.
AlphaGo won all but the 4th game; all games were won
by resignation.
https://en.wikipedia.org/wiki/AlphaGo_versus_Lee_Sedol 63
220. AlphaGo versus Lee Sedol
In March 2016 AlphaGo won 4-1 against the legendary Lee Sedol.
AlphaGo won all but the 4th game; all games were won
by resignation.
The winner of the match was slated to win $1 million.
https://en.wikipedia.org/wiki/AlphaGo_versus_Lee_Sedol 63
221. AlphaGo versus Lee Sedol
In March 2016 AlphaGo won 4-1 against the legendary Lee Sedol.
AlphaGo won all but the 4th game; all games were won
by resignation.
The winner of the match was slated to win $1 million.
Since AlphaGo won, Google DeepMind stated that the prize will be
donated to charities, including UNICEF, and Go organisations.
https://en.wikipedia.org/wiki/AlphaGo_versus_Lee_Sedol 63
222. AlphaGo versus Lee Sedol
In March 2016 AlphaGo won 4-1 against the legendary Lee Sedol.
AlphaGo won all but the 4th game; all games were won
by resignation.
The winner of the match was slated to win $1 million.
Since AlphaGo won, Google DeepMind stated that the prize will be
donated to charities, including UNICEF, and Go organisations.
Lee received $170,000 ($150,000 for participating in all the five
games, and an additional $20,000 for each game won).
https://en.wikipedia.org/wiki/AlphaGo_versus_Lee_Sedol 63
226. Difficulties of Go
challenging decision-making
intractable search space
complex optimal solution
It appears infeasible to directly approximate using a policy or value function!
Silver et al. 2016 64
229. AlphaGo: summary
Monte Carlo tree search
effective move selection and position evaluation
through deep convolutional neural networks
Silver et al. 2016 65
230. AlphaGo: summary
Monte Carlo tree search
effective move selection and position evaluation
through deep convolutional neural networks
trained by novel combination of supervised and reinforcement
learning
Silver et al. 2016 65
231. AlphaGo: summary
Monte Carlo tree search
effective move selection and position evaluation
through deep convolutional neural networks
trained by novel combination of supervised and reinforcement
learning
new search algorithm combining
Silver et al. 2016 65
232. AlphaGo: summary
Monte Carlo tree search
effective move selection and position evaluation
through deep convolutional neural networks
trained by novel combination of supervised and reinforcement
learning
new search algorithm combining
neural network evaluation
Silver et al. 2016 65
233. AlphaGo: summary
Monte Carlo tree search
effective move selection and position evaluation
through deep convolutional neural networks
trained by novel combination of supervised and reinforcement
learning
new search algorithm combining
neural network evaluation
Monte Carlo rollouts
Silver et al. 2016 65
234. AlphaGo: summary
Monte Carlo tree search
effective move selection and position evaluation
through deep convolutional neural networks
trained by novel combination of supervised and reinforcement
learning
new search algorithm combining
neural network evaluation
Monte Carlo rollouts
scalable implementation
Silver et al. 2016 65
235. AlphaGo: summary
Monte Carlo tree search
effective move selection and position evaluation
through deep convolutional neural networks
trained by novel combination of supervised and reinforcement
learning
new search algorithm combining
neural network evaluation
Monte Carlo rollouts
scalable implementation
multi-threaded simulations on CPUs
Silver et al. 2016 65
236. AlphaGo: summary
Monte Carlo tree search
effective move selection and position evaluation
through deep convolutional neural networks
trained by novel combination of supervised and reinforcement
learning
new search algorithm combining
neural network evaluation
Monte Carlo rollouts
scalable implementation
multi-threaded simulations on CPUs
parallel GPU computations
Silver et al. 2016 65
237. AlphaGo: summary
Monte Carlo tree search
effective move selection and position evaluation
through deep convolutional neural networks
trained by novel combination of supervised and reinforcement
learning
new search algorithm combining
neural network evaluation
Monte Carlo rollouts
scalable implementation
multi-threaded simulations on CPUs
parallel GPU computations
distributed version over multiple machines
Silver et al. 2016 65
239. Novel approach
During the match against Fan Hui, AlphaGo evaluated thousands
of times fewer positions than DeepBlue against Kasparov.
Silver et al. 2016 66
240. Novel approach
During the match against Fan Hui, AlphaGo evaluated thousands
of times fewer positions than DeepBlue against Kasparov.
It compensated this by:
selecting those positions more intelligently (policy network)
Silver et al. 2016 66
241. Novel approach
During the match against Fan Hui, AlphaGo evaluated thousands
of times fewer positions than DeepBlue against Kasparov.
It compensated this by:
selecting those positions more intelligently (policy network)
evaluating them more precisely (value network)
Silver et al. 2016 66
242. Novel approach
During the match against Fan Hui, AlphaGo evaluated thousands
of times fewer positions than DeepBlue against Kasparov.
It compensated this by:
selecting those positions more intelligently (policy network)
evaluating them more precisely (value network)
Silver et al. 2016 66
243. Novel approach
During the match against Fan Hui, AlphaGo evaluated thousands
of times fewer positions than DeepBlue against Kasparov.
It compensated this by:
selecting those positions more intelligently (policy network)
evaluating them more precisely (value network)
Deep Blue relied on a handcrafted evaluation function.
Silver et al. 2016 66
244. Novel approach
During the match against Fan Hui, AlphaGo evaluated thousands
of times fewer positions than DeepBlue against Kasparov.
It compensated this by:
selecting those positions more intelligently (policy network)
evaluating them more precisely (value network)
Deep Blue relied on a handcrafted evaluation function.
AlphaGo was trained directly and automatically from gameplay.
It used general-purpose learning.
Silver et al. 2016 66
245. Novel approach
During the match against Fan Hui, AlphaGo evaluated thousands
of times fewer positions than DeepBlue against Kasparov.
It compensated this by:
selecting those positions more intelligently (policy network)
evaluating them more precisely (value network)
Deep Blue relied on a handcrafted evaluation function.
AlphaGo was trained directly and automatically from gameplay.
It used general-purpose learning.
This approach is not specific to the game of Go. The algorithm
can be used for much wider class of (so far seemingly)
intractable problems in AI!
Silver et al. 2016 66
256. AlphaGo versus Lee Sedol: Game 1
https://en.wikipedia.org/wiki/AlphaGo_versus_Lee_Sedol
257. AlphaGo versus Lee Sedol: Game 2 (1/2)
https://en.wikipedia.org/wiki/AlphaGo_versus_Lee_Sedol
258. AlphaGo versus Lee Sedol: Game 2 (2/2)
https://en.wikipedia.org/wiki/AlphaGo_versus_Lee_Sedol
259. AlphaGo versus Lee Sedol: Game 3
https://en.wikipedia.org/wiki/AlphaGo_versus_Lee_Sedol
260. AlphaGo versus Lee Sedol: Game 4
https://en.wikipedia.org/wiki/AlphaGo_versus_Lee_Sedol
261. AlphaGo versus Lee Sedol: Game 5 (1/2)
https://en.wikipedia.org/wiki/AlphaGo_versus_Lee_Sedol
262. AlphaGo versus Lee Sedol: Game 5 (2/2)
https://en.wikipedia.org/wiki/AlphaGo_versus_Lee_Sedol
263. Further Reading I
AlphaGo:
Google Research Blog
http://googleresearch.blogspot.cz/2016/01/alphago-mastering-ancient-game-of-go.html
an article in Nature
http://www.nature.com/news/google-ai-algorithm-masters-ancient-game-of-go-1.19234
a reddit article claiming that AlphaGo is even stronger than it appears to be:
“AlphaGo would rather win by less points, but with higher probability.”
https://www.reddit.com/r/baduk/comments/49y17z/the_true_strength_of_alphago/
Articles by Google DeepMind:
Atari player: a DeepRL system which combines Deep Neural Networks with Reinforcement Learning (Mnih
et al. 2015)
Neural Turing Machines (Graves, Wayne, and Danihelka 2014)
Artificial Intelligence:
Artificial Intelligence course at MIT
http://ocw.mit.edu/courses/electrical-engineering-and-computer-science/
6-034-artificial-intelligence-fall-2010/index.htm
Introduction to Artificial Intelligence at Udacity
https://www.udacity.com/course/intro-to-artificial-intelligence--cs271
264. Further Reading II
General Game Playing course https://www.coursera.org/course/ggp
Singularity http://waitbutwhy.com/2015/01/artificial-intelligence-revolution-1.html + Part 2
The Singularity Is Near (Kurzweil 2005)
Combinatorial Game Theory (founded by John H. Conway to study endgames in Go):
Combinatorial Game Theory course https://www.coursera.org/learn/combinatorial-game-theory
On Numbers and Games (Conway 1976)
Machine Learning:
Machine Learning course
https://youtu.be/hPKJBXkyTK://www.coursera.org/learn/machine-learning/
Reinforcement Learning http://reinforcementlearning.ai-depot.com/
Deep Learning (LeCun, Bengio, and Hinton 2015)
Deep Learning course https://www.udacity.com/course/deep-learning--ud730
Two Minute Papers https://www.youtube.com/user/keeroyz
Applications of Deep Learning https://youtu.be/hPKJBXkyTKM
Neuroscience:
http://www.brainfacts.org/
265. References I
Allis, Louis Victor et al. (1994). Searching for solutions in games and artificial intelligence. Ponsen & Looijen.
Baudiˇs, Petr and Jean-loup Gailly (2011). “Pachi: State of the art open source Go program”. In: Advances in
Computer Games. Springer, pp. 24–38.
Bowling, Michael et al. (2015). “Heads-up limit holdem poker is solved”. In: Science 347.6218, pp. 145–149. url:
http://poker.cs.ualberta.ca/15science.html.
Conway, John Horton (1976). “On Numbers and Games”. In: London Mathematical Society Monographs 6.
Corrado, Greg (2015). Computer, respond to this email. url:
http://googleresearch.blogspot.cz/2015/11/computer-respond-to-this-email.html#1 (visited on
03/31/2016).
Dieterle, Frank Jochen (2003). “Multianalyte quantifications by means of integration of artificial neural networks,
genetic algorithms and chemometrics for time-resolved analytical data”. PhD thesis. Universit¨at T¨ubingen.
Gatys, Leon A., Alexander S. Ecker, and Matthias Bethge (2015). “A Neural Algorithm of Artistic Style”. In:
CoRR abs/1508.06576. url: http://arxiv.org/abs/1508.06576.
Graves, Alex, Greg Wayne, and Ivo Danihelka (2014). “Neural turing machines”. In: arXiv preprint
arXiv:1410.5401.
Hayes, Bradley (2016). url: https://twitter.com/deepdrumpf.
266. References II
Karpathy, Andrej (2015). The Unreasonable Effectiveness of Recurrent Neural Networks. url:
http://karpathy.github.io/2015/05/21/rnn-effectiveness/ (visited on 04/01/2016).
Kurzweil, Ray (2005). The singularity is near: When humans transcend biology. Penguin.
LeCun, Yann, Yoshua Bengio, and Geoffrey Hinton (2015). “Deep learning”. In: Nature 521.7553, pp. 436–444.
Li, Chuan and Michael Wand (2016). “Combining Markov Random Fields and Convolutional Neural Networks for
Image Synthesis”. In: CoRR abs/1601.04589. url: http://arxiv.org/abs/1601.04589.
Mnih, Volodymyr et al. (2015). “Human-level control through deep reinforcement learning”. In: Nature 518.7540,
pp. 529–533. url:
https://storage.googleapis.com/deepmind-data/assets/papers/DeepMindNature14236Paper.pdf.
Munroe, Randall. Game AIs. url: https://xkcd.com/1002/ (visited on 04/02/2016).
Silver, David et al. (2016). “Mastering the game of Go with deep neural networks and tree search”. In: Nature
529.7587, pp. 484–489.
Sun, Felix. DeepHear - Composing and harmonizing music with neural networks. url:
http://web.mit.edu/felixsun/www/neural-music.html (visited on 04/02/2016).