SlideShare a Scribd company logo
IaGo: an Othello AI
inspired by AlphaGo
Shion HONDA
@DSP
Overview
2
• I implemented an Othello AI (IaGo) inspired
by AlphaGo algorithm
• AlphaGo is composed of 3 parts:
• SL policy network: predict next action
• Value network: evaluate board state
• MCTS: choose action using 2 networks
Background
Game Search space AI Year
Othello 10^60 NEC Logistello 1997
Go 10^360 DeepMind AlphaGo 2016
3
• Go has extremely huge search space: 10360
• c.f. Estimated number of all atoms existing in the
universe: 1080
• Before AlphaGo, it had been thought to take
10 more years for Go AIs to beat human
professional due to its huge search space
• Since I don’t have enough machine resources
for replicating AlphaGo, I made Othello
version
Dataset
4
Board state Place of next stone
6 million -> 48 million
• Data were from online Othello game records
• 6 million sets of board state & the place of
next stone
• Augmented them by 8 times using rotation &
transposition symmetry
SL policy network (classification)
• Input: 2-ch matrices of board state
• Output: Probability distribution of next choice
• Network: 9 layers of convolution with
softmax output layer
• 57% accuracy of prediction
5
RL policy network
• Polished SL policy with policy gradients
-> Reinforcement Learning policy network
• After training, generated teacher data for
value network
• Played games between RL policy networks
-> 1.25 million sets of board state and result
• Augmented by 8 times -> 10 million
6
SL policy network
SL policy network
(opponent)
VS
WIN -> encourage its plays
LOSE -> discourage its plays
(32*400=12,800 times)
Value network (regression)
• Input: 2-ch matrices of board state
• Output: Value of the board state
(Win: +1, Lose: -1, Draw: 0)
• Network: 9 layers of convolution (similar to
the SL policy network)
7
Prediction examples
Monte Carlo tree search
• Rollout policy: simplified SL policy network
that works faster
• MCTS: search deeper for a good path
1. Make child node by
SL policy network
2. Evaluate current node
by value network and
the result of rollout policy
self-play
3. Update ancestor nodes’ value
4. Choose most visited node
8
Results
• IaGo (complete) beat simple SL policy in
approx. 90% of games!
• Still, there is room for improvement…
• It takes too long time for calculation
• IaGo seems to have a weak point
• Teacher data were from games
between amateurs
• Objective/quantitative evaluation is
needed
• Graphical User Interface
-> Upload to web!
9
Summary
• IaGo is composed of 3 parts:
• SL policy network: predict next action
• Value network: evaluate board state
• MCTS: choose action using 2 networks
• IaGo became a good player through training
10

More Related Content

What's hot

強化学習入門
強化学習入門強化学習入門
強化学習入門Shunta Saito
 
Tech-Circle #18 Pythonではじめる強化学習 OpenAI Gym 体験ハンズオン
Tech-Circle #18 Pythonではじめる強化学習 OpenAI Gym 体験ハンズオンTech-Circle #18 Pythonではじめる強化学習 OpenAI Gym 体験ハンズオン
Tech-Circle #18 Pythonではじめる強化学習 OpenAI Gym 体験ハンズオンTakahiro Kubo
 
CMA-ESサンプラーによるハイパーパラメータ最適化 at Optuna Meetup #1
CMA-ESサンプラーによるハイパーパラメータ最適化 at Optuna Meetup #1CMA-ESサンプラーによるハイパーパラメータ最適化 at Optuna Meetup #1
CMA-ESサンプラーによるハイパーパラメータ最適化 at Optuna Meetup #1Masashi Shibata
 
InfoGAN: Interpretable Representation Learning by Information Maximizing Gene...
InfoGAN: Interpretable Representation Learning by Information Maximizing Gene...InfoGAN: Interpretable Representation Learning by Information Maximizing Gene...
InfoGAN: Interpretable Representation Learning by Information Maximizing Gene...홍배 김
 
時系列データ分析とPython
時系列データ分析とPython時系列データ分析とPython
時系列データ分析とPythonHirofumi Tsuruta
 
最近のDQN
最近のDQN最近のDQN
最近のDQNmooopan
 
[DL Hacks]Semi-Supervised Classification with Graph Convolutional Networks
[DL Hacks]Semi-Supervised Classification with Graph Convolutional Networks[DL Hacks]Semi-Supervised Classification with Graph Convolutional Networks
[DL Hacks]Semi-Supervised Classification with Graph Convolutional NetworksDeep Learning JP
 
[DL輪読会]Deep Learning 第7章 深層学習のための正則化
[DL輪読会]Deep Learning 第7章 深層学習のための正則化[DL輪読会]Deep Learning 第7章 深層学習のための正則化
[DL輪読会]Deep Learning 第7章 深層学習のための正則化Deep Learning JP
 
全脳アーキテクチャ若手の会 強化学習
全脳アーキテクチャ若手の会 強化学習全脳アーキテクチャ若手の会 強化学習
全脳アーキテクチャ若手の会 強化学習kwp_george
 
Convolutional neural neworks
Convolutional neural neworksConvolutional neural neworks
Convolutional neural neworksLuis Serrano
 
Spatial Temporal Graph Convolutional Networks for Skeleton-Based Action Recog...
Spatial Temporal Graph Convolutional Networks for Skeleton-Based Action Recog...Spatial Temporal Graph Convolutional Networks for Skeleton-Based Action Recog...
Spatial Temporal Graph Convolutional Networks for Skeleton-Based Action Recog...yukihiro domae
 
(第3版)「知能の構成的解明の研究動向と今後の展望」についての個人的見解:Chain of thought promptingやpostdictionを中...
(第3版)「知能の構成的解明の研究動向と今後の展望」についての個人的見解:Chain of thought promptingやpostdictionを中...(第3版)「知能の構成的解明の研究動向と今後の展望」についての個人的見解:Chain of thought promptingやpostdictionを中...
(第3版)「知能の構成的解明の研究動向と今後の展望」についての個人的見解:Chain of thought promptingやpostdictionを中...KIT Cognitive Interaction Design
 
機械学習向けCGデータの量産手法の検討
機械学習向けCGデータの量産手法の検討機械学習向けCGデータの量産手法の検討
機械学習向けCGデータの量産手法の検討Silicon Studio Corporation
 
強化学習その2
強化学習その2強化学習その2
強化学習その2nishio
 
色々なダイクストラ高速化
色々なダイクストラ高速化色々なダイクストラ高速化
色々なダイクストラ高速化yosupo
 

What's hot (20)

強化学習入門
強化学習入門強化学習入門
強化学習入門
 
AlphaGo and AlphaGo Zero
AlphaGo and AlphaGo ZeroAlphaGo and AlphaGo Zero
AlphaGo and AlphaGo Zero
 
Tech-Circle #18 Pythonではじめる強化学習 OpenAI Gym 体験ハンズオン
Tech-Circle #18 Pythonではじめる強化学習 OpenAI Gym 体験ハンズオンTech-Circle #18 Pythonではじめる強化学習 OpenAI Gym 体験ハンズオン
Tech-Circle #18 Pythonではじめる強化学習 OpenAI Gym 体験ハンズオン
 
CMA-ESサンプラーによるハイパーパラメータ最適化 at Optuna Meetup #1
CMA-ESサンプラーによるハイパーパラメータ最適化 at Optuna Meetup #1CMA-ESサンプラーによるハイパーパラメータ最適化 at Optuna Meetup #1
CMA-ESサンプラーによるハイパーパラメータ最適化 at Optuna Meetup #1
 
InfoGAN: Interpretable Representation Learning by Information Maximizing Gene...
InfoGAN: Interpretable Representation Learning by Information Maximizing Gene...InfoGAN: Interpretable Representation Learning by Information Maximizing Gene...
InfoGAN: Interpretable Representation Learning by Information Maximizing Gene...
 
時系列データ分析とPython
時系列データ分析とPython時系列データ分析とPython
時系列データ分析とPython
 
最近のDQN
最近のDQN最近のDQN
最近のDQN
 
Graph LSTM解説
Graph LSTM解説Graph LSTM解説
Graph LSTM解説
 
動的計画法
動的計画法動的計画法
動的計画法
 
[DL Hacks]Semi-Supervised Classification with Graph Convolutional Networks
[DL Hacks]Semi-Supervised Classification with Graph Convolutional Networks[DL Hacks]Semi-Supervised Classification with Graph Convolutional Networks
[DL Hacks]Semi-Supervised Classification with Graph Convolutional Networks
 
[DL輪読会]Deep Learning 第7章 深層学習のための正則化
[DL輪読会]Deep Learning 第7章 深層学習のための正則化[DL輪読会]Deep Learning 第7章 深層学習のための正則化
[DL輪読会]Deep Learning 第7章 深層学習のための正則化
 
全脳アーキテクチャ若手の会 強化学習
全脳アーキテクチャ若手の会 強化学習全脳アーキテクチャ若手の会 強化学習
全脳アーキテクチャ若手の会 強化学習
 
Convolutional neural neworks
Convolutional neural neworksConvolutional neural neworks
Convolutional neural neworks
 
Spatial Temporal Graph Convolutional Networks for Skeleton-Based Action Recog...
Spatial Temporal Graph Convolutional Networks for Skeleton-Based Action Recog...Spatial Temporal Graph Convolutional Networks for Skeleton-Based Action Recog...
Spatial Temporal Graph Convolutional Networks for Skeleton-Based Action Recog...
 
(第3版)「知能の構成的解明の研究動向と今後の展望」についての個人的見解:Chain of thought promptingやpostdictionを中...
(第3版)「知能の構成的解明の研究動向と今後の展望」についての個人的見解:Chain of thought promptingやpostdictionを中...(第3版)「知能の構成的解明の研究動向と今後の展望」についての個人的見解:Chain of thought promptingやpostdictionを中...
(第3版)「知能の構成的解明の研究動向と今後の展望」についての個人的見解:Chain of thought promptingやpostdictionを中...
 
機械学習向けCGデータの量産手法の検討
機械学習向けCGデータの量産手法の検討機械学習向けCGデータの量産手法の検討
機械学習向けCGデータの量産手法の検討
 
強化学習その2
強化学習その2強化学習その2
強化学習その2
 
How AlphaGo Works
How AlphaGo WorksHow AlphaGo Works
How AlphaGo Works
 
強化学習2章
強化学習2章強化学習2章
強化学習2章
 
色々なダイクストラ高速化
色々なダイクストラ高速化色々なダイクストラ高速化
色々なダイクストラ高速化
 

Similar to IaGo: an Othello AI inspired by AlphaGo

Alpha go 16110226_김영우
Alpha go 16110226_김영우Alpha go 16110226_김영우
Alpha go 16110226_김영우영우 김
 
Introduction to Alphago Zero
Introduction to Alphago ZeroIntroduction to Alphago Zero
Introduction to Alphago ZeroChia-Ching Lin
 
J-Fall 2017 - AI Self-learning Game Playing
J-Fall 2017 - AI Self-learning Game PlayingJ-Fall 2017 - AI Self-learning Game Playing
J-Fall 2017 - AI Self-learning Game PlayingRichard Abbuhl
 
Study on Evaluation Function Design of Mahjong using Supervised Learning
Study on Evaluation Function Design of Mahjong using Supervised LearningStudy on Evaluation Function Design of Mahjong using Supervised Learning
Study on Evaluation Function Design of Mahjong using Supervised Learningharmonylab
 
A Presentation on the Paper: Mastering the game of Go with deep neural networ...
A Presentation on the Paper: Mastering the game of Go with deep neural networ...A Presentation on the Paper: Mastering the game of Go with deep neural networ...
A Presentation on the Paper: Mastering the game of Go with deep neural networ...AdityaSuryavamshi
 
G-Store: High-Performance Graph Store for Trillion-Edge Processing
G-Store: High-Performance Graph Store for Trillion-Edge ProcessingG-Store: High-Performance Graph Store for Trillion-Edge Processing
G-Store: High-Performance Graph Store for Trillion-Edge ProcessingPradeep Kumar
 
FACE: Fast and Customizable Sorting Accelerator for Heterogeneous Many-core S...
FACE: Fast and Customizable Sorting Accelerator for Heterogeneous Many-core S...FACE: Fast and Customizable Sorting Accelerator for Heterogeneous Many-core S...
FACE: Fast and Customizable Sorting Accelerator for Heterogeneous Many-core S...Ryohei Kobayashi
 
How DeepMind Mastered The Game Of Go
How DeepMind Mastered The Game Of GoHow DeepMind Mastered The Game Of Go
How DeepMind Mastered The Game Of GoTim Riser
 
Final Presentation - Edan&Itzik
Final Presentation - Edan&ItzikFinal Presentation - Edan&Itzik
Final Presentation - Edan&Itzikitzik cohen
 
Artificial neural networks introduction
Artificial neural networks introductionArtificial neural networks introduction
Artificial neural networks introductionSungminYou
 
SF Big Analytics & SF Machine Learning Meetup: Machine Learning at the Limit ...
SF Big Analytics & SF Machine Learning Meetup: Machine Learning at the Limit ...SF Big Analytics & SF Machine Learning Meetup: Machine Learning at the Limit ...
SF Big Analytics & SF Machine Learning Meetup: Machine Learning at the Limit ...Chester Chen
 
04 accelerating dl inference with (open)capi and posit numbers
04 accelerating dl inference with (open)capi and posit numbers04 accelerating dl inference with (open)capi and posit numbers
04 accelerating dl inference with (open)capi and posit numbersYutaka Kawai
 
Harlan Beverly Lag The Barrier to innovation gdc austin 2009
Harlan Beverly Lag The Barrier to innovation gdc austin 2009Harlan Beverly Lag The Barrier to innovation gdc austin 2009
Harlan Beverly Lag The Barrier to innovation gdc austin 2009Harlan Beverly
 
Gdc gameplay replication in acu with videos
Gdc   gameplay replication in acu with videosGdc   gameplay replication in acu with videos
Gdc gameplay replication in acu with videosCharles Lefebvre
 
.NET Memory Primer (Martin Kulov)
.NET Memory Primer (Martin Kulov).NET Memory Primer (Martin Kulov)
.NET Memory Primer (Martin Kulov)ITCamp
 
osdi23_slides_lo_v2.pdf
osdi23_slides_lo_v2.pdfosdi23_slides_lo_v2.pdf
osdi23_slides_lo_v2.pdfgmdvmk
 

Similar to IaGo: an Othello AI inspired by AlphaGo (20)

Alpha go 16110226_김영우
Alpha go 16110226_김영우Alpha go 16110226_김영우
Alpha go 16110226_김영우
 
Introduction to Alphago Zero
Introduction to Alphago ZeroIntroduction to Alphago Zero
Introduction to Alphago Zero
 
J-Fall 2017 - AI Self-learning Game Playing
J-Fall 2017 - AI Self-learning Game PlayingJ-Fall 2017 - AI Self-learning Game Playing
J-Fall 2017 - AI Self-learning Game Playing
 
Study on Evaluation Function Design of Mahjong using Supervised Learning
Study on Evaluation Function Design of Mahjong using Supervised LearningStudy on Evaluation Function Design of Mahjong using Supervised Learning
Study on Evaluation Function Design of Mahjong using Supervised Learning
 
A Presentation on the Paper: Mastering the game of Go with deep neural networ...
A Presentation on the Paper: Mastering the game of Go with deep neural networ...A Presentation on the Paper: Mastering the game of Go with deep neural networ...
A Presentation on the Paper: Mastering the game of Go with deep neural networ...
 
G-Store: High-Performance Graph Store for Trillion-Edge Processing
G-Store: High-Performance Graph Store for Trillion-Edge ProcessingG-Store: High-Performance Graph Store for Trillion-Edge Processing
G-Store: High-Performance Graph Store for Trillion-Edge Processing
 
ConvNets_C_Focke2
ConvNets_C_Focke2ConvNets_C_Focke2
ConvNets_C_Focke2
 
FACE: Fast and Customizable Sorting Accelerator for Heterogeneous Many-core S...
FACE: Fast and Customizable Sorting Accelerator for Heterogeneous Many-core S...FACE: Fast and Customizable Sorting Accelerator for Heterogeneous Many-core S...
FACE: Fast and Customizable Sorting Accelerator for Heterogeneous Many-core S...
 
How DeepMind Mastered The Game Of Go
How DeepMind Mastered The Game Of GoHow DeepMind Mastered The Game Of Go
How DeepMind Mastered The Game Of Go
 
Final Presentation - Edan&Itzik
Final Presentation - Edan&ItzikFinal Presentation - Edan&Itzik
Final Presentation - Edan&Itzik
 
Games.4
Games.4Games.4
Games.4
 
Starcraft 2016
Starcraft 2016Starcraft 2016
Starcraft 2016
 
Artificial neural networks introduction
Artificial neural networks introductionArtificial neural networks introduction
Artificial neural networks introduction
 
SF Big Analytics & SF Machine Learning Meetup: Machine Learning at the Limit ...
SF Big Analytics & SF Machine Learning Meetup: Machine Learning at the Limit ...SF Big Analytics & SF Machine Learning Meetup: Machine Learning at the Limit ...
SF Big Analytics & SF Machine Learning Meetup: Machine Learning at the Limit ...
 
04 accelerating dl inference with (open)capi and posit numbers
04 accelerating dl inference with (open)capi and posit numbers04 accelerating dl inference with (open)capi and posit numbers
04 accelerating dl inference with (open)capi and posit numbers
 
Harlan Beverly Lag The Barrier to innovation gdc austin 2009
Harlan Beverly Lag The Barrier to innovation gdc austin 2009Harlan Beverly Lag The Barrier to innovation gdc austin 2009
Harlan Beverly Lag The Barrier to innovation gdc austin 2009
 
Monte Carlo C++
Monte Carlo C++Monte Carlo C++
Monte Carlo C++
 
Gdc gameplay replication in acu with videos
Gdc   gameplay replication in acu with videosGdc   gameplay replication in acu with videos
Gdc gameplay replication in acu with videos
 
.NET Memory Primer (Martin Kulov)
.NET Memory Primer (Martin Kulov).NET Memory Primer (Martin Kulov)
.NET Memory Primer (Martin Kulov)
 
osdi23_slides_lo_v2.pdf
osdi23_slides_lo_v2.pdfosdi23_slides_lo_v2.pdf
osdi23_slides_lo_v2.pdf
 

More from Shion Honda

BERTをブラウザで動かしたい! ―MobileBERTとTensorFlow.js―
BERTをブラウザで動かしたい!―MobileBERTとTensorFlow.js―BERTをブラウザで動かしたい!―MobileBERTとTensorFlow.js―
BERTをブラウザで動かしたい! ―MobileBERTとTensorFlow.js―Shion Honda
 
Bridging between Vision and Language
Bridging between Vision and LanguageBridging between Vision and Language
Bridging between Vision and LanguageShion Honda
 
Deep Learning Chap. 12: Applications
Deep Learning Chap. 12: ApplicationsDeep Learning Chap. 12: Applications
Deep Learning Chap. 12: ApplicationsShion Honda
 
Deep Learning Chap. 6: Deep Feedforward Networks
Deep Learning Chap. 6: Deep Feedforward NetworksDeep Learning Chap. 6: Deep Feedforward Networks
Deep Learning Chap. 6: Deep Feedforward NetworksShion Honda
 
画像認識 第9章 さらなる話題
画像認識 第9章 さらなる話題画像認識 第9章 さらなる話題
画像認識 第9章 さらなる話題Shion Honda
 
Towards Predicting Molecular Property by Graph Neural Networks
Towards Predicting Molecular Property by Graph Neural NetworksTowards Predicting Molecular Property by Graph Neural Networks
Towards Predicting Molecular Property by Graph Neural NetworksShion Honda
 
画像認識 6.3-6.6 畳込みニューラル ネットワーク
画像認識 6.3-6.6 畳込みニューラルネットワーク画像認識 6.3-6.6 畳込みニューラルネットワーク
画像認識 6.3-6.6 畳込みニューラル ネットワークShion Honda
 
深層学習による自然言語処理 第2章 ニューラルネットの基礎
深層学習による自然言語処理 第2章 ニューラルネットの基礎深層学習による自然言語処理 第2章 ニューラルネットの基礎
深層学習による自然言語処理 第2章 ニューラルネットの基礎Shion Honda
 
BERT: Pre-training of Deep Bidirectional Transformers for Language Understand...
BERT: Pre-training of Deep Bidirectional Transformers for Language Understand...BERT: Pre-training of Deep Bidirectional Transformers for Language Understand...
BERT: Pre-training of Deep Bidirectional Transformers for Language Understand...Shion Honda
 
Planning chemical syntheses with deep neural networks and symbolic AI
Planning chemical syntheses with deep neural networks and symbolic AIPlanning chemical syntheses with deep neural networks and symbolic AI
Planning chemical syntheses with deep neural networks and symbolic AIShion Honda
 

More from Shion Honda (11)

BERTをブラウザで動かしたい! ―MobileBERTとTensorFlow.js―
BERTをブラウザで動かしたい!―MobileBERTとTensorFlow.js―BERTをブラウザで動かしたい!―MobileBERTとTensorFlow.js―
BERTをブラウザで動かしたい! ―MobileBERTとTensorFlow.js―
 
Bridging between Vision and Language
Bridging between Vision and LanguageBridging between Vision and Language
Bridging between Vision and Language
 
Graph U-Nets
Graph U-NetsGraph U-Nets
Graph U-Nets
 
Deep Learning Chap. 12: Applications
Deep Learning Chap. 12: ApplicationsDeep Learning Chap. 12: Applications
Deep Learning Chap. 12: Applications
 
Deep Learning Chap. 6: Deep Feedforward Networks
Deep Learning Chap. 6: Deep Feedforward NetworksDeep Learning Chap. 6: Deep Feedforward Networks
Deep Learning Chap. 6: Deep Feedforward Networks
 
画像認識 第9章 さらなる話題
画像認識 第9章 さらなる話題画像認識 第9章 さらなる話題
画像認識 第9章 さらなる話題
 
Towards Predicting Molecular Property by Graph Neural Networks
Towards Predicting Molecular Property by Graph Neural NetworksTowards Predicting Molecular Property by Graph Neural Networks
Towards Predicting Molecular Property by Graph Neural Networks
 
画像認識 6.3-6.6 畳込みニューラル ネットワーク
画像認識 6.3-6.6 畳込みニューラルネットワーク画像認識 6.3-6.6 畳込みニューラルネットワーク
画像認識 6.3-6.6 畳込みニューラル ネットワーク
 
深層学習による自然言語処理 第2章 ニューラルネットの基礎
深層学習による自然言語処理 第2章 ニューラルネットの基礎深層学習による自然言語処理 第2章 ニューラルネットの基礎
深層学習による自然言語処理 第2章 ニューラルネットの基礎
 
BERT: Pre-training of Deep Bidirectional Transformers for Language Understand...
BERT: Pre-training of Deep Bidirectional Transformers for Language Understand...BERT: Pre-training of Deep Bidirectional Transformers for Language Understand...
BERT: Pre-training of Deep Bidirectional Transformers for Language Understand...
 
Planning chemical syntheses with deep neural networks and symbolic AI
Planning chemical syntheses with deep neural networks and symbolic AIPlanning chemical syntheses with deep neural networks and symbolic AI
Planning chemical syntheses with deep neural networks and symbolic AI
 

Recently uploaded

basics of data science with application areas.pdf
basics of data science with application areas.pdfbasics of data science with application areas.pdf
basics of data science with application areas.pdfvyankatesh1
 
How can I successfully sell my pi coins in Philippines?
How can I successfully sell my pi coins in Philippines?How can I successfully sell my pi coins in Philippines?
How can I successfully sell my pi coins in Philippines?DOT TECH
 
Innovative Methods in Media and Communication Research by Sebastian Kubitschk...
Innovative Methods in Media and Communication Research by Sebastian Kubitschk...Innovative Methods in Media and Communication Research by Sebastian Kubitschk...
Innovative Methods in Media and Communication Research by Sebastian Kubitschk...correoyaya
 
Using PDB Relocation to Move a Single PDB to Another Existing CDB
Using PDB Relocation to Move a Single PDB to Another Existing CDBUsing PDB Relocation to Move a Single PDB to Another Existing CDB
Using PDB Relocation to Move a Single PDB to Another Existing CDBAlireza Kamrani
 
Investigate & Recover / StarCompliance.io / Crypto_Crimes
Investigate & Recover / StarCompliance.io / Crypto_CrimesInvestigate & Recover / StarCompliance.io / Crypto_Crimes
Investigate & Recover / StarCompliance.io / Crypto_CrimesStarCompliance.io
 
Supply chain analytics to combat the effects of Ukraine-Russia-conflict
Supply chain analytics to combat the effects of Ukraine-Russia-conflictSupply chain analytics to combat the effects of Ukraine-Russia-conflict
Supply chain analytics to combat the effects of Ukraine-Russia-conflictJack Cole
 
社内勉強会資料_LLM Agents                              .
社内勉強会資料_LLM Agents                              .社内勉強会資料_LLM Agents                              .
社内勉強会資料_LLM Agents                              .NABLAS株式会社
 
Jpolillo Amazon PPC - Bid Optimization Sample
Jpolillo Amazon PPC - Bid Optimization SampleJpolillo Amazon PPC - Bid Optimization Sample
Jpolillo Amazon PPC - Bid Optimization SampleJames Polillo
 
Business update Q1 2024 Lar España Real Estate SOCIMI
Business update Q1 2024 Lar España Real Estate SOCIMIBusiness update Q1 2024 Lar España Real Estate SOCIMI
Business update Q1 2024 Lar España Real Estate SOCIMIAlejandraGmez176757
 
Exploratory Data Analysis - Dilip S.pptx
Exploratory Data Analysis - Dilip S.pptxExploratory Data Analysis - Dilip S.pptx
Exploratory Data Analysis - Dilip S.pptxDilipVasan
 
Pre-ProductionImproveddsfjgndflghtgg.pptx
Pre-ProductionImproveddsfjgndflghtgg.pptxPre-ProductionImproveddsfjgndflghtgg.pptx
Pre-ProductionImproveddsfjgndflghtgg.pptxStephen266013
 
2024 Q1 Tableau User Group Leader Quarterly Call
2024 Q1 Tableau User Group Leader Quarterly Call2024 Q1 Tableau User Group Leader Quarterly Call
2024 Q1 Tableau User Group Leader Quarterly Calllward7
 
2024 Q2 Orange County (CA) Tableau User Group Meeting
2024 Q2 Orange County (CA) Tableau User Group Meeting2024 Q2 Orange County (CA) Tableau User Group Meeting
2024 Q2 Orange County (CA) Tableau User Group MeetingAlison Pitt
 
Tabula.io Cheatsheet: automate your data workflows
Tabula.io Cheatsheet: automate your data workflowsTabula.io Cheatsheet: automate your data workflows
Tabula.io Cheatsheet: automate your data workflowsalex933524
 
2024-05-14 - Tableau User Group - TC24 Hot Topics - Tableau Pulse and Einstei...
2024-05-14 - Tableau User Group - TC24 Hot Topics - Tableau Pulse and Einstei...2024-05-14 - Tableau User Group - TC24 Hot Topics - Tableau Pulse and Einstei...
2024-05-14 - Tableau User Group - TC24 Hot Topics - Tableau Pulse and Einstei...elinavihriala
 
Artificial_General_Intelligence__storm_gen_article.pdf
Artificial_General_Intelligence__storm_gen_article.pdfArtificial_General_Intelligence__storm_gen_article.pdf
Artificial_General_Intelligence__storm_gen_article.pdfscitechtalktv
 
AI Imagen for data-storytelling Infographics.pdf
AI Imagen for data-storytelling Infographics.pdfAI Imagen for data-storytelling Infographics.pdf
AI Imagen for data-storytelling Infographics.pdfMichaelSenkow
 
Webinar One View, Multiple Systems No-Code Integration of Salesforce and ERPs
Webinar One View, Multiple Systems No-Code Integration of Salesforce and ERPsWebinar One View, Multiple Systems No-Code Integration of Salesforce and ERPs
Webinar One View, Multiple Systems No-Code Integration of Salesforce and ERPsCEPTES Software Inc
 
how can i exchange pi coins for others currency like Bitcoin
how can i exchange pi coins for others currency like Bitcoinhow can i exchange pi coins for others currency like Bitcoin
how can i exchange pi coins for others currency like BitcoinDOT TECH
 

Recently uploaded (20)

basics of data science with application areas.pdf
basics of data science with application areas.pdfbasics of data science with application areas.pdf
basics of data science with application areas.pdf
 
How can I successfully sell my pi coins in Philippines?
How can I successfully sell my pi coins in Philippines?How can I successfully sell my pi coins in Philippines?
How can I successfully sell my pi coins in Philippines?
 
Innovative Methods in Media and Communication Research by Sebastian Kubitschk...
Innovative Methods in Media and Communication Research by Sebastian Kubitschk...Innovative Methods in Media and Communication Research by Sebastian Kubitschk...
Innovative Methods in Media and Communication Research by Sebastian Kubitschk...
 
Using PDB Relocation to Move a Single PDB to Another Existing CDB
Using PDB Relocation to Move a Single PDB to Another Existing CDBUsing PDB Relocation to Move a Single PDB to Another Existing CDB
Using PDB Relocation to Move a Single PDB to Another Existing CDB
 
Investigate & Recover / StarCompliance.io / Crypto_Crimes
Investigate & Recover / StarCompliance.io / Crypto_CrimesInvestigate & Recover / StarCompliance.io / Crypto_Crimes
Investigate & Recover / StarCompliance.io / Crypto_Crimes
 
Supply chain analytics to combat the effects of Ukraine-Russia-conflict
Supply chain analytics to combat the effects of Ukraine-Russia-conflictSupply chain analytics to combat the effects of Ukraine-Russia-conflict
Supply chain analytics to combat the effects of Ukraine-Russia-conflict
 
社内勉強会資料_LLM Agents                              .
社内勉強会資料_LLM Agents                              .社内勉強会資料_LLM Agents                              .
社内勉強会資料_LLM Agents                              .
 
Jpolillo Amazon PPC - Bid Optimization Sample
Jpolillo Amazon PPC - Bid Optimization SampleJpolillo Amazon PPC - Bid Optimization Sample
Jpolillo Amazon PPC - Bid Optimization Sample
 
Business update Q1 2024 Lar España Real Estate SOCIMI
Business update Q1 2024 Lar España Real Estate SOCIMIBusiness update Q1 2024 Lar España Real Estate SOCIMI
Business update Q1 2024 Lar España Real Estate SOCIMI
 
Slip-and-fall Injuries: Top Workers' Comp Claims
Slip-and-fall Injuries: Top Workers' Comp ClaimsSlip-and-fall Injuries: Top Workers' Comp Claims
Slip-and-fall Injuries: Top Workers' Comp Claims
 
Exploratory Data Analysis - Dilip S.pptx
Exploratory Data Analysis - Dilip S.pptxExploratory Data Analysis - Dilip S.pptx
Exploratory Data Analysis - Dilip S.pptx
 
Pre-ProductionImproveddsfjgndflghtgg.pptx
Pre-ProductionImproveddsfjgndflghtgg.pptxPre-ProductionImproveddsfjgndflghtgg.pptx
Pre-ProductionImproveddsfjgndflghtgg.pptx
 
2024 Q1 Tableau User Group Leader Quarterly Call
2024 Q1 Tableau User Group Leader Quarterly Call2024 Q1 Tableau User Group Leader Quarterly Call
2024 Q1 Tableau User Group Leader Quarterly Call
 
2024 Q2 Orange County (CA) Tableau User Group Meeting
2024 Q2 Orange County (CA) Tableau User Group Meeting2024 Q2 Orange County (CA) Tableau User Group Meeting
2024 Q2 Orange County (CA) Tableau User Group Meeting
 
Tabula.io Cheatsheet: automate your data workflows
Tabula.io Cheatsheet: automate your data workflowsTabula.io Cheatsheet: automate your data workflows
Tabula.io Cheatsheet: automate your data workflows
 
2024-05-14 - Tableau User Group - TC24 Hot Topics - Tableau Pulse and Einstei...
2024-05-14 - Tableau User Group - TC24 Hot Topics - Tableau Pulse and Einstei...2024-05-14 - Tableau User Group - TC24 Hot Topics - Tableau Pulse and Einstei...
2024-05-14 - Tableau User Group - TC24 Hot Topics - Tableau Pulse and Einstei...
 
Artificial_General_Intelligence__storm_gen_article.pdf
Artificial_General_Intelligence__storm_gen_article.pdfArtificial_General_Intelligence__storm_gen_article.pdf
Artificial_General_Intelligence__storm_gen_article.pdf
 
AI Imagen for data-storytelling Infographics.pdf
AI Imagen for data-storytelling Infographics.pdfAI Imagen for data-storytelling Infographics.pdf
AI Imagen for data-storytelling Infographics.pdf
 
Webinar One View, Multiple Systems No-Code Integration of Salesforce and ERPs
Webinar One View, Multiple Systems No-Code Integration of Salesforce and ERPsWebinar One View, Multiple Systems No-Code Integration of Salesforce and ERPs
Webinar One View, Multiple Systems No-Code Integration of Salesforce and ERPs
 
how can i exchange pi coins for others currency like Bitcoin
how can i exchange pi coins for others currency like Bitcoinhow can i exchange pi coins for others currency like Bitcoin
how can i exchange pi coins for others currency like Bitcoin
 

IaGo: an Othello AI inspired by AlphaGo

  • 1. IaGo: an Othello AI inspired by AlphaGo Shion HONDA @DSP
  • 2. Overview 2 • I implemented an Othello AI (IaGo) inspired by AlphaGo algorithm • AlphaGo is composed of 3 parts: • SL policy network: predict next action • Value network: evaluate board state • MCTS: choose action using 2 networks
  • 3. Background Game Search space AI Year Othello 10^60 NEC Logistello 1997 Go 10^360 DeepMind AlphaGo 2016 3 • Go has extremely huge search space: 10360 • c.f. Estimated number of all atoms existing in the universe: 1080 • Before AlphaGo, it had been thought to take 10 more years for Go AIs to beat human professional due to its huge search space • Since I don’t have enough machine resources for replicating AlphaGo, I made Othello version
  • 4. Dataset 4 Board state Place of next stone 6 million -> 48 million • Data were from online Othello game records • 6 million sets of board state & the place of next stone • Augmented them by 8 times using rotation & transposition symmetry
  • 5. SL policy network (classification) • Input: 2-ch matrices of board state • Output: Probability distribution of next choice • Network: 9 layers of convolution with softmax output layer • 57% accuracy of prediction 5
  • 6. RL policy network • Polished SL policy with policy gradients -> Reinforcement Learning policy network • After training, generated teacher data for value network • Played games between RL policy networks -> 1.25 million sets of board state and result • Augmented by 8 times -> 10 million 6 SL policy network SL policy network (opponent) VS WIN -> encourage its plays LOSE -> discourage its plays (32*400=12,800 times)
  • 7. Value network (regression) • Input: 2-ch matrices of board state • Output: Value of the board state (Win: +1, Lose: -1, Draw: 0) • Network: 9 layers of convolution (similar to the SL policy network) 7 Prediction examples
  • 8. Monte Carlo tree search • Rollout policy: simplified SL policy network that works faster • MCTS: search deeper for a good path 1. Make child node by SL policy network 2. Evaluate current node by value network and the result of rollout policy self-play 3. Update ancestor nodes’ value 4. Choose most visited node 8
  • 9. Results • IaGo (complete) beat simple SL policy in approx. 90% of games! • Still, there is room for improvement… • It takes too long time for calculation • IaGo seems to have a weak point • Teacher data were from games between amateurs • Objective/quantitative evaluation is needed • Graphical User Interface -> Upload to web! 9
  • 10. Summary • IaGo is composed of 3 parts: • SL policy network: predict next action • Value network: evaluate board state • MCTS: choose action using 2 networks • IaGo became a good player through training 10

Editor's Notes

  1. Thank you Mr. Bayne. Good afternoon! Recently I learned about AlphaGo, an AI for playing game of Go, and implemented its algorithm in an othello version. So, let me tell you how I made it and how it works.
  2. AlphaGo is composed of these 3 parts: First, policy network, that predicts next action. Second, value network. that evaluates board state. And third, Monte Carlo tree search, that chooses action using two networks. So, I’ll now explain them a little in detail.
  3. First of all, let me mention that go has extremely huge search space of 10 to the 360th power. I guess it's hard to imagine, So I'll give you one example. Estimated number of all atoms existing in the universe. It's 10 to the 80th power. Again, the search space of Go is 10 to the 360th power, so it's far far far bigger than the number of all atoms in the universe . Because of this huge search space, before AlphaGo, it had been thought to take 10 more years for Go AIs to beat human professionals. Imagine what a big achievement AlphaGo made! But since I don't have enough machine resources for replicating AlphaGo, I made an Othello version. The search space of Othello is just 10 to the 60 power.
  4. I’ve now told you about the background. I’ll move on to dataset I used for training IaGo. Data were from online Othello game records that you can get for free on the internet. It includes 6 million sets of Board state and the place of next stone. Then I augmented them by 8 times using rotation and transposition symmetry. So finally, I got 48 million sets of board state and the place of next stone.
  5. The first part of IaGo: Supervised Learning policy network. It got 2 channel matrix of board state as an input, and output probability distribution of next choice, next action. The network was 9 layers of convolution with softmax output layer. After training, it predicted human plays at the accuracy of 57%.
  6. Next, I polished SL policy network with policy gradients algorithm. The polished network is called reinforcement learning policy Network or RL policy network for short. In the process of reinforcement learning, 2 SL policy networks played games against each other. Parameters of network was updated so that good actions were encouraged and bad actions were discouraged, according to the result of the game. I repeated this for more than 12000 times. After training, RL policy network generated teacher data for value Network. 2 RL policy networks played games against each other. Then I got 1.25 Million sets of board state and result. Again I augmented them by 8 times so finally I got 10 million sets of  Board State and result.
  7. Next I'll talk about Value Network. This Network is very similar to the SL policy Network in terms of the structure. What’s the difference? While SL policy network is for classification of next action, value network is for regression of the game result. Value network gets 2 channel Matrix of board state and outputs the value of the board state. I defined the value of the Board State as +1 for win, - 1 for lose, and 0 for draw. So the value means the likelihood of winning of the white player. Look at the example pictures. For the left one, white player is almost winning, so the value is 0.67 roughly equal 1. For one on the center, the white player is almost losing so the value is nearly equal to -1. And for the right one, you'll never know the result so the value is around to 0.
  8. Let's move on to the final part of the algorithm, Monte Carlo tree search. First I made a rollout policy. This is a simplified SL policy Network. Its prediction accuracy was lower than SL policy network but worked much faster. In MCTS I have to run many many simulations so I need a predictor that works fast. MCTS, in short, is an algorithm that searches deeper for a good path in the game tree using self-play simulation. And it’s composed of four steps. Step 1, make a child node by SL policy Network. Step 2, evaluate current node by value Network and the result of rollout policy self play. Step 3, update ancestor nodes’ value according to the rollout policy self-play. Step 4, choose most visited node.
  9. I’ve told you about the algorithm of Iago, so I’ll now talk about its performance. Iago played some games against simple SL policy Network and won approximately 90% of games. Still, there is room for improvement. First, it takes too long time for calculation. If I can make it shorter, then IaGo can run more simulations and will become stronger. Second Iago seems to have a weak point. The picture on the right side was taken when I beat complete version of Iago. I took all of its stones, and the game was over in the course of it. I'm not sure about it's cause, but I guess one reason is that teacher data were from games between amateur players, not professionals. Fourth, I couldn't really evaluate IaGo’s performance in an objective or quantitative way, so a more appropriate evaluation is needed. And finally, I’d like to develop a sophisticated graphical user interface and uploaded it to the web so that everyone can play it easily just by clicking.
  10. Let me summarize my presentation. I’ve explained IaGo’s algorithm and its performance. IaGo is composed of three parts. SL policy Network that predicts next action Value network that evaluates board state. Monte Carlo tree search that uses action using these two Networks. And Iago became a good player through training using huge dataset. That's it for my presentation. Do you have any questions?