SlideShare a Scribd company logo
1 of 27
Study on Evaluation Function Design of Mahjong
using Supervised Learning
Hokaido University
Graduate School of Information Science and Technology
Harmonious Systems Engineering Laboratory
Yeqin Zheng
1
Background
• Perfect information games
– 1997 -- Deep blue vs. world champion on chess
– 2007 -- Quackle vs. world champion on scrabble
– 2016 -- AlphaGo vs. world champion on Go
• Monte Carlo tree search theory
• Deep learning method for pre-train network
– AlphaGo Zero vs. AlphaGo on Go
• Deep learning method
• Reinforcement learning
• Imperfect information games
– Uncertainty
– Randomness
– Complex rules
– Difficult for simulation
*
Previous research’s model.
• Naoki Mizukami and Yoshimasa Tsuruoka. Building a Computer
Mahjong Player Based on Monte Carlo Simulation and
Opponent Models, Proceedings of the 2015 IEEE Conference
on Computational Intelligence and Games (CIG 2015), pp.275-
283, Aug. 2015.
• Monte Carlo tree search to simulate opponents' movement
• Prediction of game states.
3
Purpose
• This study is about using supervised learning
theory and deep learning method on imperfect
information game -- Mahjong.
• Improvement:
– New feature engineering
• Improve the training results of networks
– Discard method
• Improve aggressive during games
4
Introduction of Mahjong Rule
• Mahjong tiles consist of 4 types, 34 different tile
and each tile has 4 pieces, totally 136 pieces.
5
Hand:
consist of tiles
River:
discarded tiles
Dora tile
Mountain:
invisible tiles for
stealing
Meld: Open hands
Goal of Mahjong
• Goal of mahjong is to make a winning hand into a special
format.
• There are two different types for earning points:
6
Tsumo: get last tile from mountain and earn
points from all other players
Ron: use other player’s last discarded tile
and earn points from that player
Difficulty & Approach
• Difficulty
– Imperfect information game has much more states than
perfect information game.
• It's almost impossible to meet a same game state from any game
you played ever.
– Randomness and uncertainty will fill the entire game
process.
• Approach
– Dividing the movements during games into several types.
– Using multi-networks and methods to make different
movements in different states.
7
Introduction of Tenhou.net
Tenhou is the one of the most popular online mahjong
service in Japan.
• 4,870,311 users totally.
• About 5000 players on
line on the same time.
• Our training data are all
from “houou” table
8
Tenhou.net Model
Game states
Decision
Introduction of Tenhou.net's API 9
Data from Tenhou.net Mean Example
T/U/V/W (+ ID) T/U/V/W: Player's position
ID: Tile ID from 0 to 135
T123 #Dealer steals a North
V #Player in position west steal a tile
D/E/F/G + ID D/E/F/G: Player's position
ID: Tile ID from 0 to 135
E123 #Player in position south
discards a North
Reach who= “Player's
position”
Who makes a call of riichi Reach who="2"
#Player in position west calls a riichi
N who=”Player's position”
m=”meld"
Who makes a call of meld N who="3" m=``34567"
#Player in position north calls a meld
Agari Who makes a call of winning and his hands, point changes, waiting
tile, yaku and who lose point
Ryuukyoku End a round without anyone wins and the point changes
Data to Tenhou.net Mean Example
T + ID Discard a tile and the tile's ID T123 #You discard a North
Reach who=”0" Make a call of riichi
N who=”0" m=”meld" Make a call of meld N who="0" m=``34567"
Agari Make a call of winning
Process of Decision Making 10
Player: Steal a tile
System: Win check
Player: Decide a tile to be discarded
Player: Call winYes
System: riichi check
Player: Discard
Player: Call riichi & discardYes
Last player's turn
Next player's turn
System: Win check Opponent: Call winYes
Introduction of Related Terminology
• Waiting/Tenpai: One or more players have made winning
hands and waiting for the last tiles to earn score.
• N shanten: After n effective tiles drawn into hands by
player will make hands into winning hands and enter
waiting state.
11
Aggressive Move
• Two types of game states
– No one is in waiting (Attack route): Discard a tile to make hands closer to winning
hands and earn score, which may lead to a decrease in number of shanten.
– Someone may in waiting (Defense route):
• Aggressive move: Player choose a tile that may decrease the number of shanten and
unsafe for current game state, also may lead to a decrease in player’s score because other
players may have entered waiting states and waiting for this tile.
• Safe move: Discard a tile that has less danger of losing score and give up to win, which
may make hands away from winning hands and lead to an increase in number of shanten.
12
In this case, player D has make a riichi
(Someone is in waiting):
- Aggressive move: discard a tile to
turn into waiting state which may
lead to losing point.
- Safe move: discard a tile that player
D has discarded will lead to an
increase in number of shanten.
Without aggressive move:
- Fold always
- Difficult to make a winning hands
Model Details -- Networks 13
Choose a tile
to discard
Waiting-tile network
(WTR)
Waiting-or-not
network
(WR)
opponents'
waiting
probability
probability of
34 tiles that
maybe waited
Discard network
(DR)
Lose-point
network
(LP)
probability
of 34 tiles
that maybe
discarded
probability
of point may
lose for the
tiles in hand
Defense/fold route
WR > threshold
6*6*107(108)
feature map
Discard network
(DR)
WR ≤ threshold
probability
of 34 tiles
that maybe
discarded
Attack route
Model Details -- Networks
• No one is in waiting
– Maximum of the output from discard network
• Someone may in waiting
– Minimum of lose point expert (LPE)
– LPEi = WR * (DRi *) WTRi * LPi ,
where i is tile ID which in hands. In order to increse aggressive
move, the output from discard network will be multiplied to LPE.
• The threshold to turn mode
– Collecting the data of games states when there is player in
waiting.
– Using the waiting-or-not network to calculate the
probability for these games states.
– Calculating the average of outputs which is 0.245.
14
Model Details -- Feature Engineering
• Matrix with strong connection between each adjacent nodes in
matrix performs better for convolutional neural network (CNN).
• Modeling each non-repeating tile into a vector space.
• Turning the vector space into 6 * 6 matrix base.
15
Features in Feature Map
Feature map
hands, 4 layers
river, 4 layers
turn's movement, 24 * 4 layers
dora tiles, 1 layer
invisible tiles, 1 layer
close hand, 1 layer
(discard tile, 1 layer)
16
107 layers feature map will not
include the discard tile feature.
Networks Details 17
Network Content Output Data
amount
WR waiting-or-not
network
predict the probability that
other players are waiting
a probability about whether other
players is in waiting or not (From 0
to 1)
300,000
WTR waiting-tiles
network
predict the probabilities of
tiles that others may wait for
a list of 34 probabilities about how
dangerous 34 tiles
4,000*34
DR discard network predict which tile in hand will
be discarded if player is a
mahjong high level player
a list of probability which are 34
tiles' probabilities of being discarded
100,000*34
Training data:
Waiting-or-not network:
Input: 107 layer feature map
Output:
1: someone is in waiting
0: no one is in waiting
Waiting-tiles network:
Input: 107 layer feature map
Output:
1: tiles being waited
0: other tiles
In waiting Wait for 1s and 4s
Networks Details 18
Network Content Output Data amount
LP lose-point
network
predict how many point will
lost if discard one tile
a list that consists of 6 probabilities
about how many han in other hand
if he wins this round
16,500*6
Training data:
Lose-point network:
• Input: 108 layer feature map
• Output: the lost for this discarded
tile
Networks Details 19
Number of
convolutional
kernels
Size of
convolutional
kernels
Edge
processing
padding
Activation
function
512 4*4 same relu
512 3*3 same none
512 2*2 same relu
Dropout
256 2*2 same none
256 3*3 same relu
Dropout
128 3*3 same none
128 2*2 same relu
Dropout
Full connected
6*6*107(108)
feature map as
input layer
Hidden layer
(Totally 7 layers)
...
...
Output layer and
full connected layer
Final Accuracy of Each Network 20
Network Accuracy
Waiting-or-not network 82.7%
Waiting-tiles network 40.2%
Lose-point network 88.7%
Discard network 88.4%
The Waiting-tiles
network has the
accuracy only 40.2% is
that the result only
calculate whether the
maximum of output is
being waited.
Experiment and Result
• Comparison of three models in our experiment
21
Model Game state Attack route Defense route
Best choice
algorithm (BCA)
Make a call of
riichi or open
hands with over
three melds
Choose the tile
which can make
hands closer to
winning hands
Choose the tile
which the in-
waiting player has
discarded
Combine BCA's
attack mode with
deep model for
defense
Make a prediction
that someone
may in waiting
Choose the tile
which can make
hands closer to
winning hands
Choose the tile
which will lead to
the least loss
Deep model
Make a prediction
that someone
may in waiting
Imitate expert
players discard
base on current
game state
Choose the tile
which will lead to
the least loss
Experiment and Result
• We perform 60 games for each model on “Ippan” table,
which every player can participate in.
22
Ippan
table
(avg. lv.
1.5)
Top 2nd 3rd 4th Win rate
Feed
rate
Aggressive
move
BCA 27% 30% 25% 18% 24% 11% 14%
BCA +
defense
model
17% 28% 45% 10% 18% 8% 0%
Deep
model
22% 27% 33% 18% 20% 9% 8%
Players'
average
(Tenhou)
20% 23% 27% 30% 20% 19% -
Geen: Worst performance Red: Best performance
Experiment and Result
• We perform 100 games for each model on “Joukyuu”
table.
23
Joukyuu
table
(avg. lv.
11.75)
Top 2nd 3rd 4th Win rate
Feed
rate
Aggressive
move
BCA 19% 23% 30% 28% 16% 18% 12%
BCA with
deep
model
22% 28% 33% 17% 17% 8% 1%
Deep
model
24% 29% 27% 20% 21% 11% 7%
Players'
average
(Tenhou)
25% 25% 25% 25% 23% 15% 17%
Geen: Worst performance Red: Best performance
Competition Between Each Model 24
1st/2nd/3rd/4th 1 BCA
1 BCA + defense
model
1 Deep model
3 BCA - 2/6/9/3 3/7/6/4
3 BCA + defense
model
6/4/5/5 - 5/6/5/4
3 Deep model 4/5/5/6 4/7/7/2 -
The result table shows that:
• BCA
• Good in attack
• Easy to be defended
• BCA + defense mode
• Great in defense
• Less aggressive move
• Deep model
• Good in defense
• Balance in defensive and offensive
We performed 20 game for each model with a 1 vs 3 games.
Comparison Between Discard Method
• Two discard methods show different performance during expriment.
• Make a comparison for these two methods.
• It’s easier to be speculate the non-deep learning AI’s state and what
tiles it’s waiting for.
• Deep model performs more like a human player than non-deep
learning AI in attack which we can get from the top rate and win rate.
25
Discard method Waiting
Waiting
rate
Waiting
prediction
Waiting
tiles
prediction
BCA 438 53.94% 91.32% 57.53%
Discard model 411 49.58% 83.43% 39.90%
Conclusion
• The deep model in this study shows a good performance
during Mahjong games.
– High 2nd rate.
– Aggressive move.
• New feature engineering performs good.
• Performance when model predicts that someone is in
waiting are better than human player’s average.
• It’s possible to make a better multi-network model based
on this experiment.
Thank you for listening.
26
Research performance
・Information Processing Society of Japan
1) Yeqin Zheng, Soichiro Yokoyama, Tomohisa Yamashita,
Hidenori Kawamura: Study on Evaluation Function Design of Mahjong
using Supervised Learning, Special Internet Groups(Sig), Vol 194,
Hokkaido(2019)
27

More Related Content

Similar to Study on Evaluation Function Design of Mahjong using Supervised Learning

Android application - Tic Tac Toe
Android application - Tic Tac ToeAndroid application - Tic Tac Toe
Android application - Tic Tac ToeSarthak Srivastava
 
AI3391 Artificial Intelligence Session 14 Adversarial Search .pptx
AI3391 Artificial Intelligence Session 14 Adversarial Search .pptxAI3391 Artificial Intelligence Session 14 Adversarial Search .pptx
AI3391 Artificial Intelligence Session 14 Adversarial Search .pptxAsst.prof M.Gokilavani
 
Alpha go 16110226_김영우
Alpha go 16110226_김영우Alpha go 16110226_김영우
Alpha go 16110226_김영우영우 김
 
A Presentation on the Paper: Mastering the game of Go with deep neural networ...
A Presentation on the Paper: Mastering the game of Go with deep neural networ...A Presentation on the Paper: Mastering the game of Go with deep neural networ...
A Presentation on the Paper: Mastering the game of Go with deep neural networ...AdityaSuryavamshi
 
An evolutionary tic tac toe player ccit2012
An evolutionary tic tac toe player ccit2012An evolutionary tic tac toe player ccit2012
An evolutionary tic tac toe player ccit2012Belal Al-Khateeb
 
Introduction to Alphago Zero
Introduction to Alphago ZeroIntroduction to Alphago Zero
Introduction to Alphago ZeroChia-Ching Lin
 
AI3391 Artificial Intelligence Session 20 partially observed games.pptx
AI3391 Artificial Intelligence Session 20 partially observed games.pptxAI3391 Artificial Intelligence Session 20 partially observed games.pptx
AI3391 Artificial Intelligence Session 20 partially observed games.pptxAsst.prof M.Gokilavani
 
Intelligent Heuristics for the Game Isolation
Intelligent Heuristics  for the Game IsolationIntelligent Heuristics  for the Game Isolation
Intelligent Heuristics for the Game IsolationKory Becker
 
AI3391 Artificial Intelligence UNIT III Notes_merged.pdf
AI3391 Artificial Intelligence UNIT III Notes_merged.pdfAI3391 Artificial Intelligence UNIT III Notes_merged.pdf
AI3391 Artificial Intelligence UNIT III Notes_merged.pdfAsst.prof M.Gokilavani
 
Smart Attacks on the integrity of the Internet of Things Avoiding detection b...
Smart Attacks on the integrity of the Internet of Things Avoiding detection b...Smart Attacks on the integrity of the Internet of Things Avoiding detection b...
Smart Attacks on the integrity of the Internet of Things Avoiding detection b...Communication Systems & Networks
 
Analysis on steam platform
Analysis on steam platformAnalysis on steam platform
Analysis on steam platformYousef Fadila
 
Multiplayer Networking Game
Multiplayer Networking GameMultiplayer Networking Game
Multiplayer Networking GameTanmay Krishna
 
Gdmc v11 presentation
Gdmc v11 presentationGdmc v11 presentation
Gdmc v11 presentationjihoon jeon
 
21CSC206T_UNIT3.pptx.pdf ARITIFICIAL INTELLIGENCE
21CSC206T_UNIT3.pptx.pdf ARITIFICIAL INTELLIGENCE21CSC206T_UNIT3.pptx.pdf ARITIFICIAL INTELLIGENCE
21CSC206T_UNIT3.pptx.pdf ARITIFICIAL INTELLIGENCEudayvanand
 
games, infosec, privacy, adversaries .ppt
games, infosec, privacy, adversaries .pptgames, infosec, privacy, adversaries .ppt
games, infosec, privacy, adversaries .pptMuhammadAbdullah311866
 
Game theory in network security
Game theory in network securityGame theory in network security
Game theory in network securityRahmaSallam
 

Similar to Study on Evaluation Function Design of Mahjong using Supervised Learning (20)

AlphaGo
AlphaGoAlphaGo
AlphaGo
 
Android application - Tic Tac Toe
Android application - Tic Tac ToeAndroid application - Tic Tac Toe
Android application - Tic Tac Toe
 
AI3391 Artificial Intelligence Session 14 Adversarial Search .pptx
AI3391 Artificial Intelligence Session 14 Adversarial Search .pptxAI3391 Artificial Intelligence Session 14 Adversarial Search .pptx
AI3391 Artificial Intelligence Session 14 Adversarial Search .pptx
 
Alpha go 16110226_김영우
Alpha go 16110226_김영우Alpha go 16110226_김영우
Alpha go 16110226_김영우
 
Starcraft 2016
Starcraft 2016Starcraft 2016
Starcraft 2016
 
A Presentation on the Paper: Mastering the game of Go with deep neural networ...
A Presentation on the Paper: Mastering the game of Go with deep neural networ...A Presentation on the Paper: Mastering the game of Go with deep neural networ...
A Presentation on the Paper: Mastering the game of Go with deep neural networ...
 
An evolutionary tic tac toe player ccit2012
An evolutionary tic tac toe player ccit2012An evolutionary tic tac toe player ccit2012
An evolutionary tic tac toe player ccit2012
 
Games.4
Games.4Games.4
Games.4
 
Introduction to Alphago Zero
Introduction to Alphago ZeroIntroduction to Alphago Zero
Introduction to Alphago Zero
 
AI3391 Artificial Intelligence Session 20 partially observed games.pptx
AI3391 Artificial Intelligence Session 20 partially observed games.pptxAI3391 Artificial Intelligence Session 20 partially observed games.pptx
AI3391 Artificial Intelligence Session 20 partially observed games.pptx
 
Intelligent Heuristics for the Game Isolation
Intelligent Heuristics  for the Game IsolationIntelligent Heuristics  for the Game Isolation
Intelligent Heuristics for the Game Isolation
 
Stratego
StrategoStratego
Stratego
 
AI3391 Artificial Intelligence UNIT III Notes_merged.pdf
AI3391 Artificial Intelligence UNIT III Notes_merged.pdfAI3391 Artificial Intelligence UNIT III Notes_merged.pdf
AI3391 Artificial Intelligence UNIT III Notes_merged.pdf
 
Smart Attacks on the integrity of the Internet of Things Avoiding detection b...
Smart Attacks on the integrity of the Internet of Things Avoiding detection b...Smart Attacks on the integrity of the Internet of Things Avoiding detection b...
Smart Attacks on the integrity of the Internet of Things Avoiding detection b...
 
Analysis on steam platform
Analysis on steam platformAnalysis on steam platform
Analysis on steam platform
 
Multiplayer Networking Game
Multiplayer Networking GameMultiplayer Networking Game
Multiplayer Networking Game
 
Gdmc v11 presentation
Gdmc v11 presentationGdmc v11 presentation
Gdmc v11 presentation
 
21CSC206T_UNIT3.pptx.pdf ARITIFICIAL INTELLIGENCE
21CSC206T_UNIT3.pptx.pdf ARITIFICIAL INTELLIGENCE21CSC206T_UNIT3.pptx.pdf ARITIFICIAL INTELLIGENCE
21CSC206T_UNIT3.pptx.pdf ARITIFICIAL INTELLIGENCE
 
games, infosec, privacy, adversaries .ppt
games, infosec, privacy, adversaries .pptgames, infosec, privacy, adversaries .ppt
games, infosec, privacy, adversaries .ppt
 
Game theory in network security
Game theory in network securityGame theory in network security
Game theory in network security
 

More from harmonylab

【修士論文】代替出勤者の選定業務における依頼順決定方法に関する研究   千坂知也
【修士論文】代替出勤者の選定業務における依頼順決定方法に関する研究   千坂知也【修士論文】代替出勤者の選定業務における依頼順決定方法に関する研究   千坂知也
【修士論文】代替出勤者の選定業務における依頼順決定方法に関する研究   千坂知也harmonylab
 
【修士論文】経路探索のための媒介中心性に基づく道路ネットワーク階層化手法に関する研究
【修士論文】経路探索のための媒介中心性に基づく道路ネットワーク階層化手法に関する研究【修士論文】経路探索のための媒介中心性に基づく道路ネットワーク階層化手法に関する研究
【修士論文】経路探索のための媒介中心性に基づく道路ネットワーク階層化手法に関する研究harmonylab
 
A Study on Decision Support System for Snow Removal Dispatch using Road Surfa...
A Study on Decision Support System for Snow Removal Dispatch using Road Surfa...A Study on Decision Support System for Snow Removal Dispatch using Road Surfa...
A Study on Decision Support System for Snow Removal Dispatch using Road Surfa...harmonylab
 
【卒業論文】印象タグを用いた衣服画像生成システムに関する研究
【卒業論文】印象タグを用いた衣服画像生成システムに関する研究【卒業論文】印象タグを用いた衣服画像生成システムに関する研究
【卒業論文】印象タグを用いた衣服画像生成システムに関する研究harmonylab
 
【卒業論文】大規模言語モデルを用いたマニュアル文章修正手法に関する研究
【卒業論文】大規模言語モデルを用いたマニュアル文章修正手法に関する研究【卒業論文】大規模言語モデルを用いたマニュアル文章修正手法に関する研究
【卒業論文】大規模言語モデルを用いたマニュアル文章修正手法に関する研究harmonylab
 
DLゼミ:Primitive Generation and Semantic-related Alignment for Universal Zero-S...
DLゼミ:Primitive Generation and Semantic-related Alignment for Universal Zero-S...DLゼミ:Primitive Generation and Semantic-related Alignment for Universal Zero-S...
DLゼミ:Primitive Generation and Semantic-related Alignment for Universal Zero-S...harmonylab
 
DLゼミ: MobileOne: An Improved One millisecond Mobile Backbone
DLゼミ: MobileOne: An Improved One millisecond Mobile BackboneDLゼミ: MobileOne: An Improved One millisecond Mobile Backbone
DLゼミ: MobileOne: An Improved One millisecond Mobile Backboneharmonylab
 
DLゼミ: Llama 2: Open Foundation and Fine-Tuned Chat Models
DLゼミ: Llama 2: Open Foundation and Fine-Tuned Chat ModelsDLゼミ: Llama 2: Open Foundation and Fine-Tuned Chat Models
DLゼミ: Llama 2: Open Foundation and Fine-Tuned Chat Modelsharmonylab
 
DLゼミ: ViTPose: Simple Vision Transformer Baselines for Human Pose Estimation
DLゼミ: ViTPose: Simple Vision Transformer Baselines for Human Pose EstimationDLゼミ: ViTPose: Simple Vision Transformer Baselines for Human Pose Estimation
DLゼミ: ViTPose: Simple Vision Transformer Baselines for Human Pose Estimationharmonylab
 
Voyager: An Open-Ended Embodied Agent with Large Language Models
Voyager: An Open-Ended Embodied Agent with Large Language ModelsVoyager: An Open-Ended Embodied Agent with Large Language Models
Voyager: An Open-Ended Embodied Agent with Large Language Modelsharmonylab
 
DLゼミ: Ego-Body Pose Estimation via Ego-Head Pose Estimation
DLゼミ: Ego-Body Pose Estimation via Ego-Head Pose EstimationDLゼミ: Ego-Body Pose Estimation via Ego-Head Pose Estimation
DLゼミ: Ego-Body Pose Estimation via Ego-Head Pose Estimationharmonylab
 
ReAct: Synergizing Reasoning and Acting in Language Models
ReAct: Synergizing Reasoning and Acting in Language ModelsReAct: Synergizing Reasoning and Acting in Language Models
ReAct: Synergizing Reasoning and Acting in Language Modelsharmonylab
 
形態素解析を用いた帝国議会議事速記録の変遷に関する研究
形態素解析を用いた帝国議会議事速記録の変遷に関する研究形態素解析を用いた帝国議会議事速記録の変遷に関する研究
形態素解析を用いた帝国議会議事速記録の変遷に関する研究harmonylab
 
【卒業論文】深層生成モデルを用いたユーザ意図に基づく衣服画像の生成に関する研究
【卒業論文】深層生成モデルを用いたユーザ意図に基づく衣服画像の生成に関する研究【卒業論文】深層生成モデルを用いたユーザ意図に基づく衣服画像の生成に関する研究
【卒業論文】深層生成モデルを用いたユーザ意図に基づく衣服画像の生成に関する研究harmonylab
 
灯油タンク内の液面高計測を用いた 灯油残量推定システムに関する研究
灯油タンク内の液面高計測を用いた灯油残量推定システムに関する研究灯油タンク内の液面高計測を用いた灯油残量推定システムに関する研究
灯油タンク内の液面高計測を用いた 灯油残量推定システムに関する研究harmonylab
 
深層自己回帰モデルを用いた俳句の生成と評価に関する研究
深層自己回帰モデルを用いた俳句の生成と評価に関する研究深層自己回帰モデルを用いた俳句の生成と評価に関する研究
深層自己回帰モデルを用いた俳句の生成と評価に関する研究harmonylab
 
競輪におけるレーティングシステムを用いた予想記事生成に関する研究
競輪におけるレーティングシステムを用いた予想記事生成に関する研究競輪におけるレーティングシステムを用いた予想記事生成に関する研究
競輪におけるレーティングシステムを用いた予想記事生成に関する研究harmonylab
 
【卒業論文】B2Bオークションにおけるユーザ別 入札行動予測に関する研究
【卒業論文】B2Bオークションにおけるユーザ別 入札行動予測に関する研究【卒業論文】B2Bオークションにおけるユーザ別 入札行動予測に関する研究
【卒業論文】B2Bオークションにおけるユーザ別 入札行動予測に関する研究harmonylab
 
A Study on Estimation of Household Kerosene Consumption for Optimization of D...
A Study on Estimation of Household Kerosene Consumption for Optimization of D...A Study on Estimation of Household Kerosene Consumption for Optimization of D...
A Study on Estimation of Household Kerosene Consumption for Optimization of D...harmonylab
 
マルチエージェント深層強化学習による自動運転車両の追越行動の獲得に関する研究
マルチエージェント深層強化学習による自動運転車両の追越行動の獲得に関する研究マルチエージェント深層強化学習による自動運転車両の追越行動の獲得に関する研究
マルチエージェント深層強化学習による自動運転車両の追越行動の獲得に関する研究harmonylab
 

More from harmonylab (20)

【修士論文】代替出勤者の選定業務における依頼順決定方法に関する研究   千坂知也
【修士論文】代替出勤者の選定業務における依頼順決定方法に関する研究   千坂知也【修士論文】代替出勤者の選定業務における依頼順決定方法に関する研究   千坂知也
【修士論文】代替出勤者の選定業務における依頼順決定方法に関する研究   千坂知也
 
【修士論文】経路探索のための媒介中心性に基づく道路ネットワーク階層化手法に関する研究
【修士論文】経路探索のための媒介中心性に基づく道路ネットワーク階層化手法に関する研究【修士論文】経路探索のための媒介中心性に基づく道路ネットワーク階層化手法に関する研究
【修士論文】経路探索のための媒介中心性に基づく道路ネットワーク階層化手法に関する研究
 
A Study on Decision Support System for Snow Removal Dispatch using Road Surfa...
A Study on Decision Support System for Snow Removal Dispatch using Road Surfa...A Study on Decision Support System for Snow Removal Dispatch using Road Surfa...
A Study on Decision Support System for Snow Removal Dispatch using Road Surfa...
 
【卒業論文】印象タグを用いた衣服画像生成システムに関する研究
【卒業論文】印象タグを用いた衣服画像生成システムに関する研究【卒業論文】印象タグを用いた衣服画像生成システムに関する研究
【卒業論文】印象タグを用いた衣服画像生成システムに関する研究
 
【卒業論文】大規模言語モデルを用いたマニュアル文章修正手法に関する研究
【卒業論文】大規模言語モデルを用いたマニュアル文章修正手法に関する研究【卒業論文】大規模言語モデルを用いたマニュアル文章修正手法に関する研究
【卒業論文】大規模言語モデルを用いたマニュアル文章修正手法に関する研究
 
DLゼミ:Primitive Generation and Semantic-related Alignment for Universal Zero-S...
DLゼミ:Primitive Generation and Semantic-related Alignment for Universal Zero-S...DLゼミ:Primitive Generation and Semantic-related Alignment for Universal Zero-S...
DLゼミ:Primitive Generation and Semantic-related Alignment for Universal Zero-S...
 
DLゼミ: MobileOne: An Improved One millisecond Mobile Backbone
DLゼミ: MobileOne: An Improved One millisecond Mobile BackboneDLゼミ: MobileOne: An Improved One millisecond Mobile Backbone
DLゼミ: MobileOne: An Improved One millisecond Mobile Backbone
 
DLゼミ: Llama 2: Open Foundation and Fine-Tuned Chat Models
DLゼミ: Llama 2: Open Foundation and Fine-Tuned Chat ModelsDLゼミ: Llama 2: Open Foundation and Fine-Tuned Chat Models
DLゼミ: Llama 2: Open Foundation and Fine-Tuned Chat Models
 
DLゼミ: ViTPose: Simple Vision Transformer Baselines for Human Pose Estimation
DLゼミ: ViTPose: Simple Vision Transformer Baselines for Human Pose EstimationDLゼミ: ViTPose: Simple Vision Transformer Baselines for Human Pose Estimation
DLゼミ: ViTPose: Simple Vision Transformer Baselines for Human Pose Estimation
 
Voyager: An Open-Ended Embodied Agent with Large Language Models
Voyager: An Open-Ended Embodied Agent with Large Language ModelsVoyager: An Open-Ended Embodied Agent with Large Language Models
Voyager: An Open-Ended Embodied Agent with Large Language Models
 
DLゼミ: Ego-Body Pose Estimation via Ego-Head Pose Estimation
DLゼミ: Ego-Body Pose Estimation via Ego-Head Pose EstimationDLゼミ: Ego-Body Pose Estimation via Ego-Head Pose Estimation
DLゼミ: Ego-Body Pose Estimation via Ego-Head Pose Estimation
 
ReAct: Synergizing Reasoning and Acting in Language Models
ReAct: Synergizing Reasoning and Acting in Language ModelsReAct: Synergizing Reasoning and Acting in Language Models
ReAct: Synergizing Reasoning and Acting in Language Models
 
形態素解析を用いた帝国議会議事速記録の変遷に関する研究
形態素解析を用いた帝国議会議事速記録の変遷に関する研究形態素解析を用いた帝国議会議事速記録の変遷に関する研究
形態素解析を用いた帝国議会議事速記録の変遷に関する研究
 
【卒業論文】深層生成モデルを用いたユーザ意図に基づく衣服画像の生成に関する研究
【卒業論文】深層生成モデルを用いたユーザ意図に基づく衣服画像の生成に関する研究【卒業論文】深層生成モデルを用いたユーザ意図に基づく衣服画像の生成に関する研究
【卒業論文】深層生成モデルを用いたユーザ意図に基づく衣服画像の生成に関する研究
 
灯油タンク内の液面高計測を用いた 灯油残量推定システムに関する研究
灯油タンク内の液面高計測を用いた灯油残量推定システムに関する研究灯油タンク内の液面高計測を用いた灯油残量推定システムに関する研究
灯油タンク内の液面高計測を用いた 灯油残量推定システムに関する研究
 
深層自己回帰モデルを用いた俳句の生成と評価に関する研究
深層自己回帰モデルを用いた俳句の生成と評価に関する研究深層自己回帰モデルを用いた俳句の生成と評価に関する研究
深層自己回帰モデルを用いた俳句の生成と評価に関する研究
 
競輪におけるレーティングシステムを用いた予想記事生成に関する研究
競輪におけるレーティングシステムを用いた予想記事生成に関する研究競輪におけるレーティングシステムを用いた予想記事生成に関する研究
競輪におけるレーティングシステムを用いた予想記事生成に関する研究
 
【卒業論文】B2Bオークションにおけるユーザ別 入札行動予測に関する研究
【卒業論文】B2Bオークションにおけるユーザ別 入札行動予測に関する研究【卒業論文】B2Bオークションにおけるユーザ別 入札行動予測に関する研究
【卒業論文】B2Bオークションにおけるユーザ別 入札行動予測に関する研究
 
A Study on Estimation of Household Kerosene Consumption for Optimization of D...
A Study on Estimation of Household Kerosene Consumption for Optimization of D...A Study on Estimation of Household Kerosene Consumption for Optimization of D...
A Study on Estimation of Household Kerosene Consumption for Optimization of D...
 
マルチエージェント深層強化学習による自動運転車両の追越行動の獲得に関する研究
マルチエージェント深層強化学習による自動運転車両の追越行動の獲得に関する研究マルチエージェント深層強化学習による自動運転車両の追越行動の獲得に関する研究
マルチエージェント深層強化学習による自動運転車両の追越行動の獲得に関する研究
 

Recently uploaded

DeepFakes presentation : brief idea of DeepFakes
DeepFakes presentation : brief idea of DeepFakesDeepFakes presentation : brief idea of DeepFakes
DeepFakes presentation : brief idea of DeepFakesMayuraD1
 
Online electricity billing project report..pdf
Online electricity billing project report..pdfOnline electricity billing project report..pdf
Online electricity billing project report..pdfKamal Acharya
 
Moment Distribution Method For Btech Civil
Moment Distribution Method For Btech CivilMoment Distribution Method For Btech Civil
Moment Distribution Method For Btech CivilVinayVitekari
 
Employee leave management system project.
Employee leave management system project.Employee leave management system project.
Employee leave management system project.Kamal Acharya
 
1_Introduction + EAM Vocabulary + how to navigate in EAM.pdf
1_Introduction + EAM Vocabulary + how to navigate in EAM.pdf1_Introduction + EAM Vocabulary + how to navigate in EAM.pdf
1_Introduction + EAM Vocabulary + how to navigate in EAM.pdfAldoGarca30
 
AIRCANVAS[1].pdf mini project for btech students
AIRCANVAS[1].pdf mini project for btech studentsAIRCANVAS[1].pdf mini project for btech students
AIRCANVAS[1].pdf mini project for btech studentsvanyagupta248
 
HOA1&2 - Module 3 - PREHISTORCI ARCHITECTURE OF KERALA.pptx
HOA1&2 - Module 3 - PREHISTORCI ARCHITECTURE OF KERALA.pptxHOA1&2 - Module 3 - PREHISTORCI ARCHITECTURE OF KERALA.pptx
HOA1&2 - Module 3 - PREHISTORCI ARCHITECTURE OF KERALA.pptxSCMS School of Architecture
 
kiln thermal load.pptx kiln tgermal load
kiln thermal load.pptx kiln tgermal loadkiln thermal load.pptx kiln tgermal load
kiln thermal load.pptx kiln tgermal loadhamedmustafa094
 
Work-Permit-Receiver-in-Saudi-Aramco.pptx
Work-Permit-Receiver-in-Saudi-Aramco.pptxWork-Permit-Receiver-in-Saudi-Aramco.pptx
Work-Permit-Receiver-in-Saudi-Aramco.pptxJuliansyahHarahap1
 
Engineering Drawing focus on projection of planes
Engineering Drawing focus on projection of planesEngineering Drawing focus on projection of planes
Engineering Drawing focus on projection of planesRAJNEESHKUMAR341697
 
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXssuser89054b
 
Orlando’s Arnold Palmer Hospital Layout Strategy-1.pptx
Orlando’s Arnold Palmer Hospital Layout Strategy-1.pptxOrlando’s Arnold Palmer Hospital Layout Strategy-1.pptx
Orlando’s Arnold Palmer Hospital Layout Strategy-1.pptxMuhammadAsimMuhammad6
 
Thermal Engineering -unit - III & IV.ppt
Thermal Engineering -unit - III & IV.pptThermal Engineering -unit - III & IV.ppt
Thermal Engineering -unit - III & IV.pptDineshKumar4165
 
Computer Lecture 01.pptxIntroduction to Computers
Computer Lecture 01.pptxIntroduction to ComputersComputer Lecture 01.pptxIntroduction to Computers
Computer Lecture 01.pptxIntroduction to ComputersMairaAshraf6
 
Unit 4_Part 1 CSE2001 Exception Handling and Function Template and Class Temp...
Unit 4_Part 1 CSE2001 Exception Handling and Function Template and Class Temp...Unit 4_Part 1 CSE2001 Exception Handling and Function Template and Class Temp...
Unit 4_Part 1 CSE2001 Exception Handling and Function Template and Class Temp...drmkjayanthikannan
 
Hostel management system project report..pdf
Hostel management system project report..pdfHostel management system project report..pdf
Hostel management system project report..pdfKamal Acharya
 

Recently uploaded (20)

DeepFakes presentation : brief idea of DeepFakes
DeepFakes presentation : brief idea of DeepFakesDeepFakes presentation : brief idea of DeepFakes
DeepFakes presentation : brief idea of DeepFakes
 
Integrated Test Rig For HTFE-25 - Neometrix
Integrated Test Rig For HTFE-25 - NeometrixIntegrated Test Rig For HTFE-25 - Neometrix
Integrated Test Rig For HTFE-25 - Neometrix
 
Online electricity billing project report..pdf
Online electricity billing project report..pdfOnline electricity billing project report..pdf
Online electricity billing project report..pdf
 
Moment Distribution Method For Btech Civil
Moment Distribution Method For Btech CivilMoment Distribution Method For Btech Civil
Moment Distribution Method For Btech Civil
 
Employee leave management system project.
Employee leave management system project.Employee leave management system project.
Employee leave management system project.
 
1_Introduction + EAM Vocabulary + how to navigate in EAM.pdf
1_Introduction + EAM Vocabulary + how to navigate in EAM.pdf1_Introduction + EAM Vocabulary + how to navigate in EAM.pdf
1_Introduction + EAM Vocabulary + how to navigate in EAM.pdf
 
AIRCANVAS[1].pdf mini project for btech students
AIRCANVAS[1].pdf mini project for btech studentsAIRCANVAS[1].pdf mini project for btech students
AIRCANVAS[1].pdf mini project for btech students
 
HOA1&2 - Module 3 - PREHISTORCI ARCHITECTURE OF KERALA.pptx
HOA1&2 - Module 3 - PREHISTORCI ARCHITECTURE OF KERALA.pptxHOA1&2 - Module 3 - PREHISTORCI ARCHITECTURE OF KERALA.pptx
HOA1&2 - Module 3 - PREHISTORCI ARCHITECTURE OF KERALA.pptx
 
kiln thermal load.pptx kiln tgermal load
kiln thermal load.pptx kiln tgermal loadkiln thermal load.pptx kiln tgermal load
kiln thermal load.pptx kiln tgermal load
 
Work-Permit-Receiver-in-Saudi-Aramco.pptx
Work-Permit-Receiver-in-Saudi-Aramco.pptxWork-Permit-Receiver-in-Saudi-Aramco.pptx
Work-Permit-Receiver-in-Saudi-Aramco.pptx
 
FEA Based Level 3 Assessment of Deformed Tanks with Fluid Induced Loads
FEA Based Level 3 Assessment of Deformed Tanks with Fluid Induced LoadsFEA Based Level 3 Assessment of Deformed Tanks with Fluid Induced Loads
FEA Based Level 3 Assessment of Deformed Tanks with Fluid Induced Loads
 
Engineering Drawing focus on projection of planes
Engineering Drawing focus on projection of planesEngineering Drawing focus on projection of planes
Engineering Drawing focus on projection of planes
 
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
 
Orlando’s Arnold Palmer Hospital Layout Strategy-1.pptx
Orlando’s Arnold Palmer Hospital Layout Strategy-1.pptxOrlando’s Arnold Palmer Hospital Layout Strategy-1.pptx
Orlando’s Arnold Palmer Hospital Layout Strategy-1.pptx
 
Cara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak Hamil
Cara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak HamilCara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak Hamil
Cara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak Hamil
 
Call Girls in South Ex (delhi) call me [🔝9953056974🔝] escort service 24X7
Call Girls in South Ex (delhi) call me [🔝9953056974🔝] escort service 24X7Call Girls in South Ex (delhi) call me [🔝9953056974🔝] escort service 24X7
Call Girls in South Ex (delhi) call me [🔝9953056974🔝] escort service 24X7
 
Thermal Engineering -unit - III & IV.ppt
Thermal Engineering -unit - III & IV.pptThermal Engineering -unit - III & IV.ppt
Thermal Engineering -unit - III & IV.ppt
 
Computer Lecture 01.pptxIntroduction to Computers
Computer Lecture 01.pptxIntroduction to ComputersComputer Lecture 01.pptxIntroduction to Computers
Computer Lecture 01.pptxIntroduction to Computers
 
Unit 4_Part 1 CSE2001 Exception Handling and Function Template and Class Temp...
Unit 4_Part 1 CSE2001 Exception Handling and Function Template and Class Temp...Unit 4_Part 1 CSE2001 Exception Handling and Function Template and Class Temp...
Unit 4_Part 1 CSE2001 Exception Handling and Function Template and Class Temp...
 
Hostel management system project report..pdf
Hostel management system project report..pdfHostel management system project report..pdf
Hostel management system project report..pdf
 

Study on Evaluation Function Design of Mahjong using Supervised Learning

  • 1. Study on Evaluation Function Design of Mahjong using Supervised Learning Hokaido University Graduate School of Information Science and Technology Harmonious Systems Engineering Laboratory Yeqin Zheng 1
  • 2. Background • Perfect information games – 1997 -- Deep blue vs. world champion on chess – 2007 -- Quackle vs. world champion on scrabble – 2016 -- AlphaGo vs. world champion on Go • Monte Carlo tree search theory • Deep learning method for pre-train network – AlphaGo Zero vs. AlphaGo on Go • Deep learning method • Reinforcement learning • Imperfect information games – Uncertainty – Randomness – Complex rules – Difficult for simulation *
  • 3. Previous research’s model. • Naoki Mizukami and Yoshimasa Tsuruoka. Building a Computer Mahjong Player Based on Monte Carlo Simulation and Opponent Models, Proceedings of the 2015 IEEE Conference on Computational Intelligence and Games (CIG 2015), pp.275- 283, Aug. 2015. • Monte Carlo tree search to simulate opponents' movement • Prediction of game states. 3
  • 4. Purpose • This study is about using supervised learning theory and deep learning method on imperfect information game -- Mahjong. • Improvement: – New feature engineering • Improve the training results of networks – Discard method • Improve aggressive during games 4
  • 5. Introduction of Mahjong Rule • Mahjong tiles consist of 4 types, 34 different tile and each tile has 4 pieces, totally 136 pieces. 5 Hand: consist of tiles River: discarded tiles Dora tile Mountain: invisible tiles for stealing Meld: Open hands
  • 6. Goal of Mahjong • Goal of mahjong is to make a winning hand into a special format. • There are two different types for earning points: 6 Tsumo: get last tile from mountain and earn points from all other players Ron: use other player’s last discarded tile and earn points from that player
  • 7. Difficulty & Approach • Difficulty – Imperfect information game has much more states than perfect information game. • It's almost impossible to meet a same game state from any game you played ever. – Randomness and uncertainty will fill the entire game process. • Approach – Dividing the movements during games into several types. – Using multi-networks and methods to make different movements in different states. 7
  • 8. Introduction of Tenhou.net Tenhou is the one of the most popular online mahjong service in Japan. • 4,870,311 users totally. • About 5000 players on line on the same time. • Our training data are all from “houou” table 8 Tenhou.net Model Game states Decision
  • 9. Introduction of Tenhou.net's API 9 Data from Tenhou.net Mean Example T/U/V/W (+ ID) T/U/V/W: Player's position ID: Tile ID from 0 to 135 T123 #Dealer steals a North V #Player in position west steal a tile D/E/F/G + ID D/E/F/G: Player's position ID: Tile ID from 0 to 135 E123 #Player in position south discards a North Reach who= “Player's position” Who makes a call of riichi Reach who="2" #Player in position west calls a riichi N who=”Player's position” m=”meld" Who makes a call of meld N who="3" m=``34567" #Player in position north calls a meld Agari Who makes a call of winning and his hands, point changes, waiting tile, yaku and who lose point Ryuukyoku End a round without anyone wins and the point changes Data to Tenhou.net Mean Example T + ID Discard a tile and the tile's ID T123 #You discard a North Reach who=”0" Make a call of riichi N who=”0" m=”meld" Make a call of meld N who="0" m=``34567" Agari Make a call of winning
  • 10. Process of Decision Making 10 Player: Steal a tile System: Win check Player: Decide a tile to be discarded Player: Call winYes System: riichi check Player: Discard Player: Call riichi & discardYes Last player's turn Next player's turn System: Win check Opponent: Call winYes
  • 11. Introduction of Related Terminology • Waiting/Tenpai: One or more players have made winning hands and waiting for the last tiles to earn score. • N shanten: After n effective tiles drawn into hands by player will make hands into winning hands and enter waiting state. 11
  • 12. Aggressive Move • Two types of game states – No one is in waiting (Attack route): Discard a tile to make hands closer to winning hands and earn score, which may lead to a decrease in number of shanten. – Someone may in waiting (Defense route): • Aggressive move: Player choose a tile that may decrease the number of shanten and unsafe for current game state, also may lead to a decrease in player’s score because other players may have entered waiting states and waiting for this tile. • Safe move: Discard a tile that has less danger of losing score and give up to win, which may make hands away from winning hands and lead to an increase in number of shanten. 12 In this case, player D has make a riichi (Someone is in waiting): - Aggressive move: discard a tile to turn into waiting state which may lead to losing point. - Safe move: discard a tile that player D has discarded will lead to an increase in number of shanten. Without aggressive move: - Fold always - Difficult to make a winning hands
  • 13. Model Details -- Networks 13 Choose a tile to discard Waiting-tile network (WTR) Waiting-or-not network (WR) opponents' waiting probability probability of 34 tiles that maybe waited Discard network (DR) Lose-point network (LP) probability of 34 tiles that maybe discarded probability of point may lose for the tiles in hand Defense/fold route WR > threshold 6*6*107(108) feature map Discard network (DR) WR ≤ threshold probability of 34 tiles that maybe discarded Attack route
  • 14. Model Details -- Networks • No one is in waiting – Maximum of the output from discard network • Someone may in waiting – Minimum of lose point expert (LPE) – LPEi = WR * (DRi *) WTRi * LPi , where i is tile ID which in hands. In order to increse aggressive move, the output from discard network will be multiplied to LPE. • The threshold to turn mode – Collecting the data of games states when there is player in waiting. – Using the waiting-or-not network to calculate the probability for these games states. – Calculating the average of outputs which is 0.245. 14
  • 15. Model Details -- Feature Engineering • Matrix with strong connection between each adjacent nodes in matrix performs better for convolutional neural network (CNN). • Modeling each non-repeating tile into a vector space. • Turning the vector space into 6 * 6 matrix base. 15
  • 16. Features in Feature Map Feature map hands, 4 layers river, 4 layers turn's movement, 24 * 4 layers dora tiles, 1 layer invisible tiles, 1 layer close hand, 1 layer (discard tile, 1 layer) 16 107 layers feature map will not include the discard tile feature.
  • 17. Networks Details 17 Network Content Output Data amount WR waiting-or-not network predict the probability that other players are waiting a probability about whether other players is in waiting or not (From 0 to 1) 300,000 WTR waiting-tiles network predict the probabilities of tiles that others may wait for a list of 34 probabilities about how dangerous 34 tiles 4,000*34 DR discard network predict which tile in hand will be discarded if player is a mahjong high level player a list of probability which are 34 tiles' probabilities of being discarded 100,000*34 Training data: Waiting-or-not network: Input: 107 layer feature map Output: 1: someone is in waiting 0: no one is in waiting Waiting-tiles network: Input: 107 layer feature map Output: 1: tiles being waited 0: other tiles In waiting Wait for 1s and 4s
  • 18. Networks Details 18 Network Content Output Data amount LP lose-point network predict how many point will lost if discard one tile a list that consists of 6 probabilities about how many han in other hand if he wins this round 16,500*6 Training data: Lose-point network: • Input: 108 layer feature map • Output: the lost for this discarded tile
  • 19. Networks Details 19 Number of convolutional kernels Size of convolutional kernels Edge processing padding Activation function 512 4*4 same relu 512 3*3 same none 512 2*2 same relu Dropout 256 2*2 same none 256 3*3 same relu Dropout 128 3*3 same none 128 2*2 same relu Dropout Full connected 6*6*107(108) feature map as input layer Hidden layer (Totally 7 layers) ... ... Output layer and full connected layer
  • 20. Final Accuracy of Each Network 20 Network Accuracy Waiting-or-not network 82.7% Waiting-tiles network 40.2% Lose-point network 88.7% Discard network 88.4% The Waiting-tiles network has the accuracy only 40.2% is that the result only calculate whether the maximum of output is being waited.
  • 21. Experiment and Result • Comparison of three models in our experiment 21 Model Game state Attack route Defense route Best choice algorithm (BCA) Make a call of riichi or open hands with over three melds Choose the tile which can make hands closer to winning hands Choose the tile which the in- waiting player has discarded Combine BCA's attack mode with deep model for defense Make a prediction that someone may in waiting Choose the tile which can make hands closer to winning hands Choose the tile which will lead to the least loss Deep model Make a prediction that someone may in waiting Imitate expert players discard base on current game state Choose the tile which will lead to the least loss
  • 22. Experiment and Result • We perform 60 games for each model on “Ippan” table, which every player can participate in. 22 Ippan table (avg. lv. 1.5) Top 2nd 3rd 4th Win rate Feed rate Aggressive move BCA 27% 30% 25% 18% 24% 11% 14% BCA + defense model 17% 28% 45% 10% 18% 8% 0% Deep model 22% 27% 33% 18% 20% 9% 8% Players' average (Tenhou) 20% 23% 27% 30% 20% 19% - Geen: Worst performance Red: Best performance
  • 23. Experiment and Result • We perform 100 games for each model on “Joukyuu” table. 23 Joukyuu table (avg. lv. 11.75) Top 2nd 3rd 4th Win rate Feed rate Aggressive move BCA 19% 23% 30% 28% 16% 18% 12% BCA with deep model 22% 28% 33% 17% 17% 8% 1% Deep model 24% 29% 27% 20% 21% 11% 7% Players' average (Tenhou) 25% 25% 25% 25% 23% 15% 17% Geen: Worst performance Red: Best performance
  • 24. Competition Between Each Model 24 1st/2nd/3rd/4th 1 BCA 1 BCA + defense model 1 Deep model 3 BCA - 2/6/9/3 3/7/6/4 3 BCA + defense model 6/4/5/5 - 5/6/5/4 3 Deep model 4/5/5/6 4/7/7/2 - The result table shows that: • BCA • Good in attack • Easy to be defended • BCA + defense mode • Great in defense • Less aggressive move • Deep model • Good in defense • Balance in defensive and offensive We performed 20 game for each model with a 1 vs 3 games.
  • 25. Comparison Between Discard Method • Two discard methods show different performance during expriment. • Make a comparison for these two methods. • It’s easier to be speculate the non-deep learning AI’s state and what tiles it’s waiting for. • Deep model performs more like a human player than non-deep learning AI in attack which we can get from the top rate and win rate. 25 Discard method Waiting Waiting rate Waiting prediction Waiting tiles prediction BCA 438 53.94% 91.32% 57.53% Discard model 411 49.58% 83.43% 39.90%
  • 26. Conclusion • The deep model in this study shows a good performance during Mahjong games. – High 2nd rate. – Aggressive move. • New feature engineering performs good. • Performance when model predicts that someone is in waiting are better than human player’s average. • It’s possible to make a better multi-network model based on this experiment. Thank you for listening. 26
  • 27. Research performance ・Information Processing Society of Japan 1) Yeqin Zheng, Soichiro Yokoyama, Tomohisa Yamashita, Hidenori Kawamura: Study on Evaluation Function Design of Mahjong using Supervised Learning, Special Internet Groups(Sig), Vol 194, Hokkaido(2019) 27