These are slides about the 2021 Fighting Game Artificial Intelligence Competition virtually presented at the 2021 IEEE Conference on Games (CoG), August 16-20, 2021. (updated on August 28, 2021)
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
2021 Fighting Game AI Competition
1. 2021 Fighting Game AI Competition
Keita Fujimaki, lead programmer
Xincheng Dai, co- lead programmer
Roman Savchyn, tester, etc.
Hideyasu Inoue, advisor
Pujana Paliyawan vice director
Ruck Thawonmas director
Team FightingICE
Intelligent Computer Entertainment Laboratory
Ritsumeikan University
Japan
Game resources are from The Rumble Fish 2 with the courtesy of Dimps Corporation.
http://www.ice.ci.ritsumei.ac.jp/~ftgaic/
CoG 2021: Aug 16-20, 2021 Updated on August 28, 2021
3. Fighting game AI platform viable to develop with
a small-size team in Java and also wrapped for Python
First of its kinds since 2013 & CIG 2014, developed from
scratch without using game ROM data
Aims:
for research on general fighting
game AIs
Strong against any unseen
opponents (AIs or players) ,
character types, and play modes
FightingICE
http://www.ice.ci.ritsumei.ac.jp/~ftgaic/
Game resources are from The Rumble Fish 2 with the
courtesy of Dimps Corporation.
CoG 2021: Aug 16-20, 2021
3
4. Has 16.67 ms response time (60 FPS)
for the agent to choose its action out of 40 actions
Provides the latest game state with a delay of 15
frames, to simulate human response time
Equipped with
a forward model
a method for accessing
the screen information
an OpenAI Gym API
FightingICE’s Main Features
Why FightingICE?
Generalization against different
opponents of unknown behaviors
challenging for DRL
60 FPS + introduced delay
challenging for tree search
4
CoG 2021: Aug 16-20, 2021
5. Recent Publications Using FightingICE by
Other Groups since CoG 2020
Rongqin Liang, Yuanheng Zhu, Zhentao Tang, Mu Yang and Xiaolong Zhu - Proximal Policy Optimization with
Elo-based Opponent Selection and Combination with Enhanced Rolling Horizon Evolution Algorithm, 2021 IEEE
Conference on Games, August 17-20, 2021.
Tianyu Chen, Florian Richoux, Javier M. Torres, Katsumi Inoue, "Interpretable Utility-based Models Applied to
the FightingICE Platform," 2021 IEEE Conference on Games, August 17-20, 2021.
Man-Je Kim , Jun Suk Kim , Sungjin James Kim , Min-jung Kim , Chang Wook Ahn, "Genetic state-grouping
algorithm for deep reinforcement learning," Expert Systems with Applications, 15 December 2020.
Xenija Neufeld, "Long-Term Planning and Reactive Execution in Highly Dynamic Environments," Doctoral thesis,
Otto-von-Guericke-Universität Magdeburg, Dec. 2020.
Zhentao Tang, Yuanheng Zhu, Dongbin Zhao, and Simon M. Lucas, "Enhanced Rolling Horizon Evolution
Algorithm with Opponent Model Learning," IEEE Transactions on Games, 2020.
Deng Shida, Takeshi Ito, "Fighting game AI with dynamic difficulty adjustment to make it fun to play against,"
Proc. of the 25th Game Programming Workshop 2020, pp. 58-61, Nov. 2020. (in Japanese)
Yuanheng Zhu ,Dongbin Zhao, "Online Minimax Q Network Learning for Two-Player Zero-Sum Markov Games,"
IEEE Transactions on Neural Networks and Learning Systems, Nov. 2020. (Early Access)
Mohammad Farhan Ferdous, "Privacy Preservation Algorithms on Cryptography for AI as Human-like Robotic
Player for Fighting Game Using Rule-Based Method," Cyber Defense Mechanisms, pp. 185-196. Sep. 2020.
MJ Kim, JH Lee, CW Ahn, "Genetic Optimizing Method for Real-time Monte Carlo Tree Search Problem," Proc.
of the 9 International Conference on Smart Media and Applications, Sep, 2020.
5
CoG 2021: Aug 16-20, 2021
7. Standard and Speedrunning leagues, each using three characters:
ZEN, GARNET, and LUD (GARNET and LUD’s character data not
revealed in advance, unknown
characters)
Standard: considers the winner of a round as the one with the HP above
zero at the time its opponent's HP has reached zero (all AIs' initial HP = 400)
Speedrunning: the winner of a given character type is the AI with the
shortest average time to beat our sample MctsAi (all AIs' initial HP = 400)
Contest Rules
7
CoG 2021: Aug 16-20, 2021
8. Summary of 10 Entries
8
AI Affliation Language Description
BlackMamba
Researcher team from Netease Games
AI Lab, China
Java
PPO trained against weaken MctsAI in Speedrunning League and against self-play or
previous entries with added noise in the character data in Standard League
EggTart Student from KMUTT, Thailand Java Rule-based AI
ERHEA_PPO_PG
Student team from University of
Chinese Academy of Sciences, China
Java
Enhanced Rolling Horizon Evolution Algorithm and combined with Proximal Policy
Optimization (PPO) with Elo-based opponent selection
IBM_AI
Student from Haripur University
graduate, Pakistan
Java Rule-based AI
Thunder2021 Individual developer, Japan Java
1. Prioritize certain actions in advance. 2. Predict the most possible three actions by the
opponent. 3. Select the best action by AI against the opponent's three actions. 4.
Limited actions for ZEN Speedrunning League.
DQAI Individual developer, Vietnam Python
Duel Q-network Reinforcement Learning AI
LTAI Individual developer, China Python Dual-clip PPO with a novel opponent sampling algorithm based on payoff matrix
Ruba
Student from Kyoto Sangyo University,
Japan
Python Rule-base + Genetic Algorithm AI
SummerAI Researcher team from ETRI, Korea Python PPO
WinOrGoHome
Individual researcher from Netease
Games AI Lab, China
Python
PPO trained against MctsAI in Speedrunning League and against self-play in Standard
League
• 5 Java entries, 5 Python entries; 4 student entries, 4 individual developer/researcher entries, 2 researcher team entries
• 4 entries from China, 2 entries from Japan, 1 entry from Korea, Pakistan, Thailand, and Vietnam, respectively
• PPO used in 5 entries, EA in 2 entries
CoG 2021: Aug 16-20, 2021
10. Results
• Winner AI: BlackMamba by Peng ZHANG, Guanghao ZHANG, Xuechun WANG,
Sijia XU, Shuo SHEN, and Weidong Zhang (Netease Games AI Lab, China)
• Proximal Policy Optimization Algorithms (PPO) trained against weaken MctsAI
in Speedrunning League and against self-play or previous entries with added
noise in the character data in Standard League.
• Runner-up AI: WinOrGoHome by Weijun Hong (Netease Games AI Lab, China)
• PPO trained against MctsAI in Speedrunning League and against self-play in
Standard League.
• 3rd Place AI: Thunder2021 by Eita Aoki, an Individual developer, Japan (2020 runner-
up, winner at the 2016, 2017, 2018, and 2019 competition)
• 1. Prioritize certain actions in advance. 2. Predict the most possible three
actions by the opponent. 3. Select the best action by AI against the opponent's
three actions. 4. Limited actions for ZEN Speedrunning League.
10
CoG 2021: Aug 16-20, 2021
Updated on
August 28, 2021
11. Sample Fights
BlackMamba (P1) vs WinOrGoHome (P2)
11
CoG 2021: Aug 16-20, 2021
BlackMamba on GARNET tends to
use a kick action when facing the
opponent. It doesn’t defend the
opponent’s attack, but uses attacks
to fight back.
BlackMamba on ZEN tends to use
a jump action when looking out for
the opponent’s weakness, while
doing more continuous attacks
when pushing the opponent to the
edge.
BlackMamba on LUD tends to find
a chance to hit the opponent in the
air. It also uses a jump action to
break the deadlock.
Please see the descriptions below
12. Thank you and see you at
CoG 2022 in China
(Plan to add human players for
assessment of the AI performance)
http://www.ice.ci.ritsumei.ac.jp/~ftgaic/
12
CoG 2021: Aug 16-20, 2021
13. BlackMamba
An intelligent Fighter based on Reinforcement Learning
Guanghao ZHANG
Xuechun WANG
Peng ZHANG
Sijia XU
Shuo SHEN
Developer: Affiliation: Netease Games AI Lab
Weidong ZHANG
14. Outline
BlackMamba is an RL agent trained by Proximal Policy Optimization. Regarding the
diversity and richness demand of data sampling, our AI is trained by fighting with
history opponents revealed in FightingICE Game and self-play.
The Policy Network we used is a simple six-layer MLP. And its weights are saved in csv
files finally.
In order to improve exploration and balance the convergence speed for different
opponents, we add an opponent selection mechanism during the training process.
15. Training
For speed league, we train the model by fighting with MctsAi. Considering lower
machine performance of the organizers, we let our agent also fight with a weak version
MctsAi, whose searching time is constrained.
worker
worker
worker
Agent MctsAi
VS
Data
Buffer
Policy
Rollout data
Latest policy
Learner
Worker
16. Training
For standard league, we train the model by fighting with historical participants and self-
play. To cope with changes of GARNET and LUD’s motion data, we randomly modify
the motion data when training GARNET and LUD’s model.
worker
worker
worker
Agent
Historical
Participants
VS
Data
Buffer
Policy
Rollout data
Latest policy
Learner
Historical Worker
worker
worker
worker
Agent Agent
VS
Self-play Worker
18. Fighting Game AI
Competition 2021
AI name: EGGTART
Developer name: Gunt CHANMAS
Affiliation: School of Information Technology, KMUTT
19. Outline
Rule-based AI:
Move forward if distance X > 200
Perform “CROUCH_FB” when distance X < 250 and distance Y <= 20
Dodge by
1. “FORWARD_WALK” when distance Y > 40
2. “BACK_STEP” when distance Y > 20
THANK YOU
20. Enhanced Rolling Fighting Bot -
ERHEA_PPO_PG
Rongqin Liang(Student)
Affiliation: University of Chinese Academy of Sciences
Yuanheng Zhu
Affiliation: Chinese Academy of Sciences, Institute of
Automation
Dongbin Zhao
Affiliation: Chinese Academy of Sciences, Institute of
Automation
21. Enhanced Rolling Fighting Bot
• Rolling Fighting Bot is based on Enhanced Rolling Horizon
Evolution Algorithm and combined with Proximal Policy
Optimization with Elo-based opponent selection. It uses Thunder
Bot as a reference with the valid action set as candidate.
• Base: ERHEA_PI, we made 2020.
• New approach:
* Add PPO Algorithm
* Modify Zen’s Action Set in Speed Mode
23. AI For FTG AI Competition
AI Name : IBM_AI
Developer’s Name: Ibrahim Khan
24. Affiliation
Incoming master student at Intelligent Computer
Entertainment Laboratory, Ritsumeikan University.
BSCS in Computer Science From Haripur University,
Pakistan.
From Pakistan.
25. AI Outline
AI is Inspired From The MCTS AI and Zone AI(a
previous entry in the competition).
Simple and stright forward AI with a lot of room for
improvement.
Chooses attacks and movements at random with the
help of some parameters.
No use of Machine Learning.
27. Outline
Base:ReiwaThunder, I made 2020.
New approach
・limited actions for ZEN SPEED MODE
Test
・Generate 30 Motion.csv for GARNET and LUD.
・ Using the generated Motion.csv, play against other AI and adjust the
jump timing and the filter of the moves used.
29. Introduction
• AI Name: DQAI
• Duel Q-network Reinforcement Learning AI
• Developers & Affiliation
• Thai Nguyen Van (nguyenvanthai0212@gmail.com)
• AI Development Language
• Python 3.5
30. AI Outline
• Method: Double Q network Reinforcement Learning
• RL Configuration
• Duel Q network Learning Algorithm
• Trained with MCTS AI
32. AI OUTLIINE
• Based on SpringAI
• Reinforcement Learning && Sample from opponent pool:
• Use an improved version PPO, dual-clip PPO
• PPO: Proximal Policy Optimization Algorithms
• The opponent pool consists of two parts:
• Some java-based AI: HaibuAI, JayBot_GM, MctsAi, UtalFighter
• Historical version of training model
• A novel opponent sampling algorithm based on payoff matrix
33. AI TEST (In windows10)
• Exacting zip file, copy the Folder “LTAI” into the path ${FTG4.50}/python
• Opening a new terminal, ensure the current path is ${FTG4.50} and run:
• java –Xms1024m –Xmx1024m –cp ".FightingICE.jar;.liblwjgl*;.libnativeswindows*;.lib*;.dataai*" Main --py4j --limithp 400 400
• Opening another terminal, ensure the current path is ${FTG4.50/python/LTAI} and run:
• python Main_PyAIvsJavaAI.py
36. Rule
Rule1: AIR or GROUND
I divided states of p1 and p2 into four categories.
Rule2: My energy level
0, 0~50, 50~150, 150~
Rule3: Distance between p1 and p2
~100, ~150, ~200, ~400
39. Details
• AI Name : SummerAI
• Developers & Affiliation
• Dae-Wook Kim (dooroomie@etri.re.kr) and Teammates
• Electronics and Telecommunications Research Institute (ETRI)
• Daejeon, Korea
• AI Development Language
• Python 3.6
41. Network Structure
• Self-attention
X 2
HP / Energy Movement 56 Action State / frame Projectile
My movement
My action
My state
My projectile
My HP/E
Op movement
Op action
Op state
Op projectile
Op HP/E
Gametime
(For my information
and the opponent information)
42. Network Structure
• Self-attention
My movement
My action
My state
My projectile
My HP/E
Op movement
Op action
Op state
Op projectile
Op HP/E
Gametime
query
key
value
𝑠𝑜𝑓𝑡𝑚𝑎𝑥
𝑄𝐾𝑇
𝑑
𝑉
Action
Value
43. How to Test
• After extracting zip file, you can see below files
44. How to Test
• Copy into FTG4.50/python directory
Copy
45. How to Test
• Open a terminal and run the FTG simulator
46. How to Test
• Go to python directory
• Open a new terminal and run python file
48. COMPANY
TURNOVER
┌
• WinOrGoHome is a Python agent totally built with deep reinforcement learning and self-play.
• Only numpy & py4j are required during inference, where the policy is modeled as a simple 3-layer MLP.
• It uses a slightly modified gym API based on [1], with a reduced action space, an enlarged 282-dim observation space, more
training-friendly API and some other fault-tolerant mechanisms for distributed training.
• It is trained by a distributed asynchronous version of PPO [2].
• 6 stand-alone model is trained for each track (i.e. league in FTGAIC):
• We use self-play to train the models for the standard track, with league training to enhance the diversity of opponent
strategies [3].
• The models for speed-run track is trained totally against MctsAi (LUD is finetuned from the self-play model).
O v e r v i e w
[1] https://github.com/TeamFightingICE/Gym-FightingICE
[2] Schulman, J., Wolski, F., Dhariwal, P., Radford, A., & Klimov, O. (2017). Proximal policy optimization algorithms. arXiv preprint
arXiv:1707.06347.
[3] Vinyals, O., Babuschkin, I., Czarnecki, W.M. et al. Grandmaster level in StarCraft II using multi-agent reinforcement learning. Nature
49. COMPANY
TURNOVER
┌
• Our training framework is designed after SEED [4] which is featured with centralized inference.
• It is an asynchronous architecture with great flexibility of large-scale training.
• The controller collects outcomes like winning rate from all environments, and periodically switch the training opponents or
saving the current model as a new opponent.
Tr a i n i n g F r a m e w o r k
[4] Espeholt, L., Marinier, R., Stanczyk, P., Wang, K., & Michalski, M. (2019). Seed rl: Scalable and efficient deep-rl with accelerated
central inference. arXiv preprint arXiv:1910.06591.
50. COMPANY
TURNOVER
┌
F e a t u re E n g i n e e r i n g
[5] Ye, D., Chen, G., Zhang, W., Chen, S., Yuan, B., Liu, B., ... & Liu, W. (2020). Towards playing full moba games with deep
reinforcement learning. arXiv preprint arXiv:2011.12692.
• We extend the original 143-dim vector in Gym-FightingICE env with some more features:
• Relative speed/position/Hp
• Projectile info like speed/hit energy/impact distance, etc.
• Opponent’s action distribution within a round
• Action space is also changed:
• Only keep 41~42 useful actions
• Extend the effect frames of STAND_GRUAD and CROUCH_GRUAD
• Reward:
• Hp difference between the previous and current frames of both players is used for the standard track
• Only the self_hp diff and an additional reward w.r.t. the remaining time at the end of each game is used for speed-run track
• Multiple head value [5] is introduced to reduce the variance of value estimation, but used with just the same discount factor
51. COMPANY
TURNOVER
┌
O p p o n e n t P o o l
• Firstly we want to say thanks to the other teams in the past years’ competition, including: TeraThunder, ButcherPudge, EmcmAi,
SpringAI, CYR_AI, ReiwaThunder, Thunder, FalzAI, MctsAi, SimpleAI, LGIST_Bot, Machete. In consideration of the opponents’
diversity, during self-play, we add these AIs to our initial opponent pool with heuristic sample rates according to both their strength
and style.
• During self-play, for each character, we train our AI against the agents sampled from the opponent pool, as well as
WinOrGoHome’s past generations. Here each generation is trained until convergence and then added to the pool as a new opponent.
• After the first 3~5 generations, we train an exploiter for every second generation that plays against the previous generation, in order
to find out its weakness.
• The whole training procedure ends up with around 10 generations, and the final opponent pool is filled with about 12 past AIs, 6 or
7 self-play AIs, and 3 or 4 exploiter AIs. The last self-play generation is chosen to submit. (GARNET is trained less than 10
generations because it is harder to converge and we do not have enough time.)