SlideShare a Scribd company logo
1 of 49
Download to read offline
Presentation on “Multiagent Bidirectional-
Coordinated Nets for Learning to Play
StarCraft Combat Games”
Kiho Suh

Modulabs( ), June 22nd 2017
About Paper
• Published on March 29th 2017
(v1)

• Updated on June 20th 2017 (v3)

• Alibaba, University College
London

• https://arxiv.org/pdf/
1703.10069.pdf
Motivation
• Single-Agent AI .
(Atari, Baduk, Texas Hold’em )
• . Artificial
General Intelligence ?
• AI agent .
• real-time strategy (RTS) game “StarCraft” .
• play ,
“StarCraft” .
• Parameter space joint learning approach .
?
agent
?
communication
.
communication protocol .
:
multi-agent bidirectionally-coordinated network
(BiCNet) with a vectorized extension of actor-
critic formulation
?
• agent BiCNet .
• evaluation-decision-making process .
• Parameter dynamic grouping
.
•
AI agent .
• label data BiCNet
agent .
https://www.youtube.com/watch?v=kW2q15MNFug
!
Related works
• Jakob Foerster, Yannis M Assael, Nando de Freitas, and
Shimon Whiteson. Learning to communicate with deep
multi-agent reinforcement learning. NIPS 2016.
• Sainbayar Sukhbaatar, Rob Fergus, et al. Learning
multiagent communication with backpropagation. NIPS
2016.
Differentiable Inter-Agent Learning (Jakob Foerster
et al. 2016)
• agent agent Q
RNN time-step
transfer .
•
times-step
agent transfer .
• Agent agent
,
agent
observation action
.
Differentiable Inter-Agent Learning & Reinforced
Inter-Agent Learning (Jakob Foerster et al. 2016)
• (non-stationary
environments)
.
• Starcraft real-trim
strategy (RTS)
.
CommNet (Sainbayar Sukhbaatar et al. 2016)
• Multi-agent .
• passing the averaged message over the agent modules between layers
• fully symmetric, so lacks the ability of handle heterogeneous agent
types
BiCNet
Stochastic Game of N agents and M opponents
• S agent state space
• Ai Controller agent i action space, i ∈ [1, N]
• Bj enemy j action space, j ∈ [1, M]
• T : S x A
N
x B
M
-> S environment deterministic transition
function
• Ri : S x A
N
x B
M
-> R agent/enemy i reward function, i ∈ [1, N+M]
* agent( , ) action space .
Global Reward
• Continuous action space to reduce the redundancy in
modeling the large discrete action space
• Reward shaping agent
.
• Global reward: agent reward
.
Definition of Reward Function
• Eq. (1) controlled agent . enemy global reward .
controlled agent enemy 0 . zero-sum game!
• .
reward .
(controlled agents)
reduced health level for agent j
(enemies)
Minimax Game
• Controlled agent expected sum of discounted
rewards policy .
• Enemy joint policy expected sum .
optimal action-state value function
Sampled historical state-action pairs (s, b) of the
enemies
• Minimax Q-learning . Eq. (2) Q-function modelling
.
• fictitious play( ) enemies policy bφ
.
- AI agent fictions play . Controlled agents( ) enemies
player . Eq.(2) Q-function .
- , supervised learning
deterministic policy bφ .
• Policy network sampled historical state-action pairs(s,b) .
Simpler MDP problem
Enemies policy , Eq. (2)
Stochastic Game MDP .
Eq. (1)
• Eq. (1) global reward Eq. (1) zero-
sum game local collaboration reward function
team collaboration
.
• agent collaboration
.
• Eq. (1) agent local reward agent
.
Extension of formulation of Eq. (1)
• agent i top-K(i)
• k
reward.
• agent top-K .
• Eq (1) .
Bellman equation for agent i
• N numbers, i ∈ {1, ..., N}
Objective as an expectation
• action space model-free policy
iteration .
• Qi gradient policy
vectorized version of deterministic policy
gradient (DPG) .
Final Equation (Actor)
• agents rewards gradient
agent action backpropagate
gradient parameter
backpropagate .
Final Equation (Critic)
• Off-policy deterministic actor-critic
• critic: off-policy action-value function
.
Actor-Critic networks
• Ready to use SGD to compute the updates for both the
actor and critic networks
•
backprop .
BiCNet
• Bi-directional RNN actor-critic .
Design of the two networks
• Parameter agent agent
. agent
.
• agent training test agent
.
• bi-directional RNN agent .
• Full dependency among agents because the gradients from all the actions in
Eq. (9) are efficiently propagated through the entire networks
• Not fully symmetric, and maintaining certain social conventions and roles by
fixing the order of the agents that join the RNN. Solving any possible tie
between multiple optimal joint actions
Experiments
• BicNet agent
built-AI .
• .
•
Experiments
• easy combats
- {3 Marines vs. 1 Super Zergling}
- {3 Wraiths vs. 3 Mutalisks}
• difficult combats
- {5 Marines vs. 5 Marines}
- {15 Marines vs. 16 Marines}
- {20 Marines vs. 30 Zerglings}
- {10 Marines vs. 13 Zerglings}
- {15 Wraiths vs. 17 Wraiths}
• heterogeneous combats
- {2 Dropships and 2 Tanks vs. 1 Ultralisk}
Marine Zergling
Wraith Mutalisk
Dropship Ultralisk
Siege Tank
all images are from
http://starcraft.wikia.com/wiki/
Baselines
• Independent controller (IND): agent .
.
• Fully-connected (FC): agent fully-connected.
.
• CommNet: agent multi-agent
• GreedyMDP with Episodic Zero-Order Optimization (GMEZO):
conducting collaborations through a greedy update over MDP
agents, as well as adding episodic noises in the parameter
space for explorations
Action space for each individual agent
• 3 dimensional real vector
• 1st dimension: ranging from -1 to 1
- Greater than or equal to 0, agent attacks
- otherwise, agent moves
• 2nd and 3rd dimension: degree and distance, collectively
indicating the destination that the agent should move or
attack from its current location
Training
• Nadam optimizer
• learning rate = 0.002
• 800 episodes (more than 700k steps)
Simple Experiment
• tested on 100 independent games
• skip frame: how many frames we should skip for controlling the agents actions
• when batch_size is 32 (highest mean Q-value after 600k training steps) and skip_frame
is 2 (highest mean Q-value after between 300k and 600k) has the highest winning rate.
Simple Experiment
• Letting 4~6 agents work together as a group can efficiently control individual agents while
maximizing damage output.
• Fig 3, 4~5 as group size would help achieve best performance.
• Fig 4, the convergence speed by plotting the winning rate against the number of training
episodes.
Performance Comparison
• BicNet is trained over 100k steps
• measuring the performance as the average winning rate on 100 test
games
• when the number of agents goes beyond 10, the margin of
performance between BiCNet and the second best starts to increase
Performance Comparison
• “5M vs. 5M”, key factor to win is to “focus fire” on the weak.
• As BicNet has built-in design for dynamic grouping, small number of agents (such as
“5M vs. 5M”) does not suffice to show the advantages of BiCNet on large-scale
collaborations.
• For “5M vs. 5M”, BicNet only needs 10 combats before learning the idea of “focus
fire,” achieving 85% win rate, whereas CommNet needs at least 50 episodes with a
much lower winning rate
Visualization
• “3 Marines vs. 1 Super Zergling” when the coordinated cover attack has been
learned.
• Collected values in the last hidden layer of the well-trained critic network over 10k
steps.
• t-SNE
Strategies to Experiment
• Move without collision
• task. .
• Hit and run
• task. .
• Cover attack
• task. .
• Focus fire without overkill
• task. .
• Collaboration between heterogeneous agents
• task. .
https://www.youtube.com/watch?v=kW2q15MNFug
!
Coordinated moves without collision (3 Marines
(ours) vs. 1 Super Zergling)
• The agents move in a rather uncoordinated way, particularly, when two agents are close to each
other, i.e. one agent may unintentionally block the other’s path.
• After 40k steps in around 50 episodes, the number of collisions reduces dramatically.
Winning rate against difficult settings
Hit and Run tactics (3 Marines (ours) vs. 1 Zealot)
Move agents away if under attack, and fight back when feel
safe again.
Coordinated Cover Attack (4 Dragoons (ours) vs. 2
Ultralisks)
• Let one agent draw fire or attention from the enemies.
• At the meantime, other agents can take advantage of the time or distance
gap to cause more harms.
Coordinated Cover Attack (3 Marines (ours) vs. 1
Zergling)
Focus fire without overkill (15 Marines (ours) vs. 16
Marines)
• How to efficiently allocate the attacking resources becomes important.
• The grouping design in the policy network serves as the key factor for BiCNet to learn
“focus fire without overkill.”
• Even with the decreasing of our unit number, each group can be dynamically resigned
to make sure that the 3~5 units focus on attacking on same enemy.
Collaborations between heterogeneous agents (2
Dropships and 2 tanks vs. 1 Ultralisk)
• A wide variety of types of units in Starcraft
• Can be easily implemented in BicNet
Further to Investigate after this paper
• Strong correlation between the specified reward and the
learned policies
• How the policies are communicated over the networks
among agents
• Whether there is a specific language that may have
emerged
• Nash Equilibrium when both sides are played by deep
multi agent models.
– Youtube
“ ”
“ xx xx xx xx ”
“ 20 …”
!

More Related Content

What's hot

[DL輪読会]NVAE: A Deep Hierarchical Variational Autoencoder
[DL輪読会]NVAE: A Deep Hierarchical Variational Autoencoder[DL輪読会]NVAE: A Deep Hierarchical Variational Autoencoder
[DL輪読会]NVAE: A Deep Hierarchical Variational AutoencoderDeep Learning JP
 
[DL輪読会]深層強化学習はなぜ難しいのか?Why Deep RL fails? A brief survey of recent works.
[DL輪読会]深層強化学習はなぜ難しいのか?Why Deep RL fails? A brief survey of recent works.[DL輪読会]深層強化学習はなぜ難しいのか?Why Deep RL fails? A brief survey of recent works.
[DL輪読会]深層強化学習はなぜ難しいのか?Why Deep RL fails? A brief survey of recent works.Deep Learning JP
 
【DL輪読会】Contrastive Learning as Goal-Conditioned Reinforcement Learning
【DL輪読会】Contrastive Learning as Goal-Conditioned Reinforcement Learning【DL輪読会】Contrastive Learning as Goal-Conditioned Reinforcement Learning
【DL輪読会】Contrastive Learning as Goal-Conditioned Reinforcement LearningDeep Learning JP
 
[DL輪読会] MoCoGAN: Decomposing Motion and Content for Video Generation
[DL輪読会] MoCoGAN: Decomposing Motion and Content for Video Generation[DL輪読会] MoCoGAN: Decomposing Motion and Content for Video Generation
[DL輪読会] MoCoGAN: Decomposing Motion and Content for Video GenerationDeep Learning JP
 
【CVPR 2019】Second-order Attention Network for Single Image Super-Resolution
【CVPR 2019】Second-order Attention Network for Single Image Super-Resolution【CVPR 2019】Second-order Attention Network for Single Image Super-Resolution
【CVPR 2019】Second-order Attention Network for Single Image Super-Resolutioncvpaper. challenge
 
"Semantic Segmentation for Scene Understanding: Algorithms and Implementation...
"Semantic Segmentation for Scene Understanding: Algorithms and Implementation..."Semantic Segmentation for Scene Understanding: Algorithms and Implementation...
"Semantic Segmentation for Scene Understanding: Algorithms and Implementation...Edge AI and Vision Alliance
 
Variational Auto Encoder and the Math Behind
Variational Auto Encoder and the Math BehindVariational Auto Encoder and the Math Behind
Variational Auto Encoder and the Math BehindVarun Reddy
 
사용자가 업로드한 사진의 마커를 이용해 OpenCV + aruco로 일정한 형태로 만들고 영상으로 만들어 보았다.
사용자가 업로드한 사진의 마커를 이용해 OpenCV + aruco로 일정한 형태로 만들고 영상으로 만들어 보았다.사용자가 업로드한 사진의 마커를 이용해 OpenCV + aruco로 일정한 형태로 만들고 영상으로 만들어 보았다.
사용자가 업로드한 사진의 마커를 이용해 OpenCV + aruco로 일정한 형태로 만들고 영상으로 만들어 보았다.flashscope
 
[DL輪読会]Energy-based generative adversarial networks
[DL輪読会]Energy-based generative adversarial networks[DL輪読会]Energy-based generative adversarial networks
[DL輪読会]Energy-based generative adversarial networksDeep Learning JP
 
Variational Autoencoders VAE - Santiago Pascual - UPC Barcelona 2018
Variational Autoencoders VAE - Santiago Pascual - UPC Barcelona 2018Variational Autoencoders VAE - Santiago Pascual - UPC Barcelona 2018
Variational Autoencoders VAE - Santiago Pascual - UPC Barcelona 2018Universitat Politècnica de Catalunya
 
Show, Attend and Tell: Neural Image Caption Generation with Visual Attention
Show, Attend and Tell: Neural Image Caption Generation with Visual AttentionShow, Attend and Tell: Neural Image Caption Generation with Visual Attention
Show, Attend and Tell: Neural Image Caption Generation with Visual AttentionEun Ji Lee
 
【DL輪読会】SDEdit: Guided Image Synthesis and Editing with Stochastic Differentia...
【DL輪読会】SDEdit: Guided Image Synthesis and Editing with Stochastic Differentia...【DL輪読会】SDEdit: Guided Image Synthesis and Editing with Stochastic Differentia...
【DL輪読会】SDEdit: Guided Image Synthesis and Editing with Stochastic Differentia...Deep Learning JP
 
ResNet basics (Deep Residual Network for Image Recognition)
ResNet basics (Deep Residual Network for Image Recognition)ResNet basics (Deep Residual Network for Image Recognition)
ResNet basics (Deep Residual Network for Image Recognition)Sanjay Saha
 
이펙트 쉐이더 1강 - Shader 기초 개념
이펙트 쉐이더 1강 - Shader 기초 개념이펙트 쉐이더 1강 - Shader 기초 개념
이펙트 쉐이더 1강 - Shader 기초 개념Jihoo Oh
 
[머가]Chap11 강화학습
[머가]Chap11 강화학습[머가]Chap11 강화학습
[머가]Chap11 강화학습종현 최
 
Photo-realistic Single Image Super-resolution using a Generative Adversarial ...
Photo-realistic Single Image Super-resolution using a Generative Adversarial ...Photo-realistic Single Image Super-resolution using a Generative Adversarial ...
Photo-realistic Single Image Super-resolution using a Generative Adversarial ...Hansol Kang
 
강화학습 해부학 교실: Rainbow 이론부터 구현까지 (2nd dlcat in Daejeon)
강화학습 해부학 교실: Rainbow 이론부터 구현까지 (2nd dlcat in Daejeon)강화학습 해부학 교실: Rainbow 이론부터 구현까지 (2nd dlcat in Daejeon)
강화학습 해부학 교실: Rainbow 이론부터 구현까지 (2nd dlcat in Daejeon)Kyunghwan Kim
 
経済学のための並列分散処理1
経済学のための並列分散処理1経済学のための並列分散処理1
経済学のための並列分散処理1Masa Kato
 
Efficient Lifelong Learning with A-GEM ( ICLR 2019 読み会 in 京都 20190602)
Efficient Lifelong Learning with A-GEM ( ICLR 2019 読み会 in 京都 20190602)Efficient Lifelong Learning with A-GEM ( ICLR 2019 読み会 in 京都 20190602)
Efficient Lifelong Learning with A-GEM ( ICLR 2019 読み会 in 京都 20190602)YuMaruyama
 
(2021年8月版)深層学習によるImage Classificaitonの発展
(2021年8月版)深層学習によるImage Classificaitonの発展(2021年8月版)深層学習によるImage Classificaitonの発展
(2021年8月版)深層学習によるImage Classificaitonの発展Takumi Ohkuma
 

What's hot (20)

[DL輪読会]NVAE: A Deep Hierarchical Variational Autoencoder
[DL輪読会]NVAE: A Deep Hierarchical Variational Autoencoder[DL輪読会]NVAE: A Deep Hierarchical Variational Autoencoder
[DL輪読会]NVAE: A Deep Hierarchical Variational Autoencoder
 
[DL輪読会]深層強化学習はなぜ難しいのか?Why Deep RL fails? A brief survey of recent works.
[DL輪読会]深層強化学習はなぜ難しいのか?Why Deep RL fails? A brief survey of recent works.[DL輪読会]深層強化学習はなぜ難しいのか?Why Deep RL fails? A brief survey of recent works.
[DL輪読会]深層強化学習はなぜ難しいのか?Why Deep RL fails? A brief survey of recent works.
 
【DL輪読会】Contrastive Learning as Goal-Conditioned Reinforcement Learning
【DL輪読会】Contrastive Learning as Goal-Conditioned Reinforcement Learning【DL輪読会】Contrastive Learning as Goal-Conditioned Reinforcement Learning
【DL輪読会】Contrastive Learning as Goal-Conditioned Reinforcement Learning
 
[DL輪読会] MoCoGAN: Decomposing Motion and Content for Video Generation
[DL輪読会] MoCoGAN: Decomposing Motion and Content for Video Generation[DL輪読会] MoCoGAN: Decomposing Motion and Content for Video Generation
[DL輪読会] MoCoGAN: Decomposing Motion and Content for Video Generation
 
【CVPR 2019】Second-order Attention Network for Single Image Super-Resolution
【CVPR 2019】Second-order Attention Network for Single Image Super-Resolution【CVPR 2019】Second-order Attention Network for Single Image Super-Resolution
【CVPR 2019】Second-order Attention Network for Single Image Super-Resolution
 
"Semantic Segmentation for Scene Understanding: Algorithms and Implementation...
"Semantic Segmentation for Scene Understanding: Algorithms and Implementation..."Semantic Segmentation for Scene Understanding: Algorithms and Implementation...
"Semantic Segmentation for Scene Understanding: Algorithms and Implementation...
 
Variational Auto Encoder and the Math Behind
Variational Auto Encoder and the Math BehindVariational Auto Encoder and the Math Behind
Variational Auto Encoder and the Math Behind
 
사용자가 업로드한 사진의 마커를 이용해 OpenCV + aruco로 일정한 형태로 만들고 영상으로 만들어 보았다.
사용자가 업로드한 사진의 마커를 이용해 OpenCV + aruco로 일정한 형태로 만들고 영상으로 만들어 보았다.사용자가 업로드한 사진의 마커를 이용해 OpenCV + aruco로 일정한 형태로 만들고 영상으로 만들어 보았다.
사용자가 업로드한 사진의 마커를 이용해 OpenCV + aruco로 일정한 형태로 만들고 영상으로 만들어 보았다.
 
[DL輪読会]Energy-based generative adversarial networks
[DL輪読会]Energy-based generative adversarial networks[DL輪読会]Energy-based generative adversarial networks
[DL輪読会]Energy-based generative adversarial networks
 
Variational Autoencoders VAE - Santiago Pascual - UPC Barcelona 2018
Variational Autoencoders VAE - Santiago Pascual - UPC Barcelona 2018Variational Autoencoders VAE - Santiago Pascual - UPC Barcelona 2018
Variational Autoencoders VAE - Santiago Pascual - UPC Barcelona 2018
 
Show, Attend and Tell: Neural Image Caption Generation with Visual Attention
Show, Attend and Tell: Neural Image Caption Generation with Visual AttentionShow, Attend and Tell: Neural Image Caption Generation with Visual Attention
Show, Attend and Tell: Neural Image Caption Generation with Visual Attention
 
【DL輪読会】SDEdit: Guided Image Synthesis and Editing with Stochastic Differentia...
【DL輪読会】SDEdit: Guided Image Synthesis and Editing with Stochastic Differentia...【DL輪読会】SDEdit: Guided Image Synthesis and Editing with Stochastic Differentia...
【DL輪読会】SDEdit: Guided Image Synthesis and Editing with Stochastic Differentia...
 
ResNet basics (Deep Residual Network for Image Recognition)
ResNet basics (Deep Residual Network for Image Recognition)ResNet basics (Deep Residual Network for Image Recognition)
ResNet basics (Deep Residual Network for Image Recognition)
 
이펙트 쉐이더 1강 - Shader 기초 개념
이펙트 쉐이더 1강 - Shader 기초 개념이펙트 쉐이더 1강 - Shader 기초 개념
이펙트 쉐이더 1강 - Shader 기초 개념
 
[머가]Chap11 강화학습
[머가]Chap11 강화학습[머가]Chap11 강화학습
[머가]Chap11 강화학습
 
Photo-realistic Single Image Super-resolution using a Generative Adversarial ...
Photo-realistic Single Image Super-resolution using a Generative Adversarial ...Photo-realistic Single Image Super-resolution using a Generative Adversarial ...
Photo-realistic Single Image Super-resolution using a Generative Adversarial ...
 
강화학습 해부학 교실: Rainbow 이론부터 구현까지 (2nd dlcat in Daejeon)
강화학습 해부학 교실: Rainbow 이론부터 구현까지 (2nd dlcat in Daejeon)강화학습 해부학 교실: Rainbow 이론부터 구현까지 (2nd dlcat in Daejeon)
강화학습 해부학 교실: Rainbow 이론부터 구현까지 (2nd dlcat in Daejeon)
 
経済学のための並列分散処理1
経済学のための並列分散処理1経済学のための並列分散処理1
経済学のための並列分散処理1
 
Efficient Lifelong Learning with A-GEM ( ICLR 2019 読み会 in 京都 20190602)
Efficient Lifelong Learning with A-GEM ( ICLR 2019 読み会 in 京都 20190602)Efficient Lifelong Learning with A-GEM ( ICLR 2019 読み会 in 京都 20190602)
Efficient Lifelong Learning with A-GEM ( ICLR 2019 読み会 in 京都 20190602)
 
(2021年8月版)深層学習によるImage Classificaitonの発展
(2021年8月版)深層学習によるImage Classificaitonの発展(2021年8月版)深層学習によるImage Classificaitonの発展
(2021年8月版)深層学習によるImage Classificaitonの発展
 

Viewers also liked

Concurrent Programming (Java thread 다루기)
Concurrent Programming (Java thread 다루기)Concurrent Programming (Java thread 다루기)
Concurrent Programming (Java thread 다루기)JungGeun Lee
 
Reinforcement learning v0.5
Reinforcement learning v0.5Reinforcement learning v0.5
Reinforcement learning v0.5SANG WON PARK
 
알파고 해부하기 3부
알파고 해부하기 3부알파고 해부하기 3부
알파고 해부하기 3부Donghun Lee
 
[한국어] Neural Architecture Search with Reinforcement Learning
[한국어] Neural Architecture Search with Reinforcement Learning[한국어] Neural Architecture Search with Reinforcement Learning
[한국어] Neural Architecture Search with Reinforcement LearningKiho Suh
 
머피의 머신러닝: Undirencted Graphical Model
머피의 머신러닝: Undirencted Graphical Model머피의 머신러닝: Undirencted Graphical Model
머피의 머신러닝: Undirencted Graphical ModelJungkyu Lee
 
주식 기술적 분석#3 (추세선)
주식 기술적 분석#3 (추세선)주식 기술적 분석#3 (추세선)
주식 기술적 분석#3 (추세선)Ant House
 
1차시 smart education
1차시 smart education1차시 smart education
1차시 smart educationJaechoon Jo
 
20160409 microsoft 세미나 머신러닝관련 발표자료
20160409 microsoft 세미나 머신러닝관련 발표자료20160409 microsoft 세미나 머신러닝관련 발표자료
20160409 microsoft 세미나 머신러닝관련 발표자료JungGeun Lee
 
기계 학습의 현재와 미래
기계 학습의 현재와 미래기계 학습의 현재와 미래
기계 학습의 현재와 미래Joon Kim
 
STRONG KOREA 20130609
STRONG KOREA 20130609STRONG KOREA 20130609
STRONG KOREA 20130609Leo Kim
 
기술적분석 16 trix
기술적분석 16 trix기술적분석 16 trix
기술적분석 16 trixAnt House
 
[BIZ+005 스타트업 투자/법률 기초편] 첫 투자를 위한 스타트업 기초상식 | 비즈업 조가연님
[BIZ+005 스타트업 투자/법률 기초편] 첫 투자를 위한 스타트업 기초상식 | 비즈업 조가연님 [BIZ+005 스타트업 투자/법률 기초편] 첫 투자를 위한 스타트업 기초상식 | 비즈업 조가연님
[BIZ+005 스타트업 투자/법률 기초편] 첫 투자를 위한 스타트업 기초상식 | 비즈업 조가연님 BIZ+
 
Machine Learning Foundations (a case study approach) 강의 정리
Machine Learning Foundations (a case study approach) 강의 정리Machine Learning Foundations (a case study approach) 강의 정리
Machine Learning Foundations (a case study approach) 강의 정리SANG WON PARK
 
기술적분석16 sonar
기술적분석16 sonar기술적분석16 sonar
기술적분석16 sonarAnt House
 
알파고 학습 이해하기
알파고 학습 이해하기알파고 학습 이해하기
알파고 학습 이해하기도형 임
 
강화 학습 기초 Reinforcement Learning an introduction
강화 학습 기초 Reinforcement Learning an introduction강화 학습 기초 Reinforcement Learning an introduction
강화 학습 기초 Reinforcement Learning an introductionTaehoon Kim
 
Deview deep learning-김정희
Deview deep learning-김정희Deview deep learning-김정희
Deview deep learning-김정희NAVER D2
 

Viewers also liked (20)

Concurrent Programming (Java thread 다루기)
Concurrent Programming (Java thread 다루기)Concurrent Programming (Java thread 다루기)
Concurrent Programming (Java thread 다루기)
 
Reinforcement learning v0.5
Reinforcement learning v0.5Reinforcement learning v0.5
Reinforcement learning v0.5
 
알파고 해부하기 3부
알파고 해부하기 3부알파고 해부하기 3부
알파고 해부하기 3부
 
[한국어] Neural Architecture Search with Reinforcement Learning
[한국어] Neural Architecture Search with Reinforcement Learning[한국어] Neural Architecture Search with Reinforcement Learning
[한국어] Neural Architecture Search with Reinforcement Learning
 
4차산업혁명
4차산업혁명4차산업혁명
4차산업혁명
 
머피의 머신러닝: Undirencted Graphical Model
머피의 머신러닝: Undirencted Graphical Model머피의 머신러닝: Undirencted Graphical Model
머피의 머신러닝: Undirencted Graphical Model
 
주식 기술적 분석#3 (추세선)
주식 기술적 분석#3 (추세선)주식 기술적 분석#3 (추세선)
주식 기술적 분석#3 (추세선)
 
인공지능을 위한 뇌과학
인공지능을 위한 뇌과학인공지능을 위한 뇌과학
인공지능을 위한 뇌과학
 
1차시 smart education
1차시 smart education1차시 smart education
1차시 smart education
 
20160409 microsoft 세미나 머신러닝관련 발표자료
20160409 microsoft 세미나 머신러닝관련 발표자료20160409 microsoft 세미나 머신러닝관련 발표자료
20160409 microsoft 세미나 머신러닝관련 발표자료
 
기계 학습의 현재와 미래
기계 학습의 현재와 미래기계 학습의 현재와 미래
기계 학습의 현재와 미래
 
STRONG KOREA 20130609
STRONG KOREA 20130609STRONG KOREA 20130609
STRONG KOREA 20130609
 
기술적분석 16 trix
기술적분석 16 trix기술적분석 16 trix
기술적분석 16 trix
 
파이썬으로 익히는 딥러닝
파이썬으로 익히는 딥러닝파이썬으로 익히는 딥러닝
파이썬으로 익히는 딥러닝
 
[BIZ+005 스타트업 투자/법률 기초편] 첫 투자를 위한 스타트업 기초상식 | 비즈업 조가연님
[BIZ+005 스타트업 투자/법률 기초편] 첫 투자를 위한 스타트업 기초상식 | 비즈업 조가연님 [BIZ+005 스타트업 투자/법률 기초편] 첫 투자를 위한 스타트업 기초상식 | 비즈업 조가연님
[BIZ+005 스타트업 투자/법률 기초편] 첫 투자를 위한 스타트업 기초상식 | 비즈업 조가연님
 
Machine Learning Foundations (a case study approach) 강의 정리
Machine Learning Foundations (a case study approach) 강의 정리Machine Learning Foundations (a case study approach) 강의 정리
Machine Learning Foundations (a case study approach) 강의 정리
 
기술적분석16 sonar
기술적분석16 sonar기술적분석16 sonar
기술적분석16 sonar
 
알파고 학습 이해하기
알파고 학습 이해하기알파고 학습 이해하기
알파고 학습 이해하기
 
강화 학습 기초 Reinforcement Learning an introduction
강화 학습 기초 Reinforcement Learning an introduction강화 학습 기초 Reinforcement Learning an introduction
강화 학습 기초 Reinforcement Learning an introduction
 
Deview deep learning-김정희
Deview deep learning-김정희Deview deep learning-김정희
Deview deep learning-김정희
 

Similar to Multiagent Bidirectional-Coordinated Nets for Learning to Play StarCraft Combat Games

BOOSTING ADVERSARIAL ATTACKS WITH MOMENTUM - Tianyu Pang and Chao Du, THU - D...
BOOSTING ADVERSARIAL ATTACKS WITH MOMENTUM - Tianyu Pang and Chao Du, THU - D...BOOSTING ADVERSARIAL ATTACKS WITH MOMENTUM - Tianyu Pang and Chao Du, THU - D...
BOOSTING ADVERSARIAL ATTACKS WITH MOMENTUM - Tianyu Pang and Chao Du, THU - D...GeekPwn Keen
 
adversarial robustness lecture
adversarial robustness lectureadversarial robustness lecture
adversarial robustness lectureMuhammadAhmedShah2
 
Ed Snelson. Counterfactual Analysis
Ed Snelson. Counterfactual AnalysisEd Snelson. Counterfactual Analysis
Ed Snelson. Counterfactual AnalysisVolha Banadyseva
 
Improving the Performance of MCTS-Based μRTS Agents Through Move Pruning
Improving the Performance of MCTS-Based μRTS Agents Through Move PruningImproving the Performance of MCTS-Based μRTS Agents Through Move Pruning
Improving the Performance of MCTS-Based μRTS Agents Through Move PruningAntonio Mora
 
Red + Blue, How Purple Are You
Red + Blue, How Purple Are YouRed + Blue, How Purple Are You
Red + Blue, How Purple Are YouJared Atkinson
 
Adversarial search
Adversarial searchAdversarial search
Adversarial searchNilu Desai
 
QMIX: monotonic value function factorization paper review
QMIX: monotonic value function factorization paper reviewQMIX: monotonic value function factorization paper review
QMIX: monotonic value function factorization paper review민재 정
 
Markov Cluster Algorithm & real world application
Markov Cluster Algorithm & real world applicationMarkov Cluster Algorithm & real world application
Markov Cluster Algorithm & real world applicationAndjela Todorovic
 
Cvpr 2018 papers review (efficient computing)
Cvpr 2018 papers review (efficient computing)Cvpr 2018 papers review (efficient computing)
Cvpr 2018 papers review (efficient computing)DonghyunKang12
 
MLSEV. Logistic Regression, Deepnets, and Time Series
MLSEV. Logistic Regression, Deepnets, and Time Series MLSEV. Logistic Regression, Deepnets, and Time Series
MLSEV. Logistic Regression, Deepnets, and Time Series BigML, Inc
 
The Power and Peril of PCG
The Power and Peril of PCGThe Power and Peril of PCG
The Power and Peril of PCGGillian Smith
 
Create a Scalable and Destructible World in HITMAN 2*
Create a Scalable and Destructible World in HITMAN 2*Create a Scalable and Destructible World in HITMAN 2*
Create a Scalable and Destructible World in HITMAN 2*Intel® Software
 
DutchMLSchool. Logistic Regression, Deepnets, Time Series
DutchMLSchool. Logistic Regression, Deepnets, Time SeriesDutchMLSchool. Logistic Regression, Deepnets, Time Series
DutchMLSchool. Logistic Regression, Deepnets, Time SeriesBigML, Inc
 
brief Introduction to Different Kinds of GANs
brief Introduction to Different Kinds of GANsbrief Introduction to Different Kinds of GANs
brief Introduction to Different Kinds of GANsParham Zilouchian
 
Pixelor presentation slides for SIGGRAPH Asia 2020
Pixelor presentation slides for SIGGRAPH Asia 2020Pixelor presentation slides for SIGGRAPH Asia 2020
Pixelor presentation slides for SIGGRAPH Asia 2020Ayan Das
 
Overview of Convolutional Neural Networks
Overview of Convolutional Neural NetworksOverview of Convolutional Neural Networks
Overview of Convolutional Neural Networksananth
 

Similar to Multiagent Bidirectional-Coordinated Nets for Learning to Play StarCraft Combat Games (20)

BOOSTING ADVERSARIAL ATTACKS WITH MOMENTUM - Tianyu Pang and Chao Du, THU - D...
BOOSTING ADVERSARIAL ATTACKS WITH MOMENTUM - Tianyu Pang and Chao Du, THU - D...BOOSTING ADVERSARIAL ATTACKS WITH MOMENTUM - Tianyu Pang and Chao Du, THU - D...
BOOSTING ADVERSARIAL ATTACKS WITH MOMENTUM - Tianyu Pang and Chao Du, THU - D...
 
adversarial robustness lecture
adversarial robustness lectureadversarial robustness lecture
adversarial robustness lecture
 
Ed Snelson. Counterfactual Analysis
Ed Snelson. Counterfactual AnalysisEd Snelson. Counterfactual Analysis
Ed Snelson. Counterfactual Analysis
 
Improving the Performance of MCTS-Based μRTS Agents Through Move Pruning
Improving the Performance of MCTS-Based μRTS Agents Through Move PruningImproving the Performance of MCTS-Based μRTS Agents Through Move Pruning
Improving the Performance of MCTS-Based μRTS Agents Through Move Pruning
 
Red + Blue, How Purple Are You
Red + Blue, How Purple Are YouRed + Blue, How Purple Are You
Red + Blue, How Purple Are You
 
Adversarial search
Adversarial searchAdversarial search
Adversarial search
 
QMIX: monotonic value function factorization paper review
QMIX: monotonic value function factorization paper reviewQMIX: monotonic value function factorization paper review
QMIX: monotonic value function factorization paper review
 
Markov Cluster Algorithm & real world application
Markov Cluster Algorithm & real world applicationMarkov Cluster Algorithm & real world application
Markov Cluster Algorithm & real world application
 
Cvpr 2018 papers review (efficient computing)
Cvpr 2018 papers review (efficient computing)Cvpr 2018 papers review (efficient computing)
Cvpr 2018 papers review (efficient computing)
 
MLSEV. Logistic Regression, Deepnets, and Time Series
MLSEV. Logistic Regression, Deepnets, and Time Series MLSEV. Logistic Regression, Deepnets, and Time Series
MLSEV. Logistic Regression, Deepnets, and Time Series
 
The Power and Peril of PCG
The Power and Peril of PCGThe Power and Peril of PCG
The Power and Peril of PCG
 
Create a Scalable and Destructible World in HITMAN 2*
Create a Scalable and Destructible World in HITMAN 2*Create a Scalable and Destructible World in HITMAN 2*
Create a Scalable and Destructible World in HITMAN 2*
 
ConvNets_C_Focke2
ConvNets_C_Focke2ConvNets_C_Focke2
ConvNets_C_Focke2
 
Network Centric Warfare
Network Centric WarfareNetwork Centric Warfare
Network Centric Warfare
 
Machine Learning for Dummies
Machine Learning for DummiesMachine Learning for Dummies
Machine Learning for Dummies
 
DutchMLSchool. Logistic Regression, Deepnets, Time Series
DutchMLSchool. Logistic Regression, Deepnets, Time SeriesDutchMLSchool. Logistic Regression, Deepnets, Time Series
DutchMLSchool. Logistic Regression, Deepnets, Time Series
 
11_gan.pdf
11_gan.pdf11_gan.pdf
11_gan.pdf
 
brief Introduction to Different Kinds of GANs
brief Introduction to Different Kinds of GANsbrief Introduction to Different Kinds of GANs
brief Introduction to Different Kinds of GANs
 
Pixelor presentation slides for SIGGRAPH Asia 2020
Pixelor presentation slides for SIGGRAPH Asia 2020Pixelor presentation slides for SIGGRAPH Asia 2020
Pixelor presentation slides for SIGGRAPH Asia 2020
 
Overview of Convolutional Neural Networks
Overview of Convolutional Neural NetworksOverview of Convolutional Neural Networks
Overview of Convolutional Neural Networks
 

Recently uploaded

Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionWeek-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionfulawalesam
 
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz1
 
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxBPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxMohammedJunaid861692
 
定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一
定制英国白金汉大学毕业证(UCB毕业证书)																			成绩单原版一比一定制英国白金汉大学毕业证(UCB毕业证书)																			成绩单原版一比一
定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一ffjhghh
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfMarinCaroMartnezBerg
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusTimothy Spann
 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptxAnupama Kate
 
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...Suhani Kapoor
 
BabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxBabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxolyaivanovalion
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxolyaivanovalion
 
Brighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingBrighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingNeil Barnes
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
Data-Analysis for Chicago Crime Data 2023
Data-Analysis for Chicago Crime Data  2023Data-Analysis for Chicago Crime Data  2023
Data-Analysis for Chicago Crime Data 2023ymrp368
 
Industrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfIndustrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfLars Albertsson
 
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Callshivangimorya083
 
Ukraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSUkraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSAishani27
 

Recently uploaded (20)

Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionWeek-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interaction
 
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signals
 
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxBPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
 
定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一
定制英国白金汉大学毕业证(UCB毕业证书)																			成绩单原版一比一定制英国白金汉大学毕业证(UCB毕业证书)																			成绩单原版一比一
定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdf
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and Milvus
 
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx
 
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
 
BabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxBabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptx
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptx
 
Brighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingBrighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data Storytelling
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Sampling (random) method and Non random.ppt
Sampling (random) method and Non random.pptSampling (random) method and Non random.ppt
Sampling (random) method and Non random.ppt
 
Data-Analysis for Chicago Crime Data 2023
Data-Analysis for Chicago Crime Data  2023Data-Analysis for Chicago Crime Data  2023
Data-Analysis for Chicago Crime Data 2023
 
Industrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfIndustrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdf
 
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
 
Ukraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSUkraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICS
 
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in  KishangarhDelhi 99530 vip 56974 Genuine Escort Service Call Girls in  Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
 

Multiagent Bidirectional-Coordinated Nets for Learning to Play StarCraft Combat Games

  • 1. Presentation on “Multiagent Bidirectional- Coordinated Nets for Learning to Play StarCraft Combat Games” Kiho Suh Modulabs( ), June 22nd 2017
  • 2. About Paper • Published on March 29th 2017 (v1) • Updated on June 20th 2017 (v3) • Alibaba, University College London • https://arxiv.org/pdf/ 1703.10069.pdf
  • 3. Motivation • Single-Agent AI . (Atari, Baduk, Texas Hold’em ) • . Artificial General Intelligence ? • AI agent . • real-time strategy (RTS) game “StarCraft” . • play , “StarCraft” . • Parameter space joint learning approach .
  • 5. ? communication . communication protocol . : multi-agent bidirectionally-coordinated network (BiCNet) with a vectorized extension of actor- critic formulation
  • 6. ? • agent BiCNet . • evaluation-decision-making process . • Parameter dynamic grouping . • AI agent . • label data BiCNet agent .
  • 8. Related works • Jakob Foerster, Yannis M Assael, Nando de Freitas, and Shimon Whiteson. Learning to communicate with deep multi-agent reinforcement learning. NIPS 2016. • Sainbayar Sukhbaatar, Rob Fergus, et al. Learning multiagent communication with backpropagation. NIPS 2016.
  • 9. Differentiable Inter-Agent Learning (Jakob Foerster et al. 2016) • agent agent Q RNN time-step transfer . • times-step agent transfer . • Agent agent , agent observation action .
  • 10. Differentiable Inter-Agent Learning & Reinforced Inter-Agent Learning (Jakob Foerster et al. 2016) • (non-stationary environments) . • Starcraft real-trim strategy (RTS) .
  • 11. CommNet (Sainbayar Sukhbaatar et al. 2016) • Multi-agent . • passing the averaged message over the agent modules between layers • fully symmetric, so lacks the ability of handle heterogeneous agent types
  • 13. Stochastic Game of N agents and M opponents • S agent state space • Ai Controller agent i action space, i ∈ [1, N] • Bj enemy j action space, j ∈ [1, M] • T : S x A N x B M -> S environment deterministic transition function • Ri : S x A N x B M -> R agent/enemy i reward function, i ∈ [1, N+M] * agent( , ) action space .
  • 14. Global Reward • Continuous action space to reduce the redundancy in modeling the large discrete action space • Reward shaping agent . • Global reward: agent reward .
  • 15. Definition of Reward Function • Eq. (1) controlled agent . enemy global reward . controlled agent enemy 0 . zero-sum game! • . reward . (controlled agents) reduced health level for agent j (enemies)
  • 16. Minimax Game • Controlled agent expected sum of discounted rewards policy . • Enemy joint policy expected sum . optimal action-state value function
  • 17. Sampled historical state-action pairs (s, b) of the enemies • Minimax Q-learning . Eq. (2) Q-function modelling . • fictitious play( ) enemies policy bφ . - AI agent fictions play . Controlled agents( ) enemies player . Eq.(2) Q-function . - , supervised learning deterministic policy bφ . • Policy network sampled historical state-action pairs(s,b) .
  • 18. Simpler MDP problem Enemies policy , Eq. (2) Stochastic Game MDP .
  • 19. Eq. (1) • Eq. (1) global reward Eq. (1) zero- sum game local collaboration reward function team collaboration . • agent collaboration . • Eq. (1) agent local reward agent .
  • 20. Extension of formulation of Eq. (1) • agent i top-K(i) • k reward. • agent top-K . • Eq (1) .
  • 21. Bellman equation for agent i • N numbers, i ∈ {1, ..., N}
  • 22. Objective as an expectation • action space model-free policy iteration . • Qi gradient policy vectorized version of deterministic policy gradient (DPG) .
  • 23. Final Equation (Actor) • agents rewards gradient agent action backpropagate gradient parameter backpropagate .
  • 24. Final Equation (Critic) • Off-policy deterministic actor-critic • critic: off-policy action-value function .
  • 25. Actor-Critic networks • Ready to use SGD to compute the updates for both the actor and critic networks • backprop .
  • 26.
  • 28. Design of the two networks • Parameter agent agent . agent . • agent training test agent . • bi-directional RNN agent . • Full dependency among agents because the gradients from all the actions in Eq. (9) are efficiently propagated through the entire networks • Not fully symmetric, and maintaining certain social conventions and roles by fixing the order of the agents that join the RNN. Solving any possible tie between multiple optimal joint actions
  • 30. Experiments • easy combats - {3 Marines vs. 1 Super Zergling} - {3 Wraiths vs. 3 Mutalisks} • difficult combats - {5 Marines vs. 5 Marines} - {15 Marines vs. 16 Marines} - {20 Marines vs. 30 Zerglings} - {10 Marines vs. 13 Zerglings} - {15 Wraiths vs. 17 Wraiths} • heterogeneous combats - {2 Dropships and 2 Tanks vs. 1 Ultralisk} Marine Zergling Wraith Mutalisk Dropship Ultralisk Siege Tank all images are from http://starcraft.wikia.com/wiki/
  • 31. Baselines • Independent controller (IND): agent . . • Fully-connected (FC): agent fully-connected. . • CommNet: agent multi-agent • GreedyMDP with Episodic Zero-Order Optimization (GMEZO): conducting collaborations through a greedy update over MDP agents, as well as adding episodic noises in the parameter space for explorations
  • 32. Action space for each individual agent • 3 dimensional real vector • 1st dimension: ranging from -1 to 1 - Greater than or equal to 0, agent attacks - otherwise, agent moves • 2nd and 3rd dimension: degree and distance, collectively indicating the destination that the agent should move or attack from its current location
  • 33. Training • Nadam optimizer • learning rate = 0.002 • 800 episodes (more than 700k steps)
  • 34. Simple Experiment • tested on 100 independent games • skip frame: how many frames we should skip for controlling the agents actions • when batch_size is 32 (highest mean Q-value after 600k training steps) and skip_frame is 2 (highest mean Q-value after between 300k and 600k) has the highest winning rate.
  • 35. Simple Experiment • Letting 4~6 agents work together as a group can efficiently control individual agents while maximizing damage output. • Fig 3, 4~5 as group size would help achieve best performance. • Fig 4, the convergence speed by plotting the winning rate against the number of training episodes.
  • 36. Performance Comparison • BicNet is trained over 100k steps • measuring the performance as the average winning rate on 100 test games • when the number of agents goes beyond 10, the margin of performance between BiCNet and the second best starts to increase
  • 37. Performance Comparison • “5M vs. 5M”, key factor to win is to “focus fire” on the weak. • As BicNet has built-in design for dynamic grouping, small number of agents (such as “5M vs. 5M”) does not suffice to show the advantages of BiCNet on large-scale collaborations. • For “5M vs. 5M”, BicNet only needs 10 combats before learning the idea of “focus fire,” achieving 85% win rate, whereas CommNet needs at least 50 episodes with a much lower winning rate
  • 38. Visualization • “3 Marines vs. 1 Super Zergling” when the coordinated cover attack has been learned. • Collected values in the last hidden layer of the well-trained critic network over 10k steps. • t-SNE
  • 39. Strategies to Experiment • Move without collision • task. . • Hit and run • task. . • Cover attack • task. . • Focus fire without overkill • task. . • Collaboration between heterogeneous agents • task. .
  • 41. Coordinated moves without collision (3 Marines (ours) vs. 1 Super Zergling) • The agents move in a rather uncoordinated way, particularly, when two agents are close to each other, i.e. one agent may unintentionally block the other’s path. • After 40k steps in around 50 episodes, the number of collisions reduces dramatically.
  • 42. Winning rate against difficult settings
  • 43. Hit and Run tactics (3 Marines (ours) vs. 1 Zealot) Move agents away if under attack, and fight back when feel safe again.
  • 44. Coordinated Cover Attack (4 Dragoons (ours) vs. 2 Ultralisks) • Let one agent draw fire or attention from the enemies. • At the meantime, other agents can take advantage of the time or distance gap to cause more harms.
  • 45. Coordinated Cover Attack (3 Marines (ours) vs. 1 Zergling)
  • 46. Focus fire without overkill (15 Marines (ours) vs. 16 Marines) • How to efficiently allocate the attacking resources becomes important. • The grouping design in the policy network serves as the key factor for BiCNet to learn “focus fire without overkill.” • Even with the decreasing of our unit number, each group can be dynamically resigned to make sure that the 3~5 units focus on attacking on same enemy.
  • 47. Collaborations between heterogeneous agents (2 Dropships and 2 tanks vs. 1 Ultralisk) • A wide variety of types of units in Starcraft • Can be easily implemented in BicNet
  • 48. Further to Investigate after this paper • Strong correlation between the specified reward and the learned policies • How the policies are communicated over the networks among agents • Whether there is a specific language that may have emerged • Nash Equilibrium when both sides are played by deep multi agent models.
  • 49. – Youtube “ ” “ xx xx xx xx ” “ 20 …” !