This document summarizes different approaches for multi-agent deep reinforcement learning. It discusses training multiple independent agents concurrently, centralized training with decentralized execution, and approaches that involve agent communication like parameter sharing and multi-agent deep deterministic policy gradient (MADDPG). MADDPG allows each agent to have its own reward function and trains agents centrally while executing decisions in a decentralized manner. The document provides examples of applying these methods to problems like predator-prey and uses the prisoners dilemma to illustrate how agents can learn communication protocols.
Reinforcement Learning 2. Multi-armed BanditsSeung Jae Lee
A summary of Chapter 2: Multi-armed Bandits of the book 'Reinforcement Learning: An Introduction' by Sutton and Barto. You can find the full book in Professor Sutton's website: http://incompleteideas.net/book/the-book-2nd.html
Deep Reinforcement Learning Talk at PI School. Covering following contents as:
1- Deep Reinforcement Learning
2- QLearning
3- Deep QLearning (DQN)
4- Google Deepmind Paper (DQN for ATARI)
Reinforcement Learning (RL) approaches to deal with finding an optimal reward based policy to act in an environment (Charla en Inglés)
However, what has led to their widespread use is its combination with deep neural networks (DNN) i.e., deep reinforcement learning (Deep RL). Recent successes on not only learning to play games but also superseding humans in it and academia-industry research collaborations like for manipulation of objects, locomotion skills, smart grids, etc. have surely demonstrated their case on a wide variety of challenging tasks.
With application spanning across games, robotics, dialogue, healthcare, marketing, energy and many more domains, Deep RL might just be the power that drives the next generation of Artificial Intelligence (AI) agents!
Reinforcement Learning 2. Multi-armed BanditsSeung Jae Lee
A summary of Chapter 2: Multi-armed Bandits of the book 'Reinforcement Learning: An Introduction' by Sutton and Barto. You can find the full book in Professor Sutton's website: http://incompleteideas.net/book/the-book-2nd.html
Deep Reinforcement Learning Talk at PI School. Covering following contents as:
1- Deep Reinforcement Learning
2- QLearning
3- Deep QLearning (DQN)
4- Google Deepmind Paper (DQN for ATARI)
Reinforcement Learning (RL) approaches to deal with finding an optimal reward based policy to act in an environment (Charla en Inglés)
However, what has led to their widespread use is its combination with deep neural networks (DNN) i.e., deep reinforcement learning (Deep RL). Recent successes on not only learning to play games but also superseding humans in it and academia-industry research collaborations like for manipulation of objects, locomotion skills, smart grids, etc. have surely demonstrated their case on a wide variety of challenging tasks.
With application spanning across games, robotics, dialogue, healthcare, marketing, energy and many more domains, Deep RL might just be the power that drives the next generation of Artificial Intelligence (AI) agents!
Intro to Reinforcement learning - part IIIMikko Mäkipää
Introduction to Reinforcement Learning, part III: Basic approximate methods
This is the final presentation in a three-part series covering the basics of Reinforcement Learning (RL).
In this presentation, we introduce value function approximation and cover three different approaches to generating features for linear models.
We then take a sidestep to cover stochastic gradient descent in some detail before we return to introduce semi-gradient descent for RL. We also briefly cover a batch method as an alternative for episodic methods.
We discuss the implementation of the RL algorithms. For further discussion and illustrating the simulation results, we refer to Github repositories with source code of the implementation as well as Jupyter notebooks visualizing the simulation results.
Reinforcement Learning 4. Dynamic ProgrammingSeung Jae Lee
A summary of Chapter 4: Dynamic Programming of the book 'Reinforcement Learning: An Introduction' by Sutton and Barto. You can find the full book in Professor Sutton's website: http://incompleteideas.net/book/the-book-2nd.html
[기초개념] Recurrent Neural Network (RNN) 소개Donghyeon Kim
* 시계열 데이터의 시간적 속성을 이용하는 RNN과 그 한계점을 극복하기 위한 LSTM, GRU 기법에 대해 기본적인 개념을 소개합니다.
* 광주과학기술원 인공지능 스터디 A-GIST 모임에서 발표했습니다.
* 발표 영상 (유튜브, 한국어): https://youtu.be/Dt2SCbKbKvs
MIT 6.S091: Introduction to Deep Reinforcement Learning (Deep RL) by Lex FridmanPeerasak C.
MIT 6.S091: Introduction to Deep Reinforcement Learning (Deep RL) by Lex Fridman
Watch video: https://youtu.be/zR11FLZ-O9M
First lecture of MIT course 6.S091: Deep Reinforcement Learning, introducing the fascinating field of Deep RL. For more lecture videos on deep learning, reinforcement learning (RL), artificial intelligence (AI & AGI), and podcast conversations, visit our website or follow TensorFlow code tutorials on our GitHub repo.
INFO:
Website: https://deeplearning.mit.edu
CONNECT:
- If you enjoyed this video, please subscribe to this channel.
- Twitter: https://twitter.com/lexfridman
- LinkedIn: https://www.linkedin.com/in/lexfridman
- Facebook: https://www.facebook.com/lexfridman
- Instagram: https://www.instagram.com/lexfridman
발표자: 곽동현(서울대 박사과정, 현 NAVER Clova)
강화학습(Reinforcement learning)의 개요 및 최근 Deep learning 기반의 RL 트렌드를 소개합니다.
발표영상:
http://tv.naver.com/v/2024376
https://youtu.be/dw0sHzE1oAc
Reinforcement Learning 5. Monte Carlo MethodsSeung Jae Lee
A summary of Chapter 5: Monte Carlo Methods of the book 'Reinforcement Learning: An Introduction' by Sutton and Barto. You can find the full book in Professor Sutton's website: http://incompleteideas.net/book/the-book-2nd.html
Check my website for more slides of books and papers!
https://www.endtoend.ai
Reinforcement learning is an area of machine learning inspired by behaviorist psychology, concerned with how software agents ought to take actions in an environment so as to maximize some notion of cumulative reward.
Lecture slides in DASI spring 2018, National Cheng Kung University, Taiwan. The content is about deep reinforcement learning: policy gradient including variance reduction and importance sampling
Hello~! :)
While studying the Sutton-Barto book, the traditional textbook for Reinforcement Learning, I created PPT about the Multi-armed Bandits, a Chapter 2.
If there are any mistakes, I would appreciate your feedback immediately.
Thank you.
Intro to Reinforcement learning - part IIIMikko Mäkipää
Introduction to Reinforcement Learning, part III: Basic approximate methods
This is the final presentation in a three-part series covering the basics of Reinforcement Learning (RL).
In this presentation, we introduce value function approximation and cover three different approaches to generating features for linear models.
We then take a sidestep to cover stochastic gradient descent in some detail before we return to introduce semi-gradient descent for RL. We also briefly cover a batch method as an alternative for episodic methods.
We discuss the implementation of the RL algorithms. For further discussion and illustrating the simulation results, we refer to Github repositories with source code of the implementation as well as Jupyter notebooks visualizing the simulation results.
Reinforcement Learning 4. Dynamic ProgrammingSeung Jae Lee
A summary of Chapter 4: Dynamic Programming of the book 'Reinforcement Learning: An Introduction' by Sutton and Barto. You can find the full book in Professor Sutton's website: http://incompleteideas.net/book/the-book-2nd.html
[기초개념] Recurrent Neural Network (RNN) 소개Donghyeon Kim
* 시계열 데이터의 시간적 속성을 이용하는 RNN과 그 한계점을 극복하기 위한 LSTM, GRU 기법에 대해 기본적인 개념을 소개합니다.
* 광주과학기술원 인공지능 스터디 A-GIST 모임에서 발표했습니다.
* 발표 영상 (유튜브, 한국어): https://youtu.be/Dt2SCbKbKvs
MIT 6.S091: Introduction to Deep Reinforcement Learning (Deep RL) by Lex FridmanPeerasak C.
MIT 6.S091: Introduction to Deep Reinforcement Learning (Deep RL) by Lex Fridman
Watch video: https://youtu.be/zR11FLZ-O9M
First lecture of MIT course 6.S091: Deep Reinforcement Learning, introducing the fascinating field of Deep RL. For more lecture videos on deep learning, reinforcement learning (RL), artificial intelligence (AI & AGI), and podcast conversations, visit our website or follow TensorFlow code tutorials on our GitHub repo.
INFO:
Website: https://deeplearning.mit.edu
CONNECT:
- If you enjoyed this video, please subscribe to this channel.
- Twitter: https://twitter.com/lexfridman
- LinkedIn: https://www.linkedin.com/in/lexfridman
- Facebook: https://www.facebook.com/lexfridman
- Instagram: https://www.instagram.com/lexfridman
발표자: 곽동현(서울대 박사과정, 현 NAVER Clova)
강화학습(Reinforcement learning)의 개요 및 최근 Deep learning 기반의 RL 트렌드를 소개합니다.
발표영상:
http://tv.naver.com/v/2024376
https://youtu.be/dw0sHzE1oAc
Reinforcement Learning 5. Monte Carlo MethodsSeung Jae Lee
A summary of Chapter 5: Monte Carlo Methods of the book 'Reinforcement Learning: An Introduction' by Sutton and Barto. You can find the full book in Professor Sutton's website: http://incompleteideas.net/book/the-book-2nd.html
Check my website for more slides of books and papers!
https://www.endtoend.ai
Reinforcement learning is an area of machine learning inspired by behaviorist psychology, concerned with how software agents ought to take actions in an environment so as to maximize some notion of cumulative reward.
Lecture slides in DASI spring 2018, National Cheng Kung University, Taiwan. The content is about deep reinforcement learning: policy gradient including variance reduction and importance sampling
Hello~! :)
While studying the Sutton-Barto book, the traditional textbook for Reinforcement Learning, I created PPT about the Multi-armed Bandits, a Chapter 2.
If there are any mistakes, I would appreciate your feedback immediately.
Thank you.
Step Two of the Accountability in Action training kit. In this we learn how to map an 'accountability ecosystem' and identify the stakeholders in your environment.
Evolution of Coordination and Communication in Groups of Embodied AgentsOlaf Witkowski
A PhD Thesis Defense by Olaf Witkowski. January 2015.
-- This presentation was given at the University of Tokyo, Hongo Campus, on 19 January 2015, at an Examination for the Degree of Doctor of Philosophy in Computer Science.
Collaborative defence strategies for network securitysonukumar142
This ppt describes Environmental comparison of Collaborative defence strategies for network security. Collaborative defence Strategies accumulates several algorithms and techniques to enhance and enrich network security.
Immunizing Image Classifiers Against Localized Adversary Attacksgerogepatton
This paper addresses the vulnerability of deep learning models, particularly convolutional neural networks
(CNN)s, to adversarial attacks and presents a proactive training technique designed to counter them. We
introduce a novel volumization algorithm, which transforms 2D images into 3D volumetric representations.
When combined with 3D convolution and deep curriculum learning optimization (CLO), itsignificantly improves
the immunity of models against localized universal attacks by up to 40%. We evaluate our proposed approach
using contemporary CNN architectures and the modified Canadian Institute for Advanced Research (CIFAR-10
and CIFAR-100) and ImageNet Large Scale Visual Recognition Challenge (ILSVRC12) datasets, showcasing
accuracy improvements over previous techniques. The results indicate that the combination of the volumetric
input and curriculum learning holds significant promise for mitigating adversarial attacks without necessitating
adversary training.
Student information management system project report ii.pdfKamal Acharya
Our project explains about the student management. This project mainly explains the various actions related to student details. This project shows some ease in adding, editing and deleting the student details. It also provides a less time consuming process for viewing, adding, editing and deleting the marks of the students.
Explore the innovative world of trenchless pipe repair with our comprehensive guide, "The Benefits and Techniques of Trenchless Pipe Repair." This document delves into the modern methods of repairing underground pipes without the need for extensive excavation, highlighting the numerous advantages and the latest techniques used in the industry.
Learn about the cost savings, reduced environmental impact, and minimal disruption associated with trenchless technology. Discover detailed explanations of popular techniques such as pipe bursting, cured-in-place pipe (CIPP) lining, and directional drilling. Understand how these methods can be applied to various types of infrastructure, from residential plumbing to large-scale municipal systems.
Ideal for homeowners, contractors, engineers, and anyone interested in modern plumbing solutions, this guide provides valuable insights into why trenchless pipe repair is becoming the preferred choice for pipe rehabilitation. Stay informed about the latest advancements and best practices in the field.
Sachpazis:Terzaghi Bearing Capacity Estimation in simple terms with Calculati...Dr.Costas Sachpazis
Terzaghi's soil bearing capacity theory, developed by Karl Terzaghi, is a fundamental principle in geotechnical engineering used to determine the bearing capacity of shallow foundations. This theory provides a method to calculate the ultimate bearing capacity of soil, which is the maximum load per unit area that the soil can support without undergoing shear failure. The Calculation HTML Code included.
Water scarcity is the lack of fresh water resources to meet the standard water demand. There are two type of water scarcity. One is physical. The other is economic water scarcity.
Saudi Arabia stands as a titan in the global energy landscape, renowned for its abundant oil and gas resources. It's the largest exporter of petroleum and holds some of the world's most significant reserves. Let's delve into the top 10 oil and gas projects shaping Saudi Arabia's energy future in 2024.
Industrial Training at Shahjalal Fertilizer Company Limited (SFCL)MdTanvirMahtab2
This presentation is about the working procedure of Shahjalal Fertilizer Company Limited (SFCL). A Govt. owned Company of Bangladesh Chemical Industries Corporation under Ministry of Industries.
NO1 Uk best vashikaran specialist in delhi vashikaran baba near me online vas...Amil Baba Dawood bangali
Contact with Dawood Bhai Just call on +92322-6382012 and we'll help you. We'll solve all your problems within 12 to 24 hours and with 101% guarantee and with astrology systematic. If you want to take any personal or professional advice then also you can call us on +92322-6382012 , ONLINE LOVE PROBLEM & Other all types of Daily Life Problem's.Then CALL or WHATSAPP us on +92322-6382012 and Get all these problems solutions here by Amil Baba DAWOOD BANGALI
#vashikaranspecialist #astrologer #palmistry #amliyaat #taweez #manpasandshadi #horoscope #spiritual #lovelife #lovespell #marriagespell#aamilbabainpakistan #amilbabainkarachi #powerfullblackmagicspell #kalajadumantarspecialist #realamilbaba #AmilbabainPakistan #astrologerincanada #astrologerindubai #lovespellsmaster #kalajaduspecialist #lovespellsthatwork #aamilbabainlahore#blackmagicformarriage #aamilbaba #kalajadu #kalailam #taweez #wazifaexpert #jadumantar #vashikaranspecialist #astrologer #palmistry #amliyaat #taweez #manpasandshadi #horoscope #spiritual #lovelife #lovespell #marriagespell#aamilbabainpakistan #amilbabainkarachi #powerfullblackmagicspell #kalajadumantarspecialist #realamilbaba #AmilbabainPakistan #astrologerincanada #astrologerindubai #lovespellsmaster #kalajaduspecialist #lovespellsthatwork #aamilbabainlahore #blackmagicforlove #blackmagicformarriage #aamilbaba #kalajadu #kalailam #taweez #wazifaexpert #jadumantar #vashikaranspecialist #astrologer #palmistry #amliyaat #taweez #manpasandshadi #horoscope #spiritual #lovelife #lovespell #marriagespell#aamilbabainpakistan #amilbabainkarachi #powerfullblackmagicspell #kalajadumantarspecialist #realamilbaba #AmilbabainPakistan #astrologerincanada #astrologerindubai #lovespellsmaster #kalajaduspecialist #lovespellsthatwork #aamilbabainlahore #Amilbabainuk #amilbabainspain #amilbabaindubai #Amilbabainnorway #amilbabainkrachi #amilbabainlahore #amilbabaingujranwalan #amilbabainislamabad
Hybrid optimization of pumped hydro system and solar- Engr. Abdul-Azeez.pdffxintegritypublishin
Advancements in technology unveil a myriad of electrical and electronic breakthroughs geared towards efficiently harnessing limited resources to meet human energy demands. The optimization of hybrid solar PV panels and pumped hydro energy supply systems plays a pivotal role in utilizing natural resources effectively. This initiative not only benefits humanity but also fosters environmental sustainability. The study investigated the design optimization of these hybrid systems, focusing on understanding solar radiation patterns, identifying geographical influences on solar radiation, formulating a mathematical model for system optimization, and determining the optimal configuration of PV panels and pumped hydro storage. Through a comparative analysis approach and eight weeks of data collection, the study addressed key research questions related to solar radiation patterns and optimal system design. The findings highlighted regions with heightened solar radiation levels, showcasing substantial potential for power generation and emphasizing the system's efficiency. Optimizing system design significantly boosted power generation, promoted renewable energy utilization, and enhanced energy storage capacity. The study underscored the benefits of optimizing hybrid solar PV panels and pumped hydro energy supply systems for sustainable energy usage. Optimizing the design of solar PV panels and pumped hydro energy supply systems as examined across diverse climatic conditions in a developing country, not only enhances power generation but also improves the integration of renewable energy sources and boosts energy storage capacities, particularly beneficial for less economically prosperous regions. Additionally, the study provides valuable insights for advancing energy research in economically viable areas. Recommendations included conducting site-specific assessments, utilizing advanced modeling tools, implementing regular maintenance protocols, and enhancing communication among system components.
2. 2
Foerster, J. N., Assael, Y. M., de Freitas, N., Whiteson, S. “Learning
to Communicate with Deep Multi-Agent Reinforcement Learning,”
NIPS 2016
Gupta, J. K., Egorov, M., Kochenderfer, M.
“Cooperative Multi-Agent Control Using Deep Reinforcement
Learning”.
Adaptive Learning Agents (ALA) 2017.
Lowe, R., Wu, Y., Tamar, A., Harb, J., Abbeel, P., Mordatch, I. “Multi-
Agent Actor-Critic for Mixed Cooperative-Competitive Environments.”
NIPS 2017
Hausknecht, M. J. “Cooperation and communication in multiagent
deep reinforcement learning,” 2016
Papers
3. 3
We live in a multi-agent world
Multi-agent RL
– Cooperative behaviors of multiple agents are not easily learned
by a single agent
– Then, how can we make multiple agents cooperate by RL?
Motivation
6. 6
Multiple single agents (Baseline)
Centralized (Baseline)
Multi-agent RL with communication
Distributed Multi-agent RL
Ad hoc teamwork
How to Run Multiple Agents
7. 7
Single agent trains in environment
Multiple (identical) single agents run in same
environment
Training
Execution
Multiple Single Agent (Naïve approach)
Environment
State
Reward
Agent 1
Actor Network
Critic Network
Update
Action
Environment
State
Reward
Agent 2
Actor Network
Critic Network
Update
Action
Environment
State
Reward
Agent 𝑖
Actor Network
Critic Network
Update
Action…
Environment
Agent 1
State Action
Agent 2
State Action
Agent 𝑖
State Action
…Actor
Network
Actor
Network
Actor
Network
8. 8
Multiple agents are controlled by single controller
(agent)
State & action spaces are concatenated
Large state and action space make it challenging to
learn
Centralized
Environment
Controller (agent)
Actor Network
Critic Network
Update
Shared Reward
State State State… Action Action Action…
9. 9
Two players are one team
They share the reward (score)
How they can cooperate? With communication
Multi-agent RL with communication
Observation
Shared Reward
Observation
Shared Reward
Action Action
Pass me ! NoMessage
(Communication)
Agent 1 Agent 2
10. 10
Two players are one team with shared reward
They cannot communicate each other
If they practice together for long time,
then they can cooperate without communication
Distributed Multi-agent RL
Observation
Shared Reward
Observation
Shared Reward
Action Action
Agent 1 Agent 2
11. 11
Two players are one team with shared reward
They does not know each other
Pre-coordinating team may not always be possible
Ad Hoc Teamwork
Observation
Shared Reward
Observation
Shared Reward
Action Action
Agent 1 Agent 2
13. 13
Naïve approach: concurrent
– Centralized training with decentralized (distributed) execution
– Each agent’s policy is independent
– Each agent maintains their own actor-critic networks
– All agents share reward
Distributed Multi-agent RL
Environment
State
Shared
reward
Agent 1
Actor Network
Critic Network
Update
Action State
Shared
reward
Agent 2
Actor Network
Critic Network
Update
Action
Reward is shared
Training
Environment
State
Agent 1
Actor Network
Action State
Agent 2
Actor Network
Action
Execution
14. 14
One agent learns well, but the other agent shows no
ability
Concurrent. One agent always learns how to perform
the task before the other, and the other has less chance
to learn
Centralized. Centralized controller learns to use one
agent exclusively for scoring goals, and learns to walk
the second agent away from the ball entirely
Result: Concurrent and Centralized
Concurrent Centralized
15. 15
1. Parameter sharing [2]
– Agents share the weight of actor and critic network
– Update the network of agent who has less chance to learn
together
2. Multi-agent DDPG [3]
– Why agents share reward?
– Agents can have arbitrary reward structures, including
conflicting rewards in a competitive setting
– Observation is shared during training
Two Approaches
[2] Gupta, J. K., Egorov, M., Kochenderfer, M. “Cooperative Multi-Agent Control Using Deep
Reinforcement Learning”. Adaptive Learning Agents (ALA) 2017.
[3] Lowe, R., Wu, Y., Tamar, A., Harb, J., Abbeel, P., Mordatch, I. “Multi-Agent Actor-Critic for
Mixed Cooperative-Competitive Environments.” NIPS 2017
16. 16
Share weights of actor networks and critic network
between agents
– Similar behaviors between agents
– Encourage both agents to participate even though the goal is
achievable by a single agent
– Reduces total number of parameters
Parameter Sharing
Actor Network
Critic Network
Actor Network
Critic Network
Environment
State
Shared
reward
Action State
Shared
reward
Action
Reward is shared
Sharing Parameter. .
Sharing Parameter. .
Agent 1 Agent 2
17. 17
Design choice
Share lower layer parameter
– Same low-level processing of state
features
– Specialization in the higher layers of
the network allows each agent to
develop a unique policy
Share both critics and actor
network
How many layers they share?
– 2 layer in this case
Parameter Sharing
18. 18
Shows cooperative behaviors
The shared weights from the first agent allow the second
agent to learn to score
Result: Parameter Sharing
19. 19
Propose a general-purpose multi-agent learning
algorithm
1. Use local information (i.e. their own observations) at execution
time
Centralized training with decentralized execution
2. Applicable not only to cooperative interaction but to competitive
or mixed interaction
Each agent has its own reward, and observations of all agents
are
shared during training
Multi-agent RL: MADDPG
Environment
State
Reward
Agent 1
Actor Network
Critic Network
Action State
Reward
Agent 2
Actor Network
Critic Network
Action
Reward is not shared
Training
Environment
State
Agent 1
Actor Network
Action State
Agent 2
Actor Network
Action
Execution
20. 20
𝑁: number of agents
𝒙: state
𝑜𝑖: observation of agent 𝑖
𝑎𝑖: action of agent 𝑖
𝑟𝑖: reward of agent 𝑖
One sample
Model
21. 21
Policies of 𝑁 agents are parameterized by
and let 𝝁 = {𝜇1, … , 𝜇 𝑁} be the continuous policies
Goal: find 𝜃𝑖 maximizing expected return for agent 𝑖,
Gradient of
Decentralized Actor Network
Actor network for agent 𝑖 (𝜃𝑖)
ActionObservation
Critic
22. 22
Centralized action-value function for agent 𝑖
The centralized action-value function is updated as:
Centralized Critic Network
23. 23
Result: MADDPG
Average number of prey touches by predator per
episode with 𝑁 = 𝐿 = 3, where the prey
(adversaries) are slightly (30%) faster
Predator–prey experiment
24. 24
Multiple agents run in environment
Goal: Maximizing their shared utility
Improve performance with communication?
– What kind of information they should exchange?
– What if there is limited channel capacity among agents?
Multi-agent RL with communication[1]
Environment
Observation
Shared Reward
Agent 1
Message select
Action Observation
Shared Reward
Agent 2
Action
Action select
Message select
Action select
Message
Message
Reward is
shared
Limited capacity
[1] Foerster, J. N., Assael, Y. M., de Freitas, N., Whiteson, S. “Learning to Communicate with Deep Multi-Agent
Reinforcement Learning,” NIPS 2016
25. 25
𝑛 prisoners have been newly ushered into prison
They will be placed in an isolated cell
Each day, manager chooses one prisoner uniform
randomly
– He has chance to toggle light bulb (communication)
– He has option of announcing that he believes all prisoners have
been chosen by manager at some point in time (action)
If he is right, every body go home, otherwise all die
Switch Riddle Problem
26. 26
• Multi-Agent: 𝑛 agents with 1-bit communication channel
• State: 𝑛 -bit array: has 𝑖-th prisoner been chosen
• Action: ‘Announce’ / ‘None’
• Reward: + 1 (freedom) / 0 (episode expires) / -1 (all die)
• Observation: ‘None’
• Communication: switch (1-bit)
Multi agent RL with comm.
27. 27
Agents can discover communication protocols through
Deep RL
Protocols can be extracted and understood
Result
Motivation.
All of you may know well single agent RL such as Q-learning, Policy gradient and so on. However, single agent RL can be used only in specific area such as game.
However, we live in multi-agent world. All of these have multiple agent and they interact each other.
So today I’ll explain the reinforcement learning when there are multiple agent.
Before explaining multi agent RL, I would like to explain 5 classes of running multiple agents.
It is hard to say these two are multi agent RL. But I’ll explain for comparison
I’ll briefly explain these and see the features of these