SlideShare a Scribd company logo
1 of 4
Download to read offline
Local Coordination in Online Distributed Constraint Optimization
Problems
Antonio Maria Fiscarelli1
, Robert Vanden Eynde1
and Erman Loci2
1
Ecole Polytechnique, Universite Libre de Bruxelles, Avenue Franklin Roosevelt 50, 1050 Bruxelles
2
Artificial Intelligence Lab, Vrije Universiteit Brussel, Pleinlaan 2, 1050 Brussels, Belgium
afiscare@ulb.ac.be
Abstract
For agents to achieve a common goal in multi-agent
systems, often they need to coordinate. One way to
achieve coordiantion is to let agents learn in an joint
action space. Joint Action Learning allows agents to take
into account the actions of other agents, but with an
increased number of agents the action space increases
exponentially. If coordiantion between some agents is
more important than others, than local coordination
allows the agents to coordinate while keeping the
complexity low. In this paper we investigate local
coordination in which agents learn the problem structure,
resulting in a better group performance.
Introduction
In multi-agent system, agents must coordinate to
achieve a jointly optimal payoff. One way to achieve
this coordination is to let agents see the actions that
other agents chose, and based on those actions, the
agents can choose an action that increases the total
payoff. This method of learning is called Joint Action
Learning (JAL). Increasing the number of agents in
JAL, the joint space in which the agents learn increases
exponentially (Claus & Boutilier, 1998), because the
agents have to see the actions of every other agent. Even
though JALs always give the optimal solutions, they are
quite complex to calculate. In this paper we introduce a
method, the Local Joint Action Learning (LJAL) that
solves this complexity problem by sacrificing some of
the solution quality. We let agents see the actions of
only some of the other agents, only of those that are
important, or of those that are necessary.
Local Joint Action Learners
The LJAL approach relies on the concept of a
Coordiantion Graph (CG) (Guestrin, Lagoudakis, &
Parr, 2002), a CG describes action dependencies
between agents. In a CG vertices represent agents, and
egdes represent coordination between those agents (Fig.
1).
Fig.1. An example of a Coordination Graph (CG)
In LJAL the learning problem can be described as a
distributed n-armed bandit problem, where every agent
can choose among n actions, and the reward depends on
the combination of all chosen actions.
Agents estimate rewards according to the following
formula (Sutton & Barto, 1998):
𝑄𝑑+1 π‘Ž = 𝑄𝑑 + 𝛼[π‘Ÿ 𝑑 + 1 βˆ’ 𝑄𝑑]
LJAL also keep a probabilistic model of other agents
action selection, they count the number of times C each
action has been chosen by each agent. Agent i maintains
the frequency πΉπ‘Ž 𝑗
𝑖
, that agent j selects action π‘Žπ‘— from its
action set 𝐴𝑗 :
πΉπ‘Ž 𝑗
𝑖
=
πΆπ‘Ž 𝑗
𝑗
𝐢𝑏 𝑗
𝑗
𝑏 𝑗 ∈𝐴 𝑗
The expected value for selecting a specific action i is
calculated as follows:
𝐸𝑉 π‘Žπ‘– = 𝑄(π‘Ž βˆͺ {π‘Žπ‘–}) πΉπ‘Ž[𝑗]
𝑖
𝑗
π‘Žβˆˆπ΄ 𝑖
where 𝐴𝑖
=Γ—π‘—βˆˆπ‘(𝑖) 𝐴𝑗 and N(i) represents the set of
neighbors of agent i in the CG.
According to Sutton and Barto (1998) the probability
that agent i choses action π‘Žπ‘–, at time t is:
Pr π‘Žπ‘– =
𝑒 𝐸𝑉(π‘Ž 𝑖)/𝜏
𝑒 𝐸𝑉(𝑏 𝑖)/πœπ‘›
𝑏 𝑖=1
The parameter 𝜏 expresses how greedy the actions are
being selected.
LJAL performance
We will compare IL, LJAL with randomly generated
CG with out-degree 1 (LJAL-1) for each agent, LJAL
with randomly generated CG with out-degree 2 (LJAL-
2), and LJAL with randomly generated CG with out-
degree 3 (LJAL-3). These were evaluated on randomly
generated distributed bandit problem, for every possible
joint action, a fixed global reward is drawn from a
Normal distribution N(0, 70) (70 = 10 * number of
agents). A single run of the experiment consist of 200
plays, in which 7 agents chose among 4 actions, and
receive a reward for the global joint action as
determined by the problem. Every run LJALs get a new
random graph with the correspoding out-degree. Agents
select their actions with temperature 𝜏 = 1000 βˆ—
0.94play
. The experiment is averaged over 200 runs
(Fig. 2).
Fig. 2. Comparing of IL, LJAL
We can see that the solution quality for IL is the worst,
while with more coordination the reward gets better.
This happened because IL only reason about
themselves, while LJALs take into consideration the
actions of other agents. LJALs are better at the solution
quality but the complexity also increases.
Distributed Constraint Optimization
A Constraint Optimization Problem (COP) describes the
problem of assigning values to a set of variables, subject
to a number of soft constraints. Solving a COP means
maximizing the sum of rewards for every constraint that
are associated with assigning a value to each variable. A
Distributed Constraint Optimization Problem (DCOP) is
a tuple (A, X, D, C, f), where:
A = {π‘Ž1, π‘Ž2, … , π‘Žπ‘™}, the set of agents
X = {π‘₯1, π‘₯2, … , π‘₯ 𝑛 }, the set of variables
D = {𝐷1, 𝐷2, … , 𝐷𝑙}, the set of domains.
Variable π‘₯𝑖 can be assigned values from the
finite domain 𝐷𝑖.
C = {𝑐1, 𝑐2, … , 𝑐 π‘š }, the set of constraints.
Constraint 𝑐𝑖 is a function π·π‘Ž Γ— 𝐷𝑏 Γ— … Γ—
π·π‘˜ β†’ ℝ, with {a,b,…,k} ≀ (subset) {1, 2, … ,
n}, projecting the domains of a subset of
variables onto a real number, being the reward.
f: X β†’ A, a function mapping variables onto a
single agent
The total reward of a variable assignment S, assigning
value v(π‘₯𝑖) ∈ 𝐷𝑖 to variable π‘₯𝑖, is:
𝐢 𝑆 = 𝑐𝑖(𝑣 π‘₯ π‘Ž , 𝑣 π‘₯ 𝑏 , … , 𝑣(π‘₯ π‘˜ ))
π‘š
𝑖=1
DCOPs are used to model a variety of real problems,
ranging from disaster response scenarios (Chapman et
al. 2011) and distributed sensor network management (
Kho, Rogers, & Jennings, 2009), to traffic management
in congested networks (van Leeuwen, Hesselink, &
Rohling, 2002).
In a DCOP each constraint has its own reward function,
and since the total reward for a solution is the sum of all
rewards, some constraints can have a larger impact on
the solution quality than others. Therefore coordination
between specific agents can be more important than
others. We will investigate the performance of LJALs
on DCOPs where some constraints are more important
than others. We will generate random, fully connected
DCOPs, drawing the rewards of every constraint
function from different normal distributions. We attach
a weight 𝑀𝑖 ∈ [0, 1] to each constraint 𝑐𝑖, the problems
variance 𝜎 is multiplied with this weight when the
reward function for constraint 𝑐𝑖 is calculated. The
rewards for constraint 𝑐𝑖 are drawn from this
distribution:
N(0, πœŽπ‘€π‘–)
In our experiment we will compare different LJAL
solving the structure given in Fig. 3. The black edges in
Fig. 3 correspond to weights of 0.9, while the gray
edges correspond to weights of 0.1.
Plays
Reward
Black – IL
Red – LJAL-1
Blue – LJAL-2
Green – LJAL-3
Fig. 3. A Weighted CG, darker edges mean more important
constraints, lighter edges mean less important constraints.
In addition to IL and LJAL with random out-degree 2
(LJAL-1), we compare LJALs with a CG matching the
problem structure (LJAL-2), and another LJAL with the
same structure as the problem but with an added edge
between agents 1 and 5 (LJAL-3). From the results
shown below (Fig. 4) we can see that LJAL-2 performs
better than LJAL-1, meaning that a LJAL with a CG
that corresponds to the problem structure gives better
solutions than a LJAL with randomly generated CGs.
We can also see that an added coordination between
agents 1 and 5 in LJAL-3 doesn’t improve the solution
quality. This happens because the extra information on
an unimportant constraint complicates the coordination
on important constraints. According to Taylor et al.
(2011) the increase in teamwork is not necessarily
beneficial to solution quality.
Fig. 4. Comparing IL and LJAL on a distributed constraint
optimization problem.
We make another experiment to test the effect the extra
coordination edge has on solution quality. We switch
LJAL-3 by adding an extra coordination edge between
agents 4 and 7, and removing the edge between agents 1
and 5 (Fig. 5). We can see that the extra coordination
between agent 4 and 7 improved the solution quality,
because agents 4 and 7 were not involved in any
important constraint.
Fig. 5. The effect of an extra coordination edge on solution quality
Learning Coordination Graphs
In the previous experiment we have shown that LJAL
with the same CG as the problem structure perform
better than LJAL with random generated CGs. In the
next experiment we will make the LJAL learn the
optimal CG.
The problem of learning a CG is encoded as a
distributed n-armed bandit problem. Each agent can
choose at most one or two coordination partners.We
map the two-partner selection to an n-armed bandit
problem by making actions represent pairs of agents
instead of single agents. The coordiantion partners are
chosen randomly, and after they are chosen the LJAL
solve the learning problem using that graph. The
resulting reward is used as feedback for chosing the next
coordination partners. This is one play at the meta-
learning level. This process is repeated until the CG
converges. The agents in this meta-bandit problem are
independent learners.
In our experiment we make the agents learn the CG as
proposed in fig. 3. This way we can compare the learned
CG with the known problem structure. One meta-bandit
run consist of 500 plays. In each play the chosen CG is
evaluated in 10 runs of 200 plays. The average of the
reward achieved over 10 runs is the estimated reward
for the chosen CG.
In Fig. 6 we show a CG that agents learned. The
temperature 𝜏 is decreased to 𝜏 = 1000 Γ— 0.994 π‘π‘™π‘Žπ‘¦
.
The results are averaged over 1000 runs.
Reward
Plays
Black – IL
Red – LJAL-1
Blue – LJAL-2
Green – LJAL-3
Reward
Plays
Fig. 6..
This shows that agents can determine which agents are
more important to coordinate with, but we have to
explain how the agents who learn the graph perform
better than those with the same graphs as the problem
structure. Agents that do not coordinate directly are
independent learners relative to each other. These agents
are able to find the optimal reward by climbing, that is
each agent in turn change their action (Guestrin,
Lagoudakis, & Parr, 2002). The starting point is the
highest average reward, and if a global optimal reward
can be achieved by climbing from that point, than
independent learning is enough to find the optimal
reward.
Conclusion
Given a CG we implement a distributed q-learning
algorithm where the agents find the best actions to
maximize the total reward. The only information they
have is what action the agents that he is coordinating
with are taking, and the total reward of their joint action.
In the learning of the CG we implement a q-learning
algorithm where agents learn the best coordination
graph. In this case since it is not distributed the only
information they have is the total reward they get
playing with the current coordination graph.
References
Chapman, A. C., Micillo, R. A., Kota, R., & Jennings, N. R.
(2009, May). Decentralised dynamic task allocation: a
practical game: theoretic approach. In Proceedings of The 8th
International Conference on Autonomous Agents and
Multiagent Systems-Volume 2 (pp. 915-922). International
Foundation for Autonomous Agents and Multiagent Systems.
Claus, C., & Boutilier, C. (1998, July). The dynamics of
reinforcement learning in cooperative multiagent systems. In
AAAI/IAAI (pp. 746-752).
Guestrin, C., Lagoudakis, M., & Parr, R. (2002, July).
Coordinated reinforcement learning. In ICML (Vol. 2, pp.
227-234).
Kho, J., Rogers, A., & Jennings, N. R. (2009). Decentralized
control of adaptive sampling in wireless sensor networks.
ACM Transactions on Sensor Networks (TOSN), 5(3), 19.
Van Leeuwen, P., Hesselink, H., & Rohling, J. (2002).
Scheduling aircraft using constraint satisfaction. Electronic
notes in theoretical computer science, 76, 252-268.
Sutton, R. S., & Barto, A. G. (1998). Introduction to
reinforcement learning. MIT Press.
Taylor, M. E., Jain, M., Tandon, P., Yokoo, M., & Tambe, M.
(2011). Distributed on-line multi-agent optimization under
uncertainty: Balancing exploration and exploitation. Advances
in Complex Systems, 14(03), 471-528.
1
2
3
7
6 5
4

More Related Content

What's hot

How Product Decision Characteristics Interact to Influence Cognitive Load: An...
How Product Decision Characteristics Interact to Influence Cognitive Load: An...How Product Decision Characteristics Interact to Influence Cognitive Load: An...
How Product Decision Characteristics Interact to Influence Cognitive Load: An...Pierre-Majorique LΓ©ger
Β 
uai2004_V1.doc.doc.doc
uai2004_V1.doc.doc.docuai2004_V1.doc.doc.doc
uai2004_V1.doc.doc.docbutest
Β 
Intro to Deep Reinforcement Learning
Intro to Deep Reinforcement LearningIntro to Deep Reinforcement Learning
Intro to Deep Reinforcement LearningKhaled Saleh
Β 
eBbook: The Fair Pay Act and Implications For Compensation Modeling
eBbook: The Fair Pay Act and Implications For Compensation ModelingeBbook: The Fair Pay Act and Implications For Compensation Modeling
eBbook: The Fair Pay Act and Implications For Compensation ModelingThomas Econometrics
Β 
Reinforcement Learning (DLAI D7L2 2017 UPC Deep Learning for Artificial Intel...
Reinforcement Learning (DLAI D7L2 2017 UPC Deep Learning for Artificial Intel...Reinforcement Learning (DLAI D7L2 2017 UPC Deep Learning for Artificial Intel...
Reinforcement Learning (DLAI D7L2 2017 UPC Deep Learning for Artificial Intel...Universitat Politècnica de Catalunya
Β 
Evaluation of subjective answers using glsa enhanced with contextual synonymy
Evaluation of subjective answers using glsa enhanced with contextual synonymyEvaluation of subjective answers using glsa enhanced with contextual synonymy
Evaluation of subjective answers using glsa enhanced with contextual synonymyijnlc
Β 
Pro max icdm2012-slides
Pro max icdm2012-slidesPro max icdm2012-slides
Pro max icdm2012-slidesLaks Lakshmanan
Β 
Some Studies on Multistage Decision Making Under Fuzzy Dynamic Programming
Some Studies on Multistage Decision Making Under Fuzzy Dynamic ProgrammingSome Studies on Multistage Decision Making Under Fuzzy Dynamic Programming
Some Studies on Multistage Decision Making Under Fuzzy Dynamic ProgrammingWaqas Tariq
Β 
Unit.2. linear programming
Unit.2. linear programmingUnit.2. linear programming
Unit.2. linear programmingDagnaygebawGoshme
Β 
Planning in Markov Stochastic Task Domains
Planning in Markov Stochastic Task DomainsPlanning in Markov Stochastic Task Domains
Planning in Markov Stochastic Task DomainsWaqas Tariq
Β 

What's hot (12)

How Product Decision Characteristics Interact to Influence Cognitive Load: An...
How Product Decision Characteristics Interact to Influence Cognitive Load: An...How Product Decision Characteristics Interact to Influence Cognitive Load: An...
How Product Decision Characteristics Interact to Influence Cognitive Load: An...
Β 
uai2004_V1.doc.doc.doc
uai2004_V1.doc.doc.docuai2004_V1.doc.doc.doc
uai2004_V1.doc.doc.doc
Β 
Intro to Deep Reinforcement Learning
Intro to Deep Reinforcement LearningIntro to Deep Reinforcement Learning
Intro to Deep Reinforcement Learning
Β 
PresTrojan0_1212
PresTrojan0_1212PresTrojan0_1212
PresTrojan0_1212
Β 
Utilitas
UtilitasUtilitas
Utilitas
Β 
eBbook: The Fair Pay Act and Implications For Compensation Modeling
eBbook: The Fair Pay Act and Implications For Compensation ModelingeBbook: The Fair Pay Act and Implications For Compensation Modeling
eBbook: The Fair Pay Act and Implications For Compensation Modeling
Β 
Reinforcement Learning (DLAI D7L2 2017 UPC Deep Learning for Artificial Intel...
Reinforcement Learning (DLAI D7L2 2017 UPC Deep Learning for Artificial Intel...Reinforcement Learning (DLAI D7L2 2017 UPC Deep Learning for Artificial Intel...
Reinforcement Learning (DLAI D7L2 2017 UPC Deep Learning for Artificial Intel...
Β 
Evaluation of subjective answers using glsa enhanced with contextual synonymy
Evaluation of subjective answers using glsa enhanced with contextual synonymyEvaluation of subjective answers using glsa enhanced with contextual synonymy
Evaluation of subjective answers using glsa enhanced with contextual synonymy
Β 
Pro max icdm2012-slides
Pro max icdm2012-slidesPro max icdm2012-slides
Pro max icdm2012-slides
Β 
Some Studies on Multistage Decision Making Under Fuzzy Dynamic Programming
Some Studies on Multistage Decision Making Under Fuzzy Dynamic ProgrammingSome Studies on Multistage Decision Making Under Fuzzy Dynamic Programming
Some Studies on Multistage Decision Making Under Fuzzy Dynamic Programming
Β 
Unit.2. linear programming
Unit.2. linear programmingUnit.2. linear programming
Unit.2. linear programming
Β 
Planning in Markov Stochastic Task Domains
Planning in Markov Stochastic Task DomainsPlanning in Markov Stochastic Task Domains
Planning in Markov Stochastic Task Domains
Β 

Similar to Local coordination in online distributed constraint optimization problems - Paper

Reinforcement learning
Reinforcement learning Reinforcement learning
Reinforcement learning Chandra Meena
Β 
Parallel Guided Local Search and Some Preliminary Experimental Results for Co...
Parallel Guided Local Search and Some Preliminary Experimental Results for Co...Parallel Guided Local Search and Some Preliminary Experimental Results for Co...
Parallel Guided Local Search and Some Preliminary Experimental Results for Co...csandit
Β 
An efficient use of temporal difference technique in Computer Game Learning
An efficient use of temporal difference technique in Computer Game LearningAn efficient use of temporal difference technique in Computer Game Learning
An efficient use of temporal difference technique in Computer Game LearningPrabhu Kumar
Β 
GENETIC ALGORITHM FOR FUNCTION APPROXIMATION: AN EXPERIMENTAL INVESTIGATION
GENETIC ALGORITHM FOR FUNCTION APPROXIMATION: AN EXPERIMENTAL INVESTIGATIONGENETIC ALGORITHM FOR FUNCTION APPROXIMATION: AN EXPERIMENTAL INVESTIGATION
GENETIC ALGORITHM FOR FUNCTION APPROXIMATION: AN EXPERIMENTAL INVESTIGATIONijaia
Β 
imple and new optimization algorithm for solving constrained and unconstraine...
imple and new optimization algorithm for solving constrained and unconstraine...imple and new optimization algorithm for solving constrained and unconstraine...
imple and new optimization algorithm for solving constrained and unconstraine...salam_a
Β 
Using particle swarm optimization to solve test functions problems
Using particle swarm optimization to solve test functions problemsUsing particle swarm optimization to solve test functions problems
Using particle swarm optimization to solve test functions problemsriyaniaes
Β 
CS799_FinalReport
CS799_FinalReportCS799_FinalReport
CS799_FinalReportAbhanshu Gupta
Β 
Supervised Learning.pdf
Supervised Learning.pdfSupervised Learning.pdf
Supervised Learning.pdfgadissaassefa
Β 
Reinforcement Learning
Reinforcement LearningReinforcement Learning
Reinforcement LearningSVijaylakshmi
Β 
Database Applications in Analyzing Agents
Database Applications in Analyzing AgentsDatabase Applications in Analyzing Agents
Database Applications in Analyzing Agentsiosrjce
Β 
Conjoint
ConjointConjoint
Conjointputra69
Β 
Reinforcement learning
Reinforcement learningReinforcement learning
Reinforcement learningElias Hasnat
Β 
CONSTRUCTING A FUZZY NETWORK INTRUSION CLASSIFIER BASED ON DIFFERENTIAL EVOLU...
CONSTRUCTING A FUZZY NETWORK INTRUSION CLASSIFIER BASED ON DIFFERENTIAL EVOLU...CONSTRUCTING A FUZZY NETWORK INTRUSION CLASSIFIER BASED ON DIFFERENTIAL EVOLU...
CONSTRUCTING A FUZZY NETWORK INTRUSION CLASSIFIER BASED ON DIFFERENTIAL EVOLU...IJCNCJournal
Β 
REINFORCEMENT LEARNING
REINFORCEMENT LEARNINGREINFORCEMENT LEARNING
REINFORCEMENT LEARNINGpradiprahul
Β 
Learning Collaborative Agents with Rule Guidance for Knowledge Graph Reasoning
Learning Collaborative Agents with Rule Guidance for Knowledge Graph ReasoningLearning Collaborative Agents with Rule Guidance for Knowledge Graph Reasoning
Learning Collaborative Agents with Rule Guidance for Knowledge Graph ReasoningDeren Lei
Β 
THE EFFECT OF SEGREGATION IN NONREPEATED PRISONER'S DILEMMA
THE EFFECT OF SEGREGATION IN NONREPEATED PRISONER'S DILEMMA THE EFFECT OF SEGREGATION IN NONREPEATED PRISONER'S DILEMMA
THE EFFECT OF SEGREGATION IN NONREPEATED PRISONER'S DILEMMA AIRCC Publishing Corporation
Β 
THE EFFECT OF SEGREGATION IN NONREPEATED PRISONER'S DILEMMA
THE EFFECT OF SEGREGATION IN NONREPEATED PRISONER'S DILEMMA THE EFFECT OF SEGREGATION IN NONREPEATED PRISONER'S DILEMMA
THE EFFECT OF SEGREGATION IN NONREPEATED PRISONER'S DILEMMA ijcsit
Β 
THE EFFECT OF SEGREGATION IN NONREPEATED PRISONER'S DILEMMA
THE EFFECT OF SEGREGATION IN NONREPEATED PRISONER'S DILEMMA THE EFFECT OF SEGREGATION IN NONREPEATED PRISONER'S DILEMMA
THE EFFECT OF SEGREGATION IN NONREPEATED PRISONER'S DILEMMA AIRCC Publishing Corporation
Β 
final paper1
final paper1final paper1
final paper1Leon Hunter
Β 

Similar to Local coordination in online distributed constraint optimization problems - Paper (20)

Reinforcement learning
Reinforcement learning Reinforcement learning
Reinforcement learning
Β 
Parallel Guided Local Search and Some Preliminary Experimental Results for Co...
Parallel Guided Local Search and Some Preliminary Experimental Results for Co...Parallel Guided Local Search and Some Preliminary Experimental Results for Co...
Parallel Guided Local Search and Some Preliminary Experimental Results for Co...
Β 
An efficient use of temporal difference technique in Computer Game Learning
An efficient use of temporal difference technique in Computer Game LearningAn efficient use of temporal difference technique in Computer Game Learning
An efficient use of temporal difference technique in Computer Game Learning
Β 
GENETIC ALGORITHM FOR FUNCTION APPROXIMATION: AN EXPERIMENTAL INVESTIGATION
GENETIC ALGORITHM FOR FUNCTION APPROXIMATION: AN EXPERIMENTAL INVESTIGATIONGENETIC ALGORITHM FOR FUNCTION APPROXIMATION: AN EXPERIMENTAL INVESTIGATION
GENETIC ALGORITHM FOR FUNCTION APPROXIMATION: AN EXPERIMENTAL INVESTIGATION
Β 
imple and new optimization algorithm for solving constrained and unconstraine...
imple and new optimization algorithm for solving constrained and unconstraine...imple and new optimization algorithm for solving constrained and unconstraine...
imple and new optimization algorithm for solving constrained and unconstraine...
Β 
Using particle swarm optimization to solve test functions problems
Using particle swarm optimization to solve test functions problemsUsing particle swarm optimization to solve test functions problems
Using particle swarm optimization to solve test functions problems
Β 
CS799_FinalReport
CS799_FinalReportCS799_FinalReport
CS799_FinalReport
Β 
Supervised Learning.pdf
Supervised Learning.pdfSupervised Learning.pdf
Supervised Learning.pdf
Β 
Reinforcement Learning
Reinforcement LearningReinforcement Learning
Reinforcement Learning
Β 
Database Applications in Analyzing Agents
Database Applications in Analyzing AgentsDatabase Applications in Analyzing Agents
Database Applications in Analyzing Agents
Β 
J017265860
J017265860J017265860
J017265860
Β 
Conjoint
ConjointConjoint
Conjoint
Β 
Reinforcement learning
Reinforcement learningReinforcement learning
Reinforcement learning
Β 
CONSTRUCTING A FUZZY NETWORK INTRUSION CLASSIFIER BASED ON DIFFERENTIAL EVOLU...
CONSTRUCTING A FUZZY NETWORK INTRUSION CLASSIFIER BASED ON DIFFERENTIAL EVOLU...CONSTRUCTING A FUZZY NETWORK INTRUSION CLASSIFIER BASED ON DIFFERENTIAL EVOLU...
CONSTRUCTING A FUZZY NETWORK INTRUSION CLASSIFIER BASED ON DIFFERENTIAL EVOLU...
Β 
REINFORCEMENT LEARNING
REINFORCEMENT LEARNINGREINFORCEMENT LEARNING
REINFORCEMENT LEARNING
Β 
Learning Collaborative Agents with Rule Guidance for Knowledge Graph Reasoning
Learning Collaborative Agents with Rule Guidance for Knowledge Graph ReasoningLearning Collaborative Agents with Rule Guidance for Knowledge Graph Reasoning
Learning Collaborative Agents with Rule Guidance for Knowledge Graph Reasoning
Β 
THE EFFECT OF SEGREGATION IN NONREPEATED PRISONER'S DILEMMA
THE EFFECT OF SEGREGATION IN NONREPEATED PRISONER'S DILEMMA THE EFFECT OF SEGREGATION IN NONREPEATED PRISONER'S DILEMMA
THE EFFECT OF SEGREGATION IN NONREPEATED PRISONER'S DILEMMA
Β 
THE EFFECT OF SEGREGATION IN NONREPEATED PRISONER'S DILEMMA
THE EFFECT OF SEGREGATION IN NONREPEATED PRISONER'S DILEMMA THE EFFECT OF SEGREGATION IN NONREPEATED PRISONER'S DILEMMA
THE EFFECT OF SEGREGATION IN NONREPEATED PRISONER'S DILEMMA
Β 
THE EFFECT OF SEGREGATION IN NONREPEATED PRISONER'S DILEMMA
THE EFFECT OF SEGREGATION IN NONREPEATED PRISONER'S DILEMMA THE EFFECT OF SEGREGATION IN NONREPEATED PRISONER'S DILEMMA
THE EFFECT OF SEGREGATION IN NONREPEATED PRISONER'S DILEMMA
Β 
final paper1
final paper1final paper1
final paper1
Β 

Recently uploaded

Architect Hassan Khalil Portfolio for 2024
Architect Hassan Khalil Portfolio for 2024Architect Hassan Khalil Portfolio for 2024
Architect Hassan Khalil Portfolio for 2024hassan khalil
Β 
Microscopic Analysis of Ceramic Materials.pptx
Microscopic Analysis of Ceramic Materials.pptxMicroscopic Analysis of Ceramic Materials.pptx
Microscopic Analysis of Ceramic Materials.pptxpurnimasatapathy1234
Β 
power system scada applications and uses
power system scada applications and usespower system scada applications and uses
power system scada applications and usesDevarapalliHaritha
Β 
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130Suhani Kapoor
Β 
Gurgaon ✑️9711147426✨Call In girls Gurgaon Sector 51 escort service
Gurgaon ✑️9711147426✨Call In girls Gurgaon Sector 51 escort serviceGurgaon ✑️9711147426✨Call In girls Gurgaon Sector 51 escort service
Gurgaon ✑️9711147426✨Call In girls Gurgaon Sector 51 escort servicejennyeacort
Β 
HARMONY IN THE HUMAN BEING - Unit-II UHV-2
HARMONY IN THE HUMAN BEING - Unit-II UHV-2HARMONY IN THE HUMAN BEING - Unit-II UHV-2
HARMONY IN THE HUMAN BEING - Unit-II UHV-2RajaP95
Β 
Internship report on mechanical engineering
Internship report on mechanical engineeringInternship report on mechanical engineering
Internship report on mechanical engineeringmalavadedarshan25
Β 
Model Call Girl in Narela Delhi reach out to us at πŸ”8264348440πŸ”
Model Call Girl in Narela Delhi reach out to us at πŸ”8264348440πŸ”Model Call Girl in Narela Delhi reach out to us at πŸ”8264348440πŸ”
Model Call Girl in Narela Delhi reach out to us at πŸ”8264348440πŸ”soniya singh
Β 
Current Transformer Drawing and GTP for MSETCL
Current Transformer Drawing and GTP for MSETCLCurrent Transformer Drawing and GTP for MSETCL
Current Transformer Drawing and GTP for MSETCLDeelipZope
Β 
CCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdf
CCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdfCCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdf
CCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdfAsst.prof M.Gokilavani
Β 
Introduction to Microprocesso programming and interfacing.pptx
Introduction to Microprocesso programming and interfacing.pptxIntroduction to Microprocesso programming and interfacing.pptx
Introduction to Microprocesso programming and interfacing.pptxvipinkmenon1
Β 
Heart Disease Prediction using machine learning.pptx
Heart Disease Prediction using machine learning.pptxHeart Disease Prediction using machine learning.pptx
Heart Disease Prediction using machine learning.pptxPoojaBan
Β 
GDSC ASEB Gen AI study jams presentation
GDSC ASEB Gen AI study jams presentationGDSC ASEB Gen AI study jams presentation
GDSC ASEB Gen AI study jams presentationGDSCAESB
Β 
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...Soham Mondal
Β 
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptxDecoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptxJoΓ£o Esperancinha
Β 
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICSAPPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICSKurinjimalarL3
Β 
Study on Air-Water & Water-Water Heat Exchange in a Finned ο»ΏTube Exchanger
Study on Air-Water & Water-Water Heat Exchange in a Finned ο»ΏTube ExchangerStudy on Air-Water & Water-Water Heat Exchange in a Finned ο»ΏTube Exchanger
Study on Air-Water & Water-Water Heat Exchange in a Finned ο»ΏTube ExchangerAnamika Sarkar
Β 
College Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service NashikCollege Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service NashikCall Girls in Nagpur High Profile
Β 

Recently uploaded (20)

Architect Hassan Khalil Portfolio for 2024
Architect Hassan Khalil Portfolio for 2024Architect Hassan Khalil Portfolio for 2024
Architect Hassan Khalil Portfolio for 2024
Β 
Microscopic Analysis of Ceramic Materials.pptx
Microscopic Analysis of Ceramic Materials.pptxMicroscopic Analysis of Ceramic Materials.pptx
Microscopic Analysis of Ceramic Materials.pptx
Β 
power system scada applications and uses
power system scada applications and usespower system scada applications and uses
power system scada applications and uses
Β 
young call girls in Rajiv ChowkπŸ” 9953056974 πŸ” Delhi escort Service
young call girls in Rajiv ChowkπŸ” 9953056974 πŸ” Delhi escort Serviceyoung call girls in Rajiv ChowkπŸ” 9953056974 πŸ” Delhi escort Service
young call girls in Rajiv ChowkπŸ” 9953056974 πŸ” Delhi escort Service
Β 
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
Β 
Gurgaon ✑️9711147426✨Call In girls Gurgaon Sector 51 escort service
Gurgaon ✑️9711147426✨Call In girls Gurgaon Sector 51 escort serviceGurgaon ✑️9711147426✨Call In girls Gurgaon Sector 51 escort service
Gurgaon ✑️9711147426✨Call In girls Gurgaon Sector 51 escort service
Β 
HARMONY IN THE HUMAN BEING - Unit-II UHV-2
HARMONY IN THE HUMAN BEING - Unit-II UHV-2HARMONY IN THE HUMAN BEING - Unit-II UHV-2
HARMONY IN THE HUMAN BEING - Unit-II UHV-2
Β 
Internship report on mechanical engineering
Internship report on mechanical engineeringInternship report on mechanical engineering
Internship report on mechanical engineering
Β 
Model Call Girl in Narela Delhi reach out to us at πŸ”8264348440πŸ”
Model Call Girl in Narela Delhi reach out to us at πŸ”8264348440πŸ”Model Call Girl in Narela Delhi reach out to us at πŸ”8264348440πŸ”
Model Call Girl in Narela Delhi reach out to us at πŸ”8264348440πŸ”
Β 
Current Transformer Drawing and GTP for MSETCL
Current Transformer Drawing and GTP for MSETCLCurrent Transformer Drawing and GTP for MSETCL
Current Transformer Drawing and GTP for MSETCL
Β 
CCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdf
CCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdfCCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdf
CCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdf
Β 
Introduction to Microprocesso programming and interfacing.pptx
Introduction to Microprocesso programming and interfacing.pptxIntroduction to Microprocesso programming and interfacing.pptx
Introduction to Microprocesso programming and interfacing.pptx
Β 
Heart Disease Prediction using machine learning.pptx
Heart Disease Prediction using machine learning.pptxHeart Disease Prediction using machine learning.pptx
Heart Disease Prediction using machine learning.pptx
Β 
GDSC ASEB Gen AI study jams presentation
GDSC ASEB Gen AI study jams presentationGDSC ASEB Gen AI study jams presentation
GDSC ASEB Gen AI study jams presentation
Β 
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
Β 
β˜… CALL US 9953330565 ( HOT Young Call Girls In Badarpur delhi NCR
β˜… CALL US 9953330565 ( HOT Young Call Girls In Badarpur delhi NCRβ˜… CALL US 9953330565 ( HOT Young Call Girls In Badarpur delhi NCR
β˜… CALL US 9953330565 ( HOT Young Call Girls In Badarpur delhi NCR
Β 
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptxDecoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
Β 
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICSAPPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
Β 
Study on Air-Water & Water-Water Heat Exchange in a Finned ο»ΏTube Exchanger
Study on Air-Water & Water-Water Heat Exchange in a Finned ο»ΏTube ExchangerStudy on Air-Water & Water-Water Heat Exchange in a Finned ο»ΏTube Exchanger
Study on Air-Water & Water-Water Heat Exchange in a Finned ο»ΏTube Exchanger
Β 
College Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service NashikCollege Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
Β 

Local coordination in online distributed constraint optimization problems - Paper

  • 1. Local Coordination in Online Distributed Constraint Optimization Problems Antonio Maria Fiscarelli1 , Robert Vanden Eynde1 and Erman Loci2 1 Ecole Polytechnique, Universite Libre de Bruxelles, Avenue Franklin Roosevelt 50, 1050 Bruxelles 2 Artificial Intelligence Lab, Vrije Universiteit Brussel, Pleinlaan 2, 1050 Brussels, Belgium afiscare@ulb.ac.be Abstract For agents to achieve a common goal in multi-agent systems, often they need to coordinate. One way to achieve coordiantion is to let agents learn in an joint action space. Joint Action Learning allows agents to take into account the actions of other agents, but with an increased number of agents the action space increases exponentially. If coordiantion between some agents is more important than others, than local coordination allows the agents to coordinate while keeping the complexity low. In this paper we investigate local coordination in which agents learn the problem structure, resulting in a better group performance. Introduction In multi-agent system, agents must coordinate to achieve a jointly optimal payoff. One way to achieve this coordination is to let agents see the actions that other agents chose, and based on those actions, the agents can choose an action that increases the total payoff. This method of learning is called Joint Action Learning (JAL). Increasing the number of agents in JAL, the joint space in which the agents learn increases exponentially (Claus & Boutilier, 1998), because the agents have to see the actions of every other agent. Even though JALs always give the optimal solutions, they are quite complex to calculate. In this paper we introduce a method, the Local Joint Action Learning (LJAL) that solves this complexity problem by sacrificing some of the solution quality. We let agents see the actions of only some of the other agents, only of those that are important, or of those that are necessary. Local Joint Action Learners The LJAL approach relies on the concept of a Coordiantion Graph (CG) (Guestrin, Lagoudakis, & Parr, 2002), a CG describes action dependencies between agents. In a CG vertices represent agents, and egdes represent coordination between those agents (Fig. 1). Fig.1. An example of a Coordination Graph (CG) In LJAL the learning problem can be described as a distributed n-armed bandit problem, where every agent can choose among n actions, and the reward depends on the combination of all chosen actions. Agents estimate rewards according to the following formula (Sutton & Barto, 1998): 𝑄𝑑+1 π‘Ž = 𝑄𝑑 + 𝛼[π‘Ÿ 𝑑 + 1 βˆ’ 𝑄𝑑] LJAL also keep a probabilistic model of other agents action selection, they count the number of times C each action has been chosen by each agent. Agent i maintains the frequency πΉπ‘Ž 𝑗 𝑖 , that agent j selects action π‘Žπ‘— from its action set 𝐴𝑗 : πΉπ‘Ž 𝑗 𝑖 = πΆπ‘Ž 𝑗 𝑗 𝐢𝑏 𝑗 𝑗 𝑏 𝑗 ∈𝐴 𝑗 The expected value for selecting a specific action i is calculated as follows: 𝐸𝑉 π‘Žπ‘– = 𝑄(π‘Ž βˆͺ {π‘Žπ‘–}) πΉπ‘Ž[𝑗] 𝑖 𝑗 π‘Žβˆˆπ΄ 𝑖 where 𝐴𝑖 =Γ—π‘—βˆˆπ‘(𝑖) 𝐴𝑗 and N(i) represents the set of neighbors of agent i in the CG. According to Sutton and Barto (1998) the probability that agent i choses action π‘Žπ‘–, at time t is: Pr π‘Žπ‘– = 𝑒 𝐸𝑉(π‘Ž 𝑖)/𝜏 𝑒 𝐸𝑉(𝑏 𝑖)/πœπ‘› 𝑏 𝑖=1
  • 2. The parameter 𝜏 expresses how greedy the actions are being selected. LJAL performance We will compare IL, LJAL with randomly generated CG with out-degree 1 (LJAL-1) for each agent, LJAL with randomly generated CG with out-degree 2 (LJAL- 2), and LJAL with randomly generated CG with out- degree 3 (LJAL-3). These were evaluated on randomly generated distributed bandit problem, for every possible joint action, a fixed global reward is drawn from a Normal distribution N(0, 70) (70 = 10 * number of agents). A single run of the experiment consist of 200 plays, in which 7 agents chose among 4 actions, and receive a reward for the global joint action as determined by the problem. Every run LJALs get a new random graph with the correspoding out-degree. Agents select their actions with temperature 𝜏 = 1000 βˆ— 0.94play . The experiment is averaged over 200 runs (Fig. 2). Fig. 2. Comparing of IL, LJAL We can see that the solution quality for IL is the worst, while with more coordination the reward gets better. This happened because IL only reason about themselves, while LJALs take into consideration the actions of other agents. LJALs are better at the solution quality but the complexity also increases. Distributed Constraint Optimization A Constraint Optimization Problem (COP) describes the problem of assigning values to a set of variables, subject to a number of soft constraints. Solving a COP means maximizing the sum of rewards for every constraint that are associated with assigning a value to each variable. A Distributed Constraint Optimization Problem (DCOP) is a tuple (A, X, D, C, f), where: A = {π‘Ž1, π‘Ž2, … , π‘Žπ‘™}, the set of agents X = {π‘₯1, π‘₯2, … , π‘₯ 𝑛 }, the set of variables D = {𝐷1, 𝐷2, … , 𝐷𝑙}, the set of domains. Variable π‘₯𝑖 can be assigned values from the finite domain 𝐷𝑖. C = {𝑐1, 𝑐2, … , 𝑐 π‘š }, the set of constraints. Constraint 𝑐𝑖 is a function π·π‘Ž Γ— 𝐷𝑏 Γ— … Γ— π·π‘˜ β†’ ℝ, with {a,b,…,k} ≀ (subset) {1, 2, … , n}, projecting the domains of a subset of variables onto a real number, being the reward. f: X β†’ A, a function mapping variables onto a single agent The total reward of a variable assignment S, assigning value v(π‘₯𝑖) ∈ 𝐷𝑖 to variable π‘₯𝑖, is: 𝐢 𝑆 = 𝑐𝑖(𝑣 π‘₯ π‘Ž , 𝑣 π‘₯ 𝑏 , … , 𝑣(π‘₯ π‘˜ )) π‘š 𝑖=1 DCOPs are used to model a variety of real problems, ranging from disaster response scenarios (Chapman et al. 2011) and distributed sensor network management ( Kho, Rogers, & Jennings, 2009), to traffic management in congested networks (van Leeuwen, Hesselink, & Rohling, 2002). In a DCOP each constraint has its own reward function, and since the total reward for a solution is the sum of all rewards, some constraints can have a larger impact on the solution quality than others. Therefore coordination between specific agents can be more important than others. We will investigate the performance of LJALs on DCOPs where some constraints are more important than others. We will generate random, fully connected DCOPs, drawing the rewards of every constraint function from different normal distributions. We attach a weight 𝑀𝑖 ∈ [0, 1] to each constraint 𝑐𝑖, the problems variance 𝜎 is multiplied with this weight when the reward function for constraint 𝑐𝑖 is calculated. The rewards for constraint 𝑐𝑖 are drawn from this distribution: N(0, πœŽπ‘€π‘–) In our experiment we will compare different LJAL solving the structure given in Fig. 3. The black edges in Fig. 3 correspond to weights of 0.9, while the gray edges correspond to weights of 0.1. Plays Reward Black – IL Red – LJAL-1 Blue – LJAL-2 Green – LJAL-3
  • 3. Fig. 3. A Weighted CG, darker edges mean more important constraints, lighter edges mean less important constraints. In addition to IL and LJAL with random out-degree 2 (LJAL-1), we compare LJALs with a CG matching the problem structure (LJAL-2), and another LJAL with the same structure as the problem but with an added edge between agents 1 and 5 (LJAL-3). From the results shown below (Fig. 4) we can see that LJAL-2 performs better than LJAL-1, meaning that a LJAL with a CG that corresponds to the problem structure gives better solutions than a LJAL with randomly generated CGs. We can also see that an added coordination between agents 1 and 5 in LJAL-3 doesn’t improve the solution quality. This happens because the extra information on an unimportant constraint complicates the coordination on important constraints. According to Taylor et al. (2011) the increase in teamwork is not necessarily beneficial to solution quality. Fig. 4. Comparing IL and LJAL on a distributed constraint optimization problem. We make another experiment to test the effect the extra coordination edge has on solution quality. We switch LJAL-3 by adding an extra coordination edge between agents 4 and 7, and removing the edge between agents 1 and 5 (Fig. 5). We can see that the extra coordination between agent 4 and 7 improved the solution quality, because agents 4 and 7 were not involved in any important constraint. Fig. 5. The effect of an extra coordination edge on solution quality Learning Coordination Graphs In the previous experiment we have shown that LJAL with the same CG as the problem structure perform better than LJAL with random generated CGs. In the next experiment we will make the LJAL learn the optimal CG. The problem of learning a CG is encoded as a distributed n-armed bandit problem. Each agent can choose at most one or two coordination partners.We map the two-partner selection to an n-armed bandit problem by making actions represent pairs of agents instead of single agents. The coordiantion partners are chosen randomly, and after they are chosen the LJAL solve the learning problem using that graph. The resulting reward is used as feedback for chosing the next coordination partners. This is one play at the meta- learning level. This process is repeated until the CG converges. The agents in this meta-bandit problem are independent learners. In our experiment we make the agents learn the CG as proposed in fig. 3. This way we can compare the learned CG with the known problem structure. One meta-bandit run consist of 500 plays. In each play the chosen CG is evaluated in 10 runs of 200 plays. The average of the reward achieved over 10 runs is the estimated reward for the chosen CG. In Fig. 6 we show a CG that agents learned. The temperature 𝜏 is decreased to 𝜏 = 1000 Γ— 0.994 π‘π‘™π‘Žπ‘¦ . The results are averaged over 1000 runs. Reward Plays Black – IL Red – LJAL-1 Blue – LJAL-2 Green – LJAL-3 Reward Plays
  • 4. Fig. 6.. This shows that agents can determine which agents are more important to coordinate with, but we have to explain how the agents who learn the graph perform better than those with the same graphs as the problem structure. Agents that do not coordinate directly are independent learners relative to each other. These agents are able to find the optimal reward by climbing, that is each agent in turn change their action (Guestrin, Lagoudakis, & Parr, 2002). The starting point is the highest average reward, and if a global optimal reward can be achieved by climbing from that point, than independent learning is enough to find the optimal reward. Conclusion Given a CG we implement a distributed q-learning algorithm where the agents find the best actions to maximize the total reward. The only information they have is what action the agents that he is coordinating with are taking, and the total reward of their joint action. In the learning of the CG we implement a q-learning algorithm where agents learn the best coordination graph. In this case since it is not distributed the only information they have is the total reward they get playing with the current coordination graph. References Chapman, A. C., Micillo, R. A., Kota, R., & Jennings, N. R. (2009, May). Decentralised dynamic task allocation: a practical game: theoretic approach. In Proceedings of The 8th International Conference on Autonomous Agents and Multiagent Systems-Volume 2 (pp. 915-922). International Foundation for Autonomous Agents and Multiagent Systems. Claus, C., & Boutilier, C. (1998, July). The dynamics of reinforcement learning in cooperative multiagent systems. In AAAI/IAAI (pp. 746-752). Guestrin, C., Lagoudakis, M., & Parr, R. (2002, July). Coordinated reinforcement learning. In ICML (Vol. 2, pp. 227-234). Kho, J., Rogers, A., & Jennings, N. R. (2009). Decentralized control of adaptive sampling in wireless sensor networks. ACM Transactions on Sensor Networks (TOSN), 5(3), 19. Van Leeuwen, P., Hesselink, H., & Rohling, J. (2002). Scheduling aircraft using constraint satisfaction. Electronic notes in theoretical computer science, 76, 252-268. Sutton, R. S., & Barto, A. G. (1998). Introduction to reinforcement learning. MIT Press. Taylor, M. E., Jain, M., Tandon, P., Yokoo, M., & Tambe, M. (2011). Distributed on-line multi-agent optimization under uncertainty: Balancing exploration and exploitation. Advances in Complex Systems, 14(03), 471-528. 1 2 3 7 6 5 4