SlideShare a Scribd company logo
1 of 8
Download to read offline
IOSR Journal of Computer Engineering (IOSR-JCE)
e-ISSN: 2278-0661,p-ISSN: 2278-8727, Volume 18, Issue 3, Ver. III (May-Jun. 2016), PP 18-25
www.iosrjournals.org
DOI: 10.9790/0661-1803031825 www.iosrjournals.org 18 | Page
Enhancement in Decision Making with Improved Performance by
Multiagent Learning Algorithms
Deepak A. Vidhate1
, Dr. Parag Kulkarni2
1
(Research Scholar, Department of Computer Engineering, College of Engineering, Pune, India)
2
(EKLaT Research, Shivajinagar, Pune, Maharashtra, India)
Abstract:The output of the system is a sequence of actions in some applications. There is no such measure as
the best action in any in-between state; an action is excellent if it is part of a good policy. A single action is not
important; the policy is important that is the sequence of correct actions to reach the goal. In such a case,
machine learning program should be able to assess the goodness of policies and learn from past good action
sequences to be able to generate a policy. A multi-agent environment is one in which there is more than one
agent, where they interact with one another, and further, where there are restrictions on that environment such
that agents may not at any given time know everything about the world that other agents know. Two features of
multi-agent learning which establish its study as a separate field from ordinary machine learning. Parallelism,
scalability, simpler construction and cost effectiveness are main characteristics of multi-agent systems.
Multiagent learning model is given in this paper. Two multiagent learning algorithms i. e. Strategy Sharing &
Joint Rewards algorithm are implemented. In Strategy Sharing algorithm simple averaging of Q tables is taken.
Each Q-learning agent learns from all of its teammates by taking the average of Q-tables. Joint reward learning
algorithm combines the Q learning with the idea of joint rewards. Paper shows result and performance
comparison of the two multiagent learning algorithms.
Keywords: Joint Rewards, Multiagent, Q Learning, Reinforcement Learning, Strategy Sharing
I. Introduction
Consider the example market chain that has hundreds of stores all over a country selling thousands of
goods to millions of customers. The point of sale terminals record the details of each transaction i.e. date,
customer identification code, goods bought and their amount, total money spent and so forth. This typically
generates gigabytes of data every day. What the mark et chain wants is to be able to predict who are the likely
customers for a product. Again, the algorithm for this is not evident; it changes over time and by geographic
location. If stored data is analyzed and turned into information then it becomes useful so that we can make use
of an example to make predictions. We do not know exactly which people are likely to buy this product, or
another product. We would not need any analysis of the data if we know it already. But because we do not
know, we can only collect data and hope to extract the answers to questions from data.
We do believe that there is a process that explains the data we observe. Though we do not know the
details of the process underlying the generation of data – for example, customer behavior - we know that it is not
completely random. People do not go to markets and buy things at random. When they buy beer, they buy chips;
they buy ice cream in summer and spices for Wine in winter. There are certain patterns in the data. We may not
be able to recognize the process completely, but still we can construct a good and useful approximation. That
approximation may not explain everything, but may still be able to account for some part of the data. Though
identifying the complete process may not be possible, but still patterns or regularities can be detected.
Such patterns may help us to understand the process, or make predictions. Assuming that the near
future will not be much different from the past and future predictions can also be expected to be right. There are
many real world problems that involve more than one entity for maximization of an outcome. For example,
consider a scenario of retail shops in which shop A sales clothes, shop B sales jewelry, shop C sales footwear
and wedding house D. In order to build a single system to automate (certain aspects of) the marketing process,
the internals of all shops A, B, C, and D can be modeled. The only feasible solution is to allow the various stores
to create their own policies that accurately represent their goals and interests. They must then be combined into
the system with the aid of some of the techniques. The goal of each shop is to maximize the profit by an increase
in sale i.e. yield maximization. Different parameters need to be considered in this: variation in seasons, the
dependency of items, special schemes, discount, market conditions etc. Different shops can cooperate with each
other for yield maximization in different situations. Several independent tasks that can be handled by separate
agents could benefit from cooperative nature of agents.
Another example of a domain that requires cooperative learning is hospital scheduling. It requires
different agents to represent the regard of different people within the hospital. Hospital employees have a
different outlook. X-ray operators may want to maximize the throughput on their machines. Nurses in the
Enhancement in Decision Making with Improved Performance by Multiagent Learning Algorithms
DOI: 10.9790/0661-1803031825 www.iosrjournals.org 19 | Page
hospital may want to minimize the patient‟s time in the hospital. Since different people examine candidate with
different criteria, they must be represented by cooperative agents. The output of the system is a sequence of
actions in some applications. There is no such measure as the best action in any in-between state; an action is
excellent if it is part of a good policy. A single action is not important; the policy is important that is the
sequence of correct actions to reach the goal. To be able to generate a policy the machine learning programs
should able to assess the quality of policies and learn from past good action sequences.
This paper is organized as: Section II gives the concept of multi-agent agent learning, Section III
describes multi-agent model. Joint Rewards algorithm is given in section IV and Strategy Sharing algorithm is
given in Section IV. Section VI gives experimental setup and Section VII put up the results comparisons of
both algorithms with final concluding remark and future scope.
II. Multi-Agent Learning
An agent is a computational mechanism that reveals a high degree of autonomy. Based on information
received from the environment, the agent performs actions in its environment. A multi-agent environment is one
in which there is more than one agent, where they interact with one another, and further, where there are
restrictions on that environment such that agents may not at any given time know everything about the world
that other agents know[1]. Two features of multi-agent learning which establish its study as a separate field from
ordinary machine learning. First, because multi-agent learning addresses the problem domains involving
multiple agents. The search space considered is extraordinarily huge. Small changes in learned behaviors can
often result in random changes in the resultant macro-level properties of the multi-agent group as a whole due to
the communication of those agents. Second, multi-agent learning involves multiple learners, each learning and
adapting in the context of others; this introduces complex issues to the learning process which are not yet fully
understood[2].
Parallelism, scalability, simpler construction and cost effectiveness are main characteristics of multi-
agent systems. Having these qualities, multiagent systems are used to resolve complex problems, search in large
domains, execute sophisticated tasks, and make more fault-tolerant and reliable systems. In most of the existing
systems, agents‟ behavior and coordination schemes are designed and fixed by the designer. But, an agent with
incomplete and fixed knowledge and behavior cannot be adequately efficient in a dynamic, complex or
changing environment. Therefore, to have all benefits of applying a multi-agent system, agent team must learn
to manage the fresh, hidden and dynamic situations[3].
In approximately all of the present multi-agent teams, agents learn independently. Agents are not
required to learn all things from their own experiences. Each agent observes the others and learns from their
situation and behavior. Also, agents can check with more expert agents or get guidance from them. Agents can
also share their information and learn from this information, i.e. the agent can cooperate in learning.
In the single-agent system, only one agent interacts with the environment. Multiagent system (MAS)
consists of multiple agents. These agents all carry out actions and control their environment. Each agent selects
actions individually, but it is the resulting joint action which manipulates the environment and generates the
reward for the agents. This leads to severe consequences on the characteristics and the complexity of the
problem. Work focused on cooperative MASs in which the agents have to optimize a shared performance
measure[4].
III. Multi-Agent Model
Parameters for Multiagent Model
General model parameters for MAS are described. Most model parameters extend the parameters from
single-agent systems[5]. A Multiagent System can be described using the following model parameters:
 A discrete time step t = 0, 1, 2, 3, . . . .
 A group of n agents A = { A1, A2, . . . ,An }.
 A finite set of environment states S. A state st
є S describes the state of the system at time step t.
 A finite set of actions Ai for every agent i. The action selected by agent i, Ai at time step t is denoted by at
є
A. The joint action a ∈ A = A1 × . . . × An is the vector of all individual actions.
 A reward function R : S × A → R which provides the agent i with a reward rt+1
∈ R(st
, at
) based on the joint
action at
taken in state st
.
 A state transition function T: S×A×S → [0, 1] which gives the transition probability p(st
|at−1
, st−1
) that
the system moves to state st
when the joint action at−1
is performed in state st−1
.
These parameters are very similar to the ones in the single-agent case. However, difficulties arise due
to the decentralized nature of the problem. Each agent selects actions independently, but it is the resulting joint
action that manipulates the environment and produces the reward.
Enhancement in Decision Making with Improved Performance by Multiagent Learning Algorithms
DOI: 10.9790/0661-1803031825 www.iosrjournals.org 20 | Page
Stochastic games (SGs)
Stochastic games are a very natural extension of MDPs to multiple agents.
Definition 1: A stochastic game is a tuple (n, S, A1…n ,T, R1…n ),where n is the number of agents, S is a set of
states, Ai is the set of actions available to agent i with A being the joint action space A1×…×An, T is a
transition function S×A×S→[0,1], and Ri is a reward function for the ith
agent S×A→R. SGs are a very natural
extension of MDPs to multiple agents[6].
Model of Multiagent Q-learning
Model of multiagent Q-learning with ε-greedy exploration is presented here. Effect of ε -greedy
mechanism and the presence of other agents on learning the process of one agent are studied to develop the
model. The derivation of a continuous time equation for the Q-learning rule is firstly demonstrated. Then the
limits of this equation for the case of a single learner are analysed. It indicates how they change dynamically
when multiple learners are considered. Finally, it is proved that the ε-greedy mechanism affects the shape of the
modeled function[7].
Consider the situation composed of 2 agents with 2 actions each and a single state. The reward
functions of the agents, in this case can be described using tables of the form:
A=
𝑎11 𝑎12
𝑎21 𝑎22
B=
𝑏11 𝑏12
𝑏21 𝑏22
where A describes the rewards, for the first agent and B the rewards for the second agent. Q-learning update
rule can be simplified for only one state as:
Qai := Qai + α(rai − Qai )………………..........……..(1)
where
Qai Q-value of agent a for action i
rai  reward that agent a receives for executing action i.
a  agent and i  action
Analysis
The update rule for the first agent can be rewritten as:
Qai (k + 1) − Qai (k) = α(rai (k + 1) − Qai(k))…........(2)
This difference equation explains the absolute growth in Qai between times k and k + 1. To obtain its
continuous time version, consider Δt ∈ [0, 1] to be a small amount of time.
Qai (k + Δt) − Qai (k) ≈ Δt × α(rai(k+Δt) − Qai (k))..(3)
to be the approximate growth in Qai during Δt.
if Δt = 0 then the equation becomes identity.
If Δt = 1 then it becomes Qai (k + 1) − Qai (k) = α(rai (k + 1) − Qai(k)) i.e. equation 5.
If Δt = {0,1} then it becomes linear approximation.
Divide both sides of the equation by Δt
Qai (k + Δt) − Qai (k)
Δt
≈ α(rai(k+Δt) − Qai (k))………....(4)
taking limits Δt→0 on both sides
lim Δt→0
Qai (k + Δt) − Qai (k)
Δt
≈ lim Δt→0 α ( rai ( k + Δt ) − Qai (k))
lim Δt→0
Qai (k + Δt) − Qai (k)
Δt
≈ α ( rai (k) − Qai (k))
dQai (k)
dt
≈ α ( rai (k) − Qai (k))……….. ………....…(5)
It is an approximation for the continuous time version of Equation 2.
Solution to this equation found by integration as
Qai(k)=C𝑒−𝛼𝑡
+ rai…………...………………........(6)
where C is the constant of integration.
As 𝑒−𝑥
is a monotonic function and limx→∞𝑒−𝑥
=0, it is easy to observe that the limit of Equation 3
when t→ is rai:
ai(K) = C + rai = rai…...……..(7)
It is considered that only the first agent is learning and that the second agent is using a pure strategy
then it will always generate the same reward for the first agent. In this case, the derivation above is sufficient to
prove that Qai will monotonically increase or decrease towards rai, for any initial value of Qai. More particularly,
Enhancement in Decision Making with Improved Performance by Multiagent Learning Algorithms
DOI: 10.9790/0661-1803031825 www.iosrjournals.org 21 | Page
the function is monotonically increasing if Qai (0) < rai and monotonically decreasing if Qai (0) > rai. The model
build here will be used to determine how one learning strategy affects the action selection of another learning
agent.
IV. Joint Rewards Algorithm
To extend Q-learning into a multi-agent system, two main challenges exist. One is how to deal with the
huge Q-table with the increment of dimension in a multi-agent system. The other challenge is how to utilize the
cooperative behavior among agents to obtain more efficient learning results. Joint rewards ensure agents to learn
in a multiagent environment. The experiment results show the efficiency and well convergence of the algorithm.
It is not enough for each agent to proceed selfishly in order to reach a globally optimal strategy in a multi-agent
environment. It is not possible to accomplish with only one agent to accomplish a task.
The section presents a joint reward learning algorithm which combined the Q-learning with the idea of
joint rewards to meet above two challenges partly. In a multi-agent system, every agent needs to maintain a Q
table which contains the information about its and others agent‟ states and actions, i.e. the situation of the whole
environment. In order to encourage cooperative behavior among agents to get global optimal rewards, it should
take account of the other agents‟ actions. Here, we use a simplified of vicarious rewards for feasible to realize.
We call „joint reward‟[9],
jri = b.pri + (1 - b). ……………………………………………………….(8)
where pr is the personal reward of agent i, and the sum of rewards of other agents except i and
0 < b ≤ 1 is the personal weight, denoting how much importance is given to agent‟s personal reward
compared to that of other agents. The improved update rule of Q-learning values of agent i can be
formulated as
Qi
new
(xi, ai) := (1 - α) Qi
Old
(xi, ai) + α (jri + β maxa Q(s, a))......................................(9)
Algorithm 1: Multiagent learning using Joint Reward Algorithm
1. For each agent i (0 < i < m) the learning procedure as
2.for all a ∈ A and s ∈S initialize Q(s,a) = 0
3.let t = 0, = 0
4.Loop:
5.select action ai
which has max Q value
6.execute action ai
7.receive an immediate reward
8.observe the new state s
9.calculate joint rewards as
10. jri = b.pri + (1 - b). // sum of rewards of other agents except i
11. update Q learning values of agent i as
Qi
new
(xi, ai) := (1 - α) Qi
Old
(xi, ai) + α (jri + β maxa Q(s, a))
12. end
V. Strategy Sharing Algorithm
One way for knowledge sharing in multi-agents is knowledge averaging. We can divide averaging in
two general categories, simple averaging and weighted averaging. In simple averaging, which is called strategy
sharing, each Q-learning agent learns from all of its teammates by taking the average of Q-tables[8].
= ……………………………(10)
The SS treats all agents similarly, ignoring their level of knowledge. This method does not consider agents‟
different expertise levels. SS algorithm is described as:
Algorithm 2 : Strategy Sharing Algorithm
1. initialize
2. while not EndOfLearning do
3. begin
4. if InIndividualLearning Mode then
Enhancement in Decision Making with Improved Performance by Multiagent Learning Algorithms
DOI: 10.9790/0661-1803031825 www.iosrjournals.org 22 | Page
5. begin {Individual Learning}
6. xi := FindCurrentState()
7. ai := SelectAction()
8. DoAction(ai)
9. ri := GetReward()
10. yi := GoToNextState()
11. v(yi) := Maxb ∈actionsQ(yi,b)
12. Qi
new
(xi,ai) := (1 - βi)Qi
Old
(xi,ai)+ βi(ri +γiV(yi) )
13. end
14. else {Multiagent Learning}
15. begin
16. for j := 1 to n do
17. Qi
new
:= 0
18. for j := 1 to n do
19. begin
20. Qi
new
:= Qi
new
+ 1/n ∑ Qj
old
// strategy sharing by simple averaging of Q tables
21. end
VI. Experimental Setup
Model design:
Maximize the sale of products that depends on price of product, customer age and period of sale. These
are the information available to each agent i.e. shop. So it becomes the state of environment. Final result is to
maximize profit by increasing total sale of products[10].
Input Data set:
We define the action set as the sale of possible product. i.e. A={p1,p2,p3…….p10}. Hence action a 
A. State of the system is queue of customer in the particular month for the given shop agent. So state can be
described as
X(t) = { x1(t), x2(t),m }
where
x1  customer queue with age ==> { Y, M, O } i.e. young, middle and Old age customer
x2  price of product queue ==> { H, M, L } i.e. High, Medium, Low
m  month of product sale ==> { 1,2,3,4……..12 }
In the system minimum, 108 states and actions are possible. The number of state-action increases as the
number of transactions increases. For simplicity, it is assumed that single state for each transaction else the state
space becomes infinitely large. Shop agent observes the queue and decides product i.e. action for each
customer/state. After every sale reward is given to the agent. The table shows the snapshot of the dataset
generated for single shop agent.
Table 1: Snapshot of Dataset used
In a particular season, the sale of one shop increases. With the help of cooperative learning, other shops
learn about an increase in the sale & they can take necessary actions for their profit maximization[11]. At time
0, the process X(t) is observed and classified into one of the states in the possible set of states (denoted by S).
After identification of state, the agent chooses a product action from A. If the process is in state i and agent
chooses a A, then
i. The process transition into state jS with probability Pij(a)
ii. Conditional on the event that the next state is j. The time until next transition is a random variable with
probability distribution Fij(./a)
After the transition occurs, product sale action is chosen again by the agent and (i) and (ii) are repeated.
TID Age Price Month Action Selected (Product)
1 Y L 1 P1,P2,P4
2 Y M 1 P2,P3
3 Y H 1 P3,P4
4 M L 1 P1,P2
5 M M 1 P1,P2,P3
6 M H 1 P4,P2
7 O L 1 P1,P3
Enhancement in Decision Making with Improved Performance by Multiagent Learning Algorithms
DOI: 10.9790/0661-1803031825 www.iosrjournals.org 23 | Page
VII. Results
The simulation results show the efficiency of the learning algorithm. In multiagent learning algorithms
are applied to shop dataset of cloth, jewellery & footwear shop and the result analysis is done for a year, the
specific number of products is purchased by particular customer age group. Shop agent will understand that in a
year number of products is to be sold to the customers having different age group. Figure 1 shows the results of
Strategy Sharing Algorithm Products Vs Customer Age Count. Figure 2 gives the results Joint Reward
Algorithm Products Vs Customer Age Count. Figure 3 shows the results of Strategy Sharing Algorithm for
Products Vs Quantity and Figure 4 gives the results of Joint Rewards Algorithm for Products Vs Quantity.
Figure 1: Strategy Sharing Algorithm Products Vs Customer Age Count
Figure 2: Joint Reward Algorithm Products Vs Customer Age Count
Enhancement in Decision Making with Improved Performance by Multiagent Learning Algorithms
DOI: 10.9790/0661-1803031825 www.iosrjournals.org 24 | Page
Figure 3: Strategy Sharing Algorithm Products Vs Quantity
Figure 4: Joint Rewards Algorithm Products Vs Quantity
Joint Reward Learning algorithm gives precise results than Strategy Sharing learning algorithm and
gives good predictions of the products. It gives the pattern of product sale with customer age group for a period.
The Q function values are tabulated for obtaining some insights. Q tables show the best action (that is optimal
the product) for different individual states. By knowing the Q function, the shop agent can compute best
possible product for a given state that gives maximum profit to it. Single agent learning, multi-agent learning,
cooperative learning and improved cooperative learning algorithms are implemented and results are compared.
It has shown how a shop agent can effectively use reinforcement learning in setting products dynamically so as
to maximize its profit matrix. It is believed that this is a promising approach for profit maximization in retail
market environments with limited information. In multiagent learning the result analysis is done for a year, the
specific number of products is purchased by particular customer age group. Shop agent will understand that in a
year number of products is to be sold to the customers having the different age group. Joint Reward Learning
algorithm gives precise results than Strategy Sharing learning algorithm and gives good predictions of the
products. These products are combined together for the increase in sale.
VIII. Conclusion
Learning algorithms are best suitable for decision making. Multiagent learning has more knowledge
and information available. In this method, sharing of information and policy is possible. Multiagent learning
always performs better compared to single agent learning. However, multiagent learning is still lacking in
proper communication between agents. Sharing of more knowledge and information is possible and all agents‟
knowledge is used equally, jointly solves the problem cooperatively is the future scope of this paper.
Enhancement in Decision Making with Improved Performance by Multiagent Learning Algorithms
DOI: 10.9790/0661-1803031825 www.iosrjournals.org 25 | Page
References
Journal Papers:
[1] Adnan M. Al-Khatib “Cooperative Machine Learning Method” World of Computer Science and Information Technology Journal
(WCSIT) ISSN: 2221-0741 Vol.1, No.9, 380-383, 2011.
[2] Babak Nadjar Araabi, Sahar Mastoureshgh, and Majid Nili Ahmadabadi “A Study on Expertise of Agents and Its Effects on
Cooperative Q-Learning” IEEE Transactions on Evolutionary Computation, vol:14, pp:23-57, 2010
[3] Liviu Panait Sean Luke “Cooperative Multi-Agent Learning: The State of the Art”, published in Journal of Autonomous Agents and
Multi-Agent Systems Volume 11 Issue 3, pp. 387 – 434, 05.
[4] M.V. Nagendra Prasad & Victor R. Lesser “Learning Situation-Specific Coordination in Cooperative Multi-agent Systems” in
Journal of Autonomous Agents and Multi-Agent Systems, Volume 2 Issue 2, pp. 173 – 207, 1999.
[5] Michael Kinney & Costas Tsatsoulis “Learning Communication Strategies in Multiagent Systems”, in Journal of Applied
Intelligence, Volume 9 Issue 1, pp 71-91, 1998.
[6] Ronen Brafman & Moshe Tennenholtz “Learning to Coordinate Efficiently: A Model-based Approach”, in Journal of Artificial
Intelligence Research, Volume 19 Issue 1, pp. 11-23, 2003.
Books:
[7] Ethem Alpaydin “Introduction to Machine Learning” Second Edition, MIT Press by PHI.
[8] Tom Mitchell “Machine Learning” McGraw Hill International Edition.
Proceedings Papers:
[9] Dr. Hamid R. Berenji David Vengerov “Learning, Cooperation, and Coordination in Multi-Agent Systems”, in Proceedings of 9th
IEEE International Conference on Fuzzy Systems, 2000.
[10] Jun-Yuan Tao, De-Sheng Li “Cooperative Strategy Learning In Multi-Agent Environment With Continuous State Space”, IEEE
International Conference on Machine Learning and Cybernetics, pp.2107 – 2111, 2006.
[11] La-mei GAO, Jun ZENG, Jie WU, Min LI “Cooperative Reinforcement Learning Algorithm to Distributed Power System based on
Multi-Agent” 2009 3rd International Conference on Power Electronics Systems and Applications Digital Reference: K210509035

More Related Content

What's hot

A model for profit pattern mining based on genetic algorithm
A model for profit pattern mining based on genetic algorithmA model for profit pattern mining based on genetic algorithm
A model for profit pattern mining based on genetic algorithm
eSAT Journals
 
Internship project report,Predictive Modelling
Internship project report,Predictive ModellingInternship project report,Predictive Modelling
Internship project report,Predictive Modelling
Amit Kumar
 

What's hot (17)

Analytic network process
Analytic network processAnalytic network process
Analytic network process
 
Data Science Methodology for Analytics and Solution Implementation
Data Science Methodology for Analytics and Solution ImplementationData Science Methodology for Analytics and Solution Implementation
Data Science Methodology for Analytics and Solution Implementation
 
Reduction in customer complaints - Mortgage Industry
Reduction in customer complaints - Mortgage IndustryReduction in customer complaints - Mortgage Industry
Reduction in customer complaints - Mortgage Industry
 
Classes of Model
Classes of ModelClasses of Model
Classes of Model
 
How ml can improve purchase conversions
How ml can improve purchase conversionsHow ml can improve purchase conversions
How ml can improve purchase conversions
 
Online Assignment- SIMULATION
 Online Assignment- SIMULATION Online Assignment- SIMULATION
Online Assignment- SIMULATION
 
Online Assignment - SIMULATION
Online Assignment - SIMULATIONOnline Assignment - SIMULATION
Online Assignment - SIMULATION
 
A model for profit pattern mining based on genetic algorithm
A model for profit pattern mining based on genetic algorithmA model for profit pattern mining based on genetic algorithm
A model for profit pattern mining based on genetic algorithm
 
K1802056469
K1802056469K1802056469
K1802056469
 
Internship project report,Predictive Modelling
Internship project report,Predictive ModellingInternship project report,Predictive Modelling
Internship project report,Predictive Modelling
 
Introductionedited
IntroductioneditedIntroductionedited
Introductionedited
 
Introduction to Business Anlytics and Strategic Landscape
Introduction to Business Anlytics and Strategic LandscapeIntroduction to Business Anlytics and Strategic Landscape
Introduction to Business Anlytics and Strategic Landscape
 
Eckovation Machine Learning
Eckovation Machine LearningEckovation Machine Learning
Eckovation Machine Learning
 
IRJET- Fatigue Analysis of Offshore Steel Structures
IRJET- Fatigue Analysis of Offshore Steel StructuresIRJET- Fatigue Analysis of Offshore Steel Structures
IRJET- Fatigue Analysis of Offshore Steel Structures
 
The multifaceted role_of_the_analyst
The multifaceted role_of_the_analystThe multifaceted role_of_the_analyst
The multifaceted role_of_the_analyst
 
Predictive data analytics models and their applications
Predictive data analytics models and their applicationsPredictive data analytics models and their applications
Predictive data analytics models and their applications
 
Risk assessment of information production using extended risk matrix approach
Risk assessment of information production using extended risk matrix approachRisk assessment of information production using extended risk matrix approach
Risk assessment of information production using extended risk matrix approach
 

Viewers also liked

Spectral studies of 5-({4-amino-2-[(Z)-(2-hydroxybenzylidene) amino] pyrimidi...
Spectral studies of 5-({4-amino-2-[(Z)-(2-hydroxybenzylidene) amino] pyrimidi...Spectral studies of 5-({4-amino-2-[(Z)-(2-hydroxybenzylidene) amino] pyrimidi...
Spectral studies of 5-({4-amino-2-[(Z)-(2-hydroxybenzylidene) amino] pyrimidi...
IOSR Journals
 
Impact of Hepatitis B Virus (HBV) Vaccination in Childrens Born to HBV Positi...
Impact of Hepatitis B Virus (HBV) Vaccination in Childrens Born to HBV Positi...Impact of Hepatitis B Virus (HBV) Vaccination in Childrens Born to HBV Positi...
Impact of Hepatitis B Virus (HBV) Vaccination in Childrens Born to HBV Positi...
IOSR Journals
 

Viewers also liked (20)

Cryptography On Android Message Application Using Look Up Table And Dynamic ...
Cryptography On Android Message Application Using Look  Up Table And Dynamic ...Cryptography On Android Message Application Using Look  Up Table And Dynamic ...
Cryptography On Android Message Application Using Look Up Table And Dynamic ...
 
C018111623
C018111623C018111623
C018111623
 
Spectral studies of 5-({4-amino-2-[(Z)-(2-hydroxybenzylidene) amino] pyrimidi...
Spectral studies of 5-({4-amino-2-[(Z)-(2-hydroxybenzylidene) amino] pyrimidi...Spectral studies of 5-({4-amino-2-[(Z)-(2-hydroxybenzylidene) amino] pyrimidi...
Spectral studies of 5-({4-amino-2-[(Z)-(2-hydroxybenzylidene) amino] pyrimidi...
 
Impact of Hepatitis B Virus (HBV) Vaccination in Childrens Born to HBV Positi...
Impact of Hepatitis B Virus (HBV) Vaccination in Childrens Born to HBV Positi...Impact of Hepatitis B Virus (HBV) Vaccination in Childrens Born to HBV Positi...
Impact of Hepatitis B Virus (HBV) Vaccination in Childrens Born to HBV Positi...
 
B0160709
B0160709B0160709
B0160709
 
B013150813
B013150813B013150813
B013150813
 
F0213137
F0213137F0213137
F0213137
 
K013166871
K013166871K013166871
K013166871
 
I0554754
I0554754I0554754
I0554754
 
I010246467
I010246467I010246467
I010246467
 
H1303055462
H1303055462H1303055462
H1303055462
 
C010111117
C010111117C010111117
C010111117
 
Bridgeless CUK Rectifier with Output Voltage Regulation using Fuzzy controller
Bridgeless CUK Rectifier with Output Voltage Regulation using Fuzzy controllerBridgeless CUK Rectifier with Output Voltage Regulation using Fuzzy controller
Bridgeless CUK Rectifier with Output Voltage Regulation using Fuzzy controller
 
Design of de-coupler for an interacting tanks system
Design of de-coupler for an interacting tanks systemDesign of de-coupler for an interacting tanks system
Design of de-coupler for an interacting tanks system
 
D017612529
D017612529D017612529
D017612529
 
An Enhanced ILD Diagnosis Method using DWT
An Enhanced ILD Diagnosis Method using DWTAn Enhanced ILD Diagnosis Method using DWT
An Enhanced ILD Diagnosis Method using DWT
 
P1303019093
P1303019093P1303019093
P1303019093
 
Stable Multi Optimized Algorithm Used For Controlling The Load Shedding Probl...
Stable Multi Optimized Algorithm Used For Controlling The Load Shedding Probl...Stable Multi Optimized Algorithm Used For Controlling The Load Shedding Probl...
Stable Multi Optimized Algorithm Used For Controlling The Load Shedding Probl...
 
E0432933
E0432933E0432933
E0432933
 
H0964752
H0964752H0964752
H0964752
 

Similar to C1803031825

Running Head Data Mining in The Cloud .docx
Running Head Data Mining in The Cloud                            .docxRunning Head Data Mining in The Cloud                            .docx
Running Head Data Mining in The Cloud .docx
healdkathaleen
 
Applied_Data_Science_Presented_by_Yhat
Applied_Data_Science_Presented_by_YhatApplied_Data_Science_Presented_by_Yhat
Applied_Data_Science_Presented_by_Yhat
Charlie Hecht
 
Article 1Question1.  What is that project Provide complete .docx
Article 1Question1.  What is that project Provide complete .docxArticle 1Question1.  What is that project Provide complete .docx
Article 1Question1.  What is that project Provide complete .docx
davezstarr61655
 
FOUR TYPES OF BUSINESS ANALYTICS TO KNOWBUSINESS ANALYTICSby A
FOUR TYPES OF BUSINESS ANALYTICS TO KNOWBUSINESS ANALYTICSby AFOUR TYPES OF BUSINESS ANALYTICS TO KNOWBUSINESS ANALYTICSby A
FOUR TYPES OF BUSINESS ANALYTICS TO KNOWBUSINESS ANALYTICSby A
JeanmarieColbert3
 

Similar to C1803031825 (20)

Innovation and System Design
Innovation and System DesignInnovation and System Design
Innovation and System Design
 
MIS 05 Decision Support Systems
MIS 05  Decision Support SystemsMIS 05  Decision Support Systems
MIS 05 Decision Support Systems
 
Introduction to Information Systems Supporting 4th Edition Rainer Solutions M...
Introduction to Information Systems Supporting 4th Edition Rainer Solutions M...Introduction to Information Systems Supporting 4th Edition Rainer Solutions M...
Introduction to Information Systems Supporting 4th Edition Rainer Solutions M...
 
Running Head Data Mining in The Cloud .docx
Running Head Data Mining in The Cloud                            .docxRunning Head Data Mining in The Cloud                            .docx
Running Head Data Mining in The Cloud .docx
 
Information systems strategy formulation
Information systems strategy formulationInformation systems strategy formulation
Information systems strategy formulation
 
machine learning.docx
machine learning.docxmachine learning.docx
machine learning.docx
 
Machine Learning
Machine LearningMachine Learning
Machine Learning
 
Machine Learning for Business - Eight Best Practices for Getting Started
Machine Learning for Business - Eight Best Practices for Getting StartedMachine Learning for Business - Eight Best Practices for Getting Started
Machine Learning for Business - Eight Best Practices for Getting Started
 
journalism research
journalism researchjournalism research
journalism research
 
journalism research
journalism researchjournalism research
journalism research
 
Management ( Six Business Objectives)
Management ( Six Business Objectives)Management ( Six Business Objectives)
Management ( Six Business Objectives)
 
Mis unit iv-converted
Mis unit  iv-convertedMis unit  iv-converted
Mis unit iv-converted
 
Applied_Data_Science_Presented_by_Yhat
Applied_Data_Science_Presented_by_YhatApplied_Data_Science_Presented_by_Yhat
Applied_Data_Science_Presented_by_Yhat
 
Ravindra industrial training
Ravindra industrial trainingRavindra industrial training
Ravindra industrial training
 
Your Cognitive Future
Your Cognitive FutureYour Cognitive Future
Your Cognitive Future
 
Weak Signals White Paper
Weak Signals White PaperWeak Signals White Paper
Weak Signals White Paper
 
Article 1Question1.  What is that project Provide complete .docx
Article 1Question1.  What is that project Provide complete .docxArticle 1Question1.  What is that project Provide complete .docx
Article 1Question1.  What is that project Provide complete .docx
 
Standards For Wright Aircraft Corp
Standards For Wright Aircraft CorpStandards For Wright Aircraft Corp
Standards For Wright Aircraft Corp
 
Big Data Analytics for Predicting Consumer Behaviour
Big Data Analytics for Predicting Consumer BehaviourBig Data Analytics for Predicting Consumer Behaviour
Big Data Analytics for Predicting Consumer Behaviour
 
FOUR TYPES OF BUSINESS ANALYTICS TO KNOWBUSINESS ANALYTICSby A
FOUR TYPES OF BUSINESS ANALYTICS TO KNOWBUSINESS ANALYTICSby AFOUR TYPES OF BUSINESS ANALYTICS TO KNOWBUSINESS ANALYTICSby A
FOUR TYPES OF BUSINESS ANALYTICS TO KNOWBUSINESS ANALYTICSby A
 

More from IOSR Journals

More from IOSR Journals (20)

A011140104
A011140104A011140104
A011140104
 
M0111397100
M0111397100M0111397100
M0111397100
 
L011138596
L011138596L011138596
L011138596
 
K011138084
K011138084K011138084
K011138084
 
J011137479
J011137479J011137479
J011137479
 
I011136673
I011136673I011136673
I011136673
 
G011134454
G011134454G011134454
G011134454
 
H011135565
H011135565H011135565
H011135565
 
F011134043
F011134043F011134043
F011134043
 
E011133639
E011133639E011133639
E011133639
 
D011132635
D011132635D011132635
D011132635
 
C011131925
C011131925C011131925
C011131925
 
B011130918
B011130918B011130918
B011130918
 
A011130108
A011130108A011130108
A011130108
 
I011125160
I011125160I011125160
I011125160
 
H011124050
H011124050H011124050
H011124050
 
G011123539
G011123539G011123539
G011123539
 
F011123134
F011123134F011123134
F011123134
 
E011122530
E011122530E011122530
E011122530
 
D011121524
D011121524D011121524
D011121524
 

Recently uploaded

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 

Recently uploaded (20)

Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation Strategies
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 

C1803031825

  • 1. IOSR Journal of Computer Engineering (IOSR-JCE) e-ISSN: 2278-0661,p-ISSN: 2278-8727, Volume 18, Issue 3, Ver. III (May-Jun. 2016), PP 18-25 www.iosrjournals.org DOI: 10.9790/0661-1803031825 www.iosrjournals.org 18 | Page Enhancement in Decision Making with Improved Performance by Multiagent Learning Algorithms Deepak A. Vidhate1 , Dr. Parag Kulkarni2 1 (Research Scholar, Department of Computer Engineering, College of Engineering, Pune, India) 2 (EKLaT Research, Shivajinagar, Pune, Maharashtra, India) Abstract:The output of the system is a sequence of actions in some applications. There is no such measure as the best action in any in-between state; an action is excellent if it is part of a good policy. A single action is not important; the policy is important that is the sequence of correct actions to reach the goal. In such a case, machine learning program should be able to assess the goodness of policies and learn from past good action sequences to be able to generate a policy. A multi-agent environment is one in which there is more than one agent, where they interact with one another, and further, where there are restrictions on that environment such that agents may not at any given time know everything about the world that other agents know. Two features of multi-agent learning which establish its study as a separate field from ordinary machine learning. Parallelism, scalability, simpler construction and cost effectiveness are main characteristics of multi-agent systems. Multiagent learning model is given in this paper. Two multiagent learning algorithms i. e. Strategy Sharing & Joint Rewards algorithm are implemented. In Strategy Sharing algorithm simple averaging of Q tables is taken. Each Q-learning agent learns from all of its teammates by taking the average of Q-tables. Joint reward learning algorithm combines the Q learning with the idea of joint rewards. Paper shows result and performance comparison of the two multiagent learning algorithms. Keywords: Joint Rewards, Multiagent, Q Learning, Reinforcement Learning, Strategy Sharing I. Introduction Consider the example market chain that has hundreds of stores all over a country selling thousands of goods to millions of customers. The point of sale terminals record the details of each transaction i.e. date, customer identification code, goods bought and their amount, total money spent and so forth. This typically generates gigabytes of data every day. What the mark et chain wants is to be able to predict who are the likely customers for a product. Again, the algorithm for this is not evident; it changes over time and by geographic location. If stored data is analyzed and turned into information then it becomes useful so that we can make use of an example to make predictions. We do not know exactly which people are likely to buy this product, or another product. We would not need any analysis of the data if we know it already. But because we do not know, we can only collect data and hope to extract the answers to questions from data. We do believe that there is a process that explains the data we observe. Though we do not know the details of the process underlying the generation of data – for example, customer behavior - we know that it is not completely random. People do not go to markets and buy things at random. When they buy beer, they buy chips; they buy ice cream in summer and spices for Wine in winter. There are certain patterns in the data. We may not be able to recognize the process completely, but still we can construct a good and useful approximation. That approximation may not explain everything, but may still be able to account for some part of the data. Though identifying the complete process may not be possible, but still patterns or regularities can be detected. Such patterns may help us to understand the process, or make predictions. Assuming that the near future will not be much different from the past and future predictions can also be expected to be right. There are many real world problems that involve more than one entity for maximization of an outcome. For example, consider a scenario of retail shops in which shop A sales clothes, shop B sales jewelry, shop C sales footwear and wedding house D. In order to build a single system to automate (certain aspects of) the marketing process, the internals of all shops A, B, C, and D can be modeled. The only feasible solution is to allow the various stores to create their own policies that accurately represent their goals and interests. They must then be combined into the system with the aid of some of the techniques. The goal of each shop is to maximize the profit by an increase in sale i.e. yield maximization. Different parameters need to be considered in this: variation in seasons, the dependency of items, special schemes, discount, market conditions etc. Different shops can cooperate with each other for yield maximization in different situations. Several independent tasks that can be handled by separate agents could benefit from cooperative nature of agents. Another example of a domain that requires cooperative learning is hospital scheduling. It requires different agents to represent the regard of different people within the hospital. Hospital employees have a different outlook. X-ray operators may want to maximize the throughput on their machines. Nurses in the
  • 2. Enhancement in Decision Making with Improved Performance by Multiagent Learning Algorithms DOI: 10.9790/0661-1803031825 www.iosrjournals.org 19 | Page hospital may want to minimize the patient‟s time in the hospital. Since different people examine candidate with different criteria, they must be represented by cooperative agents. The output of the system is a sequence of actions in some applications. There is no such measure as the best action in any in-between state; an action is excellent if it is part of a good policy. A single action is not important; the policy is important that is the sequence of correct actions to reach the goal. To be able to generate a policy the machine learning programs should able to assess the quality of policies and learn from past good action sequences. This paper is organized as: Section II gives the concept of multi-agent agent learning, Section III describes multi-agent model. Joint Rewards algorithm is given in section IV and Strategy Sharing algorithm is given in Section IV. Section VI gives experimental setup and Section VII put up the results comparisons of both algorithms with final concluding remark and future scope. II. Multi-Agent Learning An agent is a computational mechanism that reveals a high degree of autonomy. Based on information received from the environment, the agent performs actions in its environment. A multi-agent environment is one in which there is more than one agent, where they interact with one another, and further, where there are restrictions on that environment such that agents may not at any given time know everything about the world that other agents know[1]. Two features of multi-agent learning which establish its study as a separate field from ordinary machine learning. First, because multi-agent learning addresses the problem domains involving multiple agents. The search space considered is extraordinarily huge. Small changes in learned behaviors can often result in random changes in the resultant macro-level properties of the multi-agent group as a whole due to the communication of those agents. Second, multi-agent learning involves multiple learners, each learning and adapting in the context of others; this introduces complex issues to the learning process which are not yet fully understood[2]. Parallelism, scalability, simpler construction and cost effectiveness are main characteristics of multi- agent systems. Having these qualities, multiagent systems are used to resolve complex problems, search in large domains, execute sophisticated tasks, and make more fault-tolerant and reliable systems. In most of the existing systems, agents‟ behavior and coordination schemes are designed and fixed by the designer. But, an agent with incomplete and fixed knowledge and behavior cannot be adequately efficient in a dynamic, complex or changing environment. Therefore, to have all benefits of applying a multi-agent system, agent team must learn to manage the fresh, hidden and dynamic situations[3]. In approximately all of the present multi-agent teams, agents learn independently. Agents are not required to learn all things from their own experiences. Each agent observes the others and learns from their situation and behavior. Also, agents can check with more expert agents or get guidance from them. Agents can also share their information and learn from this information, i.e. the agent can cooperate in learning. In the single-agent system, only one agent interacts with the environment. Multiagent system (MAS) consists of multiple agents. These agents all carry out actions and control their environment. Each agent selects actions individually, but it is the resulting joint action which manipulates the environment and generates the reward for the agents. This leads to severe consequences on the characteristics and the complexity of the problem. Work focused on cooperative MASs in which the agents have to optimize a shared performance measure[4]. III. Multi-Agent Model Parameters for Multiagent Model General model parameters for MAS are described. Most model parameters extend the parameters from single-agent systems[5]. A Multiagent System can be described using the following model parameters:  A discrete time step t = 0, 1, 2, 3, . . . .  A group of n agents A = { A1, A2, . . . ,An }.  A finite set of environment states S. A state st є S describes the state of the system at time step t.  A finite set of actions Ai for every agent i. The action selected by agent i, Ai at time step t is denoted by at є A. The joint action a ∈ A = A1 × . . . × An is the vector of all individual actions.  A reward function R : S × A → R which provides the agent i with a reward rt+1 ∈ R(st , at ) based on the joint action at taken in state st .  A state transition function T: S×A×S → [0, 1] which gives the transition probability p(st |at−1 , st−1 ) that the system moves to state st when the joint action at−1 is performed in state st−1 . These parameters are very similar to the ones in the single-agent case. However, difficulties arise due to the decentralized nature of the problem. Each agent selects actions independently, but it is the resulting joint action that manipulates the environment and produces the reward.
  • 3. Enhancement in Decision Making with Improved Performance by Multiagent Learning Algorithms DOI: 10.9790/0661-1803031825 www.iosrjournals.org 20 | Page Stochastic games (SGs) Stochastic games are a very natural extension of MDPs to multiple agents. Definition 1: A stochastic game is a tuple (n, S, A1…n ,T, R1…n ),where n is the number of agents, S is a set of states, Ai is the set of actions available to agent i with A being the joint action space A1×…×An, T is a transition function S×A×S→[0,1], and Ri is a reward function for the ith agent S×A→R. SGs are a very natural extension of MDPs to multiple agents[6]. Model of Multiagent Q-learning Model of multiagent Q-learning with ε-greedy exploration is presented here. Effect of ε -greedy mechanism and the presence of other agents on learning the process of one agent are studied to develop the model. The derivation of a continuous time equation for the Q-learning rule is firstly demonstrated. Then the limits of this equation for the case of a single learner are analysed. It indicates how they change dynamically when multiple learners are considered. Finally, it is proved that the ε-greedy mechanism affects the shape of the modeled function[7]. Consider the situation composed of 2 agents with 2 actions each and a single state. The reward functions of the agents, in this case can be described using tables of the form: A= 𝑎11 𝑎12 𝑎21 𝑎22 B= 𝑏11 𝑏12 𝑏21 𝑏22 where A describes the rewards, for the first agent and B the rewards for the second agent. Q-learning update rule can be simplified for only one state as: Qai := Qai + α(rai − Qai )………………..........……..(1) where Qai Q-value of agent a for action i rai  reward that agent a receives for executing action i. a  agent and i  action Analysis The update rule for the first agent can be rewritten as: Qai (k + 1) − Qai (k) = α(rai (k + 1) − Qai(k))…........(2) This difference equation explains the absolute growth in Qai between times k and k + 1. To obtain its continuous time version, consider Δt ∈ [0, 1] to be a small amount of time. Qai (k + Δt) − Qai (k) ≈ Δt × α(rai(k+Δt) − Qai (k))..(3) to be the approximate growth in Qai during Δt. if Δt = 0 then the equation becomes identity. If Δt = 1 then it becomes Qai (k + 1) − Qai (k) = α(rai (k + 1) − Qai(k)) i.e. equation 5. If Δt = {0,1} then it becomes linear approximation. Divide both sides of the equation by Δt Qai (k + Δt) − Qai (k) Δt ≈ α(rai(k+Δt) − Qai (k))………....(4) taking limits Δt→0 on both sides lim Δt→0 Qai (k + Δt) − Qai (k) Δt ≈ lim Δt→0 α ( rai ( k + Δt ) − Qai (k)) lim Δt→0 Qai (k + Δt) − Qai (k) Δt ≈ α ( rai (k) − Qai (k)) dQai (k) dt ≈ α ( rai (k) − Qai (k))……….. ………....…(5) It is an approximation for the continuous time version of Equation 2. Solution to this equation found by integration as Qai(k)=C𝑒−𝛼𝑡 + rai…………...………………........(6) where C is the constant of integration. As 𝑒−𝑥 is a monotonic function and limx→∞𝑒−𝑥 =0, it is easy to observe that the limit of Equation 3 when t→ is rai: ai(K) = C + rai = rai…...……..(7) It is considered that only the first agent is learning and that the second agent is using a pure strategy then it will always generate the same reward for the first agent. In this case, the derivation above is sufficient to prove that Qai will monotonically increase or decrease towards rai, for any initial value of Qai. More particularly,
  • 4. Enhancement in Decision Making with Improved Performance by Multiagent Learning Algorithms DOI: 10.9790/0661-1803031825 www.iosrjournals.org 21 | Page the function is monotonically increasing if Qai (0) < rai and monotonically decreasing if Qai (0) > rai. The model build here will be used to determine how one learning strategy affects the action selection of another learning agent. IV. Joint Rewards Algorithm To extend Q-learning into a multi-agent system, two main challenges exist. One is how to deal with the huge Q-table with the increment of dimension in a multi-agent system. The other challenge is how to utilize the cooperative behavior among agents to obtain more efficient learning results. Joint rewards ensure agents to learn in a multiagent environment. The experiment results show the efficiency and well convergence of the algorithm. It is not enough for each agent to proceed selfishly in order to reach a globally optimal strategy in a multi-agent environment. It is not possible to accomplish with only one agent to accomplish a task. The section presents a joint reward learning algorithm which combined the Q-learning with the idea of joint rewards to meet above two challenges partly. In a multi-agent system, every agent needs to maintain a Q table which contains the information about its and others agent‟ states and actions, i.e. the situation of the whole environment. In order to encourage cooperative behavior among agents to get global optimal rewards, it should take account of the other agents‟ actions. Here, we use a simplified of vicarious rewards for feasible to realize. We call „joint reward‟[9], jri = b.pri + (1 - b). ……………………………………………………….(8) where pr is the personal reward of agent i, and the sum of rewards of other agents except i and 0 < b ≤ 1 is the personal weight, denoting how much importance is given to agent‟s personal reward compared to that of other agents. The improved update rule of Q-learning values of agent i can be formulated as Qi new (xi, ai) := (1 - α) Qi Old (xi, ai) + α (jri + β maxa Q(s, a))......................................(9) Algorithm 1: Multiagent learning using Joint Reward Algorithm 1. For each agent i (0 < i < m) the learning procedure as 2.for all a ∈ A and s ∈S initialize Q(s,a) = 0 3.let t = 0, = 0 4.Loop: 5.select action ai which has max Q value 6.execute action ai 7.receive an immediate reward 8.observe the new state s 9.calculate joint rewards as 10. jri = b.pri + (1 - b). // sum of rewards of other agents except i 11. update Q learning values of agent i as Qi new (xi, ai) := (1 - α) Qi Old (xi, ai) + α (jri + β maxa Q(s, a)) 12. end V. Strategy Sharing Algorithm One way for knowledge sharing in multi-agents is knowledge averaging. We can divide averaging in two general categories, simple averaging and weighted averaging. In simple averaging, which is called strategy sharing, each Q-learning agent learns from all of its teammates by taking the average of Q-tables[8]. = ……………………………(10) The SS treats all agents similarly, ignoring their level of knowledge. This method does not consider agents‟ different expertise levels. SS algorithm is described as: Algorithm 2 : Strategy Sharing Algorithm 1. initialize 2. while not EndOfLearning do 3. begin 4. if InIndividualLearning Mode then
  • 5. Enhancement in Decision Making with Improved Performance by Multiagent Learning Algorithms DOI: 10.9790/0661-1803031825 www.iosrjournals.org 22 | Page 5. begin {Individual Learning} 6. xi := FindCurrentState() 7. ai := SelectAction() 8. DoAction(ai) 9. ri := GetReward() 10. yi := GoToNextState() 11. v(yi) := Maxb ∈actionsQ(yi,b) 12. Qi new (xi,ai) := (1 - βi)Qi Old (xi,ai)+ βi(ri +γiV(yi) ) 13. end 14. else {Multiagent Learning} 15. begin 16. for j := 1 to n do 17. Qi new := 0 18. for j := 1 to n do 19. begin 20. Qi new := Qi new + 1/n ∑ Qj old // strategy sharing by simple averaging of Q tables 21. end VI. Experimental Setup Model design: Maximize the sale of products that depends on price of product, customer age and period of sale. These are the information available to each agent i.e. shop. So it becomes the state of environment. Final result is to maximize profit by increasing total sale of products[10]. Input Data set: We define the action set as the sale of possible product. i.e. A={p1,p2,p3…….p10}. Hence action a  A. State of the system is queue of customer in the particular month for the given shop agent. So state can be described as X(t) = { x1(t), x2(t),m } where x1  customer queue with age ==> { Y, M, O } i.e. young, middle and Old age customer x2  price of product queue ==> { H, M, L } i.e. High, Medium, Low m  month of product sale ==> { 1,2,3,4……..12 } In the system minimum, 108 states and actions are possible. The number of state-action increases as the number of transactions increases. For simplicity, it is assumed that single state for each transaction else the state space becomes infinitely large. Shop agent observes the queue and decides product i.e. action for each customer/state. After every sale reward is given to the agent. The table shows the snapshot of the dataset generated for single shop agent. Table 1: Snapshot of Dataset used In a particular season, the sale of one shop increases. With the help of cooperative learning, other shops learn about an increase in the sale & they can take necessary actions for their profit maximization[11]. At time 0, the process X(t) is observed and classified into one of the states in the possible set of states (denoted by S). After identification of state, the agent chooses a product action from A. If the process is in state i and agent chooses a A, then i. The process transition into state jS with probability Pij(a) ii. Conditional on the event that the next state is j. The time until next transition is a random variable with probability distribution Fij(./a) After the transition occurs, product sale action is chosen again by the agent and (i) and (ii) are repeated. TID Age Price Month Action Selected (Product) 1 Y L 1 P1,P2,P4 2 Y M 1 P2,P3 3 Y H 1 P3,P4 4 M L 1 P1,P2 5 M M 1 P1,P2,P3 6 M H 1 P4,P2 7 O L 1 P1,P3
  • 6. Enhancement in Decision Making with Improved Performance by Multiagent Learning Algorithms DOI: 10.9790/0661-1803031825 www.iosrjournals.org 23 | Page VII. Results The simulation results show the efficiency of the learning algorithm. In multiagent learning algorithms are applied to shop dataset of cloth, jewellery & footwear shop and the result analysis is done for a year, the specific number of products is purchased by particular customer age group. Shop agent will understand that in a year number of products is to be sold to the customers having different age group. Figure 1 shows the results of Strategy Sharing Algorithm Products Vs Customer Age Count. Figure 2 gives the results Joint Reward Algorithm Products Vs Customer Age Count. Figure 3 shows the results of Strategy Sharing Algorithm for Products Vs Quantity and Figure 4 gives the results of Joint Rewards Algorithm for Products Vs Quantity. Figure 1: Strategy Sharing Algorithm Products Vs Customer Age Count Figure 2: Joint Reward Algorithm Products Vs Customer Age Count
  • 7. Enhancement in Decision Making with Improved Performance by Multiagent Learning Algorithms DOI: 10.9790/0661-1803031825 www.iosrjournals.org 24 | Page Figure 3: Strategy Sharing Algorithm Products Vs Quantity Figure 4: Joint Rewards Algorithm Products Vs Quantity Joint Reward Learning algorithm gives precise results than Strategy Sharing learning algorithm and gives good predictions of the products. It gives the pattern of product sale with customer age group for a period. The Q function values are tabulated for obtaining some insights. Q tables show the best action (that is optimal the product) for different individual states. By knowing the Q function, the shop agent can compute best possible product for a given state that gives maximum profit to it. Single agent learning, multi-agent learning, cooperative learning and improved cooperative learning algorithms are implemented and results are compared. It has shown how a shop agent can effectively use reinforcement learning in setting products dynamically so as to maximize its profit matrix. It is believed that this is a promising approach for profit maximization in retail market environments with limited information. In multiagent learning the result analysis is done for a year, the specific number of products is purchased by particular customer age group. Shop agent will understand that in a year number of products is to be sold to the customers having the different age group. Joint Reward Learning algorithm gives precise results than Strategy Sharing learning algorithm and gives good predictions of the products. These products are combined together for the increase in sale. VIII. Conclusion Learning algorithms are best suitable for decision making. Multiagent learning has more knowledge and information available. In this method, sharing of information and policy is possible. Multiagent learning always performs better compared to single agent learning. However, multiagent learning is still lacking in proper communication between agents. Sharing of more knowledge and information is possible and all agents‟ knowledge is used equally, jointly solves the problem cooperatively is the future scope of this paper.
  • 8. Enhancement in Decision Making with Improved Performance by Multiagent Learning Algorithms DOI: 10.9790/0661-1803031825 www.iosrjournals.org 25 | Page References Journal Papers: [1] Adnan M. Al-Khatib “Cooperative Machine Learning Method” World of Computer Science and Information Technology Journal (WCSIT) ISSN: 2221-0741 Vol.1, No.9, 380-383, 2011. [2] Babak Nadjar Araabi, Sahar Mastoureshgh, and Majid Nili Ahmadabadi “A Study on Expertise of Agents and Its Effects on Cooperative Q-Learning” IEEE Transactions on Evolutionary Computation, vol:14, pp:23-57, 2010 [3] Liviu Panait Sean Luke “Cooperative Multi-Agent Learning: The State of the Art”, published in Journal of Autonomous Agents and Multi-Agent Systems Volume 11 Issue 3, pp. 387 – 434, 05. [4] M.V. Nagendra Prasad & Victor R. Lesser “Learning Situation-Specific Coordination in Cooperative Multi-agent Systems” in Journal of Autonomous Agents and Multi-Agent Systems, Volume 2 Issue 2, pp. 173 – 207, 1999. [5] Michael Kinney & Costas Tsatsoulis “Learning Communication Strategies in Multiagent Systems”, in Journal of Applied Intelligence, Volume 9 Issue 1, pp 71-91, 1998. [6] Ronen Brafman & Moshe Tennenholtz “Learning to Coordinate Efficiently: A Model-based Approach”, in Journal of Artificial Intelligence Research, Volume 19 Issue 1, pp. 11-23, 2003. Books: [7] Ethem Alpaydin “Introduction to Machine Learning” Second Edition, MIT Press by PHI. [8] Tom Mitchell “Machine Learning” McGraw Hill International Edition. Proceedings Papers: [9] Dr. Hamid R. Berenji David Vengerov “Learning, Cooperation, and Coordination in Multi-Agent Systems”, in Proceedings of 9th IEEE International Conference on Fuzzy Systems, 2000. [10] Jun-Yuan Tao, De-Sheng Li “Cooperative Strategy Learning In Multi-Agent Environment With Continuous State Space”, IEEE International Conference on Machine Learning and Cybernetics, pp.2107 – 2111, 2006. [11] La-mei GAO, Jun ZENG, Jie WU, Min LI “Cooperative Reinforcement Learning Algorithm to Distributed Power System based on Multi-Agent” 2009 3rd International Conference on Power Electronics Systems and Applications Digital Reference: K210509035