SlideShare a Scribd company logo
1 of 8
Download to read offline
I 
ICE-B 2008 
Proceedings of the 
International Conference on 
e-Business 
Porto, Portugal 
July 26 – 29, 2008 
Organized by 
INSTICC – Institute for Systems and Technologies of Information, Control 
and Communication 
Co-Sponsored by 
WfMC – Workflow Management Coalition – Process Thought Leadership 
Technically Co-Sponsored by 
IEEE SMC – Systems, Man, and Cybernetics Society
II 
Copyright © 2008 INSTICC – Institute for Systems and Technologies of 
Information, Control and Communication 
All rights reserved 
Edited by Joaquim Filipe, David A. Marca, Boris Shishkov and Marten van Sinderen 
Printed in Portugal 
ISBN: 978-989-8111-58-6 
Depósito Legal: 279019/08 
http://www.ice-b.org 
secretariat@ice-b.org
A NEW REINFORCEMENT SCHEME FOR STOCHASTIC 
LEARNING AUTOMATA 
Application to Automatic Control 
Florin Stoica, Emil M. Popa 
Computer Science Department, “Lucian Blaga” University, Str. Dr. Ion Ratiu 5-7, Sibiu, Romania 
florin.stoica@ulbsibiu.ro, emilm.popa@ulbsibiu.ro 
Iulian Pah 
Department of Sociology, “Babes-Bolyai” University, Bd.21 decembrie 1989, no.128-130, Cluj-Napoca, Romania 
i_pah@yahoo.com 
Keywords: Stochastic Learning Automata, Reinforcement Learning, Intelligent Vehicle Control, agents. 
Abstract: A Learning Automaton is a learning entity that learns the optimal action to use from its set of possible 
actions. It does this by performing actions toward an environment and analyzes the resulting response. The 
response, being both good and bad, results in behaviour change to the automaton (the automaton will learn 
based on this response). This behaviour change is often called reinforcement algorithm. The term stochastic 
emphasizes the adaptive nature of the automaton: environment output is stochastically related to the 
automaton action. The reinforcement scheme presented in this paper is shown to satisfy all necessary and 
sufficient conditions for absolute expediency for a stationary environment. An automaton using this scheme 
is guaranteed to „do better” at every time step than at the previous step. Some simulation results are 
presented, which prove that our algorithm converges to a solution faster than one previously defined in 
(Ünsal, 1999). Using Stochastic Learning Automata techniques, we introduce a decision/control method for 
intelligent vehicles, in infrastructure managed architecture. The aim is to design an automata system that can 
learn the best possible action based on the data received from on-board sensors or from the localization 
system of highway infrastructure. A multi-agent approach is used for effective implementation. Each 
vehicle has associated a “driver” agent, hosted on a JADE platform. 
1 INTRODUCTION 
The past and present research on vehicle control 
emphasizes the importance of new methodologies in 
order to obtain stable longitudinal and lateral 
control. In this paper, we consider stochastic 
learning automata as intelligent controller within our 
model for an Intelligent Vehicle Control System. 
An automaton is a machine or control 
mechanism designed to automatically follow a 
predetermined sequence of operations or respond to 
encoded instructions. The term stochastic 
emphasizes the adaptive nature of the automaton we 
describe here. The automaton described here does 
not follow predetermined rules, but adapts to 
changes in its environment. This adaptation is the 
result of the learning process (Barto, 2003). 
Learning is defined as any permanent change in 
behavior as a result of past experience, and a 
learning system should therefore have the ability to 
improve its behavior with time, toward a final goal. 
The stochastic automaton attempts a solution of 
the problem without any information on the optimal 
action (initially, equal probabilities are attached to 
all the actions). One action is selected at random, the 
response from the environment is observed, action 
probabilities are updated based on that response, and 
the procedure is repeated. A stochastic automaton 
acting as described to improve its performance is 
called a learning automaton. The algorithm that 
guarantees the desired learning process is called a 
reinforcement scheme (Moody, 2004). 
Mathematically, the environment is defined by a 
triple } , , { β α c where } ,..., , { 2 1 r α 
α = α α represents 
a finite set of actions being the input to the 
environment, { 1, 2} β = β β represents a binary 
response set, and c = {c1,c2 ,..., cr} is a set of penalty 
45
probabilities, where ci is the probability that action 
i α 
will result in an unfavorable response. Given that 
β (n) = 0 is a favorable outcome and β (n) = 1 is an 
unfavorable outcome at time instant 
n (n = 0, 1, 2, ...) , the element ci of c is defined 
mathematically by: 
ci = P(β (n) = 1|α (n) =αi ) i = 1, 2, ..., r 
The environment can further be split up in two 
types, stationary and nonstationary. In a stationary 
environment the penalty probabilities will never 
change. In a nonstationary environment the penalties 
will change over time. 
In order to describe the reinforcement schemes, 
is defined p(n) , a vector of action probabilities: 
pi (n) = P(α (n) =αi ), i = 1, r 
Updating action probabilities can be represented 
as follows: 
p(n +1) = T[ p(n),α (n),β (n)] 
where T is a mapping. This formula says the next 
action probability p(n +1) is updated based on the 
current probability p(n) , the input from the 
environment and the resulting action. If p(n +1) is a 
linear function of p(n) , the reinforcement scheme is 
said to be linear; otherwise it is termed nonlinear. 
2 REINFORCEMENT SCHEMES 
2.1 Performance Evaluation 
Consider a stationary random environment with 
penalty probabilities {c1,c2 ,...,cr} defined above. 
We define a quantity M(n) as the average 
penalty for a given action probability vector: 
Σ= 
M n ci pi n 
( ) = 
( ) 
r 
i 
1 
An automaton is absolutely expedient if the 
expected value of the average penalty at one 
iteration step is less than it was at the previous step 
for all steps: M(n +1) < M(n) for all n (Rivero, 
2003). 
The algorithm which we will present in this 
paper is derived from a nonlinear absolutely 
expedient reinforcement scheme presented by 
(Ünsal, 1999). 
2.2 Absolutely Expedient 
Reinforcement Schemes 
The reinforcement scheme is the basis of the 
learning process for learning automata. The general 
solution for absolutely expedient schemes was found 
by (Lakshmivarahan, 1973). 
A learning automaton may send its action to 
i multiple environments α 
at the same time. In that case, 
the action of the automaton results in a vector of 
responses from environments (or “teachers”). In a 
stationary N-teacher environment, if an automaton 
produced the action and the environment 
responses are j j N 
β i = 1,..., at time instant n , then 
the vector of action probabilities p(n) is updated as 
follows (Ünsal, 1999): 
⎤ 
( 1) ( ) 1 β φ ( ( )) 
Σ Σ 
≠ = 
= 
− ∗ ⎥⎦ 
⎡ 
⎢⎣ 
+ = + 
r 
j 
j i 
j 
N 
k 
k 
i i i p n 
N 
p n p n 
1 1 
⎤ 
1 1 β ψ ( ( )) 
Σ Σ 
≠ = 
= 
∗ ⎥⎦ 
⎡ 
− − 
⎢⎣ 
r 
j 
j i 
j 
N 
k 
k 
i p n 
N 1 1 
⎤ 
⎡ 
( 1) ( ) 1 ( ( )) 
⎤ 
⎡ 
+ − 
1 1 ( ( )) 
1 
1 
p n 
N 
p n 
N 
p n p n 
j 
N 
k 
k 
i 
j 
N 
k 
k 
j j i 
β ψ 
β φ 
∗ ⎥⎦ 
⎢⎣ 
+ ∗ ⎥⎦ 
⎢⎣ 
+ = − 
Σ 
Σ 
= 
= 
for all i j ≠ where the functions i φ 
and i ψ 
(1) 
satisfy 
the following conditions: 
p n 
p n 
1 r λ 
p n 
( ( )) 
... ( ( )) 
( ) 
( ( )) 
1 
( ) 
p n 
p n 
r 
φ φ 
= = = (2) 
p n 
p n 
1 r μ 
p n 
( ( )) 
... ( ( )) 
( ) 
( ( )) 
1 
( ) 
p n 
p n 
r 
ψ ψ 
= = = 
Σ 
r 
pi n j p n 
( ) + φ ( ( )) > 
0 (3) 
1 
≠ = 
j 
j i 
r 
Σ 
( ) ψ ( ( )) 1 (4) 
pi n j p n 
− < 
1 
≠ = 
j 
j i 
p j (n) + j ( p(n)) > 0 ψ (5) 
p j (n) − j ( p(n)) < 1 φ (6) 
for all j∈{1,..., r}{i} 
The conditions (3)-(6) ensure that 
0 < pk < 1, k = 1, r (Stoica, 2007). 
Theorem. If the functions λ ( p(n)) and μ ( p(n)) 
satisfy the following conditions: 
ICE-B 2008 - International Conference on e-Business 
46
A NEW REINFORCEMENT SCHEME FOR STOCHASTIC LEARNING AUTOMATA - Application to Automatic 
λ ( p(n)) ≤ 0 
μ ( p(n)) ≤ 0 (7) 
λ ( p(n)) +μ ( p(n)) < 0 
then the automaton with the reinforcement scheme 
in (1) is absolutely expedient in a stationary 
environment. 
The proof of this theorem can be found in (Baba, 
1984). 
3 A NEW NONLINEAR 
REINFORCEMENT SCHEME 
Because the above theorem is also valid for a single-teacher 
model, we can define a single environment 
response that is a function f of many teacher 
outputs. 
Thus, we can update the above algorithm as 
follows: 
p n p n f H n p n 
( 1) ( ) ( ( )) [1 ( )] 
+ = + ∗ − ∗ ∗ ∗ − − 
i i i 
− − ∗ − ∗ − 
f p n 
(1 ) ( ) [1 ( )] 
i 
θ 
θ δ 
p n p n f H n 
( 1) ( ) ( ( )) 
p n f p n 
+ = − ∗ − ∗ ∗ ∗ 
j j 
∗ + − ∗ − ∗ 
θ δ 
( ) (1 ) ( θ 
) ( ) 
j j 
(8) 
for all j ≠ i , i.e.: 
k ( p(n)) pk (n) ψ = −θ ∗ 
k ( p(n)) H(n) pk (n) φ = −θ ∗δ ∗ ∗ 
where learning parameters θ and δ are real values 
which satisfy: 
0 <θ < 1 and 0 <θ ∗δ <1. 
The function H is defined as follows: 
{ { 
⎩ ⎨ ⎧ 
p n 
( ) 
( ) min 1; max min ε 
= − 
, 
θ δ p n 
(1 ( )) 
∗ ∗ − 
H n 
i 
i 
⎫ 
⎪⎬ 
;0}} 
p n 
1 ( ) 
( ) 
, 1 ⎪⎭ 
⎞ 
⎟ ⎟ 
⎠ 
⎛ 
⎜ ⎜ 
⎝ 
− 
− 
∗ ∗ 
j j r 
≠ = 
j i 
j 
p n 
ε 
θ δ 
k φ 
Parameter ε is an arbitrarily small positive real 
number. 
Our reinforcement scheme differs from the one 
given in (Ünsal, 1999) by the definition of these 
two functions: H and . 
The proof that all the conditions of the 
reinforcement scheme (1) and theorem (7) are 
satisfied can be found in (Stoica, 2007). 
In conclusion, we state the algorithm given in 
equations (8) is absolutely expedient in a stationary 
environment. 
Control 
4 EXPERIMENTAL RESULTS 
4.1 Problem Formulation 
To show that our algorithm converges to a solution 
faster than the one given in (Ünsal, 1999), let us 
consider a simple example. Figure 1 illustrates a grid 
world in which a robot navigates. Shaded cells 
represent barriers. 
Figure 1: A grid world for robot navigation. 
The current position of the robot is marked by a 
circle. Navigation is done using four actions 
α = {N, S, E,W} , the actions denoting the four 
possible movements along the coordinate directions. 
Because in given situation there is a single 
optimal action, we stop the execution when the 
probability of the optimal action reaches a certain 
value (0.9999). 
4.2 Comparative Results 
We compared two reinforcement schemes using 
these four actions and two different initial 
conditions. 
Table 1: Convergence rates for a single optimal action of a 
4-action automaton (200 runs for each parameter set). 
Average number of steps to reach 
popt=0.9999 
4 actions with 
pi 
(0) 1/ 4, 
= 
1,4 
= 
i 
4 actions with 
(0) = 
0.0005, 
= 
0.9995 / 3 
opt 
p 
i≠opt 
p 
θ δ Ünsal’s 
Alg. 
New 
alg. 
Ünsal’s 
Alg. 
New 
alg. 
0.01 
1 644.84 633.96 921.20 905.18 
25 62.23 56.64 205.56 194.08 
50 11.13 8.73 351.67 340.27 
0.05 
1 136.99 130.41 202.96 198.25 
5 74.05 63.93 88.39 79.19 
10 24.74 20.09 103.21 92.83 
0.1 
1 70.81 63.09 105.12 99.20 
2.5 59.48 50.52 71.77 65.49 
5 23.05 19.51 59.06 54.08 
The data shown in Table 1 are the results of two 
different initial conditions where in first case all 
47
probabilities are initially the same and in second 
case the optimal action initially has a small 
probability value (0.0005), with only one action 
receiving reward (i.e., optimal action). 
Comparing values from corresponding columns, 
we conclude that our algorithm converges to a 
solution faster than the one given in (Ünsal, 1999). 
5 USING STOCHASTIC 
LEARNING AUTOMATA FOR 
INTELLIGENT VEHICLE 
CONTROL 
The task of creating intelligent systems that we can 
rely on is not trivial. In this section, we present a 
method for intelligent vehicle control, having as 
theoretical background Stochastic Learning 
Automata. We visualize the planning layer of an 
intelligent vehicle as an automaton (or automata 
group) in a nonstationary environment. We attempt 
to find a way to make intelligent decisions here, 
having as objectives conformance with traffic 
parameters imposed by the highway infrastructure 
(management system and global control), and 
improved safety by minimizing crash risk. 
The aim here is to design an automata system 
that can learn the best possible action based on the 
data received from on-board sensors, of from 
roadside-to-vehicle communications. For our model, 
we assume that an intelligent vehicle is capable of 
two sets of lateral and longitudinal actions. Lateral 
actions are LEFT (shift to left lane), RIGHT (shift to 
right lane) and LINE_OK (stay in current lane). 
Longitudinal actions are ACC (accelerate), DEC 
(decelerate) and SPEED_OK (keep current speed). 
An autonomous vehicle must be able to “sense” the 
environment around itself. Therefore, we assume 
that there are four different sensors modules on 
board the vehicle (the headway module, two side 
modules and a speed module), in order to detect the 
presence of a vehicle traveling in front of the vehicle 
or in the immediately adjacent lane and to know the 
current speed of the vehicle. 
These sensor modules evaluate the information 
received from the on-board sensors or from the 
highway infrastructure in the light of the current 
automata actions, and send a response to the 
automata. Our basic model for planning and 
coordination of lane changing and speed control is 
shown in Figure 2. 
Localization 
System 
yes 
β1 = 0 
no 
yes 
β2 = 0 
Frontal 
Detection 
Left 
Detection 
Right 
Detection 
Speed 
Detection 
F1 
F2 
no 
α1 
α 2 
Physical Environment 
SLA 
Environment 
Longitudina 
l 
Lateral 
Automaton 
α1 
α 2 
Planning Layer 
Auto 
vehicle 
β1 
β2 
Highway 
α1 
Regulation 
Buffer 
─ 
α 2 
Regulation 
Buffer 
─ 
action 
Figure 2: The model of the Intelligent Vehicle Control 
System. 
The response from physical environment is a 
combination of outputs from the sensor modules. 
Because an input parameter for the decision blocks 
is the action chosen by the stochastic automaton, is 
necessary to use two distinct functions F1 and F2 
for mapping the outputs of decision blocks in inputs 
for the two learning automata, namely the 
longitudinal automaton and respectively the lateral 
automaton. 
After updating the action probability vectors in 
both learning automata, using the nonlinear 
reinforcement scheme presented in section 3, the 
outputs from stochastic automata are transmitted to 
the regulation layer. The regulation layer handles 
the actions received from the two automata in a 
distinct manner, using for each of them a regulation 
ICE-B 2008 - International Conference on e-Business 
48
A NEW REINFORCEMENT SCHEME FOR STOCHASTIC LEARNING AUTOMATA - Application to Automatic 
buffer. If an action received was rewarded, it will be 
introduced in the regulation buffer of the 
corresponding automaton, else in buffer will be 
introduced a certain value which denotes a penalized 
action by the physical environment. The regulation 
layer does not carry out the action chosen 
immediately; instead, it carries out an action only if 
it is recommended k times consecutively by the 
automaton, where k is the length of the regulation 
buffer. After an action is executed, the action 
probability vector is initialized to 
1 , where r is the 
r 
number of actions. When an action is executed, 
regulation buffer is initialized also. 
6 SENSOR MODULES 
The four teacher modules mentioned above are 
decision blocks that calculate the response 
(reward/penalty), based on the last chosen action of 
automaton. Table 2 describes the output of decision 
blocks for side sensors. 
Table 2: Outputs from the Left/Right Sensor Module. 
Left/Right Sensor Module 
Actions 
Vehicle in sensor 
range or no adjacent 
lane 
No vehicle in 
sensor range 
and adjacent 
lane exists 
LINE_OK 0/0 0/0 
LEFT 1/0 0/0 
RIGHT 0/1 0/0 
Table 3: Outputs from the Headway Module. 
Headway Sensor Module 
Actions 
Vehicle in range 
(at a close frontal 
distance) 
No vehicle 
in range 
LINE_OK 1 0 
LEFT 0 0 
RIGHT 0 0 
SPEED_OK 1 0 
ACC 1 0 
DEC 0* 0 
As seen in Table 2, a penalty response is 
received from the left sensor module when the 
action is LEFT and there is a vehicle in the left or 
the vehicle is already traveling on the leftmost lane. 
There is a similar situation for the right sensor 
module. 
The Headway (Frontal) Module is defined as 
shown in Table 3. If there is a vehicle at a close 
Control 
distance (< admissible distance), a penalty response 
is sent to the automaton for actions LINE_OK, 
SPEED_OK and ACC. All other actions (LEFT, 
RIGHT, DEC) are encouraged, because they may 
serve to avoid a collision. 
The Speed Module compares the actual speed 
with the desired speed, and based on the action 
choosed send a feedback to the longitudinal 
automaton. 
Table 4: Outputs from the Speed Module. 
Speed Sensor Module 
Actions Speed: 
too slow 
Acceptable 
speed 
Speed: 
too fast 
SPEED_OK 1 0 1 
ACC 0 0 1 
DEC 1 0 0 
The reward response indicated by 0* (from the 
Headway Sensor Module) is different than the 
normal reward response, indicated by 0: this reward 
response has a higher priority and must override a 
possible penalty from other modules. 
7 A MULTI-AGENT SYSTEM 
FOR INTELLIGENT VEHICLE 
CONTROL 
In this section is described an implementation of a 
simulator for the Intelligent Vehicle Control System, 
in a multi-agent approach. The entire system was 
implemented in Java, and is based on JADE 
platform (Bigus, 2001). 
In figure 3 is showed the class diagram of the 
simulator. Each vehicle has associated a JADE agent 
(DriverAgent), responsible for the intelligent control. 
“Driving” means a continuous learning process, 
sustained by the two stochastic learning automata, 
namely the longitudinal automaton and respectively 
the lateral automaton. 
The response of the physical environment is a 
combination of the outputs of all four sensor 
modules. The implementation of this combination 
for each automaton (longitudinal respectively 
lateral) is showed in figure 4 (the value 0* was 
substituted by 2). 
49
Figure 3: The class diagram of the simulator. 
// Longitudinal Automaton 
public double reward(int action){ 
int combine; 
combine=Math.max(speedModule(action), 
frontModule(action)); 
if (combine = = 2) combine = 0; 
return combine; 
} 
// Lateral Automaton 
public double reward(int action){ 
int combine; 
combine=Math.max( 
leftRightModule(action), 
frontModule(action)); 
return combine; 
} 
Figure 4: The physical environment response. 
8 CONCLUSIONS 
Reinforcement learning has attracted rapidly 
increasing interest in the machine learning and 
artificial intelligence communities. Its promise is 
beguiling - a way of programming agents by reward 
and punishment without needing to specify how the 
task (i.e., behavior) is to be achieved. Reinforcement 
learning allows, at least in principle, to bypass the 
problems of building an explicit model of the 
behavior to be synthesized and its counterpart, a 
meaningful learning base (supervised learning). 
The reinforcement scheme presented in this 
paper satisfies all necessary and sufficient conditions 
for absolute expediency in a stationary environment 
and the nonlinear algorithm based on this scheme is 
found to converge to the ”optimal” action faster than 
nonlinear schemes previously defined in (Ünsal, 
1999). 
Using this new reinforcement scheme was 
developed a simulator for an Intelligent Vehicle 
Control System, in a multi-agent approach. The 
entire system was implemented in Java, and is based 
on JADE platform 
REFERENCES 
Baba, N., 1984. New Topics in Learning Automata: 
Theory and Applications, Lecture Notes in Control 
and Information Sciences Berlin, Germany: Springer- 
Verlag. 
Barto, A., Mahadevan, S., 2003. Recent advances in 
hierarchical reinforcement learning, Discrete-Event 
Systems journal, Special issue on Reinforcement 
Learning. 
Bigus, J. P., Bigus, J., 2001. Constructing Intelligent 
Agents using Java, 2nd ed., John Wiley & Sons, Inc. 
Buffet, O., Dutech, A., Charpillet, F., 2001. Incremental 
reinforcement learning for designing multi-agent 
systems, In J. P. Müller, E. Andre, S. Sen, and C. 
Frasson, editors, Proceedings of the Fifth 
International Conference onAutonomous Agents, pp. 
31–32, Montreal, Canada, ACM Press. 
Lakshmivarahan, S., Thathachar, M.A.L., 1973. 
Absolutely Expedient Learning Algorithms for 
Stochastic Automata, IEEE Transactions on Systems, 
Man and Cybernetics, vol. SMC-6, pp. 281-286. 
Moody, J., Liu, Y., Saffell, M., Youn, K., 2004. Stochastic 
direct reinforcement: Application to simple games 
with recurrence, In Proceedings of Artificial 
Multiagent Learning. Papers from the 2004 AAAI Fall 
Symposium,Technical Report FS-04-02. 
Narendra, K. S., Thathachar, M. A. L., 1989. Learning 
Automata: an introduction, Prentice-Hall. 
Rivero, C., 2003. Characterization of the absolutely 
expedient learning algorithms for stochastic automata 
in a non-discrete space of actions, ESANN'2003 
proceedings - European Symposium on Artificial 
Neural Networks Bruges (Belgium), ISBN 2-930307- 
03-X, pp. 307-312 
Stoica, F., Popa, E. M., 2007. An Absolutely Expedient 
Learning Algorithm for Stochastic Automata, WSEAS 
Transactions on Computers, Issue 2, Volume 6, ISSN 
1109-2750, pp. 229-235. 
Sutton, R., Barto, A., 1998. Reinforcement learning: An 
introduction, MIT-press, Cambridge, MA. 
Ünsal, C., Kachroo, P., Bay, J. S., 1999. Multiple 
Stochastic Learning Automata for Vehicle Path 
Control in an Automated Highway System, IEEE 
Transactions on Systems, Man, and Cybernetics -part 
A: systems and humans, vol. 29, no. 1, january 1999. 
ICE-B 2008 - International Conference on e-Business 
50

More Related Content

What's hot

DriP PSO- A fast and inexpensive PSO for drifting problem spaces
DriP PSO- A fast and inexpensive PSO for drifting problem spacesDriP PSO- A fast and inexpensive PSO for drifting problem spaces
DriP PSO- A fast and inexpensive PSO for drifting problem spacesZubin Bhuyan
 
HMM-Based Speech Synthesis
HMM-Based Speech SynthesisHMM-Based Speech Synthesis
HMM-Based Speech SynthesisIJMER
 
JAISTサマースクール2016「脳を知るための理論」講義04 Neural Networks and Neuroscience
JAISTサマースクール2016「脳を知るための理論」講義04 Neural Networks and Neuroscience JAISTサマースクール2016「脳を知るための理論」講義04 Neural Networks and Neuroscience
JAISTサマースクール2016「脳を知るための理論」講義04 Neural Networks and Neuroscience hirokazutanaka
 
Note on fourier transform of unit step function
Note on fourier transform of unit step functionNote on fourier transform of unit step function
Note on fourier transform of unit step functionAnand Krishnamoorthy
 
Computational Motor Control: State Space Models for Motor Adaptation (JAIST s...
Computational Motor Control: State Space Models for Motor Adaptation (JAIST s...Computational Motor Control: State Space Models for Motor Adaptation (JAIST s...
Computational Motor Control: State Space Models for Motor Adaptation (JAIST s...hirokazutanaka
 
JAISTサマースクール2016「脳を知るための理論」講義02 Synaptic Learning rules
JAISTサマースクール2016「脳を知るための理論」講義02 Synaptic Learning rulesJAISTサマースクール2016「脳を知るための理論」講義02 Synaptic Learning rules
JAISTサマースクール2016「脳を知るための理論」講義02 Synaptic Learning ruleshirokazutanaka
 
JAISTサマースクール2016「脳を知るための理論」講義03 Network Dynamics
JAISTサマースクール2016「脳を知るための理論」講義03 Network DynamicsJAISTサマースクール2016「脳を知るための理論」講義03 Network Dynamics
JAISTサマースクール2016「脳を知るための理論」講義03 Network Dynamicshirokazutanaka
 
Modelling and Simulations
Modelling and SimulationsModelling and Simulations
Modelling and SimulationsMinjie Lu
 
Computational Motor Control: Kinematics & Dynamics (JAIST summer course)
Computational Motor Control: Kinematics & Dynamics (JAIST summer course)Computational Motor Control: Kinematics & Dynamics (JAIST summer course)
Computational Motor Control: Kinematics & Dynamics (JAIST summer course)hirokazutanaka
 
Modelling and Simulations Minor Project
Modelling and Simulations Minor ProjectModelling and Simulations Minor Project
Modelling and Simulations Minor ProjectMinjie Lu
 
Signals and systems-3
Signals and systems-3Signals and systems-3
Signals and systems-3sarun soman
 
A COMPARISON OF PARTICLE SWARM OPTIMIZATION AND DIFFERENTIAL EVOLUTION
A COMPARISON OF PARTICLE SWARM OPTIMIZATION AND DIFFERENTIAL EVOLUTIONA COMPARISON OF PARTICLE SWARM OPTIMIZATION AND DIFFERENTIAL EVOLUTION
A COMPARISON OF PARTICLE SWARM OPTIMIZATION AND DIFFERENTIAL EVOLUTIONijsc
 
Design and Implementation of Parallel and Randomized Approximation Algorithms
Design and Implementation of Parallel and Randomized Approximation AlgorithmsDesign and Implementation of Parallel and Randomized Approximation Algorithms
Design and Implementation of Parallel and Randomized Approximation AlgorithmsAjay Bidyarthy
 
Adaptive pi based on direct synthesis nishant
Adaptive pi based on direct synthesis nishantAdaptive pi based on direct synthesis nishant
Adaptive pi based on direct synthesis nishantNishant Parikh
 
single degree of freedom systems forced vibrations
single degree of freedom systems forced vibrations single degree of freedom systems forced vibrations
single degree of freedom systems forced vibrations KESHAV
 
STABILITY ANALYSIS AND CONTROL OF A 3-D AUTONOMOUS AI-YUAN-ZHI-HAO HYPERCHAOT...
STABILITY ANALYSIS AND CONTROL OF A 3-D AUTONOMOUS AI-YUAN-ZHI-HAO HYPERCHAOT...STABILITY ANALYSIS AND CONTROL OF A 3-D AUTONOMOUS AI-YUAN-ZHI-HAO HYPERCHAOT...
STABILITY ANALYSIS AND CONTROL OF A 3-D AUTONOMOUS AI-YUAN-ZHI-HAO HYPERCHAOT...ijscai
 
AN IMPROVED MULTIMODAL PSO METHOD BASED ON ELECTROSTATIC INTERACTION USING NN...
AN IMPROVED MULTIMODAL PSO METHOD BASED ON ELECTROSTATIC INTERACTION USING NN...AN IMPROVED MULTIMODAL PSO METHOD BASED ON ELECTROSTATIC INTERACTION USING NN...
AN IMPROVED MULTIMODAL PSO METHOD BASED ON ELECTROSTATIC INTERACTION USING NN...ijaia
 
The feedback-control-for-distributed-systems
The feedback-control-for-distributed-systemsThe feedback-control-for-distributed-systems
The feedback-control-for-distributed-systemsCemal Ardil
 

What's hot (20)

DriP PSO- A fast and inexpensive PSO for drifting problem spaces
DriP PSO- A fast and inexpensive PSO for drifting problem spacesDriP PSO- A fast and inexpensive PSO for drifting problem spaces
DriP PSO- A fast and inexpensive PSO for drifting problem spaces
 
HMM-Based Speech Synthesis
HMM-Based Speech SynthesisHMM-Based Speech Synthesis
HMM-Based Speech Synthesis
 
JAISTサマースクール2016「脳を知るための理論」講義04 Neural Networks and Neuroscience
JAISTサマースクール2016「脳を知るための理論」講義04 Neural Networks and Neuroscience JAISTサマースクール2016「脳を知るための理論」講義04 Neural Networks and Neuroscience
JAISTサマースクール2016「脳を知るための理論」講義04 Neural Networks and Neuroscience
 
Note on fourier transform of unit step function
Note on fourier transform of unit step functionNote on fourier transform of unit step function
Note on fourier transform of unit step function
 
Computational Motor Control: State Space Models for Motor Adaptation (JAIST s...
Computational Motor Control: State Space Models for Motor Adaptation (JAIST s...Computational Motor Control: State Space Models for Motor Adaptation (JAIST s...
Computational Motor Control: State Space Models for Motor Adaptation (JAIST s...
 
JAISTサマースクール2016「脳を知るための理論」講義02 Synaptic Learning rules
JAISTサマースクール2016「脳を知るための理論」講義02 Synaptic Learning rulesJAISTサマースクール2016「脳を知るための理論」講義02 Synaptic Learning rules
JAISTサマースクール2016「脳を知るための理論」講義02 Synaptic Learning rules
 
D010341722
D010341722D010341722
D010341722
 
JAISTサマースクール2016「脳を知るための理論」講義03 Network Dynamics
JAISTサマースクール2016「脳を知るための理論」講義03 Network DynamicsJAISTサマースクール2016「脳を知るための理論」講義03 Network Dynamics
JAISTサマースクール2016「脳を知るための理論」講義03 Network Dynamics
 
Modelling and Simulations
Modelling and SimulationsModelling and Simulations
Modelling and Simulations
 
Computational Motor Control: Kinematics & Dynamics (JAIST summer course)
Computational Motor Control: Kinematics & Dynamics (JAIST summer course)Computational Motor Control: Kinematics & Dynamics (JAIST summer course)
Computational Motor Control: Kinematics & Dynamics (JAIST summer course)
 
Modelling and Simulations Minor Project
Modelling and Simulations Minor ProjectModelling and Simulations Minor Project
Modelling and Simulations Minor Project
 
Signals and systems-3
Signals and systems-3Signals and systems-3
Signals and systems-3
 
A COMPARISON OF PARTICLE SWARM OPTIMIZATION AND DIFFERENTIAL EVOLUTION
A COMPARISON OF PARTICLE SWARM OPTIMIZATION AND DIFFERENTIAL EVOLUTIONA COMPARISON OF PARTICLE SWARM OPTIMIZATION AND DIFFERENTIAL EVOLUTION
A COMPARISON OF PARTICLE SWARM OPTIMIZATION AND DIFFERENTIAL EVOLUTION
 
Design and Implementation of Parallel and Randomized Approximation Algorithms
Design and Implementation of Parallel and Randomized Approximation AlgorithmsDesign and Implementation of Parallel and Randomized Approximation Algorithms
Design and Implementation of Parallel and Randomized Approximation Algorithms
 
Adaptive pi based on direct synthesis nishant
Adaptive pi based on direct synthesis nishantAdaptive pi based on direct synthesis nishant
Adaptive pi based on direct synthesis nishant
 
single degree of freedom systems forced vibrations
single degree of freedom systems forced vibrations single degree of freedom systems forced vibrations
single degree of freedom systems forced vibrations
 
STABILITY ANALYSIS AND CONTROL OF A 3-D AUTONOMOUS AI-YUAN-ZHI-HAO HYPERCHAOT...
STABILITY ANALYSIS AND CONTROL OF A 3-D AUTONOMOUS AI-YUAN-ZHI-HAO HYPERCHAOT...STABILITY ANALYSIS AND CONTROL OF A 3-D AUTONOMOUS AI-YUAN-ZHI-HAO HYPERCHAOT...
STABILITY ANALYSIS AND CONTROL OF A 3-D AUTONOMOUS AI-YUAN-ZHI-HAO HYPERCHAOT...
 
basal-ganglia
basal-gangliabasal-ganglia
basal-ganglia
 
AN IMPROVED MULTIMODAL PSO METHOD BASED ON ELECTROSTATIC INTERACTION USING NN...
AN IMPROVED MULTIMODAL PSO METHOD BASED ON ELECTROSTATIC INTERACTION USING NN...AN IMPROVED MULTIMODAL PSO METHOD BASED ON ELECTROSTATIC INTERACTION USING NN...
AN IMPROVED MULTIMODAL PSO METHOD BASED ON ELECTROSTATIC INTERACTION USING NN...
 
The feedback-control-for-distributed-systems
The feedback-control-for-distributed-systemsThe feedback-control-for-distributed-systems
The feedback-control-for-distributed-systems
 

Viewers also liked

Building a new CTL model checker using Web Services
Building a new CTL model checker using Web ServicesBuilding a new CTL model checker using Web Services
Building a new CTL model checker using Web Servicesinfopapers
 
A new co-mutation genetic operator
A new co-mutation genetic operatorA new co-mutation genetic operator
A new co-mutation genetic operatorinfopapers
 
Generic Reinforcement Schemes and Their Optimization
Generic Reinforcement Schemes and Their OptimizationGeneric Reinforcement Schemes and Their Optimization
Generic Reinforcement Schemes and Their Optimizationinfopapers
 
Using the Breeder GA to Optimize a Multiple Regression Analysis Model
Using the Breeder GA to Optimize a Multiple Regression Analysis ModelUsing the Breeder GA to Optimize a Multiple Regression Analysis Model
Using the Breeder GA to Optimize a Multiple Regression Analysis Modelinfopapers
 
Intelligent agents in ontology-based applications
Intelligent agents in ontology-based applicationsIntelligent agents in ontology-based applications
Intelligent agents in ontology-based applicationsinfopapers
 
Optimization of Complex SVM Kernels Using a Hybrid Algorithm Based on Wasp Be...
Optimization of Complex SVM Kernels Using a Hybrid Algorithm Based on Wasp Be...Optimization of Complex SVM Kernels Using a Hybrid Algorithm Based on Wasp Be...
Optimization of Complex SVM Kernels Using a Hybrid Algorithm Based on Wasp Be...infopapers
 
Building a Web-bridge for JADE agents
Building a Web-bridge for JADE agentsBuilding a Web-bridge for JADE agents
Building a Web-bridge for JADE agentsinfopapers
 
A general frame for building optimal multiple SVM kernels
A general frame for building optimal multiple SVM kernelsA general frame for building optimal multiple SVM kernels
A general frame for building optimal multiple SVM kernelsinfopapers
 
Implementing an ATL Model Checker tool using Relational Algebra concepts
Implementing an ATL Model Checker tool using Relational Algebra conceptsImplementing an ATL Model Checker tool using Relational Algebra concepts
Implementing an ATL Model Checker tool using Relational Algebra conceptsinfopapers
 
Deliver Dynamic and Interactive Web Content in J2EE Applications
Deliver Dynamic and Interactive Web Content in J2EE ApplicationsDeliver Dynamic and Interactive Web Content in J2EE Applications
Deliver Dynamic and Interactive Web Content in J2EE Applicationsinfopapers
 
Modeling the Broker Behavior Using a BDI Agent
Modeling the Broker Behavior Using a BDI AgentModeling the Broker Behavior Using a BDI Agent
Modeling the Broker Behavior Using a BDI Agentinfopapers
 
Algebraic Approach to Implementing an ATL Model Checker
Algebraic Approach to Implementing an ATL Model CheckerAlgebraic Approach to Implementing an ATL Model Checker
Algebraic Approach to Implementing an ATL Model Checkerinfopapers
 
An Executable Actor Model in Abstract State Machine Language
An Executable Actor Model in Abstract State Machine LanguageAn Executable Actor Model in Abstract State Machine Language
An Executable Actor Model in Abstract State Machine Languageinfopapers
 
A Distributed CTL Model Checker
A Distributed CTL Model CheckerA Distributed CTL Model Checker
A Distributed CTL Model Checkerinfopapers
 
An AsmL model for an Intelligent Vehicle Control System
An AsmL model for an Intelligent Vehicle Control SystemAn AsmL model for an Intelligent Vehicle Control System
An AsmL model for an Intelligent Vehicle Control Systeminfopapers
 
Using genetic algorithms and simulation as decision support in marketing stra...
Using genetic algorithms and simulation as decision support in marketing stra...Using genetic algorithms and simulation as decision support in marketing stra...
Using genetic algorithms and simulation as decision support in marketing stra...infopapers
 

Viewers also liked (16)

Building a new CTL model checker using Web Services
Building a new CTL model checker using Web ServicesBuilding a new CTL model checker using Web Services
Building a new CTL model checker using Web Services
 
A new co-mutation genetic operator
A new co-mutation genetic operatorA new co-mutation genetic operator
A new co-mutation genetic operator
 
Generic Reinforcement Schemes and Their Optimization
Generic Reinforcement Schemes and Their OptimizationGeneric Reinforcement Schemes and Their Optimization
Generic Reinforcement Schemes and Their Optimization
 
Using the Breeder GA to Optimize a Multiple Regression Analysis Model
Using the Breeder GA to Optimize a Multiple Regression Analysis ModelUsing the Breeder GA to Optimize a Multiple Regression Analysis Model
Using the Breeder GA to Optimize a Multiple Regression Analysis Model
 
Intelligent agents in ontology-based applications
Intelligent agents in ontology-based applicationsIntelligent agents in ontology-based applications
Intelligent agents in ontology-based applications
 
Optimization of Complex SVM Kernels Using a Hybrid Algorithm Based on Wasp Be...
Optimization of Complex SVM Kernels Using a Hybrid Algorithm Based on Wasp Be...Optimization of Complex SVM Kernels Using a Hybrid Algorithm Based on Wasp Be...
Optimization of Complex SVM Kernels Using a Hybrid Algorithm Based on Wasp Be...
 
Building a Web-bridge for JADE agents
Building a Web-bridge for JADE agentsBuilding a Web-bridge for JADE agents
Building a Web-bridge for JADE agents
 
A general frame for building optimal multiple SVM kernels
A general frame for building optimal multiple SVM kernelsA general frame for building optimal multiple SVM kernels
A general frame for building optimal multiple SVM kernels
 
Implementing an ATL Model Checker tool using Relational Algebra concepts
Implementing an ATL Model Checker tool using Relational Algebra conceptsImplementing an ATL Model Checker tool using Relational Algebra concepts
Implementing an ATL Model Checker tool using Relational Algebra concepts
 
Deliver Dynamic and Interactive Web Content in J2EE Applications
Deliver Dynamic and Interactive Web Content in J2EE ApplicationsDeliver Dynamic and Interactive Web Content in J2EE Applications
Deliver Dynamic and Interactive Web Content in J2EE Applications
 
Modeling the Broker Behavior Using a BDI Agent
Modeling the Broker Behavior Using a BDI AgentModeling the Broker Behavior Using a BDI Agent
Modeling the Broker Behavior Using a BDI Agent
 
Algebraic Approach to Implementing an ATL Model Checker
Algebraic Approach to Implementing an ATL Model CheckerAlgebraic Approach to Implementing an ATL Model Checker
Algebraic Approach to Implementing an ATL Model Checker
 
An Executable Actor Model in Abstract State Machine Language
An Executable Actor Model in Abstract State Machine LanguageAn Executable Actor Model in Abstract State Machine Language
An Executable Actor Model in Abstract State Machine Language
 
A Distributed CTL Model Checker
A Distributed CTL Model CheckerA Distributed CTL Model Checker
A Distributed CTL Model Checker
 
An AsmL model for an Intelligent Vehicle Control System
An AsmL model for an Intelligent Vehicle Control SystemAn AsmL model for an Intelligent Vehicle Control System
An AsmL model for an Intelligent Vehicle Control System
 
Using genetic algorithms and simulation as decision support in marketing stra...
Using genetic algorithms and simulation as decision support in marketing stra...Using genetic algorithms and simulation as decision support in marketing stra...
Using genetic algorithms and simulation as decision support in marketing stra...
 

Similar to A new Reinforcement Scheme for Stochastic Learning Automata

A New Nonlinear Reinforcement Scheme for Stochastic Learning Automata
A New Nonlinear Reinforcement Scheme for Stochastic Learning AutomataA New Nonlinear Reinforcement Scheme for Stochastic Learning Automata
A New Nonlinear Reinforcement Scheme for Stochastic Learning Automatainfopapers
 
Automatic control based on Wasp Behavioral Model and Stochastic Learning Auto...
Automatic control based on Wasp Behavioral Model and Stochastic Learning Auto...Automatic control based on Wasp Behavioral Model and Stochastic Learning Auto...
Automatic control based on Wasp Behavioral Model and Stochastic Learning Auto...infopapers
 
ADAPTIVE CONTROL AND SYNCHRONIZATION OF SPROTT-I SYSTEM WITH UNKNOWN PARAMETERS
ADAPTIVE CONTROL AND SYNCHRONIZATION OF SPROTT-I SYSTEM WITH UNKNOWN PARAMETERSADAPTIVE CONTROL AND SYNCHRONIZATION OF SPROTT-I SYSTEM WITH UNKNOWN PARAMETERS
ADAPTIVE CONTROL AND SYNCHRONIZATION OF SPROTT-I SYSTEM WITH UNKNOWN PARAMETERSijscai
 
ACTIVE CONTROLLER DESIGN FOR THE HYBRID SYNCHRONIZATION OF HYPERCHAOTIC ZHEN...
ACTIVE CONTROLLER DESIGN FOR THE HYBRID  SYNCHRONIZATION OF HYPERCHAOTIC ZHEN...ACTIVE CONTROLLER DESIGN FOR THE HYBRID  SYNCHRONIZATION OF HYPERCHAOTIC ZHEN...
ACTIVE CONTROLLER DESIGN FOR THE HYBRID SYNCHRONIZATION OF HYPERCHAOTIC ZHEN...ijscai
 
ADAPTIVE STABILIZATION AND SYNCHRONIZATION OF HYPERCHAOTIC QI SYSTEM
ADAPTIVE STABILIZATION AND SYNCHRONIZATION OF HYPERCHAOTIC QI SYSTEM ADAPTIVE STABILIZATION AND SYNCHRONIZATION OF HYPERCHAOTIC QI SYSTEM
ADAPTIVE STABILIZATION AND SYNCHRONIZATION OF HYPERCHAOTIC QI SYSTEM cseij
 
Adaptive Stabilization and Synchronization of Hyperchaotic QI System
Adaptive Stabilization and Synchronization of Hyperchaotic QI SystemAdaptive Stabilization and Synchronization of Hyperchaotic QI System
Adaptive Stabilization and Synchronization of Hyperchaotic QI SystemCSEIJJournal
 
safe and efficient off policy reinforcement learning
safe and efficient off policy reinforcement learningsafe and efficient off policy reinforcement learning
safe and efficient off policy reinforcement learningRyo Iwaki
 
THE ACTIVE CONTROLLER DESIGN FOR ACHIEVING GENERALIZED PROJECTIVE SYNCHRONIZA...
THE ACTIVE CONTROLLER DESIGN FOR ACHIEVING GENERALIZED PROJECTIVE SYNCHRONIZA...THE ACTIVE CONTROLLER DESIGN FOR ACHIEVING GENERALIZED PROJECTIVE SYNCHRONIZA...
THE ACTIVE CONTROLLER DESIGN FOR ACHIEVING GENERALIZED PROJECTIVE SYNCHRONIZA...ijait
 
Adaptive Controller and Synchronizer Design for Hyperchaotic Zhou System with...
Adaptive Controller and Synchronizer Design for Hyperchaotic Zhou System with...Adaptive Controller and Synchronizer Design for Hyperchaotic Zhou System with...
Adaptive Controller and Synchronizer Design for Hyperchaotic Zhou System with...Zac Darcy
 
GLOBAL CHAOS SYNCHRONIZATION OF HYPERCHAOTIC QI AND HYPERCHAOTIC JHA SYSTEMS ...
GLOBAL CHAOS SYNCHRONIZATION OF HYPERCHAOTIC QI AND HYPERCHAOTIC JHA SYSTEMS ...GLOBAL CHAOS SYNCHRONIZATION OF HYPERCHAOTIC QI AND HYPERCHAOTIC JHA SYSTEMS ...
GLOBAL CHAOS SYNCHRONIZATION OF HYPERCHAOTIC QI AND HYPERCHAOTIC JHA SYSTEMS ...ijistjournal
 
GLOBAL CHAOS SYNCHRONIZATION OF HYPERCHAOTIC QI AND HYPERCHAOTIC JHA SYSTEMS ...
GLOBAL CHAOS SYNCHRONIZATION OF HYPERCHAOTIC QI AND HYPERCHAOTIC JHA SYSTEMS ...GLOBAL CHAOS SYNCHRONIZATION OF HYPERCHAOTIC QI AND HYPERCHAOTIC JHA SYSTEMS ...
GLOBAL CHAOS SYNCHRONIZATION OF HYPERCHAOTIC QI AND HYPERCHAOTIC JHA SYSTEMS ...ijistjournal
 
Projective and hybrid projective synchronization of 4-D hyperchaotic system v...
Projective and hybrid projective synchronization of 4-D hyperchaotic system v...Projective and hybrid projective synchronization of 4-D hyperchaotic system v...
Projective and hybrid projective synchronization of 4-D hyperchaotic system v...TELKOMNIKA JOURNAL
 
ANTI-SYNCHRONIZATION OF HYPERCHAOTIC PANG AND HYPERCHAOTIC WANG-CHEN SYSTEMS ...
ANTI-SYNCHRONIZATION OF HYPERCHAOTIC PANG AND HYPERCHAOTIC WANG-CHEN SYSTEMS ...ANTI-SYNCHRONIZATION OF HYPERCHAOTIC PANG AND HYPERCHAOTIC WANG-CHEN SYSTEMS ...
ANTI-SYNCHRONIZATION OF HYPERCHAOTIC PANG AND HYPERCHAOTIC WANG-CHEN SYSTEMS ...ijctcm
 
ACTIVE CONTROLLER DESIGN FOR THE HYBRID SYNCHRONIZATION OF HYPERCHAOTIC XU AN...
ACTIVE CONTROLLER DESIGN FOR THE HYBRID SYNCHRONIZATION OF HYPERCHAOTIC XU AN...ACTIVE CONTROLLER DESIGN FOR THE HYBRID SYNCHRONIZATION OF HYPERCHAOTIC XU AN...
ACTIVE CONTROLLER DESIGN FOR THE HYBRID SYNCHRONIZATION OF HYPERCHAOTIC XU AN...Zac Darcy
 
ADAPTIVE CONTROL AND SYNCHRONIZATION OF A HIGHLY CHAOTIC ATTRACTOR
ADAPTIVE CONTROL AND SYNCHRONIZATION OF A HIGHLY CHAOTIC ATTRACTORADAPTIVE CONTROL AND SYNCHRONIZATION OF A HIGHLY CHAOTIC ATTRACTOR
ADAPTIVE CONTROL AND SYNCHRONIZATION OF A HIGHLY CHAOTIC ATTRACTORijistjournal
 
ADAPTIVE CONTROL AND SYNCHRONIZATION OF A HIGHLY CHAOTIC ATTRACTOR
ADAPTIVE CONTROL AND SYNCHRONIZATION OF A HIGHLY CHAOTIC ATTRACTORADAPTIVE CONTROL AND SYNCHRONIZATION OF A HIGHLY CHAOTIC ATTRACTOR
ADAPTIVE CONTROL AND SYNCHRONIZATION OF A HIGHLY CHAOTIC ATTRACTORijistjournal
 
An automatic test data generation for data flow
An automatic test data generation for data flowAn automatic test data generation for data flow
An automatic test data generation for data flowWafaQKhan
 
International Journal of Instrumentation and Control Systems (IJICS)
International Journal of Instrumentation and Control Systems (IJICS)International Journal of Instrumentation and Control Systems (IJICS)
International Journal of Instrumentation and Control Systems (IJICS)ijcisjournal
 
ADAPTIVE CONTROLLER DESIGN FOR THE ANTI-SYNCHRONIZATION OF HYPERCHAOTIC YANG ...
ADAPTIVE CONTROLLER DESIGN FOR THE ANTI-SYNCHRONIZATION OF HYPERCHAOTIC YANG ...ADAPTIVE CONTROLLER DESIGN FOR THE ANTI-SYNCHRONIZATION OF HYPERCHAOTIC YANG ...
ADAPTIVE CONTROLLER DESIGN FOR THE ANTI-SYNCHRONIZATION OF HYPERCHAOTIC YANG ...ijics
 
Numerical Solutions of Burgers' Equation Project Report
Numerical Solutions of Burgers' Equation Project ReportNumerical Solutions of Burgers' Equation Project Report
Numerical Solutions of Burgers' Equation Project ReportShikhar Agarwal
 

Similar to A new Reinforcement Scheme for Stochastic Learning Automata (20)

A New Nonlinear Reinforcement Scheme for Stochastic Learning Automata
A New Nonlinear Reinforcement Scheme for Stochastic Learning AutomataA New Nonlinear Reinforcement Scheme for Stochastic Learning Automata
A New Nonlinear Reinforcement Scheme for Stochastic Learning Automata
 
Automatic control based on Wasp Behavioral Model and Stochastic Learning Auto...
Automatic control based on Wasp Behavioral Model and Stochastic Learning Auto...Automatic control based on Wasp Behavioral Model and Stochastic Learning Auto...
Automatic control based on Wasp Behavioral Model and Stochastic Learning Auto...
 
ADAPTIVE CONTROL AND SYNCHRONIZATION OF SPROTT-I SYSTEM WITH UNKNOWN PARAMETERS
ADAPTIVE CONTROL AND SYNCHRONIZATION OF SPROTT-I SYSTEM WITH UNKNOWN PARAMETERSADAPTIVE CONTROL AND SYNCHRONIZATION OF SPROTT-I SYSTEM WITH UNKNOWN PARAMETERS
ADAPTIVE CONTROL AND SYNCHRONIZATION OF SPROTT-I SYSTEM WITH UNKNOWN PARAMETERS
 
ACTIVE CONTROLLER DESIGN FOR THE HYBRID SYNCHRONIZATION OF HYPERCHAOTIC ZHEN...
ACTIVE CONTROLLER DESIGN FOR THE HYBRID  SYNCHRONIZATION OF HYPERCHAOTIC ZHEN...ACTIVE CONTROLLER DESIGN FOR THE HYBRID  SYNCHRONIZATION OF HYPERCHAOTIC ZHEN...
ACTIVE CONTROLLER DESIGN FOR THE HYBRID SYNCHRONIZATION OF HYPERCHAOTIC ZHEN...
 
ADAPTIVE STABILIZATION AND SYNCHRONIZATION OF HYPERCHAOTIC QI SYSTEM
ADAPTIVE STABILIZATION AND SYNCHRONIZATION OF HYPERCHAOTIC QI SYSTEM ADAPTIVE STABILIZATION AND SYNCHRONIZATION OF HYPERCHAOTIC QI SYSTEM
ADAPTIVE STABILIZATION AND SYNCHRONIZATION OF HYPERCHAOTIC QI SYSTEM
 
Adaptive Stabilization and Synchronization of Hyperchaotic QI System
Adaptive Stabilization and Synchronization of Hyperchaotic QI SystemAdaptive Stabilization and Synchronization of Hyperchaotic QI System
Adaptive Stabilization and Synchronization of Hyperchaotic QI System
 
safe and efficient off policy reinforcement learning
safe and efficient off policy reinforcement learningsafe and efficient off policy reinforcement learning
safe and efficient off policy reinforcement learning
 
THE ACTIVE CONTROLLER DESIGN FOR ACHIEVING GENERALIZED PROJECTIVE SYNCHRONIZA...
THE ACTIVE CONTROLLER DESIGN FOR ACHIEVING GENERALIZED PROJECTIVE SYNCHRONIZA...THE ACTIVE CONTROLLER DESIGN FOR ACHIEVING GENERALIZED PROJECTIVE SYNCHRONIZA...
THE ACTIVE CONTROLLER DESIGN FOR ACHIEVING GENERALIZED PROJECTIVE SYNCHRONIZA...
 
Adaptive Controller and Synchronizer Design for Hyperchaotic Zhou System with...
Adaptive Controller and Synchronizer Design for Hyperchaotic Zhou System with...Adaptive Controller and Synchronizer Design for Hyperchaotic Zhou System with...
Adaptive Controller and Synchronizer Design for Hyperchaotic Zhou System with...
 
GLOBAL CHAOS SYNCHRONIZATION OF HYPERCHAOTIC QI AND HYPERCHAOTIC JHA SYSTEMS ...
GLOBAL CHAOS SYNCHRONIZATION OF HYPERCHAOTIC QI AND HYPERCHAOTIC JHA SYSTEMS ...GLOBAL CHAOS SYNCHRONIZATION OF HYPERCHAOTIC QI AND HYPERCHAOTIC JHA SYSTEMS ...
GLOBAL CHAOS SYNCHRONIZATION OF HYPERCHAOTIC QI AND HYPERCHAOTIC JHA SYSTEMS ...
 
GLOBAL CHAOS SYNCHRONIZATION OF HYPERCHAOTIC QI AND HYPERCHAOTIC JHA SYSTEMS ...
GLOBAL CHAOS SYNCHRONIZATION OF HYPERCHAOTIC QI AND HYPERCHAOTIC JHA SYSTEMS ...GLOBAL CHAOS SYNCHRONIZATION OF HYPERCHAOTIC QI AND HYPERCHAOTIC JHA SYSTEMS ...
GLOBAL CHAOS SYNCHRONIZATION OF HYPERCHAOTIC QI AND HYPERCHAOTIC JHA SYSTEMS ...
 
Projective and hybrid projective synchronization of 4-D hyperchaotic system v...
Projective and hybrid projective synchronization of 4-D hyperchaotic system v...Projective and hybrid projective synchronization of 4-D hyperchaotic system v...
Projective and hybrid projective synchronization of 4-D hyperchaotic system v...
 
ANTI-SYNCHRONIZATION OF HYPERCHAOTIC PANG AND HYPERCHAOTIC WANG-CHEN SYSTEMS ...
ANTI-SYNCHRONIZATION OF HYPERCHAOTIC PANG AND HYPERCHAOTIC WANG-CHEN SYSTEMS ...ANTI-SYNCHRONIZATION OF HYPERCHAOTIC PANG AND HYPERCHAOTIC WANG-CHEN SYSTEMS ...
ANTI-SYNCHRONIZATION OF HYPERCHAOTIC PANG AND HYPERCHAOTIC WANG-CHEN SYSTEMS ...
 
ACTIVE CONTROLLER DESIGN FOR THE HYBRID SYNCHRONIZATION OF HYPERCHAOTIC XU AN...
ACTIVE CONTROLLER DESIGN FOR THE HYBRID SYNCHRONIZATION OF HYPERCHAOTIC XU AN...ACTIVE CONTROLLER DESIGN FOR THE HYBRID SYNCHRONIZATION OF HYPERCHAOTIC XU AN...
ACTIVE CONTROLLER DESIGN FOR THE HYBRID SYNCHRONIZATION OF HYPERCHAOTIC XU AN...
 
ADAPTIVE CONTROL AND SYNCHRONIZATION OF A HIGHLY CHAOTIC ATTRACTOR
ADAPTIVE CONTROL AND SYNCHRONIZATION OF A HIGHLY CHAOTIC ATTRACTORADAPTIVE CONTROL AND SYNCHRONIZATION OF A HIGHLY CHAOTIC ATTRACTOR
ADAPTIVE CONTROL AND SYNCHRONIZATION OF A HIGHLY CHAOTIC ATTRACTOR
 
ADAPTIVE CONTROL AND SYNCHRONIZATION OF A HIGHLY CHAOTIC ATTRACTOR
ADAPTIVE CONTROL AND SYNCHRONIZATION OF A HIGHLY CHAOTIC ATTRACTORADAPTIVE CONTROL AND SYNCHRONIZATION OF A HIGHLY CHAOTIC ATTRACTOR
ADAPTIVE CONTROL AND SYNCHRONIZATION OF A HIGHLY CHAOTIC ATTRACTOR
 
An automatic test data generation for data flow
An automatic test data generation for data flowAn automatic test data generation for data flow
An automatic test data generation for data flow
 
International Journal of Instrumentation and Control Systems (IJICS)
International Journal of Instrumentation and Control Systems (IJICS)International Journal of Instrumentation and Control Systems (IJICS)
International Journal of Instrumentation and Control Systems (IJICS)
 
ADAPTIVE CONTROLLER DESIGN FOR THE ANTI-SYNCHRONIZATION OF HYPERCHAOTIC YANG ...
ADAPTIVE CONTROLLER DESIGN FOR THE ANTI-SYNCHRONIZATION OF HYPERCHAOTIC YANG ...ADAPTIVE CONTROLLER DESIGN FOR THE ANTI-SYNCHRONIZATION OF HYPERCHAOTIC YANG ...
ADAPTIVE CONTROLLER DESIGN FOR THE ANTI-SYNCHRONIZATION OF HYPERCHAOTIC YANG ...
 
Numerical Solutions of Burgers' Equation Project Report
Numerical Solutions of Burgers' Equation Project ReportNumerical Solutions of Burgers' Equation Project Report
Numerical Solutions of Burgers' Equation Project Report
 

More from infopapers

A New Model Checking Tool
A New Model Checking ToolA New Model Checking Tool
A New Model Checking Toolinfopapers
 
CTL Model Update Implementation Using ANTLR Tools
CTL Model Update Implementation Using ANTLR ToolsCTL Model Update Implementation Using ANTLR Tools
CTL Model Update Implementation Using ANTLR Toolsinfopapers
 
Generating JADE agents from SDL specifications
Generating JADE agents from SDL specificationsGenerating JADE agents from SDL specifications
Generating JADE agents from SDL specificationsinfopapers
 
An evolutionary method for constructing complex SVM kernels
An evolutionary method for constructing complex SVM kernelsAn evolutionary method for constructing complex SVM kernels
An evolutionary method for constructing complex SVM kernelsinfopapers
 
Evaluation of a hybrid method for constructing multiple SVM kernels
Evaluation of a hybrid method for constructing multiple SVM kernelsEvaluation of a hybrid method for constructing multiple SVM kernels
Evaluation of a hybrid method for constructing multiple SVM kernelsinfopapers
 
Interoperability issues in accessing databases through Web Services
Interoperability issues in accessing databases through Web ServicesInteroperability issues in accessing databases through Web Services
Interoperability issues in accessing databases through Web Servicesinfopapers
 
Using Ontology in Electronic Evaluation for Personalization of eLearning Systems
Using Ontology in Electronic Evaluation for Personalization of eLearning SystemsUsing Ontology in Electronic Evaluation for Personalization of eLearning Systems
Using Ontology in Electronic Evaluation for Personalization of eLearning Systemsinfopapers
 
Models for a Multi-Agent System Based on Wasp-Like Behaviour for Distributed ...
Models for a Multi-Agent System Based on Wasp-Like Behaviour for Distributed ...Models for a Multi-Agent System Based on Wasp-Like Behaviour for Distributed ...
Models for a Multi-Agent System Based on Wasp-Like Behaviour for Distributed ...infopapers
 
An executable model for an Intelligent Vehicle Control System
An executable model for an Intelligent Vehicle Control SystemAn executable model for an Intelligent Vehicle Control System
An executable model for an Intelligent Vehicle Control Systeminfopapers
 

More from infopapers (9)

A New Model Checking Tool
A New Model Checking ToolA New Model Checking Tool
A New Model Checking Tool
 
CTL Model Update Implementation Using ANTLR Tools
CTL Model Update Implementation Using ANTLR ToolsCTL Model Update Implementation Using ANTLR Tools
CTL Model Update Implementation Using ANTLR Tools
 
Generating JADE agents from SDL specifications
Generating JADE agents from SDL specificationsGenerating JADE agents from SDL specifications
Generating JADE agents from SDL specifications
 
An evolutionary method for constructing complex SVM kernels
An evolutionary method for constructing complex SVM kernelsAn evolutionary method for constructing complex SVM kernels
An evolutionary method for constructing complex SVM kernels
 
Evaluation of a hybrid method for constructing multiple SVM kernels
Evaluation of a hybrid method for constructing multiple SVM kernelsEvaluation of a hybrid method for constructing multiple SVM kernels
Evaluation of a hybrid method for constructing multiple SVM kernels
 
Interoperability issues in accessing databases through Web Services
Interoperability issues in accessing databases through Web ServicesInteroperability issues in accessing databases through Web Services
Interoperability issues in accessing databases through Web Services
 
Using Ontology in Electronic Evaluation for Personalization of eLearning Systems
Using Ontology in Electronic Evaluation for Personalization of eLearning SystemsUsing Ontology in Electronic Evaluation for Personalization of eLearning Systems
Using Ontology in Electronic Evaluation for Personalization of eLearning Systems
 
Models for a Multi-Agent System Based on Wasp-Like Behaviour for Distributed ...
Models for a Multi-Agent System Based on Wasp-Like Behaviour for Distributed ...Models for a Multi-Agent System Based on Wasp-Like Behaviour for Distributed ...
Models for a Multi-Agent System Based on Wasp-Like Behaviour for Distributed ...
 
An executable model for an Intelligent Vehicle Control System
An executable model for an Intelligent Vehicle Control SystemAn executable model for an Intelligent Vehicle Control System
An executable model for an Intelligent Vehicle Control System
 

Recently uploaded

Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝soniya singh
 
‏‏VIRUS - 123455555555555555555555555555555555555555
‏‏VIRUS -  123455555555555555555555555555555555555555‏‏VIRUS -  123455555555555555555555555555555555555555
‏‏VIRUS - 123455555555555555555555555555555555555555kikilily0909
 
THE ROLE OF PHARMACOGNOSY IN TRADITIONAL AND MODERN SYSTEM OF MEDICINE.pptx
THE ROLE OF PHARMACOGNOSY IN TRADITIONAL AND MODERN SYSTEM OF MEDICINE.pptxTHE ROLE OF PHARMACOGNOSY IN TRADITIONAL AND MODERN SYSTEM OF MEDICINE.pptx
THE ROLE OF PHARMACOGNOSY IN TRADITIONAL AND MODERN SYSTEM OF MEDICINE.pptxNandakishor Bhaurao Deshmukh
 
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptxSOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptxkessiyaTpeter
 
Speech, hearing, noise, intelligibility.pptx
Speech, hearing, noise, intelligibility.pptxSpeech, hearing, noise, intelligibility.pptx
Speech, hearing, noise, intelligibility.pptxpriyankatabhane
 
Scheme-of-Work-Science-Stage-4 cambridge science.docx
Scheme-of-Work-Science-Stage-4 cambridge science.docxScheme-of-Work-Science-Stage-4 cambridge science.docx
Scheme-of-Work-Science-Stage-4 cambridge science.docxyaramohamed343013
 
TOTAL CHOLESTEROL (lipid profile test).pptx
TOTAL CHOLESTEROL (lipid profile test).pptxTOTAL CHOLESTEROL (lipid profile test).pptx
TOTAL CHOLESTEROL (lipid profile test).pptxdharshini369nike
 
Evidences of Evolution General Biology 2
Evidences of Evolution General Biology 2Evidences of Evolution General Biology 2
Evidences of Evolution General Biology 2John Carlo Rollon
 
Transposable elements in prokaryotes.ppt
Transposable elements in prokaryotes.pptTransposable elements in prokaryotes.ppt
Transposable elements in prokaryotes.pptArshadWarsi13
 
Call Girls in Aiims Metro Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Aiims Metro Delhi 💯Call Us 🔝9953322196🔝 💯Escort.Call Girls in Aiims Metro Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Aiims Metro Delhi 💯Call Us 🔝9953322196🔝 💯Escort.aasikanpl
 
Heredity: Inheritance and Variation of Traits
Heredity: Inheritance and Variation of TraitsHeredity: Inheritance and Variation of Traits
Heredity: Inheritance and Variation of TraitsCharlene Llagas
 
Spermiogenesis or Spermateleosis or metamorphosis of spermatid
Spermiogenesis or Spermateleosis or metamorphosis of spermatidSpermiogenesis or Spermateleosis or metamorphosis of spermatid
Spermiogenesis or Spermateleosis or metamorphosis of spermatidSarthak Sekhar Mondal
 
Call Us ≽ 9953322196 ≼ Call Girls In Lajpat Nagar (Delhi) |
Call Us ≽ 9953322196 ≼ Call Girls In Lajpat Nagar (Delhi) |Call Us ≽ 9953322196 ≼ Call Girls In Lajpat Nagar (Delhi) |
Call Us ≽ 9953322196 ≼ Call Girls In Lajpat Nagar (Delhi) |aasikanpl
 
Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |
Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |
Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |aasikanpl
 
Welcome to GFDL for Take Your Child To Work Day
Welcome to GFDL for Take Your Child To Work DayWelcome to GFDL for Take Your Child To Work Day
Welcome to GFDL for Take Your Child To Work DayZachary Labe
 
Harmful and Useful Microorganisms Presentation
Harmful and Useful Microorganisms PresentationHarmful and Useful Microorganisms Presentation
Harmful and Useful Microorganisms Presentationtahreemzahra82
 
TOPIC 8 Temperature and Heat.pdf physics
TOPIC 8 Temperature and Heat.pdf physicsTOPIC 8 Temperature and Heat.pdf physics
TOPIC 8 Temperature and Heat.pdf physicsssuserddc89b
 
LIGHT-PHENOMENA-BY-CABUALDIONALDOPANOGANCADIENTE-CONDEZA (1).pptx
LIGHT-PHENOMENA-BY-CABUALDIONALDOPANOGANCADIENTE-CONDEZA (1).pptxLIGHT-PHENOMENA-BY-CABUALDIONALDOPANOGANCADIENTE-CONDEZA (1).pptx
LIGHT-PHENOMENA-BY-CABUALDIONALDOPANOGANCADIENTE-CONDEZA (1).pptxmalonesandreagweneth
 
Microphone- characteristics,carbon microphone, dynamic microphone.pptx
Microphone- characteristics,carbon microphone, dynamic microphone.pptxMicrophone- characteristics,carbon microphone, dynamic microphone.pptx
Microphone- characteristics,carbon microphone, dynamic microphone.pptxpriyankatabhane
 
Solution chemistry, Moral and Normal solutions
Solution chemistry, Moral and Normal solutionsSolution chemistry, Moral and Normal solutions
Solution chemistry, Moral and Normal solutionsHajira Mahmood
 

Recently uploaded (20)

Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝
 
‏‏VIRUS - 123455555555555555555555555555555555555555
‏‏VIRUS -  123455555555555555555555555555555555555555‏‏VIRUS -  123455555555555555555555555555555555555555
‏‏VIRUS - 123455555555555555555555555555555555555555
 
THE ROLE OF PHARMACOGNOSY IN TRADITIONAL AND MODERN SYSTEM OF MEDICINE.pptx
THE ROLE OF PHARMACOGNOSY IN TRADITIONAL AND MODERN SYSTEM OF MEDICINE.pptxTHE ROLE OF PHARMACOGNOSY IN TRADITIONAL AND MODERN SYSTEM OF MEDICINE.pptx
THE ROLE OF PHARMACOGNOSY IN TRADITIONAL AND MODERN SYSTEM OF MEDICINE.pptx
 
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptxSOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
 
Speech, hearing, noise, intelligibility.pptx
Speech, hearing, noise, intelligibility.pptxSpeech, hearing, noise, intelligibility.pptx
Speech, hearing, noise, intelligibility.pptx
 
Scheme-of-Work-Science-Stage-4 cambridge science.docx
Scheme-of-Work-Science-Stage-4 cambridge science.docxScheme-of-Work-Science-Stage-4 cambridge science.docx
Scheme-of-Work-Science-Stage-4 cambridge science.docx
 
TOTAL CHOLESTEROL (lipid profile test).pptx
TOTAL CHOLESTEROL (lipid profile test).pptxTOTAL CHOLESTEROL (lipid profile test).pptx
TOTAL CHOLESTEROL (lipid profile test).pptx
 
Evidences of Evolution General Biology 2
Evidences of Evolution General Biology 2Evidences of Evolution General Biology 2
Evidences of Evolution General Biology 2
 
Transposable elements in prokaryotes.ppt
Transposable elements in prokaryotes.pptTransposable elements in prokaryotes.ppt
Transposable elements in prokaryotes.ppt
 
Call Girls in Aiims Metro Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Aiims Metro Delhi 💯Call Us 🔝9953322196🔝 💯Escort.Call Girls in Aiims Metro Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Aiims Metro Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
 
Heredity: Inheritance and Variation of Traits
Heredity: Inheritance and Variation of TraitsHeredity: Inheritance and Variation of Traits
Heredity: Inheritance and Variation of Traits
 
Spermiogenesis or Spermateleosis or metamorphosis of spermatid
Spermiogenesis or Spermateleosis or metamorphosis of spermatidSpermiogenesis or Spermateleosis or metamorphosis of spermatid
Spermiogenesis or Spermateleosis or metamorphosis of spermatid
 
Call Us ≽ 9953322196 ≼ Call Girls In Lajpat Nagar (Delhi) |
Call Us ≽ 9953322196 ≼ Call Girls In Lajpat Nagar (Delhi) |Call Us ≽ 9953322196 ≼ Call Girls In Lajpat Nagar (Delhi) |
Call Us ≽ 9953322196 ≼ Call Girls In Lajpat Nagar (Delhi) |
 
Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |
Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |
Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |
 
Welcome to GFDL for Take Your Child To Work Day
Welcome to GFDL for Take Your Child To Work DayWelcome to GFDL for Take Your Child To Work Day
Welcome to GFDL for Take Your Child To Work Day
 
Harmful and Useful Microorganisms Presentation
Harmful and Useful Microorganisms PresentationHarmful and Useful Microorganisms Presentation
Harmful and Useful Microorganisms Presentation
 
TOPIC 8 Temperature and Heat.pdf physics
TOPIC 8 Temperature and Heat.pdf physicsTOPIC 8 Temperature and Heat.pdf physics
TOPIC 8 Temperature and Heat.pdf physics
 
LIGHT-PHENOMENA-BY-CABUALDIONALDOPANOGANCADIENTE-CONDEZA (1).pptx
LIGHT-PHENOMENA-BY-CABUALDIONALDOPANOGANCADIENTE-CONDEZA (1).pptxLIGHT-PHENOMENA-BY-CABUALDIONALDOPANOGANCADIENTE-CONDEZA (1).pptx
LIGHT-PHENOMENA-BY-CABUALDIONALDOPANOGANCADIENTE-CONDEZA (1).pptx
 
Microphone- characteristics,carbon microphone, dynamic microphone.pptx
Microphone- characteristics,carbon microphone, dynamic microphone.pptxMicrophone- characteristics,carbon microphone, dynamic microphone.pptx
Microphone- characteristics,carbon microphone, dynamic microphone.pptx
 
Solution chemistry, Moral and Normal solutions
Solution chemistry, Moral and Normal solutionsSolution chemistry, Moral and Normal solutions
Solution chemistry, Moral and Normal solutions
 

A new Reinforcement Scheme for Stochastic Learning Automata

  • 1. I ICE-B 2008 Proceedings of the International Conference on e-Business Porto, Portugal July 26 – 29, 2008 Organized by INSTICC – Institute for Systems and Technologies of Information, Control and Communication Co-Sponsored by WfMC – Workflow Management Coalition – Process Thought Leadership Technically Co-Sponsored by IEEE SMC – Systems, Man, and Cybernetics Society
  • 2. II Copyright © 2008 INSTICC – Institute for Systems and Technologies of Information, Control and Communication All rights reserved Edited by Joaquim Filipe, David A. Marca, Boris Shishkov and Marten van Sinderen Printed in Portugal ISBN: 978-989-8111-58-6 Depósito Legal: 279019/08 http://www.ice-b.org secretariat@ice-b.org
  • 3. A NEW REINFORCEMENT SCHEME FOR STOCHASTIC LEARNING AUTOMATA Application to Automatic Control Florin Stoica, Emil M. Popa Computer Science Department, “Lucian Blaga” University, Str. Dr. Ion Ratiu 5-7, Sibiu, Romania florin.stoica@ulbsibiu.ro, emilm.popa@ulbsibiu.ro Iulian Pah Department of Sociology, “Babes-Bolyai” University, Bd.21 decembrie 1989, no.128-130, Cluj-Napoca, Romania i_pah@yahoo.com Keywords: Stochastic Learning Automata, Reinforcement Learning, Intelligent Vehicle Control, agents. Abstract: A Learning Automaton is a learning entity that learns the optimal action to use from its set of possible actions. It does this by performing actions toward an environment and analyzes the resulting response. The response, being both good and bad, results in behaviour change to the automaton (the automaton will learn based on this response). This behaviour change is often called reinforcement algorithm. The term stochastic emphasizes the adaptive nature of the automaton: environment output is stochastically related to the automaton action. The reinforcement scheme presented in this paper is shown to satisfy all necessary and sufficient conditions for absolute expediency for a stationary environment. An automaton using this scheme is guaranteed to „do better” at every time step than at the previous step. Some simulation results are presented, which prove that our algorithm converges to a solution faster than one previously defined in (Ünsal, 1999). Using Stochastic Learning Automata techniques, we introduce a decision/control method for intelligent vehicles, in infrastructure managed architecture. The aim is to design an automata system that can learn the best possible action based on the data received from on-board sensors or from the localization system of highway infrastructure. A multi-agent approach is used for effective implementation. Each vehicle has associated a “driver” agent, hosted on a JADE platform. 1 INTRODUCTION The past and present research on vehicle control emphasizes the importance of new methodologies in order to obtain stable longitudinal and lateral control. In this paper, we consider stochastic learning automata as intelligent controller within our model for an Intelligent Vehicle Control System. An automaton is a machine or control mechanism designed to automatically follow a predetermined sequence of operations or respond to encoded instructions. The term stochastic emphasizes the adaptive nature of the automaton we describe here. The automaton described here does not follow predetermined rules, but adapts to changes in its environment. This adaptation is the result of the learning process (Barto, 2003). Learning is defined as any permanent change in behavior as a result of past experience, and a learning system should therefore have the ability to improve its behavior with time, toward a final goal. The stochastic automaton attempts a solution of the problem without any information on the optimal action (initially, equal probabilities are attached to all the actions). One action is selected at random, the response from the environment is observed, action probabilities are updated based on that response, and the procedure is repeated. A stochastic automaton acting as described to improve its performance is called a learning automaton. The algorithm that guarantees the desired learning process is called a reinforcement scheme (Moody, 2004). Mathematically, the environment is defined by a triple } , , { β α c where } ,..., , { 2 1 r α α = α α represents a finite set of actions being the input to the environment, { 1, 2} β = β β represents a binary response set, and c = {c1,c2 ,..., cr} is a set of penalty 45
  • 4. probabilities, where ci is the probability that action i α will result in an unfavorable response. Given that β (n) = 0 is a favorable outcome and β (n) = 1 is an unfavorable outcome at time instant n (n = 0, 1, 2, ...) , the element ci of c is defined mathematically by: ci = P(β (n) = 1|α (n) =αi ) i = 1, 2, ..., r The environment can further be split up in two types, stationary and nonstationary. In a stationary environment the penalty probabilities will never change. In a nonstationary environment the penalties will change over time. In order to describe the reinforcement schemes, is defined p(n) , a vector of action probabilities: pi (n) = P(α (n) =αi ), i = 1, r Updating action probabilities can be represented as follows: p(n +1) = T[ p(n),α (n),β (n)] where T is a mapping. This formula says the next action probability p(n +1) is updated based on the current probability p(n) , the input from the environment and the resulting action. If p(n +1) is a linear function of p(n) , the reinforcement scheme is said to be linear; otherwise it is termed nonlinear. 2 REINFORCEMENT SCHEMES 2.1 Performance Evaluation Consider a stationary random environment with penalty probabilities {c1,c2 ,...,cr} defined above. We define a quantity M(n) as the average penalty for a given action probability vector: Σ= M n ci pi n ( ) = ( ) r i 1 An automaton is absolutely expedient if the expected value of the average penalty at one iteration step is less than it was at the previous step for all steps: M(n +1) < M(n) for all n (Rivero, 2003). The algorithm which we will present in this paper is derived from a nonlinear absolutely expedient reinforcement scheme presented by (Ünsal, 1999). 2.2 Absolutely Expedient Reinforcement Schemes The reinforcement scheme is the basis of the learning process for learning automata. The general solution for absolutely expedient schemes was found by (Lakshmivarahan, 1973). A learning automaton may send its action to i multiple environments α at the same time. In that case, the action of the automaton results in a vector of responses from environments (or “teachers”). In a stationary N-teacher environment, if an automaton produced the action and the environment responses are j j N β i = 1,..., at time instant n , then the vector of action probabilities p(n) is updated as follows (Ünsal, 1999): ⎤ ( 1) ( ) 1 β φ ( ( )) Σ Σ ≠ = = − ∗ ⎥⎦ ⎡ ⎢⎣ + = + r j j i j N k k i i i p n N p n p n 1 1 ⎤ 1 1 β ψ ( ( )) Σ Σ ≠ = = ∗ ⎥⎦ ⎡ − − ⎢⎣ r j j i j N k k i p n N 1 1 ⎤ ⎡ ( 1) ( ) 1 ( ( )) ⎤ ⎡ + − 1 1 ( ( )) 1 1 p n N p n N p n p n j N k k i j N k k j j i β ψ β φ ∗ ⎥⎦ ⎢⎣ + ∗ ⎥⎦ ⎢⎣ + = − Σ Σ = = for all i j ≠ where the functions i φ and i ψ (1) satisfy the following conditions: p n p n 1 r λ p n ( ( )) ... ( ( )) ( ) ( ( )) 1 ( ) p n p n r φ φ = = = (2) p n p n 1 r μ p n ( ( )) ... ( ( )) ( ) ( ( )) 1 ( ) p n p n r ψ ψ = = = Σ r pi n j p n ( ) + φ ( ( )) > 0 (3) 1 ≠ = j j i r Σ ( ) ψ ( ( )) 1 (4) pi n j p n − < 1 ≠ = j j i p j (n) + j ( p(n)) > 0 ψ (5) p j (n) − j ( p(n)) < 1 φ (6) for all j∈{1,..., r}{i} The conditions (3)-(6) ensure that 0 < pk < 1, k = 1, r (Stoica, 2007). Theorem. If the functions λ ( p(n)) and μ ( p(n)) satisfy the following conditions: ICE-B 2008 - International Conference on e-Business 46
  • 5. A NEW REINFORCEMENT SCHEME FOR STOCHASTIC LEARNING AUTOMATA - Application to Automatic λ ( p(n)) ≤ 0 μ ( p(n)) ≤ 0 (7) λ ( p(n)) +μ ( p(n)) < 0 then the automaton with the reinforcement scheme in (1) is absolutely expedient in a stationary environment. The proof of this theorem can be found in (Baba, 1984). 3 A NEW NONLINEAR REINFORCEMENT SCHEME Because the above theorem is also valid for a single-teacher model, we can define a single environment response that is a function f of many teacher outputs. Thus, we can update the above algorithm as follows: p n p n f H n p n ( 1) ( ) ( ( )) [1 ( )] + = + ∗ − ∗ ∗ ∗ − − i i i − − ∗ − ∗ − f p n (1 ) ( ) [1 ( )] i θ θ δ p n p n f H n ( 1) ( ) ( ( )) p n f p n + = − ∗ − ∗ ∗ ∗ j j ∗ + − ∗ − ∗ θ δ ( ) (1 ) ( θ ) ( ) j j (8) for all j ≠ i , i.e.: k ( p(n)) pk (n) ψ = −θ ∗ k ( p(n)) H(n) pk (n) φ = −θ ∗δ ∗ ∗ where learning parameters θ and δ are real values which satisfy: 0 <θ < 1 and 0 <θ ∗δ <1. The function H is defined as follows: { { ⎩ ⎨ ⎧ p n ( ) ( ) min 1; max min ε = − , θ δ p n (1 ( )) ∗ ∗ − H n i i ⎫ ⎪⎬ ;0}} p n 1 ( ) ( ) , 1 ⎪⎭ ⎞ ⎟ ⎟ ⎠ ⎛ ⎜ ⎜ ⎝ − − ∗ ∗ j j r ≠ = j i j p n ε θ δ k φ Parameter ε is an arbitrarily small positive real number. Our reinforcement scheme differs from the one given in (Ünsal, 1999) by the definition of these two functions: H and . The proof that all the conditions of the reinforcement scheme (1) and theorem (7) are satisfied can be found in (Stoica, 2007). In conclusion, we state the algorithm given in equations (8) is absolutely expedient in a stationary environment. Control 4 EXPERIMENTAL RESULTS 4.1 Problem Formulation To show that our algorithm converges to a solution faster than the one given in (Ünsal, 1999), let us consider a simple example. Figure 1 illustrates a grid world in which a robot navigates. Shaded cells represent barriers. Figure 1: A grid world for robot navigation. The current position of the robot is marked by a circle. Navigation is done using four actions α = {N, S, E,W} , the actions denoting the four possible movements along the coordinate directions. Because in given situation there is a single optimal action, we stop the execution when the probability of the optimal action reaches a certain value (0.9999). 4.2 Comparative Results We compared two reinforcement schemes using these four actions and two different initial conditions. Table 1: Convergence rates for a single optimal action of a 4-action automaton (200 runs for each parameter set). Average number of steps to reach popt=0.9999 4 actions with pi (0) 1/ 4, = 1,4 = i 4 actions with (0) = 0.0005, = 0.9995 / 3 opt p i≠opt p θ δ Ünsal’s Alg. New alg. Ünsal’s Alg. New alg. 0.01 1 644.84 633.96 921.20 905.18 25 62.23 56.64 205.56 194.08 50 11.13 8.73 351.67 340.27 0.05 1 136.99 130.41 202.96 198.25 5 74.05 63.93 88.39 79.19 10 24.74 20.09 103.21 92.83 0.1 1 70.81 63.09 105.12 99.20 2.5 59.48 50.52 71.77 65.49 5 23.05 19.51 59.06 54.08 The data shown in Table 1 are the results of two different initial conditions where in first case all 47
  • 6. probabilities are initially the same and in second case the optimal action initially has a small probability value (0.0005), with only one action receiving reward (i.e., optimal action). Comparing values from corresponding columns, we conclude that our algorithm converges to a solution faster than the one given in (Ünsal, 1999). 5 USING STOCHASTIC LEARNING AUTOMATA FOR INTELLIGENT VEHICLE CONTROL The task of creating intelligent systems that we can rely on is not trivial. In this section, we present a method for intelligent vehicle control, having as theoretical background Stochastic Learning Automata. We visualize the planning layer of an intelligent vehicle as an automaton (or automata group) in a nonstationary environment. We attempt to find a way to make intelligent decisions here, having as objectives conformance with traffic parameters imposed by the highway infrastructure (management system and global control), and improved safety by minimizing crash risk. The aim here is to design an automata system that can learn the best possible action based on the data received from on-board sensors, of from roadside-to-vehicle communications. For our model, we assume that an intelligent vehicle is capable of two sets of lateral and longitudinal actions. Lateral actions are LEFT (shift to left lane), RIGHT (shift to right lane) and LINE_OK (stay in current lane). Longitudinal actions are ACC (accelerate), DEC (decelerate) and SPEED_OK (keep current speed). An autonomous vehicle must be able to “sense” the environment around itself. Therefore, we assume that there are four different sensors modules on board the vehicle (the headway module, two side modules and a speed module), in order to detect the presence of a vehicle traveling in front of the vehicle or in the immediately adjacent lane and to know the current speed of the vehicle. These sensor modules evaluate the information received from the on-board sensors or from the highway infrastructure in the light of the current automata actions, and send a response to the automata. Our basic model for planning and coordination of lane changing and speed control is shown in Figure 2. Localization System yes β1 = 0 no yes β2 = 0 Frontal Detection Left Detection Right Detection Speed Detection F1 F2 no α1 α 2 Physical Environment SLA Environment Longitudina l Lateral Automaton α1 α 2 Planning Layer Auto vehicle β1 β2 Highway α1 Regulation Buffer ─ α 2 Regulation Buffer ─ action Figure 2: The model of the Intelligent Vehicle Control System. The response from physical environment is a combination of outputs from the sensor modules. Because an input parameter for the decision blocks is the action chosen by the stochastic automaton, is necessary to use two distinct functions F1 and F2 for mapping the outputs of decision blocks in inputs for the two learning automata, namely the longitudinal automaton and respectively the lateral automaton. After updating the action probability vectors in both learning automata, using the nonlinear reinforcement scheme presented in section 3, the outputs from stochastic automata are transmitted to the regulation layer. The regulation layer handles the actions received from the two automata in a distinct manner, using for each of them a regulation ICE-B 2008 - International Conference on e-Business 48
  • 7. A NEW REINFORCEMENT SCHEME FOR STOCHASTIC LEARNING AUTOMATA - Application to Automatic buffer. If an action received was rewarded, it will be introduced in the regulation buffer of the corresponding automaton, else in buffer will be introduced a certain value which denotes a penalized action by the physical environment. The regulation layer does not carry out the action chosen immediately; instead, it carries out an action only if it is recommended k times consecutively by the automaton, where k is the length of the regulation buffer. After an action is executed, the action probability vector is initialized to 1 , where r is the r number of actions. When an action is executed, regulation buffer is initialized also. 6 SENSOR MODULES The four teacher modules mentioned above are decision blocks that calculate the response (reward/penalty), based on the last chosen action of automaton. Table 2 describes the output of decision blocks for side sensors. Table 2: Outputs from the Left/Right Sensor Module. Left/Right Sensor Module Actions Vehicle in sensor range or no adjacent lane No vehicle in sensor range and adjacent lane exists LINE_OK 0/0 0/0 LEFT 1/0 0/0 RIGHT 0/1 0/0 Table 3: Outputs from the Headway Module. Headway Sensor Module Actions Vehicle in range (at a close frontal distance) No vehicle in range LINE_OK 1 0 LEFT 0 0 RIGHT 0 0 SPEED_OK 1 0 ACC 1 0 DEC 0* 0 As seen in Table 2, a penalty response is received from the left sensor module when the action is LEFT and there is a vehicle in the left or the vehicle is already traveling on the leftmost lane. There is a similar situation for the right sensor module. The Headway (Frontal) Module is defined as shown in Table 3. If there is a vehicle at a close Control distance (< admissible distance), a penalty response is sent to the automaton for actions LINE_OK, SPEED_OK and ACC. All other actions (LEFT, RIGHT, DEC) are encouraged, because they may serve to avoid a collision. The Speed Module compares the actual speed with the desired speed, and based on the action choosed send a feedback to the longitudinal automaton. Table 4: Outputs from the Speed Module. Speed Sensor Module Actions Speed: too slow Acceptable speed Speed: too fast SPEED_OK 1 0 1 ACC 0 0 1 DEC 1 0 0 The reward response indicated by 0* (from the Headway Sensor Module) is different than the normal reward response, indicated by 0: this reward response has a higher priority and must override a possible penalty from other modules. 7 A MULTI-AGENT SYSTEM FOR INTELLIGENT VEHICLE CONTROL In this section is described an implementation of a simulator for the Intelligent Vehicle Control System, in a multi-agent approach. The entire system was implemented in Java, and is based on JADE platform (Bigus, 2001). In figure 3 is showed the class diagram of the simulator. Each vehicle has associated a JADE agent (DriverAgent), responsible for the intelligent control. “Driving” means a continuous learning process, sustained by the two stochastic learning automata, namely the longitudinal automaton and respectively the lateral automaton. The response of the physical environment is a combination of the outputs of all four sensor modules. The implementation of this combination for each automaton (longitudinal respectively lateral) is showed in figure 4 (the value 0* was substituted by 2). 49
  • 8. Figure 3: The class diagram of the simulator. // Longitudinal Automaton public double reward(int action){ int combine; combine=Math.max(speedModule(action), frontModule(action)); if (combine = = 2) combine = 0; return combine; } // Lateral Automaton public double reward(int action){ int combine; combine=Math.max( leftRightModule(action), frontModule(action)); return combine; } Figure 4: The physical environment response. 8 CONCLUSIONS Reinforcement learning has attracted rapidly increasing interest in the machine learning and artificial intelligence communities. Its promise is beguiling - a way of programming agents by reward and punishment without needing to specify how the task (i.e., behavior) is to be achieved. Reinforcement learning allows, at least in principle, to bypass the problems of building an explicit model of the behavior to be synthesized and its counterpart, a meaningful learning base (supervised learning). The reinforcement scheme presented in this paper satisfies all necessary and sufficient conditions for absolute expediency in a stationary environment and the nonlinear algorithm based on this scheme is found to converge to the ”optimal” action faster than nonlinear schemes previously defined in (Ünsal, 1999). Using this new reinforcement scheme was developed a simulator for an Intelligent Vehicle Control System, in a multi-agent approach. The entire system was implemented in Java, and is based on JADE platform REFERENCES Baba, N., 1984. New Topics in Learning Automata: Theory and Applications, Lecture Notes in Control and Information Sciences Berlin, Germany: Springer- Verlag. Barto, A., Mahadevan, S., 2003. Recent advances in hierarchical reinforcement learning, Discrete-Event Systems journal, Special issue on Reinforcement Learning. Bigus, J. P., Bigus, J., 2001. Constructing Intelligent Agents using Java, 2nd ed., John Wiley & Sons, Inc. Buffet, O., Dutech, A., Charpillet, F., 2001. Incremental reinforcement learning for designing multi-agent systems, In J. P. Müller, E. Andre, S. Sen, and C. Frasson, editors, Proceedings of the Fifth International Conference onAutonomous Agents, pp. 31–32, Montreal, Canada, ACM Press. Lakshmivarahan, S., Thathachar, M.A.L., 1973. Absolutely Expedient Learning Algorithms for Stochastic Automata, IEEE Transactions on Systems, Man and Cybernetics, vol. SMC-6, pp. 281-286. Moody, J., Liu, Y., Saffell, M., Youn, K., 2004. Stochastic direct reinforcement: Application to simple games with recurrence, In Proceedings of Artificial Multiagent Learning. Papers from the 2004 AAAI Fall Symposium,Technical Report FS-04-02. Narendra, K. S., Thathachar, M. A. L., 1989. Learning Automata: an introduction, Prentice-Hall. Rivero, C., 2003. Characterization of the absolutely expedient learning algorithms for stochastic automata in a non-discrete space of actions, ESANN'2003 proceedings - European Symposium on Artificial Neural Networks Bruges (Belgium), ISBN 2-930307- 03-X, pp. 307-312 Stoica, F., Popa, E. M., 2007. An Absolutely Expedient Learning Algorithm for Stochastic Automata, WSEAS Transactions on Computers, Issue 2, Volume 6, ISSN 1109-2750, pp. 229-235. Sutton, R., Barto, A., 1998. Reinforcement learning: An introduction, MIT-press, Cambridge, MA. Ünsal, C., Kachroo, P., Bay, J. S., 1999. Multiple Stochastic Learning Automata for Vehicle Path Control in an Automated Highway System, IEEE Transactions on Systems, Man, and Cybernetics -part A: systems and humans, vol. 29, no. 1, january 1999. ICE-B 2008 - International Conference on e-Business 50