Reinforcement learning in neuro fuzzy traffic signal control

1,040 views

Published on

Published in: Education, Technology, Business
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
1,040
On SlideShare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
45
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide

Reinforcement learning in neuro fuzzy traffic signal control

  1. 1. By:-Abhishek Vishnoi(112501) NITTTR Chandhigarh
  2. 2. INTRODUCTION A fuzzy traffic signal controller uses simple “if–then” rules which involve linguistic concepts such as medium or long, presented as membership functions. In neuro-fuzzy traffic signal control, a neural network adjusts the fuzzy controller by fine-tuning the form and location of the membership functions. The learning algorithm of the neural network is reinforcement learning, which gives credit for successful system behaviour and punishes for poor behaviour;
  3. 3. Basics of Neuro-Fuzzy A combination of a neural network and a fuzzy system is called a neurofuzzy system. In neurofuzzy control, the parameters of the fuzzy controller are adjusted using a neural network. Neurofuzzy systems utilize both the linguistic, human-like reasoning of fuzzy systems and the powerful computing ability of neural networks. They can avoid some of the drawbacks of solely fuzzy or neural systems
  4. 4. Fuzzy system Fuzzy logic were brought to public attention by Zadeh. Fuzzy sets provide a mathematical interpretation for natural language terms. A fuzzy set is a set without a crisp, clearly defined boundary. It can contain elements with a partial membership function Fuzzy control uses a rule base, where the rules are propositions of the form “if X is S, then Y is T”. Here X and Y are linguistic variables,
  5. 5.  Membership function (MF) is a curve that defines how each point in the input space is mapped to a membership value (or degree of membership) between 0 and 1. Each linguistic terms is associated with a fuzzy set defined by a corresponding membership function.
  6. 6. Fuzzy traffic signal control the controller receives measurements of incoming traffic and chooses the length of the green signal accordingly. Advantage of fuzzy control systems over traditional ones:- Their ability to use expert knowledge in the form of fuzzy rule. Small number of parameters needed.
  7. 7. Fuzzy traffic signal control• The traffic simulation environment used in this work is a two- phase controlled intersection of two-lane streets. In each approaching lane there are two traffic detectors, the first one before the stop line and the other at the stop line. These detectors send input measurements of traffic to the fuzzy controller: APP, number of approaching vehicles in the green direction and QUE, number of queuing vehicles in the red direction. Depending on the traffic situation, the green phase can be extended with one or several seconds. Output of the fuzzy controller is EXT, green time extension (in seconds). The linguistic values of APP are zero, a few, medium and many; QUE are a few, medium and too long; EXT are zero, short, medium and long.
  8. 8. Fuzzy rule The rule base consists of five rule sets. The choice of the rule set depends on how many green extensions have already been given. The objective of the rules is to split the green time and find the right moment of green termination so that the delay of vehicles is minimizedAfter minimum green (5 seconds) if APP is zero, then EXT is zero if APP is a few and if QUE is less than medium, then EXT is short if APP is more than a few, then EXT is medium if APP is medium, then EXT is long
  9. 9. After the first extension if APP is zero, then EXT is zero if APP is a few and if QUE is less than medium, then EXT is short if APP is medium, then EXT is medium if APP is many, then EXT is longAfter the second extension if APP is zero, then EXT is zero if APP is a few and if QUE is less than medium, then EXT is short if APP is medium and if QUE is less than medium, then EXT is medium if APP is many and if QUE is less than medium, then EXT is long
  10. 10. After the third extension if APP is zero, then EXT is zero if QUE is too long, then EXT is zero if APP is more than a few and if QUE is less than medium, then EXT is short if APP is medium and if QUE is less than medium, then EXT is medium if APP is many and if QUE is less than a few, then EXT is long After the fourth extension if APP is zero, then EXT is zero if QUE is too long, then EXT is zero if APP is more than a few and if QUE is a few, then EXT is short if APP is medium and if QUE is less than a few, then EXT is medium if APP is many and if QUE is less than a few, then EXT is long
  11. 11. Reinforcement learningWhy is reinforcement learning needed? The parameters of the fuzzy controller could be updated using the back propagation algorithm common in supervised learning in neural networks. Backpropagation algorithm cannot be used; instead, a learning algorithm called reinforcement learning is used. In reinforcement learning, the system evaluates whether the previous control action was good or not. If the action had good consequences, the tendency to produce that action is strengthened, that is, reinforced.
  12. 12. Structure of the neurofuzzy control system The evaluation network gathers information about the decisions of the fuzzy controller and the delays of the vehicles. This reinforcement information is used in fine-tuning the membership functions of the fuzzy controller, which is also presented as a neural network. Thus there are actually two neural networks in the system: an evaluation network and a fuzzy controller network.
  13. 13. Evaluation network The evaluation network evaluates the goodness of the actions of the fuzzy controller based on information it has gathered by observing the process. The network fine-tunes the membership functions of the fuzzy controller by updating the parameters of the membership functions. It is a feedforward, multilayer perceptron-type network. The input variables of the network are APP and QUE, measurements of incoming traffic in green and red directions, respectively..
  14. 14.  The hidden layer activation function is a sigmoidal function zj(xj)=1/(1+exp(−xj)), where The size h of the hidden layer may vary, and there are no precise rules for determining how many cells it should contain. Increasing the size gives a more powerful and flexible network but requires a longer learning time. In our work the size of h=10 was found suitable
  15. 15.  The network output v is a measure of the goodness of the state of the network, a prediction of future reinforcement. h v = b1 APP + b2 QUE + cjzj j 1 The gradient descent algorithm is used in the learning phase of the evaluation network. If a positive reinforcement signal is received, the network weights are rewarded by being changed in the direction which increases their contribution to the total sum. If a negative signal is received, the weights are punished by being changed in the direction which decreases their contribution.
  16. 16. Fuzzy controller network Consider the first rule set, whose neural network presentation in figure.
  17. 17. Experimental resultsFig-Average vehicular delays (in Fig-Average vehicular delays (inseconds) before (dashed line) and after seconds) before (dashed line) and(solid line) the learning at traffic after (solid line) the learning at trafficvolumes of 300, 500 and 1000 vehicles volumes of 300, 500 and 1000 vehiclesper hour. The location of the first traffic per hour. The location of the firstdetector is 50 m from the stop line. traffic detector is 100 m from the stop line.
  18. 18. Volume (veh/h) Det. (M) dini dnew D500 50 9.99 9.42 0.571000 50 14.78 13.96 0.81500 100 9.11 8.82 0.811000 100 15.18 14.52 0.66Here dini and dnew are the average vehicular delays using the initialand the new membership functions, respectively, D=dini−dnew is thedifference of individual observations
  19. 19. Membership functions zero, a few, medium and many of APP before (dottedline) and after (solid line) the learning at a traffic volume of 500 vehicles perhour. The location of the first traffic detector is 50 m from the stop line.Horizontal axis: number of approaching vehicles. Vertical axis: value ofmembership function.
  20. 20.  Fig Membership functions a few, medium and too long of QUE before (dotted line) and after (solid line) the learning at a traffic volume of 500 vehicles per hour. The location of the first traffic detector is 50 m from the stop line. Horizontal axis: number of queuing vehicles. Vertical axis: value of membership function.
  21. 21.  Fig. membership function zero, short , medium and long of EXT before (dotted line) and after (solid line) the learning at a traffic volume of 500 vehicles per hour. The location of the first traffic detector is 50 m from the stop line. Horizontal axis: green signal extension in seconds. Vertical axis: value of membership function.

×