A temporal classifier system using spiking neural networks


Published on

David Howard, Larry Bull, Pier Luca Lanzi. "A temporal classifier system using spiking neural networks". IWLCS, 2011

Published in: Technology
1 Like
  • Be the first to comment

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

A temporal classifier system using spiking neural networks

  1. 1. A temporal classifier system using spiking neural networks<br />Gerard David Howard, Larry Bull & Pier-Luca Lanzi<br />{david4.howard, larry.bull} @uwe.ac.uk<br />pierluca.lanzi @polimi.it<br />1<br />
  2. 2. Contents<br />Intro & Motivation<br />System architecture – Spiking XCSF<br />Constructivism (nodes and connections)<br />Working in continuous space<br />Comparison to MLP / Q-learner<br />Taking time into consideration<br />Comparison to MLP<br />Simulated robotics<br />2<br />
  3. 3. Motivation<br />Many real-world tasks incorporate continuous space and continuous time<br />Autonomous robotics are an unanswered question: will require some degree of knowledge “self-shaping” or control over their internal knowledge representation<br />We introduce an LCS containing spiking networks and demonstrate the usefulness of the representation<br />Handles continuous space and continuous time<br />Representation structure dependent on environment<br />3<br />
  4. 4. XCSF<br />Includes computed prediction, which is calculated from input state (augmented by constant x0) and a weight vector – each classifier has a weight vector<br />Weights are updated linearly using modified delta rule<br />Main differences from canonical:<br />SNN replaces condition and calculates action<br />Self-adaptive parameters give autonomous learning control<br />Topology of networks altered in GA cycle<br />Generalisation from computed prediction, computed actions and network topologies<br />4<br />
  5. 5. Spiking networks<br />Spiking networks have temporal functionality<br />We use Integrate-and-Fire (IAF) neurons<br />Each neuron has a membrane potential (m) that varies through time<br />When m exceeds a threshold, the neuron sends a spike to every neuron in the network that it has a forward connection to, and resets m<br />Membrane potential is a way of implementing memory<br />5<br />
  6. 6. Spiking networks<br />IAF Spiking network replaces condition and action, 2 input, 3 output nodes<br />Each input state processed 5 times by spiking network. Neural outputs are spike trains: high (>=3) or low (<3) spikes in 5-element output window.<br />Classifier doesn’t match if !M node is high<br /><ul><li>Generalisation from computed prediction, computed actions and network topologies</li></ul>6<br />00101 = LOW<br />L<br />Input state <br />(one node per element)<br />R<br />01110 = HIGH<br />![M]<br />10001 = LOW<br />
  7. 7. Self-adaptive parameters<br />During a GA cycle, a parent’s µ value is copied to its offspring and altered<br />The offspring then applies its own µ to itself (bounded <br />[0-1]) before being inserted into the population.<br />Similar to ES mutation alteration<br />Mutate<br />µ  µ * e N(0,1)<br />Insert<br />Copy µ <br />[A]<br />Mutate<br />µ  µ * e N(0,1)<br />Copy µ <br />Insert<br />7<br />
  8. 8. Constructivism<br />Neural Constructivism - interaction with environment guides learning process by growing/pruning dendritic connectivity<br />Constructivism can add or remove neurons from the hidden layer during a GA event <br />Two new self-adaptive values control NC , ψ (probability of constructivism event occurring) and ω (probability of adding rather than removing a node). These are modified during a GA cycle as with µ<br />8<br />Randomly initialised weights<br />
  9. 9. Connection selection<br />Automatic feature selection often used in conjunction with neural networks – allows reduction in number of inputs to only highest utility features<br />We apply feature selection to every connection in a network, connection is enabled/disabled on satisfaction of new self-adaptive parameter τ. <br />All connections initially enabled, connections created via node addition are enabled with 50% probability per connection<br /><ul><li>Generalisation from computed prediction, computed actionsand network topologies</li></ul>9<br />
  10. 10. Effects of SA, NC & CS<br />Self Adaptation allows the system to control the amount of search taking place in an environmental niche without having to predetermine suitable parameter values<br />Neural Constructivism allows classifier to automatically grow networks to match task complexity<br /><ul><li>Connection Selection allows finer-grained search and tailoring of solutions, potentially reducing network size even further. Keeps only salient connections, can remove detrimental inputs/connections.</li></ul>10<br />
  11. 11. Continuous Grid World<br />Two dimensional continuous grid environment<br />Runs from 0 – 1 in both x and y axes<br />Goal state is where (x+y>1.9) – darker regions of grid represent higher expected payoff. Reaching goal returns a reward of 1000, else 0<br />Agent starts randomly anywhere in the grid except the goal state, aims to reach goal (moving 0.05) in fewest possible steps (avg. opt. 18.6)<br />1.00<br />0.50<br />Agent<br />0.00<br />0.50<br />1.00<br />11<br />
  12. 12. Discrete movement<br />Agent can make a single discrete movement (N,E,S,W) N=(HIGH,HIGH), E=(HIGH,LOW) etc…<br />Experimental parameters N=20000,γ =0.95, β=0.2, ε0=0.005, θGA=50, θDEL=50<br />XCSF parameters as normal.<br />Initial prediction error in new classifiers=0.01, initial fitness=0.1<br />Additional trial from fixed location lets us perform t-tests. “Stability” shows first step that 50 consecutive trials reach goal state from this location.<br />12<br />
  13. 13. Discrete movement<br />13<br /><ul><li>Stability = 50 consecutive optimals from a fixed location
  14. 14. Fewer macroclassifiers = greater generalisation
  15. 15. Lower mutation rate = more stable evolutionary process</li></li></ul><li>Taking time into account<br /><ul><li>Most real-world tasks have some temporal element</li></ul>Describes the behaviour of the agent, or state of the environment, across extended periods of time (semi-MDP!)<br />Other LCSs have attempted to tackle semi-MDPs (CXCS, DACS etc)<br />Most recent is the Temporal Classifier System<br />TCS shown able to handle real/simulated robotics tasks under ZCS and XCS, this is the first implementation with XCSF<br />14<br />
  16. 16. Continuous duration actions<br />Reward usually calculated as<br />Reward is now calculated as<br />Two discount factors that favour overall effectiveness and efficient state transitions respectively <br /> =0.05, ρ=0.1<br />tt = total steps for entire trial <br />ti = duration of a single action<br />Timeout=20; new steps to goal is 1.5<br />15<br />
  17. 17. Continuous Grid World TCS<br />16<br /><ul><li>Fewer macroclassifiers = greater generalisation
  18. 18. Lower mutation rate = more stable evolutionary process</li></li></ul><li>Continuous duration actions<br />17<br />Spiking networks more inclined to switch actions within the same [A]!<br />Possibly important in scenarios requiring more disjointed action selection<br />
  19. 19. Smaller step size<br />18<br /><ul><li>Step size reduced to 0.005, timeout = 200 – optimal steps to goal 1.5
  20. 20. Tabular Q-learner cannot learn – too many (s,a) combinations, long action chains!
  21. 21. Spiking non-TCS cannot learn – too many (s,a) combinations, long action chains!
  22. 22. MLP TCS cannot learn – lack of memory?
  23. 23. Spiking TCS canlearn to optimally solve this environment by extending an action set across multiple states and recalculating actions where necessary
  24. 24. Aided by temporal element of networks</li></li></ul><li>Comparing Step Sizes<br />19<br /><ul><li>Spiking TCS with step size 0.005 has higher performance than in step size 0.05! More steps = more opportunity to use temporal information</li></li></ul><li>Smaller step size Q-learner<br />20<br />Steps to goal, spiking TCS<br />Step size 0.005<br />Steps to goal, tabular Q-learner, step size 0.005<br />
  25. 25. Mountain-car<br />21<br /><ul><li>Multi-step reinforcement learning problem
  26. 26. Guide a car out of a valley, sometimes requiring non-obvious behaviour
  27. 27. State comprises position and velocity
  28. 28. Actions increase/decrease velocity: HIGH/HIGH = increase, LOW/LOW = decrease, anything else = no change.
  29. 29. Noise! (+/- 5% of both state elements)
  30. 30. TCS optimal steps to goal = 1
  31. 31. N=1000
  32. 32. Results compare favourably to XCSF work (e.g with tile coding)</li></li></ul><li>Mountain-car results<br />22<br />
  33. 33. Robotics<br />23<br /><ul><li>Webots robotics package, TCS spiking controller
  34. 34. Simulate a Khepera robot that uses 3 IR and 3 light sensors as input state
  35. 35. Two bump sensors detect a collision and reverse the robot/reform [M] if collision is detected
  36. 36. Task: similar to grid world, but with an obstacle to avoid and a light source to reach.
  37. 37. 3 actions possible
  38. 38. Problems: state space higher dimensionality and not directly linked to prediction, increased noise levels, wheel slip, robot orientation etc.</li></li></ul><li>Robotics<br />24<br /><ul><li>Parameters: 500 trials, N=3000, initially 6 hidden layer nodes, all connected with 50% probability per connection – jumps the start of the evolutionary process, allows topological network/behaviour variations
  39. 39. Robot start position constrained so that obstacle is always between it and the light source
  40. 40. Movement is much more granular than with grid(0.05) !
  41. 41. Self adaptive parameters initially constrained to (0<(μ,ψ,τ)≤0.02), with (0<ω≤1) as normal.</li></li></ul><li>Robotics<br />25<br />
  42. 42. Robotics<br />26<br />Steps to goal<br />Connected hidden layer nodes<br />Percentage enabled connections<br />Self-adaptive parameters, μ, ψ, τ all plotted on RHS axis<br />
  43. 43. Robotics<br />27<br />Overall:<br /><ul><li>Parameters, neurons, and connections do not vary much
  44. 44. Initially seeding with 6 hidden layer nodes still let’s us use connection selection to generate behavioural variation in the networks
  45. 45. Temporal functionality of the networks is exploited so that a single action set can</li></ul>Drop unwanted classifiers to change it’s favoured action at specific points (e.g. just before a collision)<br />Alter the action advocated action of a majority of classifiers in [A] for the same effect<br />
  46. 46. Thanks for your time!<br />28<br />