Robot Swarms as Ensembles of Cooperating Components - Matthias Holzl

702 views

Published on

Matthias Holzl's Case Study from AWASS 2013

Published in: Technology, Education
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
702
On SlideShare
0
From Embeds
0
Number of Embeds
10
Actions
Shares
0
Downloads
13
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Robot Swarms as Ensembles of Cooperating Components - Matthias Holzl

  1. 1. Robot Swarms as Ensembles of Cooperating Components Matthias Hölzl With contributions from Martin Wirsing, Annabelle Klarl AWASS Lucca, June 24, 2013 www.ascens-ist.eu
  2. 2. The Task Robots cleaning an exhibition area Matthias Hölzl 2
  3. 3. marXbot Miniature mobile robot developed by EPFL Rough-terrain mobility Robots can dock to other robots Many sensors Proximity sensors Gyroscope 3D accelerometer RFID reader Cameras . . . ARM-based Linux system Gripper for picking up items Matthias Hölzl 3
  4. 4. Swarm Robotics Matthias Hölzl 4
  5. 5. Problems Noise, sensor resolution Extracting information from sensor data Unforeseen situations Uncertainty about the environment Performing complex actions when intermediate results are uncertain . . . Matthias Hölzl 5
  6. 6. Action Logics Logics that can represent change over time Probabilistic behavior can be modeled (but is cumbersome) Matthias Hölzl 6
  7. 7. Markov Decision Processes pos = (x,y) pos = (x,y+1) pos = (x+1,y+1) pos = (x + 1,y) e / ... s, n / ... s / 0.9 / -0.1 e,w / 0.025 / -0.1 w / ... s,n / ... e / ... s, n / ... w/ ... s,n / ... n / 0.9 / -0.1 e, w / 0.025 / -0.1 s,n,w,e / 0.05 / -0.1 Matthias Hölzl 7
  8. 8. Markov Decision Processes watchTV goToClub Decide Activity In Club Watching TV Dancing Alone Drinking Dancing With Partner dance p = 0.5 dance p = 0.5 drinkBeer Oh, oh Oh, no flirt p = 0.05 p = 0.95 Matthias Hölzl 8
  9. 9. MDPs: Strategies watchTV goToClub Decide Activity In Club Watching TV Dancing Alone Drinking Dancing With Partner dance p = 0.5 dance p = 0.5 drinkBeer Oh, oh Oh, no flirt p = 0.05 p = 0.95 State TV CDB CDF DA watchTV goToClub goToClub IC drinkBeer drinkBeer dance DWP flirt flirt flirt Utility 0.1 −0.05(∗) −1.975(∗) (∗) 0.05 + (−0.1) (∗∗) 0.05 + (0.5 × 0.2) + 0.5 × (0.25 + (0.05 × 5) + (0.95 × −5)) Matthias Hölzl 9
  10. 10. Reinforcement Learning General idea: Figure out the expected value of each action in each state Pick the action with the highest expected value (most of the time) Update the expectations according to the actual rewards Matthias Hölzl 10
  11. 11. How well does this work? Rather well for small problems But: state explosion Matthias Hölzl 11
  12. 12. Solutions Decomposition Hierarchy Partial programs Matthias Hölzl 12
  13. 13. POEM Action language First-order reasoning Hierarchical reinforcement learning Learns completions for partial programs Concurrency Reflection / meta-object protocols · · · Matthias Hölzl 13
  14. 14. Iliad: A POEM Implementation Common Lisp-based programming language Full first-order reasoning Operations on logical theories: U(C)NA, domain closure, . . . Resolution, hyperresolution, DPLL, etc. Conditional answer extraction Procedural attachment, constraint solving Hierarchical reinforcement learning Based on Concurrent ALisp Partial programs Threadwise and temporal state abstraction Hierarchically Optimal Yet Local (HOLY) Q-learning Hierarchically Optimal Recursively Decomposed (HORD) Q-learning Matthias Hölzl 14
  15. 15. Planned Contents Introduction to CALisp/Poem Simple TD-learning: bandits Flat reinforcement learning: navigation Hierarchical reinforcement learning: collecting items individually Threadwise decomposition for hierarchical reinforcement learning: learning collaboratively Matthias Hölzl 15
  16. 16. n-armed Bandits S search / 1.0 / N(0.1, 1.0) coll-known / 1.0 / N(0.3, 3.0) Choice between n actions Reward depends probabilisticly on the action choice No long-term consequences Simplest form of TD-learning Matthias Hölzl 16
  17. 17. Flat Learning XXXXXXXXXXXXXXXXXXXXXXXX Target: (0 0) XXTT XX XX XX XXXXXXXXXX XXXX Choices: (N E S W) XXXXXX XX XX XX Q-values: XX XX XXXXXXXX XX #((N (Q -1.8)) XX XXXXXX XX XX (E (Q -1.8)) XX XX XXXXXXXX (S (Q -2.25)) XXXXXXXXXX XX XX (W (Q 2.76))) XX XX XXXX XX Recommended choice is W XX XX XX XX XX XX XX XX RR XX XXXXXXXXXXXXXXXXXXXXXXXX Matthias Hölzl 17
  18. 18. Flat Learning (defun simple-robot () (call (nav (target-loc (robot-env))))) (defun nav (loc) (until (equal (robot-loc) loc) (with-choice navigate-choice (dir ’(N E S W)) (action navigate-move dir)))) Matthias Hölzl 18
  19. 19. Hierarchical Learning (defun waste-removal () (loop (choose choose-waste-removal-action (action sleep ’SLEEP) (call (pickup-waste)) (call (drop-waste))))) (defun pickup-waste () (call (nav (waste-source))) (action pickup-waste ’PICKUP)) Matthias Hölzl 19
  20. 20. Multi-Robot Learning XXXXXXXXXXXXXXXXXXXXXXXX XXTT XX RR XX XX XX XX XXXXXX XX XX XX XX RW RR XX XX XX XX RR WW XX XX XX XX RR XX XX XX XX XX XXXXXXXXXXXXXXXXXXXXXXXX Matthias Hölzl 20
  21. 21. Thank you! Any Questions? matthias.hoelzl@ifi.lmu.de Matthias Hölzl 21

×