Robot Swarms as Ensembles of
Cooperating Components
Matthias Hölzl
With contributions from Martin Wirsing, Annabelle Klarl...
The Task
Robots cleaning an exhibition area
Matthias Hölzl 2
marXbot
Miniature mobile robot developed by EPFL
Rough-terrain mobility
Robots can dock to other robots
Many sensors
Proxi...
Swarm Robotics
Matthias Hölzl 4
Problems
Noise, sensor resolution
Extracting information from sensor data
Unforeseen situations
Uncertainty about the envi...
Action Logics
Logics that can represent change over time
Probabilistic behavior can be modeled (but is cumbersome)
Matthia...
Markov Decision Processes
pos = (x,y)
pos = (x,y+1) pos = (x+1,y+1)
pos = (x + 1,y)
e / ...
s, n / ...
s / 0.9 / -0.1
e,w ...
Markov Decision Processes
watchTV
goToClub
Decide
Activity
In Club
Watching
TV
Dancing
Alone
Drinking
Dancing
With
Partner...
MDPs: Strategies
watchTV
goToClub
Decide
Activity
In Club
Watching
TV
Dancing
Alone
Drinking
Dancing
With
Partner
dance p ...
Reinforcement Learning
General idea:
Figure out the expected
value of each action
in each state
Pick the action with
the h...
How well does this work?
Rather well for small problems
But: state explosion
Matthias Hölzl 11
Solutions
Decomposition
Hierarchy
Partial programs
Matthias Hölzl 12
POEM
Action language
First-order reasoning
Hierarchical reinforcement learning
Learns completions for partial programs
Con...
Iliad: A POEM Implementation
Common Lisp-based programming language
Full first-order reasoning
Operations on logical theori...
Planned Contents
Introduction to CALisp/Poem
Simple TD-learning: bandits
Flat reinforcement learning: navigation
Hierarchi...
n-armed Bandits
S
search / 1.0 / N(0.1, 1.0)
coll-known / 1.0 / N(0.3, 3.0)
Choice between n actions
Reward depends probab...
Flat Learning
XXXXXXXXXXXXXXXXXXXXXXXX Target: (0 0)
XXTT XX XX
XX XXXXXXXXXX XXXX Choices: (N E S W)
XXXXXX XX XX XX Q-va...
Flat Learning
(defun simple-robot ()
(call (nav (target-loc (robot-env)))))
(defun nav (loc)
(until (equal (robot-loc) loc...
Hierarchical Learning
(defun waste-removal ()
(loop
(choose choose-waste-removal-action
(action sleep ’SLEEP)
(call (picku...
Multi-Robot Learning
XXXXXXXXXXXXXXXXXXXXXXXX
XXTT XX RR XX
XX XX XX
XXXXXX XX XX
XX XX RW RR XX
XX XX
XX RR WW XX
XX XX
X...
Thank you!
Any Questions?
matthias.hoelzl@ifi.lmu.de
Matthias Hölzl 21
Upcoming SlideShare
Loading in...5
×

Robot Swarms as Ensembles of Cooperating Components - Matthias Holzl

462

Published on

Matthias Holzl's Case Study from AWASS 2013

Published in: Technology, Education
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
462
On Slideshare
0
From Embeds
0
Number of Embeds
3
Actions
Shares
0
Downloads
10
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Robot Swarms as Ensembles of Cooperating Components - Matthias Holzl

  1. 1. Robot Swarms as Ensembles of Cooperating Components Matthias Hölzl With contributions from Martin Wirsing, Annabelle Klarl AWASS Lucca, June 24, 2013 www.ascens-ist.eu
  2. 2. The Task Robots cleaning an exhibition area Matthias Hölzl 2
  3. 3. marXbot Miniature mobile robot developed by EPFL Rough-terrain mobility Robots can dock to other robots Many sensors Proximity sensors Gyroscope 3D accelerometer RFID reader Cameras . . . ARM-based Linux system Gripper for picking up items Matthias Hölzl 3
  4. 4. Swarm Robotics Matthias Hölzl 4
  5. 5. Problems Noise, sensor resolution Extracting information from sensor data Unforeseen situations Uncertainty about the environment Performing complex actions when intermediate results are uncertain . . . Matthias Hölzl 5
  6. 6. Action Logics Logics that can represent change over time Probabilistic behavior can be modeled (but is cumbersome) Matthias Hölzl 6
  7. 7. Markov Decision Processes pos = (x,y) pos = (x,y+1) pos = (x+1,y+1) pos = (x + 1,y) e / ... s, n / ... s / 0.9 / -0.1 e,w / 0.025 / -0.1 w / ... s,n / ... e / ... s, n / ... w/ ... s,n / ... n / 0.9 / -0.1 e, w / 0.025 / -0.1 s,n,w,e / 0.05 / -0.1 Matthias Hölzl 7
  8. 8. Markov Decision Processes watchTV goToClub Decide Activity In Club Watching TV Dancing Alone Drinking Dancing With Partner dance p = 0.5 dance p = 0.5 drinkBeer Oh, oh Oh, no flirt p = 0.05 p = 0.95 Matthias Hölzl 8
  9. 9. MDPs: Strategies watchTV goToClub Decide Activity In Club Watching TV Dancing Alone Drinking Dancing With Partner dance p = 0.5 dance p = 0.5 drinkBeer Oh, oh Oh, no flirt p = 0.05 p = 0.95 State TV CDB CDF DA watchTV goToClub goToClub IC drinkBeer drinkBeer dance DWP flirt flirt flirt Utility 0.1 −0.05(∗) −1.975(∗) (∗) 0.05 + (−0.1) (∗∗) 0.05 + (0.5 × 0.2) + 0.5 × (0.25 + (0.05 × 5) + (0.95 × −5)) Matthias Hölzl 9
  10. 10. Reinforcement Learning General idea: Figure out the expected value of each action in each state Pick the action with the highest expected value (most of the time) Update the expectations according to the actual rewards Matthias Hölzl 10
  11. 11. How well does this work? Rather well for small problems But: state explosion Matthias Hölzl 11
  12. 12. Solutions Decomposition Hierarchy Partial programs Matthias Hölzl 12
  13. 13. POEM Action language First-order reasoning Hierarchical reinforcement learning Learns completions for partial programs Concurrency Reflection / meta-object protocols · · · Matthias Hölzl 13
  14. 14. Iliad: A POEM Implementation Common Lisp-based programming language Full first-order reasoning Operations on logical theories: U(C)NA, domain closure, . . . Resolution, hyperresolution, DPLL, etc. Conditional answer extraction Procedural attachment, constraint solving Hierarchical reinforcement learning Based on Concurrent ALisp Partial programs Threadwise and temporal state abstraction Hierarchically Optimal Yet Local (HOLY) Q-learning Hierarchically Optimal Recursively Decomposed (HORD) Q-learning Matthias Hölzl 14
  15. 15. Planned Contents Introduction to CALisp/Poem Simple TD-learning: bandits Flat reinforcement learning: navigation Hierarchical reinforcement learning: collecting items individually Threadwise decomposition for hierarchical reinforcement learning: learning collaboratively Matthias Hölzl 15
  16. 16. n-armed Bandits S search / 1.0 / N(0.1, 1.0) coll-known / 1.0 / N(0.3, 3.0) Choice between n actions Reward depends probabilisticly on the action choice No long-term consequences Simplest form of TD-learning Matthias Hölzl 16
  17. 17. Flat Learning XXXXXXXXXXXXXXXXXXXXXXXX Target: (0 0) XXTT XX XX XX XXXXXXXXXX XXXX Choices: (N E S W) XXXXXX XX XX XX Q-values: XX XX XXXXXXXX XX #((N (Q -1.8)) XX XXXXXX XX XX (E (Q -1.8)) XX XX XXXXXXXX (S (Q -2.25)) XXXXXXXXXX XX XX (W (Q 2.76))) XX XX XXXX XX Recommended choice is W XX XX XX XX XX XX XX XX RR XX XXXXXXXXXXXXXXXXXXXXXXXX Matthias Hölzl 17
  18. 18. Flat Learning (defun simple-robot () (call (nav (target-loc (robot-env))))) (defun nav (loc) (until (equal (robot-loc) loc) (with-choice navigate-choice (dir ’(N E S W)) (action navigate-move dir)))) Matthias Hölzl 18
  19. 19. Hierarchical Learning (defun waste-removal () (loop (choose choose-waste-removal-action (action sleep ’SLEEP) (call (pickup-waste)) (call (drop-waste))))) (defun pickup-waste () (call (nav (waste-source))) (action pickup-waste ’PICKUP)) Matthias Hölzl 19
  20. 20. Multi-Robot Learning XXXXXXXXXXXXXXXXXXXXXXXX XXTT XX RR XX XX XX XX XXXXXX XX XX XX XX RW RR XX XX XX XX RR WW XX XX XX XX RR XX XX XX XX XX XXXXXXXXXXXXXXXXXXXXXXXX Matthias Hölzl 20
  21. 21. Thank you! Any Questions? matthias.hoelzl@ifi.lmu.de Matthias Hölzl 21
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×