2. Outline
n Fuzzy Logic
n Learning Control
n Evolutionary Robotics
3. Types of Uncertainty
n Stochastic uncertainty
n example: rolling a dice
n Linguistic uncertainty
n examples : low price, tall people, young age
n Informational uncertainty
n - example : credit worthiness, honesty
4. Classical Set
young = { x ∉ P | age(x) ≤ 20 }
characteristic function:
1 : age(x) ≤ 20
µyoung(x) = { 0 : age(x) > 20
µyoung(x)
A=“young”
1
0
x [years]
5. Fuzzy Set
Fuzzy Logic
Classical Logic
Element x belongs to set A
Element x belongs to set A
with a certain
or it does not:
degree of membership:
µ(x)∈{0,1}
µ(x)∈[0,1]
µA(x) µA(x)
A=“young” A=“young”
1 1
0 0
x [years] x [years]
6. Fuzzy Set
Definition :
Fuzzy Set A = {(x, µA(x)) : x ∈ X, µ A(x) ∈ [0,1]}
• a universe of discourse X : 0 ≤ x ≤ 100
• a membership function µA : X → [0,1]
µA(x)
A=“young”
1
µ=0.8
0
x [years]
x=23
7. Types of Membership Functions
Trapezoid: <a,b,c,d> Gaussian: N(m,s)
µ(x) µ(x)
1 1
s
0 a b c d x 0 m x
Triangular: <a,b,b,d> Singleton: (a,1) and (b,0.5)
µ(x) µ(x)
1 1
0 a b d x 0 a b x
8. The Extension Principle
Assume a fuzzy set A and a function f:
How does the fuzzy set f(A) look like?
For arbitrary functions f:
µf(A)(y) = max{µA(x) | y=f(x)}
y
y
)
)
ff
A
A
) ffy (((
µy (
µA(x) (x) max
µ
µ
)
A
x x
9. Operators on Fuzzy Sets
Union Intersection
µA∨B(x)=max{µA(x),µB (x)} µA∧B(x)=min{µA(x),µB(x)}
µA(x) µB(x) µA(x) µB(x)
1 1
0 x 0 x
µA∨B(x)=min{1,µA(x)+µB(x)} µA∧B(x)=µA(x) • µB(x)
µA(x) µB(x) µA(x) µB(x)
1 1
0 x 0 x
11. Fuzzy Relations
classical relation
R : X x Y defined by µR(x,y) = 1 if (x,y) ∈ R |
{ 0 if (x,y) ∉ R
fuzzy relation
R : X x Y defined by µR(x,y) ∈ [0,1]
µR(x,y) describes to which degree x and y are related
It can also be interpreted as the truth value of the
proposition x R y
13. Fuzzy Sets & Linguistic Variables
A linguistic variable combines several fuzzy sets.
linguistic variable : temperature
linguistics terms (fuzzy sets) : { cold, warm, hot }
µ(x)
µcold µwarm µhot
1
0 20 60 x [C]
14. Fuzzy Rules
n causal dependencies can be expressed in
form of if-then-rules
n general form:
if <antecedent> then <consequence>
n example:
if temperature is cold and oil is cheap
then heating is high
linguistic variables linguistic values/terms (fuzzy sets)
15. Fuzzy Rule Base
Heating Temperature :
Oil price: cold warm hot
cheap high high medium
normal high medium low
expensive medium low low
if temperature is cold and oil price is low then heating is high
if temperature is hot and oil price is normal then heating is low
16. Fuzzy Knowledge Base
fuzzy knowledge base
Fuzzy Data-Base:
Definition of linguistic input and output variables
Definition of fuzzy membership functions
µ(x) µ µwarm µhot
1 cold
0 20 60 x [C]
Fuzzy Rule-Base:
if temperature is cold and oil price is cheap
then heating is high
….
17. Fuzzification
1. Fuzzification
Determine degree of membership for each term of an
input variable :
temperature : t=15 C oilprice : p=$13/barrel
µcold(t)=0.5 µcheap (p)=0.3
1 1
0.5 0.3
0 t 0 p
15C $13/barrel
If temperature is cold ... and oil is cheap ...
18. Fuzzy Combination
2. Combine the terms in one degree of fulfillment for the entire
antecedent by fuzzy AND: min-operator
µcold(t)=0.5 µcheap(p)=0.3
1 1
0.5 0.3
0 t 0 p
15C $13/barrel
if temperatur is cold ... and oil is cheap ...
µante = min{µcold(t), µcheap(p)} = min{0.5,0.3} = 0.3
19. Fuzzy Inference
3. Inference step: Apply the degree of membership of the
antecedent to the consequent of the rule
µhigh(h) µconsequent(h)
1
... µante =0.3 min-inference:
0 µcons. = min{µante , µhigh }
h
... then heating is high
µhigh(h) µconsequent(h)
1
... µante =0.3 prod-inference:
0 µcons. = µante • µhigh
h
20. Fuzzy Aggregation
4. Aggregation: Aggregate all the rules consequents using
the max-operator for union
... then heating is high
... then heating is medium
... then heating is low
1
0
h
21. Defuzzification
5. Determine crisp value from output membership function
for example using “Center of Gravity”-method:
µconsequent(h) COG
1
0
h
73
Center of singletons defuzzification:
mi = degree of membership fuzzy set i
h = Si mi • Ai • ci
Ai = area of fuzzy set i
Si mi • Ai ci = center of gravity of fuzzy set i
22. Schema of a Fuzzy Decision
Fuzzification Inference Defuzzification
rule-base
if temp is cold
µcold µwarm µhot then valve is open µopen µhalf µclose
0.7 µcold =0.7 0.7
if temp is warm
0.2 then valve is half 0.2
µwarm =0.2
measured t v
if temp is hot crisp output
temperature then valve is close
for valve-setting
µhot =0.0
24. Machine vs. Robot Learning
Machine Learning Robot Learning
n Learning in vaccum n Embedded learning
n Statistically well-behaved n Data distribution not
data homegeneous
n Mostly off-line n Mostly on-line
n Informative feed-back n Qualitative and sparse
n Computational time not an feed-back
issue
n Time is crucial
n Hardware does not matter
n Convergence proof n Hardware is a priority
n Empirical proof
25. Methods of Robot Learning
n Dynamic Programming / Reinforcement Learning:
The desired behavior is expressed as an optimization
criterion r to be optimized over a temporal horizon,
resulting in a cost function (long term accumulated
reward)
J(xt) = Σt r(xt,ut)
n Problem: curse of dimensionality, large state spaces,
large amount of exploration
n Idea: modularize control policy
26. Learning Task
n Learn a task specfic control policy π that maps the
continuous valued state vector s to a continuous valued
control action u.
u = π(x,α,t)
Learning
α system
Desired
Behavior u
Control policy Robot & s
π(x,α,t) environment
27. Learning Control with Sub-Policies
n Learn or design sub-policies and subsequently build the
complete policy out of the sub-policies
Learning
system
Desired sub-policy π4
Behavior sub-policy π3 u Robot & s
sub-policy π2 environment
sub-policy π1
28. Indirect Learning of Control Policies
n Decompose task into planning and execution stage
n Planning generates a desired kinematic trajectory
n Execution transforms plan into appropriate motor command
n Learn inverse kinematic model for the execution module
Learning
system
Control policy feedforward
Desired controller
Behavior u
trajectory Σ feedback Robot &
planning controller Σ environment
29. Learning Inverse Models
n Learn inverse kinematic model for feed-
forward control
n Kinematic function: x=f(u)
n Inverse model: u = f-1(x)
n Dynamic model: dx/dt = f(x,u)
n Inverse dynamic model: u=g(xdesired,x)
30. Evolutionary Robotics in a Nutshell
population environment
0110 → α
0100 0110
1001
α
u=f(s,α)
1101
0011
evaluation
recombination
mutation
selection
1101 01 01
0110 11 10 1101 X
0100
0110
fitness( 0110 )
31. Evolutionary Behavior Design
Evolutionary
Evolutionary fitness
Evaluation
Evaluation
algorithm
algorithm scheme
scheme
genotype
behavior observed reward : r
parameters
control action: a
Robotic
Robotic Environment
Environment
Behavior
Behavior
observed state : s
32. Evolving in Simulation vs. Reality
Simulation Reality
• Requires model of the • Real world is the model
sensors and environment
• Brittleness of adapted • Robust behaviors
behaviors
• Identical test cases for all • Difficult to initialize for a new
candidate controllers controller under evaluation
• automated fast fitness • Time-consuming, manual,
evaluation fitness evaluation
34. Robot & Sensors
n 6 binary sensors (4 antenna + 2 bumpers)
n 1 rotation sensor
35. External vs. Internal Fitness
External fitness
n can not be measured by the robot itself (e.g. location in
world coordinates)
n external observer perspective
n useful in simulations
Internal fitness
n directly accessible to the robot by means of sensors (e.g.
sensor readings, battery level)
n useful when learning on the real robot
n fitness function might be more difficult to design
36. Functional vs. Behavorial Fitness
Functional:
n measures directly the way in which the system functions,
observes the causes of a behavior
n Example: learn to generate a desired oscillatory pattern
of leg motion
Behavioral:
n Measures the resulting behavior, observes the effects of
the behavior
n Example: measure the absolute distance traveled by the
robot using the rotation sensor
37. Explicit vs. Implicit Fitness
n Explicit:
n Large number of constraints
n Actively steers the evolutionary system towards
desired behaviors
n Problem: weighting and aggregating multiple
constraints
n Implicit:
n Small number of constraints
n Allow evolution of emergent, novel behaviors
n Problem: for complex behaviors (e.g. find cylinders,
pick up cylinders and drop them outside the arena)
finding an initial behavior is like searching for a
needle in the haystack
38. Behavior Representation
n The robot is controlled by the duration and direction of left and
right motor command.
n Sensory states :
n s1,…,s6 (26 possible states reduced to 9 different
states)
n Control action :
n direction left, right motor
n duration of left, right motor action
n Mapping:
n For each of the nine different sensory states, the
direction and duration of left and right motor
commands are encoded by one byte.
39. Sensor States to Motor Actions
Sensor state Left motor action Right motor action
S1: no contact
0 [ms] 0 [ms]
S2: front bumper
50 [ms] 50 [ms]
S3: left bumper
40 [ms] 70 [ms]
40. Sensor States to Motor Actions
Sensor state Left motor action Right motor action
S4: right bumper
30 [ms] 30 [ms]
S5: left antenna
outward
(if black vertical
axle is pressed
this state is
equivalent to S3)
60 [ms] 60 [ms]
S6: left antenna
inward
30 [ms]
Float 20 [ms]
41. Sensor States to Motor Actions
Sensor state Left motor action Right motor action
S7: right antenna
inward
60 [ms] 70 [ms]
S8: right antenna
outward
(if black vertical
axle is pressed
this state is
equivalent to S4)
70 [ms] 40 [ms]
S9: left & right
antenna
outward
20 [ms] 10 [ms]
42. Communication between RCX and PC
Serial link
IR comunication tower
Host
computer
Environment
RCX IR port
43. Behavior Evaluation
n The parameters of the robotic behavior are
downloaded on the LEGO robot.
n The robot performs behavior for one minute.
n The number of rotations of the tracking wheel,
equivalent to the distance traveled is returned as the
fitness.
n Based on the fitness the evolutionary algorithm,
selects good behaviors and generates new candidate
behaviors by means of recombination and mutation.
n Population size 10 individuals, 20 generations, one
run of the evolutionary algorithm takes about 3-4
hours
45. Evolution of a Wall-Following Behavior
n 2 light sensors
n 2 bumper
n 1 rotation sensor
46. Sensor Characteristic
n Light sensor readings S1, S2 as a function of the distance to the
obstacle
47. Behavior Representation and Fitness
n Neural network: ω=f(S1, S2, wij, θi )
n Turn rate ω → motor commands
forward
backward forward
ω ∆T (1−ω) ∆T
n Genotype encodes:
n 7 ANN parameters {wij , θi } : 8 bit/parameter
n Motor command for collision states left and right bumper
n Fitness: absolute distance traveled
#rotation
50. Distance Maximization
n Fitness function contains an additional penalty term for low
proximity to obstacles Si < Smin
without proximity penalty with proximity penalty