2D1431 Machine Learning

2D1431 Machine Learning

Fuzzy Logic
&
Learning in Robotics

Outline

n Fuzzy Logic
n Learning Control
n Evolutionary Robotics

Types of Uncertainty

n Stochastic uncertainty
n example: rolling a dice

n Linguistic uncertainty
n examples : low price, tall people, young age

n Informational uncertainty
n - example : credit worthiness, honesty

Classical Set
young = { x ∉ P | age(x) ≤ 20 }

characteristic function:
1 : age(x) ≤ 20
µyoung(x) = { 0 : age(x) > 20
µyoung(x)
A=“young”
1

0
x [years]

Fuzzy Set
Fuzzy Logic
Classical Logic
Element x belongs to set A
Element x belongs to set A
with a certain
or it does not:
degree of membership:
µ(x)∈{0,1}
µ(x)∈[0,1]

µA(x) µA(x)
A=“young” A=“young”
1 1

0 0
x [years] x [years]

Fuzzy Set
Definition :
Fuzzy Set A = {(x, µA(x)) : x ∈ X, µ A(x) ∈ [0,1]}
• a universe of discourse X : 0 ≤ x ≤ 100
• a membership function µA : X → [0,1]
µA(x)
A=“young”
1
µ=0.8

0
x [years]
x=23

Types of Membership Functions
Trapezoid: <a,b,c,d> Gaussian: N(m,s)
µ(x) µ(x)
1 1
s

0 a b c d x 0 m x

Triangular: <a,b,b,d> Singleton: (a,1) and (b,0.5)
µ(x) µ(x)
1 1

0 a b d x 0 a b x

The Extension Principle
Assume a fuzzy set A and a function f:
How does the fuzzy set f(A) look like?

For arbitrary functions f:
µf(A)(y) = max{µA(x) | y=f(x)}
y
y
)
)

ff
A
A
) ffy (((
µy (

µA(x) (x) max
µ

µ
)

A

x x

Operators on Fuzzy Sets
Union Intersection
µA∨B(x)=max{µA(x),µB (x)} µA∧B(x)=min{µA(x),µB(x)}
µA(x) µB(x) µA(x) µB(x)
1 1

0 x 0 x

µA∨B(x)=min{1,µA(x)+µB(x)} µA∧B(x)=µA(x) • µB(x)
µA(x) µB(x) µA(x) µB(x)
1 1

0 x 0 x

Complement

Negation: µ¬A(x)= 1 - µA(x)

Classical law does not always hold:

µ¬A∨A(x) ≡ 1
µ¬A∧A(x) ≡ 0

Example : µA(x) = 0.6
µ¬A(x) = 1 - µA(x) = 0.4
µ¬A∨A(x) = max(0.6,0.4) = 0.6 ¹ 1
µ¬A∧A(x) = min(0.6,0.4) = 0.4 ¹ 0

Fuzzy Relations
classical relation
R : X x Y defined by µR(x,y) = 1 if (x,y) ∈ R |
{ 0 if (x,y) ∉ R
fuzzy relation
R : X x Y defined by µR(x,y) ∈ [0,1]

µR(x,y) describes to which degree x and y are related
It can also be interpreted as the truth value of the
proposition x R y

Fuzzy Relations
Example:
X = { rainy, cloudy, sunny }

Y = { swimming, bicycling, camping, reading }

X/Y swimming bicycling camping reading
rainy 0.0 0.2 0.0 1.0
cloudy 0.0 0.8 0.3 0.3
sunny 1.0 0.2 0.7 0.0

Fuzzy Sets & Linguistic Variables
A linguistic variable combines several fuzzy sets.

linguistic variable : temperature
linguistics terms (fuzzy sets) : { cold, warm, hot }

µ(x)
µcold µwarm µhot
1

0 20 60 x [C]

Fuzzy Rules

n causal dependencies can be expressed in
form of if-then-rules
n general form:
if <antecedent> then <consequence>
n example:
if temperature is cold and oil is cheap
then heating is high

linguistic variables linguistic values/terms (fuzzy sets)

Fuzzy Rule Base
Heating Temperature :
Oil price: cold warm hot

cheap high high medium

normal high medium low

expensive medium low low

if temperature is cold and oil price is low then heating is high

if temperature is hot and oil price is normal then heating is low

Fuzzy Knowledge Base
fuzzy knowledge base
Fuzzy Data-Base:
Definition of linguistic input and output variables
Definition of fuzzy membership functions
µ(x) µ µwarm µhot
1 cold

0 20 60 x [C]
Fuzzy Rule-Base:
if temperature is cold and oil price is cheap
then heating is high
….

Fuzzification
1. Fuzzification
Determine degree of membership for each term of an
input variable :

temperature : t=15 C oilprice : p=$13/barrel

µcold(t)=0.5 µcheap (p)=0.3
1 1
0.5 0.3
0 t 0 p
15C $13/barrel

If temperature is cold ... and oil is cheap ...

Fuzzy Combination
2. Combine the terms in one degree of fulfillment for the entire
antecedent by fuzzy AND: min-operator

µcold(t)=0.5 µcheap(p)=0.3
1 1
0.5 0.3
0 t 0 p
15C $13/barrel

if temperatur is cold ... and oil is cheap ...

µante = min{µcold(t), µcheap(p)} = min{0.5,0.3} = 0.3

Fuzzy Inference
3. Inference step: Apply the degree of membership of the
antecedent to the consequent of the rule
µhigh(h) µconsequent(h)
1

... µante =0.3 min-inference:
0 µcons. = min{µante , µhigh }
h
... then heating is high

µhigh(h) µconsequent(h)
1

... µante =0.3 prod-inference:
0 µcons. = µante • µhigh
h

Fuzzy Aggregation
4. Aggregation: Aggregate all the rules consequents using
the max-operator for union

... then heating is high
... then heating is medium
... then heating is low

1

0
h

Defuzzification
5. Determine crisp value from output membership function
for example using “Center of Gravity”-method:

µconsequent(h) COG
1

0
h
73
Center of singletons defuzzification:
mi = degree of membership fuzzy set i
h = Si mi • Ai • ci
Ai = area of fuzzy set i
Si mi • Ai ci = center of gravity of fuzzy set i

Schema of a Fuzzy Decision
Fuzzification Inference Defuzzification

rule-base
if temp is cold
µcold µwarm µhot then valve is open µopen µhalf µclose
0.7 µcold =0.7 0.7
if temp is warm
0.2 then valve is half 0.2
µwarm =0.2
measured t v
if temp is hot crisp output
temperature then valve is close
for valve-setting
µhot =0.0

Machine vs. Robot Learning
Machine Learning Learning in Robotics

Machine vs. Robot Learning

Machine Learning Robot Learning
n Learning in vaccum n Embedded learning
n Statistically well-behaved n Data distribution not
data homegeneous
n Mostly off-line n Mostly on-line
n Informative feed-back n Qualitative and sparse
n Computational time not an feed-back
issue
n Time is crucial
n Hardware does not matter
n Convergence proof n Hardware is a priority
n Empirical proof

Methods of Robot Learning
n Dynamic Programming / Reinforcement Learning:
The desired behavior is expressed as an optimization
criterion r to be optimized over a temporal horizon,
resulting in a cost function (long term accumulated
reward)
J(xt) = Σt r(xt,ut)

n Problem: curse of dimensionality, large state spaces,
large amount of exploration
n Idea: modularize control policy

Learning Task
n Learn a task specfic control policy π that maps the
continuous valued state vector s to a continuous valued
control action u.
u = π(x,α,t)

Learning
α system
Desired
Behavior u
Control policy Robot & s
π(x,α,t) environment

Learning Control with Sub-Policies
n Learn or design sub-policies and subsequently build the
complete policy out of the sub-policies

Learning
system
Desired sub-policy π4
Behavior sub-policy π3 u Robot & s
sub-policy π2 environment
sub-policy π1

Indirect Learning of Control Policies
n Decompose task into planning and execution stage
n Planning generates a desired kinematic trajectory
n Execution transforms plan into appropriate motor command
n Learn inverse kinematic model for the execution module
Learning
system
Control policy feedforward
Desired controller
Behavior u
trajectory Σ feedback Robot &
planning controller Σ environment

Learning Inverse Models
n Learn inverse kinematic model for feed-
forward control
n Kinematic function: x=f(u)
n Inverse model: u = f-1(x)
n Dynamic model: dx/dt = f(x,u)
n Inverse dynamic model: u=g(xdesired,x)

Evolutionary Robotics in a Nutshell
population environment
0110 → α

0100 0110

1001
α
u=f(s,α)
1101
0011

evaluation
recombination
mutation
selection
1101 01 01
0110 11 10 1101 X
0100
0110
fitness( 0110 )

Evolutionary Behavior Design
Evolutionary
Evolutionary fitness
Evaluation
Evaluation
algorithm
algorithm scheme
scheme
genotype
behavior observed reward : r
parameters
control action: a
Robotic
Robotic Environment
Environment
Behavior
Behavior

observed state : s

Evolving in Simulation vs. Reality

Simulation Reality
• Requires model of the • Real world is the model
sensors and environment

• Brittleness of adapted • Robust behaviors
behaviors

• Identical test cases for all • Difficult to initialize for a new
candidate controllers controller under evaluation
• automated fast fitness • Time-consuming, manual,
evaluation fitness evaluation

Environment
Real time online evolution in an 200x100cm maze with
about 10-15 minutes per generation

Robot & Sensors
n 6 binary sensors (4 antenna + 2 bumpers)
n 1 rotation sensor

External vs. Internal Fitness
External fitness
n can not be measured by the robot itself (e.g. location in
world coordinates)
n external observer perspective
n useful in simulations
Internal fitness
n directly accessible to the robot by means of sensors (e.g.
sensor readings, battery level)
n useful when learning on the real robot
n fitness function might be more difficult to design

Functional vs. Behavorial Fitness
Functional:
n measures directly the way in which the system functions,
observes the causes of a behavior
n Example: learn to generate a desired oscillatory pattern
of leg motion
Behavioral:
n Measures the resulting behavior, observes the effects of
the behavior
n Example: measure the absolute distance traveled by the
robot using the rotation sensor

Explicit vs. Implicit Fitness
n Explicit:
n Large number of constraints

n Actively steers the evolutionary system towards

desired behaviors
n Problem: weighting and aggregating multiple

constraints
n Implicit:
n Small number of constraints

n Allow evolution of emergent, novel behaviors

n Problem: for complex behaviors (e.g. find cylinders,

pick up cylinders and drop them outside the arena)
finding an initial behavior is like searching for a
needle in the haystack

Behavior Representation
n The robot is controlled by the duration and direction of left and
right motor command.
n Sensory states :
n s1,…,s6 (26 possible states reduced to 9 different
states)
n Control action :
n direction left, right motor
n duration of left, right motor action
n Mapping:
n For each of the nine different sensory states, the
direction and duration of left and right motor
commands are encoded by one byte.

Sensor States to Motor Actions
Sensor state Left motor action Right motor action

S1: no contact

0 [ms] 0 [ms]

S2: front bumper

50 [ms] 50 [ms]
S3: left bumper

40 [ms] 70 [ms]


S4: right bumper

30 [ms] 30 [ms]

S5: left antenna
outward

(if black vertical
axle is pressed
this state is
equivalent to S3)
60 [ms] 60 [ms]

S6: left antenna
inward

30 [ms]
Float 20 [ms]


S7: right antenna
inward

60 [ms] 70 [ms]

S8: right antenna
outward
(if black vertical
axle is pressed
this state is
equivalent to S4)
70 [ms] 40 [ms]

S9: left & right
antenna
outward

20 [ms] 10 [ms]

Communication between RCX and PC
Serial link

IR comunication tower

Host
computer
Environment

RCX IR port

Behavior Evaluation
n The parameters of the robotic behavior are
downloaded on the LEGO robot.
n The robot performs behavior for one minute.
n The number of rotations of the tracking wheel,
equivalent to the distance traveled is returned as the
fitness.
n Based on the fitness the evolutionary algorithm,
selects good behaviors and generates new candidate
behaviors by means of recombination and mutation.
n Population size 10 individuals, 20 generations, one
run of the evolutionary algorithm takes about 3-4
hours

Evolved Behavior

n ......Moviesp90913g2.mov

Evolution of a Wall-Following Behavior
n 2 light sensors
n 2 bumper
n 1 rotation sensor

Sensor Characteristic
n Light sensor readings S1, S2 as a function of the distance to the
obstacle

Behavior Representation and Fitness
n Neural network: ω=f(S1, S2, wij, θi )
n Turn rate ω → motor commands

forward
backward forward
ω ∆T (1−ω) ∆T
n Genotype encodes:
n 7 ANN parameters {wij , θi } : 8 bit/parameter

n Motor command for collision states left and right bumper

n Fitness: absolute distance traveled

#rotation

Network Architectures
Feed-forward network Recurrent Network
(purely reactive behaviors) (dynamic behaviors)
X(t+1)
ω
ω H S1 S2
wij
H ω
X(t)
H
S1 S2

S1

S2

Evolved Behavior

......MoviesPB251814.MOV

Distance Maximization
n Fitness function contains an additional penalty term for low
proximity to obstacles Si < Smin

without proximity penalty with proximity penalty

2D1431 Machine Learning

Recommended

Recommended

More Related Content

What's hot

What's hot (18)

Viewers also liked

Viewers also liked (9)

Similar to 2D1431 Machine Learning

Similar to 2D1431 Machine Learning (20)

More from butest

More from butest (20)

2D1431 Machine Learning