• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
Towards a Theoretical  Towards a Theoretical  Framework for LCS  Framework for LCS
 

Towards a Theoretical Towards a Theoretical Framework for LCS Framework for LCS

on

  • 1,111 views

Alwyn Barry introduces the theoretical framework for LCS that Jan Drugowitsch is currently working on.

Alwyn Barry introduces the theoretical framework for LCS that Jan Drugowitsch is currently working on.

Statistics

Views

Total Views
1,111
Views on SlideShare
1,109
Embed Views
2

Actions

Likes
0
Downloads
7
Comments
0

2 Embeds 2

http://lcs-gbml.ncsa.uiuc.edu 1
http://www.slideshare.net 1

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

    Towards a Theoretical  Towards a Theoretical  Framework for LCS  Framework for LCS Towards a Theoretical Towards a Theoretical Framework for LCS Framework for LCS Presentation Transcript

    • Towards a Theoretical Framework for LCS Jan Drugowitsch Dr Alwyn Barry
    • Overview Big Question: Why use LCS? Aims and Approach 1. Function Approximation 2. Dynamic Programming 3. Classifier Replacement 4. Future Work 5. May 2006 © A.M.Barry and J. Drugowitsch, 2006 Drugowitsch,
    • Research Aims To establish a formal model for Accuracy-based LCS To use established models, methods and notation To transfer method knowledge (analysis / improvement) To create links to other ML approaches and disciplines May 2006 © A.M.Barry and J. Drugowitsch, 2006 Drugowitsch,
    • Approach Reduce LCS to components: Function Approximation – Dynamic Programming – Classifier Replacement – Model the components Model their interactions Verify models by empirical studies Extend models from related domains May 2006 © A.M.Barry and J. Drugowitsch, 2006 Drugowitsch,
    • Function Approximation Architecture Outline: Set of K classifiers, matching states Sk⊆ S – Every classifier k approximates the value function – V over Sk. independently. Assumptions Function V : S → ℜ is time-invariant – Time-invariant set of classifiers {S1, …, SK} – Fixed sampling dist. ∏(S) given by π : S → [0, 1] – ~ Minimising MSE: ∑ π (i )(V (i ) − Vk (i )) 2 – i∈S k May 2006 © A.M.Barry and J. Drugowitsch, 2006 Drugowitsch,
    • Function Approximation Linear approx. architecture Feature vector φ : S → ℜL – Approximation parameters vector ωk ∈ ℜL – ~ Approximation V k = φ (i )T ωk – Averaging classifiers, φ (i) = (1), ∀i∈S Linear estimators, φ (i) = (1, i)′, ∀i∈S May 2006 © A.M.Barry and J. Drugowitsch, 2006 Drugowitsch,
    • Function Approximation Approximation to MSE for k at time t : I S k (im )(V (im ) − ωk ,tφ (im )) t 1 ∑ ε k ,t ′ = 2 ck , t − 1 m = 0 Matching: I S : S → {0t,1} – k Match count: ck ,t = ∑ I S (im ) – k m =0 Since the sequence of states im is dependant on π, the MSE is automatically weighted by this distribution. May 2006 © A.M.Barry and J. Drugowitsch, 2006 Drugowitsch,
    • Function Approximation Mixing classifiers A classifier k covers Sk ⊆ S – Need to mix estimates to recover V (i ) ~ – Requires mixing parameter (will be – ‘accuracy’ ... see later!) K K where ψ k : S → {0,1}, ∑ψ k (i ) = 1 ~ V (i ) = ∑ψ k (i )ωkφ (i ) ′ k =1 k =1 ψk(i) = 0 where i ∉ Sk … classifiers that do – not match do not contribute May 2006 © A.M.Barry and J. Drugowitsch, 2006 Drugowitsch,
    • Function Approximation Matrix notation V = (V (1),L, V ( N ))′ Dk = I S k D  Lφ (1)′L  ~ Vk = Φωk   Φ= L   ψ k(1),K,0  Lφ ( N )′L     Ψk =   O  π (1),K,0   0, K ,ψ ( N )      k D= O   0, K , π ( N )    ~K K ~ V = ∑ ΨkVk = ∑ Ψk Φωk  I S k (1),K ,0    k =1 k =1 I Sk =   O   0, K, I S ( N )     k May 2006 © A.M.Barry and J. Drugowitsch, 2006 Drugowitsch,
    • Function Approximation Achievements Proof that the MSE of a classifier k at time t is the – normalised sample variance of the value functions over the states matched until time t Proof that the expression developed for sample- – based approximated MSE is a suitable approximation for the actual MSE, for finite state spaces Proof of optimality of approximated MSE from the – vector-space viewpoint (relates classifier error to the Principle of Orthogonality) May 2006 © A.M.Barry and J. Drugowitsch, 2006 Drugowitsch,
    • Function Approximation Achievements (cont’d) Elaboration of algorithms using the model: – Steepest Gradient Descent Least Mean Square Show sensitivity to step size and input range hold in LCS – Normalised LMS Kalman Filter … and its inverse co-variance form – Equivalence to Recursive Least Squares – Derivation of computationally cheap implementation – For each, derivation of: – Stability criteria For LMS, identification of the excess mean squared – estimation error Time constant bounds May 2006 © A.M.Barry and J. Drugowitsch, 2006 Drugowitsch,
    • Function Approximation Achievements (cont’d) Investigation: – May 2006 © A.M.Barry and J. Drugowitsch, 2006 Drugowitsch,
    • Function Approximation How to set the mixing weights ψk(i)? Accuracy-based mixing (YCS-like!) – Divorce mixing for fn. approx. from fitness – I S k (i )ε k−ν ψ k (i ) = K ∑ I S p (i )ε pν − p =1 Investigating power factor ν ∈ℜ+ – Demonstrate, using the model, that we – obtain the MLE of the mixed classifiers when ν = 1 May 2006 © A.M.Barry and J. Drugowitsch, 2006 Drugowitsch,
    • Function Approximation Mixing - Empirical Results Higher ν, more importance to low error – Foundν = 1 best compromise – Classifiers: 0..π/2, π/2.. π, 0.. π Classifiers: 0..0.5, 0.5.. 1, 0.. 1 May 2006 © A.M.Barry and J. Drugowitsch, 2006 Drugowitsch,
    • Dynamic Programming Dynamic Programming: ∞ t  V (i ) = max E ∑ γ r (it , at , it +1 ) | i0 = i, at = µ (it )  * µ  t =0  Bellman’s Equation: ( ) V * (i ) = max E r (i, a, j ) + γV * ( j ) | j = p (i, a) a∈ A Applying the DP operator T to value est. (TV )(i ) = max E(r (i, a, j ) + γV ( j ) | j = p (i, a ) ) a∈ A May 2006 © A.M.Barry and J. Drugowitsch, 2006 Drugowitsch,
    • Dynamic Programming T is a contraction w.r.t. maximum norm Converges to unique fixed point V*= TV* – With function approximation: Approximation after every update: – () ~ ~ Vt +1 = f TVt Might not converge for more complex – approximation architectures Is LCS function approximation a non- expansion w.r.t. maximum norm? May 2006 © A.M.Barry and J. Drugowitsch, 2006 Drugowitsch,
    • Dynamic Programming LCS Value Iteration Assume: – constant population averaging classifiers, φ (i) = (1), ∀i∈S ~ Vt +1 = TVt Classifier is approximating: – Based on minimising MSE: – ( ) ~ ~ ~~ 2 2 ∑ Vt +1 (i) − Vk ,t +1 (i) = Vt +1 − Vk ,t +1 2 = TVt − Vk ,t +1 I Sk I Sk i∈S k May 2006 © A.M.Barry and J. Drugowitsch, 2006 Drugowitsch,
    • Dynamic Programming LCS Value Iteration (cont’d) Minimum is orthog. projection on approx: – ~ ~ Vk ,t +1 = Π I Sk Vt +1 = Π I Sk TVt Need also to determine new mixing weight – by evaluating the new approx. error: ( ) 1 1 ~ ~ 2 2 ε k ,t +1 = Vt +1 − Vk ,t +1 = I − Π Dk TVt Tr( I S k ) Tr( I S k ) I Sk I Sk ~ – ∴the only time dependency is on Vt K ~ ~ – Thus: V = ∑ Ψk ,t +1Π I Sk TVt t +1 k =1 May 2006 © A.M.Barry and J. Drugowitsch, 2006 Drugowitsch,
    • Dynamic Programming LCS Asynchronous Value Iteration Only update Vk for the matched state – Minimise: ∑ I S (im )((TVm )(im ) − ωk′ ,tφ (im ) )2 t ~ – k m =0 Thus, approx. costs weighted by state dist – ( ) t ~ ~ 2 ∑ 2 π (i ) (TV )(i ) − ωkφ (i ) = TV − Φωk ′ Dk i∈S k ~ ~ ∴Vk ,t +1 = Π Dk TVt K ~ ~ Vt +1 = ∑ Ψk ,t +1Π Dk TVt k =1 May 2006 © A.M.Barry and J. Drugowitsch, 2006 Drugowitsch,
    • Dynamic Programming Also derived: LCS Q-Learning – LMS Implementation Kalman Filter Implementation Demonstrate that the weight update uses the normalised form of local gradient descent Model-K based Policy Iteration – ~ ~ Vt +1 = ∑ Ψk Π Dk TµVt k =1 Step-wise Policy Iteration – TD(λ) is not possible in LCS due to re- – evaluation of mixing weights May 2006 © A.M.Barry and J. Drugowitsch, 2006 Drugowitsch,
    • Dynamic Programming Is LCS function approximation a non-expansion w.r.t. maximum norm? We derive that: – Constant Mixing: Yes Arbitrary Mixing: Not necessarily Accuracy-based Mixing: non-expansion (dependant upon a conjecture) Is LCS function approximation a non-expansion w.r.t. weighted norm? We know Tµ is a contraction – We show ΠDk is a non-expansion – – We demonstrate: Single Classifier: convergence to a fixed pt Disjoint Classifiers: convergence to a fixed pt Arbitrary Mixing: spectral radius can be ≥ 1 even in some fixed weightings May 2006 © A.M.Barry and J. Drugowitsch, 2006 Drugowitsch,
    • Classifier Replacement How can we define the optimal population formally? What do we want to reach? – How does the global fitness of one classifier differ – from its local fitness (i.e. accuracy)? What is the quality of a population? – Methodology Examine methods to define the optimal – population Relate to generalised hill-climbing – Map to other search criteria – May 2006 © A.M.Barry and J. Drugowitsch, 2006 Drugowitsch,
    • Future Work Formalising and analysing classifier replacement Interaction of replacement with function approximation Interaction of replacement with DP Handling stochastic environments Error bounds and rates of convergence … Big Answer: LCS is an integrative framework, … But we need the framework definition! May 2006 © A.M.Barry and J. Drugowitsch, 2006 Drugowitsch,