Towards a Theoretical Towards a Theoretical Framework for LCS Framework for LCS

Loading...

Flash Player 9 (or above) is needed to view presentations.
We have detected that you do not have it on your computer. To install it, go here.

0 comments

Post a comment

    Post a comment
    Embed Video
    Edit your comment Cancel

    Favorites, Groups & Events

    Towards a Theoretical Towards a Theoretical Framework for LCS Framework for LCS - Presentation Transcript

    1. Towards a Theoretical Framework for LCS Jan Drugowitsch Dr Alwyn Barry
    2. Overview Big Question: Why use LCS? Aims and Approach 1. Function Approximation 2. Dynamic Programming 3. Classifier Replacement 4. Future Work 5. May 2006 © A.M.Barry and J. Drugowitsch, 2006 Drugowitsch,
    3. Research Aims To establish a formal model for Accuracy-based LCS To use established models, methods and notation To transfer method knowledge (analysis / improvement) To create links to other ML approaches and disciplines May 2006 © A.M.Barry and J. Drugowitsch, 2006 Drugowitsch,
    4. Approach Reduce LCS to components: Function Approximation – Dynamic Programming – Classifier Replacement – Model the components Model their interactions Verify models by empirical studies Extend models from related domains May 2006 © A.M.Barry and J. Drugowitsch, 2006 Drugowitsch,
    5. Function Approximation Architecture Outline: Set of K classifiers, matching states Sk⊆ S – Every classifier k approximates the value function – V over Sk. independently. Assumptions Function V : S → ℜ is time-invariant – Time-invariant set of classifiers {S1, …, SK} – Fixed sampling dist. ∏(S) given by π : S → [0, 1] – ~ Minimising MSE: ∑ π (i )(V (i ) − Vk (i )) 2 – i∈S k May 2006 © A.M.Barry and J. Drugowitsch, 2006 Drugowitsch,
    6. Function Approximation Linear approx. architecture Feature vector φ : S → ℜL – Approximation parameters vector ωk ∈ ℜL – ~ Approximation V k = φ (i )T ωk – Averaging classifiers, φ (i) = (1), ∀i∈S Linear estimators, φ (i) = (1, i)′, ∀i∈S May 2006 © A.M.Barry and J. Drugowitsch, 2006 Drugowitsch,
    7. Function Approximation Approximation to MSE for k at time t : I S k (im )(V (im ) − ωk ,tφ (im )) t 1 ∑ ε k ,t ′ = 2 ck , t − 1 m = 0 Matching: I S : S → {0t,1} – k Match count: ck ,t = ∑ I S (im ) – k m =0 Since the sequence of states im is dependant on π, the MSE is automatically weighted by this distribution. May 2006 © A.M.Barry and J. Drugowitsch, 2006 Drugowitsch,
    8. Function Approximation Mixing classifiers A classifier k covers Sk ⊆ S – Need to mix estimates to recover V (i ) ~ – Requires mixing parameter (will be – ‘accuracy’ ... see later!) K K where ψ k : S → {0,1}, ∑ψ k (i ) = 1 ~ V (i ) = ∑ψ k (i )ωkφ (i ) ′ k =1 k =1 ψk(i) = 0 where i ∉ Sk … classifiers that do – not match do not contribute May 2006 © A.M.Barry and J. Drugowitsch, 2006 Drugowitsch,
    9. Function Approximation Matrix notation V = (V (1),L, V ( N ))′ Dk = I S k D  Lφ (1)′L  ~ Vk = Φωk   Φ= L   ψ k(1),K,0  Lφ ( N )′L     Ψk =   O  π (1),K,0   0, K ,ψ ( N )      k D= O   0, K , π ( N )    ~K K ~ V = ∑ ΨkVk = ∑ Ψk Φωk  I S k (1),K ,0    k =1 k =1 I Sk =   O   0, K, I S ( N )     k May 2006 © A.M.Barry and J. Drugowitsch, 2006 Drugowitsch,
    10. Function Approximation Achievements Proof that the MSE of a classifier k at time t is the – normalised sample variance of the value functions over the states matched until time t Proof that the expression developed for sample- – based approximated MSE is a suitable approximation for the actual MSE, for finite state spaces Proof of optimality of approximated MSE from the – vector-space viewpoint (relates classifier error to the Principle of Orthogonality) May 2006 © A.M.Barry and J. Drugowitsch, 2006 Drugowitsch,
    11. Function Approximation Achievements (cont’d) Elaboration of algorithms using the model: – Steepest Gradient Descent Least Mean Square Show sensitivity to step size and input range hold in LCS – Normalised LMS Kalman Filter … and its inverse co-variance form – Equivalence to Recursive Least Squares – Derivation of computationally cheap implementation – For each, derivation of: – Stability criteria For LMS, identification of the excess mean squared – estimation error Time constant bounds May 2006 © A.M.Barry and J. Drugowitsch, 2006 Drugowitsch,
    12. Function Approximation Achievements (cont’d) Investigation: – May 2006 © A.M.Barry and J. Drugowitsch, 2006 Drugowitsch,
    13. Function Approximation How to set the mixing weights ψk(i)? Accuracy-based mixing (YCS-like!) – Divorce mixing for fn. approx. from fitness – I S k (i )ε k−ν ψ k (i ) = K ∑ I S p (i )ε pν − p =1 Investigating power factor ν ∈ℜ+ – Demonstrate, using the model, that we – obtain the MLE of the mixed classifiers when ν = 1 May 2006 © A.M.Barry and J. Drugowitsch, 2006 Drugowitsch,
    14. Function Approximation Mixing - Empirical Results Higher ν, more importance to low error – Foundν = 1 best compromise – Classifiers: 0..π/2, π/2.. π, 0.. π Classifiers: 0..0.5, 0.5.. 1, 0.. 1 May 2006 © A.M.Barry and J. Drugowitsch, 2006 Drugowitsch,
    15. Dynamic Programming Dynamic Programming: ∞ t  V (i ) = max E ∑ γ r (it , at , it +1 ) | i0 = i, at = µ (it )  * µ  t =0  Bellman’s Equation: ( ) V * (i ) = max E r (i, a, j ) + γV * ( j ) | j = p (i, a) a∈ A Applying the DP operator T to value est. (TV )(i ) = max E(r (i, a, j ) + γV ( j ) | j = p (i, a ) ) a∈ A May 2006 © A.M.Barry and J. Drugowitsch, 2006 Drugowitsch,
    16. Dynamic Programming T is a contraction w.r.t. maximum norm Converges to unique fixed point V*= TV* – With function approximation: Approximation after every update: – () ~ ~ Vt +1 = f TVt Might not converge for more complex – approximation architectures Is LCS function approximation a non- expansion w.r.t. maximum norm? May 2006 © A.M.Barry and J. Drugowitsch, 2006 Drugowitsch,
    17. Dynamic Programming LCS Value Iteration Assume: – constant population averaging classifiers, φ (i) = (1), ∀i∈S ~ Vt +1 = TVt Classifier is approximating: – Based on minimising MSE: – ( ) ~ ~ ~~ 2 2 ∑ Vt +1 (i) − Vk ,t +1 (i) = Vt +1 − Vk ,t +1 2 = TVt − Vk ,t +1 I Sk I Sk i∈S k May 2006 © A.M.Barry and J. Drugowitsch, 2006 Drugowitsch,
    18. Dynamic Programming LCS Value Iteration (cont’d) Minimum is orthog. projection on approx: – ~ ~ Vk ,t +1 = Π I Sk Vt +1 = Π I Sk TVt Need also to determine new mixing weight – by evaluating the new approx. error: ( ) 1 1 ~ ~ 2 2 ε k ,t +1 = Vt +1 − Vk ,t +1 = I − Π Dk TVt Tr( I S k ) Tr( I S k ) I Sk I Sk ~ – ∴the only time dependency is on Vt K ~ ~ – Thus: V = ∑ Ψk ,t +1Π I Sk TVt t +1 k =1 May 2006 © A.M.Barry and J. Drugowitsch, 2006 Drugowitsch,
    19. Dynamic Programming LCS Asynchronous Value Iteration Only update Vk for the matched state – Minimise: ∑ I S (im )((TVm )(im ) − ωk′ ,tφ (im ) )2 t ~ – k m =0 Thus, approx. costs weighted by state dist – ( ) t ~ ~ 2 ∑ 2 π (i ) (TV )(i ) − ωkφ (i ) = TV − Φωk ′ Dk i∈S k ~ ~ ∴Vk ,t +1 = Π Dk TVt K ~ ~ Vt +1 = ∑ Ψk ,t +1Π Dk TVt k =1 May 2006 © A.M.Barry and J. Drugowitsch, 2006 Drugowitsch,
    20. Dynamic Programming Also derived: LCS Q-Learning – LMS Implementation Kalman Filter Implementation Demonstrate that the weight update uses the normalised form of local gradient descent Model-K based Policy Iteration – ~ ~ Vt +1 = ∑ Ψk Π Dk TµVt k =1 Step-wise Policy Iteration – TD(λ) is not possible in LCS due to re- – evaluation of mixing weights May 2006 © A.M.Barry and J. Drugowitsch, 2006 Drugowitsch,
    21. Dynamic Programming Is LCS function approximation a non-expansion w.r.t. maximum norm? We derive that: – Constant Mixing: Yes Arbitrary Mixing: Not necessarily Accuracy-based Mixing: non-expansion (dependant upon a conjecture) Is LCS function approximation a non-expansion w.r.t. weighted norm? We know Tµ is a contraction – We show ΠDk is a non-expansion – – We demonstrate: Single Classifier: convergence to a fixed pt Disjoint Classifiers: convergence to a fixed pt Arbitrary Mixing: spectral radius can be ≥ 1 even in some fixed weightings May 2006 © A.M.Barry and J. Drugowitsch, 2006 Drugowitsch,
    22. Classifier Replacement How can we define the optimal population formally? What do we want to reach? – How does the global fitness of one classifier differ – from its local fitness (i.e. accuracy)? What is the quality of a population? – Methodology Examine methods to define the optimal – population Relate to generalised hill-climbing – Map to other search criteria – May 2006 © A.M.Barry and J. Drugowitsch, 2006 Drugowitsch,
    23. Future Work Formalising and analysing classifier replacement Interaction of replacement with function approximation Interaction of replacement with DP Handling stochastic environments Error bounds and rates of convergence … Big Answer: LCS is an integrative framework, … But we need the framework definition! May 2006 © A.M.Barry and J. Drugowitsch, 2006 Drugowitsch,

    + Xavier LloràXavier Llorà, 6 months ago

    custom

    300 views, 0 favs, 1 embeds more stats

    Alwyn Barry introduces the theoretical framework fo more

    More info about this document

    © All Rights Reserved

    Go to text version

    • Total Views 300
      • 299 on SlideShare
      • 1 from embeds
    • Comments 0
    • Favorites 0
    • Downloads 0
    Most viewed embeds
    • 1 views on http://lcs-gbml.ncsa.uiuc.edu

    more

    All embeds
    • 1 views on http://lcs-gbml.ncsa.uiuc.edu

    less

    Flagged as inappropriate Flag as inappropriate
    Flag as inappropriate

    Select your reason for flagging this presentation as inappropriate. If needed, use the feedback form to let us know more details.

    Cancel
    File a copyright complaint
    Having problems? Go to our helpdesk?

    Categories