SlideShare a Scribd company logo
1 of 27
Download to read offline
Approximate Dynamic Programming using
 Fluid and Diffusion Approximations
                                    with Applications to Power Management


Speaker: Dayu Huang

Wei Chen, Dayu Huang, Ankur A. Kulkarni,1Jayakrishnan Unnikrishnan, Quanyan Zhu,
Prashant Mehta, Sean Meyn, and Adam Wierman 2

 Coordinated Science Laboratory, UIUC
 Dept. of IESE, UIUC 1
 Dept. of CS, California Inst. of Tech. 2
                                                             4                                                   120



                                                             3                                                   100



                                                                                                                     80
                                                                                                                                              J
                                                             2



                                                             1                                                       60




National Science Foundation (ECS-0523620 and CCF-0830511),   0                                                       40




ITMANET DARPA RK 2006-07284, and Microsoft Research
                                                             −1                                                      20



                                                             −2                                              n       0                                                      x
                                                                  0   1   2   3   4   5   6   7   8   9   10 x 104        0   2   4   6   8   10   12   14   16   18   20
Introduction
MDP model               Control


                                  i.i.d
Cost
Minimize average cost
Introduction
MDP model               Control


                                  i.i.d
Cost
Minimize average cost

Generator
Introduction
MDP model                                Control


                                                   i.i.d
Cost
Minimize average cost

Generator Cost Optimality Equation (ACOE)
 Average




                             Generator


        Relative value function

   Solve ACOE and Find
TD Learning
 The “curse of dimensionality”:
   Complexity of solving ACOE grows exponentially with
   the dimension of the state space.

 Approximate        within a nite-dimensional function class


 Criterion: minimize the mean-squre error


                               solved by stochastic approximation algorithms
TD Learning
 The “curse of dimensionality”:
    Complexity of solving ACOE grows exponentially with
    the dimension of the state space.

 Approximate         within a nite-dimensional function class


 Criterion: minimize the mean-squre error


                                solved by stochastic approximation algorithms

Problem: How to select the basis functions                    ?

                               key to the success of TD learning
Approach Based on Fluid and Di usion Models
                                                                             this talk: uid model

                                                                 Total cost for
Value function of the uid model                                  an associated deterministic model

      is a tight approximation to
        120



        100


                      Fluid value function
         80
                      Relative value function
         60



         40



         20



              0   2    4      6      8       10   12   14   16     18   20


  can be used as a part of the basis
Related Work
Multiclass queueing network

                                   Meyn 1997, Meyn 1997b



                 optimal control   Chen and Meyn 1999

                   simulation      Hendersen et.al. 2003

             network scheduling    Veatch 2004
                    and routing    Moallemi, Kumar and Van Roy 2006

                                   Meyn 2007       Control Techniques for
                                                    Complex Networks


               other approaches     Tsitsiklis and Van Roy 1997
                                    Mannor, Menache and Shimkin 2005
Related Work
Multiclass queueing network

                                   Meyn 1997, Meyn 1997b



                 optimal control   Chen and Meyn 1999

                   simulation      Hendersen et.al. 2003

             network scheduling    Veatch 2004
                    and routing    Moallemi, Kumar and Van Roy 2006

                                   Meyn 2007       Control Techniques for
                                                    Complex Networks


               other approaches     Tsitsiklis and Van Roy 1997
                                    Mannor, Menache and Shimkin 2005
  Taylor series approximation      this work
Power Management via Speed Scaling
                                                           Bansal, Kimbrel and Pruhs 2007
                                                            Wierman, Andrew and Tang 2009

Single processor
                                          job arrivals

                             processing rate
                             determined by the current power

Control the processing speed to balance delay and energy costs



                                                         Kaxiras and Martonosi 2008
Processor design: polynomial cost                        Wierman, Andrew and Tang 2009
                                          This talk
We also consider
for wireless communication applications
Fluid Model                       MDP


Fluid model:




Total Cost



Total Cost Optimality Equation (TCOE) for the uid model:
Why Fluid Model?                   MDP




 First order Taylor series approximation
Why Fluid Model?                   MDP




 First order Taylor series approximation




  TCOE



  ACOE



         almost solves the ACOE            Simple but
                                           powerful idea!
Approximation of the Cost Function

 Error Analysis
                                constant?
Approximation of the Cost Function

 Error Analysis
                                 constant?


 Surrogate cost




                  approximates

     Bounds on           ?
Structure Results on the Fluid Solution
Lower Bound




              Convexity of
Lower Bound




              Convexity of
Upper Bound
Upper Bound
Upper Bound
Approach Based on Fluid and Di usion Models
                                                                             this talk: uid model

                                                                 Total cost for
Value function of the uid model                                  an associated deterministic model

      is a tight approximation to
        120



        100


                      Fluid value function
         80
                      Relative value function
         60



         40



         20



              0   2    4      6      8       10   12   14   16     18   20


  can be used as a part of the basis
TD Learning Experiment


         Basis functions:


4                                                     120



3                                                     100              Approximate relative value function
                                                                       Fluid value function
2                                                         80
                                                                       Relative value function

1                                                         60



0                                                         40



−1                                                        20



−2                                                        0
     0   1   2   3   4   5   6    7   8    9   10 x 104        0   2    4      6      8       10   12        14   16   18   20



 Estimates of Coe cients for the case of quadratic cost
TD Learning with Policy Improvement




   3       Average cost at stage




   2


       0        5            10     15         20            25




  Nearly optimal after just a few iterations
                                                    Need the value of the optimal policy
Conclusions

   The uid value function can be used as a part of the basis for TD-learning.


   Motivated by analysis using Taylor series expansion:
     The uid value function almost solves ACOE. In particular,
     it solves the ACOE for a slightly di erent cost function; and
     the error term can be estimated.


   TD learning with policy improvement gives a near optimal policy
   in a few iterations, as shown by experiments.


   Application in power management for processors.
Value Iteration   Chen and Meyn 1999


     250


                        −
     200



     150                         Initialization:   V0     0
                                 Initialization:   V0 =
     100



      50



      0
              5             10                15              20   n
Policy

         180

         160           Stochastic optimal policy
         140                  myopic policy
         120           Di erence
         100

          80

          60

          40

          20

           0

         −20
               0   2    4      6      8       10   12   14   16   18   20

More Related Content

What's hot

MULTIFOCUS IMAGE FUSION USING MULTIRESOLUTION APPROACH WITH BILATERAL GRADIEN...
MULTIFOCUS IMAGE FUSION USING MULTIRESOLUTION APPROACH WITH BILATERAL GRADIEN...MULTIFOCUS IMAGE FUSION USING MULTIRESOLUTION APPROACH WITH BILATERAL GRADIEN...
MULTIFOCUS IMAGE FUSION USING MULTIRESOLUTION APPROACH WITH BILATERAL GRADIEN...cscpconf
 
Computational Intelligence Approach for Predicting the Hardness Performances ...
Computational Intelligence Approach for Predicting the Hardness Performances ...Computational Intelligence Approach for Predicting the Hardness Performances ...
Computational Intelligence Approach for Predicting the Hardness Performances ...Waqas Tariq
 
Multi-way Array Decomposition on Acoustic Source Separation for Fault Diagnos...
Multi-way Array Decomposition on Acoustic Source Separation for Fault Diagnos...Multi-way Array Decomposition on Acoustic Source Separation for Fault Diagnos...
Multi-way Array Decomposition on Acoustic Source Separation for Fault Diagnos...IJECEIAES
 
Reliability Based Optimum Design of a Gear Box
Reliability Based Optimum Design of a Gear BoxReliability Based Optimum Design of a Gear Box
Reliability Based Optimum Design of a Gear BoxIJERA Editor
 
Consistent Nonparametric Spectrum Estimation Via Cepstrum Thresholding
Consistent Nonparametric Spectrum Estimation Via Cepstrum ThresholdingConsistent Nonparametric Spectrum Estimation Via Cepstrum Thresholding
Consistent Nonparametric Spectrum Estimation Via Cepstrum ThresholdingCSCJournals
 
International Journal of Humanities and Social Science Invention (IJHSSI)
International Journal of Humanities and Social Science Invention (IJHSSI)International Journal of Humanities and Social Science Invention (IJHSSI)
International Journal of Humanities and Social Science Invention (IJHSSI)inventionjournals
 
Perfusion deconvolution via em algorithm
Perfusion deconvolution via em algorithmPerfusion deconvolution via em algorithm
Perfusion deconvolution via em algorithmT T
 

What's hot (8)

MULTIFOCUS IMAGE FUSION USING MULTIRESOLUTION APPROACH WITH BILATERAL GRADIEN...
MULTIFOCUS IMAGE FUSION USING MULTIRESOLUTION APPROACH WITH BILATERAL GRADIEN...MULTIFOCUS IMAGE FUSION USING MULTIRESOLUTION APPROACH WITH BILATERAL GRADIEN...
MULTIFOCUS IMAGE FUSION USING MULTIRESOLUTION APPROACH WITH BILATERAL GRADIEN...
 
Computational Intelligence Approach for Predicting the Hardness Performances ...
Computational Intelligence Approach for Predicting the Hardness Performances ...Computational Intelligence Approach for Predicting the Hardness Performances ...
Computational Intelligence Approach for Predicting the Hardness Performances ...
 
Multi-way Array Decomposition on Acoustic Source Separation for Fault Diagnos...
Multi-way Array Decomposition on Acoustic Source Separation for Fault Diagnos...Multi-way Array Decomposition on Acoustic Source Separation for Fault Diagnos...
Multi-way Array Decomposition on Acoustic Source Separation for Fault Diagnos...
 
Reliability Based Optimum Design of a Gear Box
Reliability Based Optimum Design of a Gear BoxReliability Based Optimum Design of a Gear Box
Reliability Based Optimum Design of a Gear Box
 
Consistent Nonparametric Spectrum Estimation Via Cepstrum Thresholding
Consistent Nonparametric Spectrum Estimation Via Cepstrum ThresholdingConsistent Nonparametric Spectrum Estimation Via Cepstrum Thresholding
Consistent Nonparametric Spectrum Estimation Via Cepstrum Thresholding
 
16 siddareddy.bathini 13
16 siddareddy.bathini 1316 siddareddy.bathini 13
16 siddareddy.bathini 13
 
International Journal of Humanities and Social Science Invention (IJHSSI)
International Journal of Humanities and Social Science Invention (IJHSSI)International Journal of Humanities and Social Science Invention (IJHSSI)
International Journal of Humanities and Social Science Invention (IJHSSI)
 
Perfusion deconvolution via em algorithm
Perfusion deconvolution via em algorithmPerfusion deconvolution via em algorithm
Perfusion deconvolution via em algorithm
 

Similar to Approximate Dynamic Programming using Fluid and Diffusion Approximations with Applications to Power Management (CDC 2009, Shanghai)

Genetic Algorithms and Genetic Programming for Multiscale Modeling
Genetic Algorithms and Genetic Programming for Multiscale ModelingGenetic Algorithms and Genetic Programming for Multiscale Modeling
Genetic Algorithms and Genetic Programming for Multiscale Modelingkknsastry
 
Acm Tech Talk - Decomposition Paradigms for Large Scale Systems
Acm Tech Talk - Decomposition Paradigms for Large Scale SystemsAcm Tech Talk - Decomposition Paradigms for Large Scale Systems
Acm Tech Talk - Decomposition Paradigms for Large Scale SystemsVinayak Hegde
 
Programming with Relaxed Synchronization
Programming with Relaxed SynchronizationProgramming with Relaxed Synchronization
Programming with Relaxed Synchronizationracesworkshop
 
FEA Analysis & Re-Design of a Bicycle Crank Arm
FEA Analysis & Re-Design of a Bicycle Crank ArmFEA Analysis & Re-Design of a Bicycle Crank Arm
FEA Analysis & Re-Design of a Bicycle Crank ArmAusten Leversage
 
[February 2017 - Ph.D. Final Dissertation] Enabling Power-awareness For Multi...
[February 2017 - Ph.D. Final Dissertation] Enabling Power-awareness For Multi...[February 2017 - Ph.D. Final Dissertation] Enabling Power-awareness For Multi...
[February 2017 - Ph.D. Final Dissertation] Enabling Power-awareness For Multi...Matteo Ferroni
 
“Imaging Systems for Applied Reinforcement Learning Control,” a Presentation ...
“Imaging Systems for Applied Reinforcement Learning Control,” a Presentation ...“Imaging Systems for Applied Reinforcement Learning Control,” a Presentation ...
“Imaging Systems for Applied Reinforcement Learning Control,” a Presentation ...Edge AI and Vision Alliance
 
MEMS Extraction & Verification
MEMS Extraction & VerificationMEMS Extraction & Verification
MEMS Extraction & Verificationintellisense
 
Multi Area Economic Dispatch Using Secant Method and Tie Line Matrix
Multi Area Economic Dispatch Using Secant Method and Tie Line MatrixMulti Area Economic Dispatch Using Secant Method and Tie Line Matrix
Multi Area Economic Dispatch Using Secant Method and Tie Line MatrixIJAPEJOURNAL
 
Cfd fem-09 compatibility-and_accuracy_of_mesh_valeo
Cfd fem-09 compatibility-and_accuracy_of_mesh_valeoCfd fem-09 compatibility-and_accuracy_of_mesh_valeo
Cfd fem-09 compatibility-and_accuracy_of_mesh_valeoAnand Kumar Chinni
 
Validation of High Fidelity CFD Modeling Approach for Utility Scale Wind Turb...
Validation of High Fidelity CFD Modeling Approach for Utility Scale Wind Turb...Validation of High Fidelity CFD Modeling Approach for Utility Scale Wind Turb...
Validation of High Fidelity CFD Modeling Approach for Utility Scale Wind Turb...AltairKorea
 
Solution to ELD problem
Solution to ELD problemSolution to ELD problem
Solution to ELD problemNaveena Navi
 
A comparative review of various approaches for feature extraction in Face rec...
A comparative review of various approaches for feature extraction in Face rec...A comparative review of various approaches for feature extraction in Face rec...
A comparative review of various approaches for feature extraction in Face rec...Vishnupriya T H
 
IRJET- Fabric Defect Detection using Discrete Wavelet Transform
IRJET- Fabric Defect Detection using Discrete Wavelet TransformIRJET- Fabric Defect Detection using Discrete Wavelet Transform
IRJET- Fabric Defect Detection using Discrete Wavelet TransformIRJET Journal
 
ASCE_ChingHuei_Rev00..
ASCE_ChingHuei_Rev00..ASCE_ChingHuei_Rev00..
ASCE_ChingHuei_Rev00..butest
 
ASCE_ChingHuei_Rev00..
ASCE_ChingHuei_Rev00..ASCE_ChingHuei_Rev00..
ASCE_ChingHuei_Rev00..butest
 
Contribution to the optimization of energy withdrawn from a PV panel using an...
Contribution to the optimization of energy withdrawn from a PV panel using an...Contribution to the optimization of energy withdrawn from a PV panel using an...
Contribution to the optimization of energy withdrawn from a PV panel using an...saad motahhir
 
Product defect detection based on convolutional autoencoder and one-class cla...
Product defect detection based on convolutional autoencoder and one-class cla...Product defect detection based on convolutional autoencoder and one-class cla...
Product defect detection based on convolutional autoencoder and one-class cla...IAESIJAI
 
IRJET- Power Scheduling Algorithm based Power Optimization of Mpsocs
IRJET-  	  Power Scheduling Algorithm based Power Optimization of MpsocsIRJET-  	  Power Scheduling Algorithm based Power Optimization of Mpsocs
IRJET- Power Scheduling Algorithm based Power Optimization of MpsocsIRJET Journal
 

Similar to Approximate Dynamic Programming using Fluid and Diffusion Approximations with Applications to Power Management (CDC 2009, Shanghai) (20)

Genetic Algorithms and Genetic Programming for Multiscale Modeling
Genetic Algorithms and Genetic Programming for Multiscale ModelingGenetic Algorithms and Genetic Programming for Multiscale Modeling
Genetic Algorithms and Genetic Programming for Multiscale Modeling
 
Acm Tech Talk - Decomposition Paradigms for Large Scale Systems
Acm Tech Talk - Decomposition Paradigms for Large Scale SystemsAcm Tech Talk - Decomposition Paradigms for Large Scale Systems
Acm Tech Talk - Decomposition Paradigms for Large Scale Systems
 
Programming with Relaxed Synchronization
Programming with Relaxed SynchronizationProgramming with Relaxed Synchronization
Programming with Relaxed Synchronization
 
FEA Analysis & Re-Design of a Bicycle Crank Arm
FEA Analysis & Re-Design of a Bicycle Crank ArmFEA Analysis & Re-Design of a Bicycle Crank Arm
FEA Analysis & Re-Design of a Bicycle Crank Arm
 
[February 2017 - Ph.D. Final Dissertation] Enabling Power-awareness For Multi...
[February 2017 - Ph.D. Final Dissertation] Enabling Power-awareness For Multi...[February 2017 - Ph.D. Final Dissertation] Enabling Power-awareness For Multi...
[February 2017 - Ph.D. Final Dissertation] Enabling Power-awareness For Multi...
 
“Imaging Systems for Applied Reinforcement Learning Control,” a Presentation ...
“Imaging Systems for Applied Reinforcement Learning Control,” a Presentation ...“Imaging Systems for Applied Reinforcement Learning Control,” a Presentation ...
“Imaging Systems for Applied Reinforcement Learning Control,” a Presentation ...
 
MEMS Extraction & Verification
MEMS Extraction & VerificationMEMS Extraction & Verification
MEMS Extraction & Verification
 
Multi Area Economic Dispatch Using Secant Method and Tie Line Matrix
Multi Area Economic Dispatch Using Secant Method and Tie Line MatrixMulti Area Economic Dispatch Using Secant Method and Tie Line Matrix
Multi Area Economic Dispatch Using Secant Method and Tie Line Matrix
 
Cfd fem-09 compatibility-and_accuracy_of_mesh_valeo
Cfd fem-09 compatibility-and_accuracy_of_mesh_valeoCfd fem-09 compatibility-and_accuracy_of_mesh_valeo
Cfd fem-09 compatibility-and_accuracy_of_mesh_valeo
 
Licentiate Presentation
Licentiate PresentationLicentiate Presentation
Licentiate Presentation
 
Final report Review
Final report ReviewFinal report Review
Final report Review
 
Validation of High Fidelity CFD Modeling Approach for Utility Scale Wind Turb...
Validation of High Fidelity CFD Modeling Approach for Utility Scale Wind Turb...Validation of High Fidelity CFD Modeling Approach for Utility Scale Wind Turb...
Validation of High Fidelity CFD Modeling Approach for Utility Scale Wind Turb...
 
Solution to ELD problem
Solution to ELD problemSolution to ELD problem
Solution to ELD problem
 
A comparative review of various approaches for feature extraction in Face rec...
A comparative review of various approaches for feature extraction in Face rec...A comparative review of various approaches for feature extraction in Face rec...
A comparative review of various approaches for feature extraction in Face rec...
 
IRJET- Fabric Defect Detection using Discrete Wavelet Transform
IRJET- Fabric Defect Detection using Discrete Wavelet TransformIRJET- Fabric Defect Detection using Discrete Wavelet Transform
IRJET- Fabric Defect Detection using Discrete Wavelet Transform
 
ASCE_ChingHuei_Rev00..
ASCE_ChingHuei_Rev00..ASCE_ChingHuei_Rev00..
ASCE_ChingHuei_Rev00..
 
ASCE_ChingHuei_Rev00..
ASCE_ChingHuei_Rev00..ASCE_ChingHuei_Rev00..
ASCE_ChingHuei_Rev00..
 
Contribution to the optimization of energy withdrawn from a PV panel using an...
Contribution to the optimization of energy withdrawn from a PV panel using an...Contribution to the optimization of energy withdrawn from a PV panel using an...
Contribution to the optimization of energy withdrawn from a PV panel using an...
 
Product defect detection based on convolutional autoencoder and one-class cla...
Product defect detection based on convolutional autoencoder and one-class cla...Product defect detection based on convolutional autoencoder and one-class cla...
Product defect detection based on convolutional autoencoder and one-class cla...
 
IRJET- Power Scheduling Algorithm based Power Optimization of Mpsocs
IRJET-  	  Power Scheduling Algorithm based Power Optimization of MpsocsIRJET-  	  Power Scheduling Algorithm based Power Optimization of Mpsocs
IRJET- Power Scheduling Algorithm based Power Optimization of Mpsocs
 

Approximate Dynamic Programming using Fluid and Diffusion Approximations with Applications to Power Management (CDC 2009, Shanghai)

  • 1. Approximate Dynamic Programming using Fluid and Diffusion Approximations with Applications to Power Management Speaker: Dayu Huang Wei Chen, Dayu Huang, Ankur A. Kulkarni,1Jayakrishnan Unnikrishnan, Quanyan Zhu, Prashant Mehta, Sean Meyn, and Adam Wierman 2 Coordinated Science Laboratory, UIUC Dept. of IESE, UIUC 1 Dept. of CS, California Inst. of Tech. 2 4 120 3 100 80 J 2 1 60 National Science Foundation (ECS-0523620 and CCF-0830511), 0 40 ITMANET DARPA RK 2006-07284, and Microsoft Research −1 20 −2 n 0 x 0 1 2 3 4 5 6 7 8 9 10 x 104 0 2 4 6 8 10 12 14 16 18 20
  • 2. Introduction MDP model Control i.i.d Cost Minimize average cost
  • 3. Introduction MDP model Control i.i.d Cost Minimize average cost Generator
  • 4. Introduction MDP model Control i.i.d Cost Minimize average cost Generator Cost Optimality Equation (ACOE) Average Generator Relative value function Solve ACOE and Find
  • 5. TD Learning The “curse of dimensionality”: Complexity of solving ACOE grows exponentially with the dimension of the state space. Approximate within a nite-dimensional function class Criterion: minimize the mean-squre error solved by stochastic approximation algorithms
  • 6. TD Learning The “curse of dimensionality”: Complexity of solving ACOE grows exponentially with the dimension of the state space. Approximate within a nite-dimensional function class Criterion: minimize the mean-squre error solved by stochastic approximation algorithms Problem: How to select the basis functions ? key to the success of TD learning
  • 7. Approach Based on Fluid and Di usion Models this talk: uid model Total cost for Value function of the uid model an associated deterministic model is a tight approximation to 120 100 Fluid value function 80 Relative value function 60 40 20 0 2 4 6 8 10 12 14 16 18 20 can be used as a part of the basis
  • 8. Related Work Multiclass queueing network Meyn 1997, Meyn 1997b optimal control Chen and Meyn 1999 simulation Hendersen et.al. 2003 network scheduling Veatch 2004 and routing Moallemi, Kumar and Van Roy 2006 Meyn 2007 Control Techniques for Complex Networks other approaches Tsitsiklis and Van Roy 1997 Mannor, Menache and Shimkin 2005
  • 9. Related Work Multiclass queueing network Meyn 1997, Meyn 1997b optimal control Chen and Meyn 1999 simulation Hendersen et.al. 2003 network scheduling Veatch 2004 and routing Moallemi, Kumar and Van Roy 2006 Meyn 2007 Control Techniques for Complex Networks other approaches Tsitsiklis and Van Roy 1997 Mannor, Menache and Shimkin 2005 Taylor series approximation this work
  • 10. Power Management via Speed Scaling Bansal, Kimbrel and Pruhs 2007 Wierman, Andrew and Tang 2009 Single processor job arrivals processing rate determined by the current power Control the processing speed to balance delay and energy costs Kaxiras and Martonosi 2008 Processor design: polynomial cost Wierman, Andrew and Tang 2009 This talk We also consider for wireless communication applications
  • 11. Fluid Model MDP Fluid model: Total Cost Total Cost Optimality Equation (TCOE) for the uid model:
  • 12. Why Fluid Model? MDP First order Taylor series approximation
  • 13. Why Fluid Model? MDP First order Taylor series approximation TCOE ACOE almost solves the ACOE Simple but powerful idea!
  • 14. Approximation of the Cost Function Error Analysis constant?
  • 15. Approximation of the Cost Function Error Analysis constant? Surrogate cost approximates Bounds on ?
  • 16. Structure Results on the Fluid Solution
  • 17. Lower Bound Convexity of
  • 18. Lower Bound Convexity of
  • 22. Approach Based on Fluid and Di usion Models this talk: uid model Total cost for Value function of the uid model an associated deterministic model is a tight approximation to 120 100 Fluid value function 80 Relative value function 60 40 20 0 2 4 6 8 10 12 14 16 18 20 can be used as a part of the basis
  • 23. TD Learning Experiment Basis functions: 4 120 3 100 Approximate relative value function Fluid value function 2 80 Relative value function 1 60 0 40 −1 20 −2 0 0 1 2 3 4 5 6 7 8 9 10 x 104 0 2 4 6 8 10 12 14 16 18 20 Estimates of Coe cients for the case of quadratic cost
  • 24. TD Learning with Policy Improvement 3 Average cost at stage 2 0 5 10 15 20 25 Nearly optimal after just a few iterations Need the value of the optimal policy
  • 25. Conclusions The uid value function can be used as a part of the basis for TD-learning. Motivated by analysis using Taylor series expansion: The uid value function almost solves ACOE. In particular, it solves the ACOE for a slightly di erent cost function; and the error term can be estimated. TD learning with policy improvement gives a near optimal policy in a few iterations, as shown by experiments. Application in power management for processors.
  • 26. Value Iteration Chen and Meyn 1999 250 − 200 150 Initialization: V0 0 Initialization: V0 = 100 50 0 5 10 15 20 n
  • 27. Policy 180 160 Stochastic optimal policy 140 myopic policy 120 Di erence 100 80 60 40 20 0 −20 0 2 4 6 8 10 12 14 16 18 20