My presentation at CDC 2009.
Authors of the paper are:
Wei Chen, Dayu Huang, Ankur Kulkarni, Jayakrishnan Unnikrishnan, Quanyan Zhu, Prashant Mehta, Sean Meyn, and Adam Wierman
IRJET- Power Scheduling Algorithm based Power Optimization of Mpsocs
Approximate Dynamic Programming using Fluid and Diffusion Approximations with Applications to Power Management (CDC 2009, Shanghai)
1. Approximate Dynamic Programming using
Fluid and Diffusion Approximations
with Applications to Power Management
Speaker: Dayu Huang
Wei Chen, Dayu Huang, Ankur A. Kulkarni,1Jayakrishnan Unnikrishnan, Quanyan Zhu,
Prashant Mehta, Sean Meyn, and Adam Wierman 2
Coordinated Science Laboratory, UIUC
Dept. of IESE, UIUC 1
Dept. of CS, California Inst. of Tech. 2
4 120
3 100
80
J
2
1 60
National Science Foundation (ECS-0523620 and CCF-0830511), 0 40
ITMANET DARPA RK 2006-07284, and Microsoft Research
−1 20
−2 n 0 x
0 1 2 3 4 5 6 7 8 9 10 x 104 0 2 4 6 8 10 12 14 16 18 20
4. Introduction
MDP model Control
i.i.d
Cost
Minimize average cost
Generator Cost Optimality Equation (ACOE)
Average
Generator
Relative value function
Solve ACOE and Find
5. TD Learning
The “curse of dimensionality”:
Complexity of solving ACOE grows exponentially with
the dimension of the state space.
Approximate within a nite-dimensional function class
Criterion: minimize the mean-squre error
solved by stochastic approximation algorithms
6. TD Learning
The “curse of dimensionality”:
Complexity of solving ACOE grows exponentially with
the dimension of the state space.
Approximate within a nite-dimensional function class
Criterion: minimize the mean-squre error
solved by stochastic approximation algorithms
Problem: How to select the basis functions ?
key to the success of TD learning
7. Approach Based on Fluid and Di usion Models
this talk: uid model
Total cost for
Value function of the uid model an associated deterministic model
is a tight approximation to
120
100
Fluid value function
80
Relative value function
60
40
20
0 2 4 6 8 10 12 14 16 18 20
can be used as a part of the basis
8. Related Work
Multiclass queueing network
Meyn 1997, Meyn 1997b
optimal control Chen and Meyn 1999
simulation Hendersen et.al. 2003
network scheduling Veatch 2004
and routing Moallemi, Kumar and Van Roy 2006
Meyn 2007 Control Techniques for
Complex Networks
other approaches Tsitsiklis and Van Roy 1997
Mannor, Menache and Shimkin 2005
9. Related Work
Multiclass queueing network
Meyn 1997, Meyn 1997b
optimal control Chen and Meyn 1999
simulation Hendersen et.al. 2003
network scheduling Veatch 2004
and routing Moallemi, Kumar and Van Roy 2006
Meyn 2007 Control Techniques for
Complex Networks
other approaches Tsitsiklis and Van Roy 1997
Mannor, Menache and Shimkin 2005
Taylor series approximation this work
10. Power Management via Speed Scaling
Bansal, Kimbrel and Pruhs 2007
Wierman, Andrew and Tang 2009
Single processor
job arrivals
processing rate
determined by the current power
Control the processing speed to balance delay and energy costs
Kaxiras and Martonosi 2008
Processor design: polynomial cost Wierman, Andrew and Tang 2009
This talk
We also consider
for wireless communication applications
11. Fluid Model MDP
Fluid model:
Total Cost
Total Cost Optimality Equation (TCOE) for the uid model:
22. Approach Based on Fluid and Di usion Models
this talk: uid model
Total cost for
Value function of the uid model an associated deterministic model
is a tight approximation to
120
100
Fluid value function
80
Relative value function
60
40
20
0 2 4 6 8 10 12 14 16 18 20
can be used as a part of the basis
23. TD Learning Experiment
Basis functions:
4 120
3 100 Approximate relative value function
Fluid value function
2 80
Relative value function
1 60
0 40
−1 20
−2 0
0 1 2 3 4 5 6 7 8 9 10 x 104 0 2 4 6 8 10 12 14 16 18 20
Estimates of Coe cients for the case of quadratic cost
24. TD Learning with Policy Improvement
3 Average cost at stage
2
0 5 10 15 20 25
Nearly optimal after just a few iterations
Need the value of the optimal policy
25. Conclusions
The uid value function can be used as a part of the basis for TD-learning.
Motivated by analysis using Taylor series expansion:
The uid value function almost solves ACOE. In particular,
it solves the ACOE for a slightly di erent cost function; and
the error term can be estimated.
TD learning with policy improvement gives a near optimal policy
in a few iterations, as shown by experiments.
Application in power management for processors.
26. Value Iteration Chen and Meyn 1999
250
−
200
150 Initialization: V0 0
Initialization: V0 =
100
50
0
5 10 15 20 n