Distributed solution of stochastic optimal control
problem on GPUs
Ajay K. Sampathiraoa, P. Sopasakisa, A. Bemporada and P. Patrinosb
a IMT Institute for Advanced Studies Lucca, Italy
b Dept. Electr. Eng. (ESAT), KU Leuven, Belgium
December 18, 2015
Applications
Microgrids [Hans et al. ’15]
Drinking water networks [Sampathirao et al. ’15]
HVAC [Long et al. ’13, Zhang et al. ’13, Parisio et al. ’13]
Financial systems [Patrinos et al. ’11, Bemporad et al., ’14]
Chemical process [Lucia et al. ’13]
Distillation column [Garrido and Steinbach, ’11]
1 / 28
Motivation
Stochastic optimisation is not fit for control applications.
2 / 28
Spoiler alert!
Example:
920, 000 decision variables
Interior point runtime 35s
GPU APG solver < 3s
3 / 28
Outline
1. Stochastic optimal control problem formulation
2. Accelerated proximal gradient algorithm
3. Parallelisable implementation
4. Simulations
4 / 28
I. Stochastic Optimal Control
System description
Discrete-time uncertain linear system:
xk+1 = Aξk
xk + Bξk
uk + wξk
,
ξk is a random variable on a prob. space (Ωk, Fk, Pk). At time k we
observe xk but not ξk.
5 / 28
Stochastic optimal control problem
Optimisation problem:
V (p) = min
π={uk}k=N−1
k=0
E Vf (xN , ξN ) +
N−1
k=0
k(xk, uk, ξk) ,
s.t x0 = p,
xk+1 = Aξk
xk + Bξk
uk + wξk
,
6 / 28
Stochastic optimal control problem
Optimisation problem:
V (p) = min
π={uk}k=N−1
k=0
E Vf (xN , ξN ) +
N−1
k=0
k(xk, uk, ξk) ,
s.t x0 = p,
xk+1 = Aξk
xk + Bξk
uk + wξk
,
where:
E[·]: conditional expectation wrt the product probability measure
6 / 28
Stochastic optimal control problem
Optimisation problem:
V (p) = min
π={uk}k=N−1
k=0
E Vf (xN , ξN ) +
N−1
k=0
k(xk, uk, ξk) ,
s.t x0 = p,
xk+1 = Aξk
xk + Bξk
uk + wξk
,
where:
E[·]: conditional expectation wrt the product probability measure
Casual policy uk = ψk(p,ξξξk−1), with ξξξk = (ξ0, ξ1, . . . , ξk)
6 / 28
Stochastic optimal control problem
Optimisation problem:
V (p) = min
π={uk}k=N−1
k=0
E Vf (xN , ξN ) +
N−1
k=0
k(xk, uk, ξk) ,
s.t x0 = p,
xk+1 = Aξk
xk + Bξk
uk + wξk
,
where:
E[·]: conditional expectation wrt the product probability measure
Casual policy uk = ψk(p,ξξξk−1), with ξξξk = (ξ0, ξ1, . . . , ξk)
and Vf can encode constraints
6 / 28
Stage cost
The stage cost is a function k : Rn × Rm × Ωk → ¯R
k(xk, uk, ξk) = φk(xk, uk, ξk) + ¯φk(Fkxk + Gkuk, ξk),
where φ is real-valued, convex, smooth, e.g.,
φk(xk, uk, ξk) = xkQξk
xk + ukRξk
uk,
and ¯φ is proper, convex, lsc, and possibly non-smooth, e.g.,
¯φk(xk, uk, ξk) = δ(Fkxk + Gkuk | Yξk
).
7 / 28
Terminal cost
The terminal cost is a function Vf : Rn × ΩN → ¯R which can be written
as
Vf (x) = φN (xN , ξN ) + ¯φN (xN , ξN ),
where φN is real-valued, convex, smooth and ¯φN is proper, convex, lsc
and possibly non-smooth.
8 / 28
Total cost
The total cost function can be written as E(f(x) + g(Hx)), where
x = ((xk)k, (uk)k)
f(x) =
N−1
k=0
φk(xk, uk, ξk) + φN (xN , ξN ) + δ(x | X(p))
g(Hx) =
N−1
k=0
¯φk(Fkxk + Gkuk, ξk) + ¯φN (FN xN , ξN ),
and φk and φN are such that f is σ-strongly convex on its domain, that
is, the affine space which defines the system dynamics, i.e.,
X(p) = {x : xj
k+1 = Aj
kxi
k + Bj
kui
k + wj
k, j ∈ child(k, i)}
9 / 28
II. Proximal gradient algorithm
Proximal operator
We define a mapping proxγf : Rn → Rn of a closed, convex, proper
extended-real valued function f : Rn → ¯R as
proxγf (v) = arg min
x∈Rn
f(x) +
1
2γ
x − v 2
2 ,
for γ>0.
10 / 28
Proximal of the conjugate function
For a function f : Rn → ¯R we define its conjugate function to be1
f∗
(y) = sup
x∈Rn
{ y, x − f(x)}.
If we can compute proxγf , then we can also compute proxγf∗ using the
Moreau decomposition formula
v = proxγf (v) + γ proxγ−1f∗ (γ−1
v)
1
R. Rockafellar, Convex analysis. Princeton university press, 1972.
11 / 28
Optimisation problem
Consider the optimisation problem :
P = min
z=Hx
f(x) + g(z),
where f : Rn → ¯R is σ-strongly convex and g : Rm → ¯R is closed, proper
and convex. The Fenchel dual of this problem is:
D = min
y
f∗
(−H y) + g∗
(y),
where f∗ has Lipschitz-continuous gradient with constant 1/σ.
12 / 28
The basic algorithm
The proximal point algorithm applied to the dual optimisation problem
is defined by the recursion on dual variables2:
y0
= 0,
yν+1
= proxλg∗ (yν
+ λH f∗
(−H yν
)).
Using the conjugate subgradient theorem we can define
xν
:= f∗
(−H yν
) = arg min
z
{ z, H yν
+ f(z)}.
2
P. Combettes and J. Pesquet, “Proximal splitting methods in signal processing”, Fixed-Point Algorithms for Inverse
Problems in Science and Engineering, 2011.
13 / 28
Dual APG algorithm
Nesterov’s accelerated proximal gradient algorithm (APG) converges
at a rate of O(1/ν2) and is defined by the recursion:
vν
= yν
+ θν(θ−1
ν−1 − 1)(yν
− yν−1
),
xν
= arg min
z
{ z, H vν
+ f(z)},
zν
= proxλ−1g(λ−1
vν
+ Hxν
),
yν+1
= vν
+ λ(Hzv
− tv
),
θν+1 =
1
2
( θ4
ν + 4θ2
ν − θ2
ν).
14 / 28
Dual APG algorithm
Nesterov’s accelerated proximal gradient algorithm (APG) converges
at a rate of O(1/ν2) and is defined by the recursion:
vν
= yν
+ θν(θ−1
ν−1 − 1)(yν
− yν−1
),
xν
= arg min
z
{ z, H vν
+ f(z)},
zν
= proxλ−1g(λ−1
vν
+ Hxν
),
yν+1
= vν
+ λ(Hzv
− tv
),
θν+1 =
1
2
( θ4
ν + 4θ2
ν − θ2
ν).
14 / 28
Characteristics of the algoritm
Dual iterates converge at a rate of O(1/ν2)
An ergodic (averaged) primal iterate converges at a rate of O(1/ν2)3
Preconditioning is of crucial importance
Terminate the algorithm when the iterate (xν, zν) satisfies
f(x) + g(z) − P ≤ V
x − Hz ∞ ≤ g.
3
P. Patrinos and A. Bemporad, “An accelerated dual gradient-projection algorithm for embedded linear model predictive
control,” IEEE Trans. Aut. Contr., vol. 59, no. 1, pp. 18–33, 2014.
15 / 28
III. APG for Stochastic Optimal
Control Problems
Scenario tree formulation
16 / 28
Splitting for proximal formulation
We have
Ef(x)=
N−1
k=0
µ(k)
i=1
pi
kφ(xi
k,ui
k,i) +
µ(N)
i=1
pi
N φN (xi
N , i)+δ(x|X(p)),
Eg(Hx)=
N−1
k=0
µ(k)
i=1
pi
k
¯φ(Fi
kxi
k+Gi
kui
k,i)+
µ(N)
i=1
pi
N
¯φN (Fi
N xi
N , i),
17 / 28
Splitting for proximal formulation
We have
Ef(x)=
N−1
k=0
µ(k)
i=1
pi
kφ(xi
k,ui
k,i) +
µ(N)
i=1
pi
N φN (xi
N , i)+δ(x|X(p)),
Eg(Hx)=
N−1
k=0
µ(k)
i=1
pi
k
¯φ(Fi
kxi
k+Gi
kui
k,i)+
µ(N)
i=1
pi
N
¯φN (Fi
N xi
N , i),
where
X(p) = {x : xj
k+1 = Aj
kxi
k + Bj
kui
k + wj
k, j ∈ child(k, i)}
17 / 28
Computation of the dual gradient
Using dynamic programming, we solve the problem
xν
= arg min
z
{ z, H yν
+ Ef(z)}.
where
Ef(x)=
N−1
k=0
µ(k)
i=1
pi
kφ(xi
k,ui
k,i) +
µ(N)
i=1
pi
N φN (xi
N , i)+δ(x|X(p)),
18 / 28
Computation of the dual gradient
Factor step:
Performed once
Parallelisable
For time-invariant problems,
can be performed once offline
Algorithm 1 Solve step
qi
N ← yi
N , ∀i ∈ N[1,µ(N)], %Backward substitution
for k = N − 1, . . . , 0 do
for i ∈ µ(k) do {in parallel}
ui
k ← Φi
kyi
k + j∈child(k,i) Θ
j
k
q
j
k+1
+ σi
k
qi
k ← Di
k yi
k + j∈child(k,i) Λ
j
k
q
j
k+1
+ ci
k
end for
end for
x1
0 = p, %Forward substitution
for k = 0, . . . , N − 1 do
for i ∈ µ(k) do {in parallel}
ui
k ← Ki
kxi
k + ui
k
for j ∈ child(k, i) do {in parallel}
x
j
k+1
← A
j
k
xi
k + B
j
k
ui
k + w
j
k
end for
end for
end for
19 / 28
Computation of the dual gradient
Dynamic programming approach
Parallelisable across all nodes of a stage
The solve step involves only matrix-vector products
20 / 28
IV. Simulations
Simulation Results
Linear spring-mass system
GPU CUDA-C implementation (NVIDIA Tesla 2075)
Average and maximum runtime for a random sample of 100 initial
points
Compared against interior-point solver of Gurobi
21 / 28
Number of scenarios
22 / 28
Number of scenarios
log 2
(scenarios)
7 8 9 10 11 12 13
max.time(sec)
10 -2
10 -1
10 0
10 1
10 2
APG 0.005
APG 0.01
APG 0.05
Gurobi IP
23 / 28
Number of scenarios
In numbers:
8192 scenarios
6.39 · 105 primal variables
2.0 · 106 dual variables
Using g = V = 0.01 we are 40× faster (average)
24 / 28
Prediction horizon
prediction horizon
10 20 30 40 50 60
averagetime(sec)
10 -1
10 0
10 1
APG 0.005
APG 0.01
APG 0.05
Gurobi IP
25 / 28
Prediction horizon
prediction horizon
10 20 30 40 50 60
max.time(sec)
10 -1
10 0
10 1
APG 0.005
APG 0.01
APG 0.05
Gurobi IP
26 / 28
Prediction horizon
In numbers:
N = 60 and 500 scenarios
0.92 · 106 primal variables
2.0 · 106 dual variables
Using g = V = 0.01 we are 23× faster (average)
27 / 28
Stochastic MPC of drinking water networks
Recent results (to be submitted):
About 2 million primal variables
593 scenarios, N = 24
Gurobi requires 1329s on average
GPU APG runtime is about 58s
28 / 28
Thank you for your attention.
This work was financially supported by the EU FP7 research project EFFINET “Efficient Integrated Real-time
monitoring and Control of Drinking Water Networks,” grant agreement no. 318556.

Distributed solution of stochastic optimal control problem on GPUs

  • 1.
    Distributed solution ofstochastic optimal control problem on GPUs Ajay K. Sampathiraoa, P. Sopasakisa, A. Bemporada and P. Patrinosb a IMT Institute for Advanced Studies Lucca, Italy b Dept. Electr. Eng. (ESAT), KU Leuven, Belgium December 18, 2015
  • 2.
    Applications Microgrids [Hans etal. ’15] Drinking water networks [Sampathirao et al. ’15] HVAC [Long et al. ’13, Zhang et al. ’13, Parisio et al. ’13] Financial systems [Patrinos et al. ’11, Bemporad et al., ’14] Chemical process [Lucia et al. ’13] Distillation column [Garrido and Steinbach, ’11] 1 / 28
  • 3.
    Motivation Stochastic optimisation isnot fit for control applications. 2 / 28
  • 4.
    Spoiler alert! Example: 920, 000decision variables Interior point runtime 35s GPU APG solver < 3s 3 / 28
  • 5.
    Outline 1. Stochastic optimalcontrol problem formulation 2. Accelerated proximal gradient algorithm 3. Parallelisable implementation 4. Simulations 4 / 28
  • 6.
  • 7.
    System description Discrete-time uncertainlinear system: xk+1 = Aξk xk + Bξk uk + wξk , ξk is a random variable on a prob. space (Ωk, Fk, Pk). At time k we observe xk but not ξk. 5 / 28
  • 8.
    Stochastic optimal controlproblem Optimisation problem: V (p) = min π={uk}k=N−1 k=0 E Vf (xN , ξN ) + N−1 k=0 k(xk, uk, ξk) , s.t x0 = p, xk+1 = Aξk xk + Bξk uk + wξk , 6 / 28
  • 9.
    Stochastic optimal controlproblem Optimisation problem: V (p) = min π={uk}k=N−1 k=0 E Vf (xN , ξN ) + N−1 k=0 k(xk, uk, ξk) , s.t x0 = p, xk+1 = Aξk xk + Bξk uk + wξk , where: E[·]: conditional expectation wrt the product probability measure 6 / 28
  • 10.
    Stochastic optimal controlproblem Optimisation problem: V (p) = min π={uk}k=N−1 k=0 E Vf (xN , ξN ) + N−1 k=0 k(xk, uk, ξk) , s.t x0 = p, xk+1 = Aξk xk + Bξk uk + wξk , where: E[·]: conditional expectation wrt the product probability measure Casual policy uk = ψk(p,ξξξk−1), with ξξξk = (ξ0, ξ1, . . . , ξk) 6 / 28
  • 11.
    Stochastic optimal controlproblem Optimisation problem: V (p) = min π={uk}k=N−1 k=0 E Vf (xN , ξN ) + N−1 k=0 k(xk, uk, ξk) , s.t x0 = p, xk+1 = Aξk xk + Bξk uk + wξk , where: E[·]: conditional expectation wrt the product probability measure Casual policy uk = ψk(p,ξξξk−1), with ξξξk = (ξ0, ξ1, . . . , ξk) and Vf can encode constraints 6 / 28
  • 12.
    Stage cost The stagecost is a function k : Rn × Rm × Ωk → ¯R k(xk, uk, ξk) = φk(xk, uk, ξk) + ¯φk(Fkxk + Gkuk, ξk), where φ is real-valued, convex, smooth, e.g., φk(xk, uk, ξk) = xkQξk xk + ukRξk uk, and ¯φ is proper, convex, lsc, and possibly non-smooth, e.g., ¯φk(xk, uk, ξk) = δ(Fkxk + Gkuk | Yξk ). 7 / 28
  • 13.
    Terminal cost The terminalcost is a function Vf : Rn × ΩN → ¯R which can be written as Vf (x) = φN (xN , ξN ) + ¯φN (xN , ξN ), where φN is real-valued, convex, smooth and ¯φN is proper, convex, lsc and possibly non-smooth. 8 / 28
  • 14.
    Total cost The totalcost function can be written as E(f(x) + g(Hx)), where x = ((xk)k, (uk)k) f(x) = N−1 k=0 φk(xk, uk, ξk) + φN (xN , ξN ) + δ(x | X(p)) g(Hx) = N−1 k=0 ¯φk(Fkxk + Gkuk, ξk) + ¯φN (FN xN , ξN ), and φk and φN are such that f is σ-strongly convex on its domain, that is, the affine space which defines the system dynamics, i.e., X(p) = {x : xj k+1 = Aj kxi k + Bj kui k + wj k, j ∈ child(k, i)} 9 / 28
  • 15.
  • 16.
    Proximal operator We definea mapping proxγf : Rn → Rn of a closed, convex, proper extended-real valued function f : Rn → ¯R as proxγf (v) = arg min x∈Rn f(x) + 1 2γ x − v 2 2 , for γ>0. 10 / 28
  • 17.
    Proximal of theconjugate function For a function f : Rn → ¯R we define its conjugate function to be1 f∗ (y) = sup x∈Rn { y, x − f(x)}. If we can compute proxγf , then we can also compute proxγf∗ using the Moreau decomposition formula v = proxγf (v) + γ proxγ−1f∗ (γ−1 v) 1 R. Rockafellar, Convex analysis. Princeton university press, 1972. 11 / 28
  • 18.
    Optimisation problem Consider theoptimisation problem : P = min z=Hx f(x) + g(z), where f : Rn → ¯R is σ-strongly convex and g : Rm → ¯R is closed, proper and convex. The Fenchel dual of this problem is: D = min y f∗ (−H y) + g∗ (y), where f∗ has Lipschitz-continuous gradient with constant 1/σ. 12 / 28
  • 19.
    The basic algorithm Theproximal point algorithm applied to the dual optimisation problem is defined by the recursion on dual variables2: y0 = 0, yν+1 = proxλg∗ (yν + λH f∗ (−H yν )). Using the conjugate subgradient theorem we can define xν := f∗ (−H yν ) = arg min z { z, H yν + f(z)}. 2 P. Combettes and J. Pesquet, “Proximal splitting methods in signal processing”, Fixed-Point Algorithms for Inverse Problems in Science and Engineering, 2011. 13 / 28
  • 20.
    Dual APG algorithm Nesterov’saccelerated proximal gradient algorithm (APG) converges at a rate of O(1/ν2) and is defined by the recursion: vν = yν + θν(θ−1 ν−1 − 1)(yν − yν−1 ), xν = arg min z { z, H vν + f(z)}, zν = proxλ−1g(λ−1 vν + Hxν ), yν+1 = vν + λ(Hzv − tv ), θν+1 = 1 2 ( θ4 ν + 4θ2 ν − θ2 ν). 14 / 28
  • 21.
    Dual APG algorithm Nesterov’saccelerated proximal gradient algorithm (APG) converges at a rate of O(1/ν2) and is defined by the recursion: vν = yν + θν(θ−1 ν−1 − 1)(yν − yν−1 ), xν = arg min z { z, H vν + f(z)}, zν = proxλ−1g(λ−1 vν + Hxν ), yν+1 = vν + λ(Hzv − tv ), θν+1 = 1 2 ( θ4 ν + 4θ2 ν − θ2 ν). 14 / 28
  • 22.
    Characteristics of thealgoritm Dual iterates converge at a rate of O(1/ν2) An ergodic (averaged) primal iterate converges at a rate of O(1/ν2)3 Preconditioning is of crucial importance Terminate the algorithm when the iterate (xν, zν) satisfies f(x) + g(z) − P ≤ V x − Hz ∞ ≤ g. 3 P. Patrinos and A. Bemporad, “An accelerated dual gradient-projection algorithm for embedded linear model predictive control,” IEEE Trans. Aut. Contr., vol. 59, no. 1, pp. 18–33, 2014. 15 / 28
  • 23.
    III. APG forStochastic Optimal Control Problems
  • 24.
  • 25.
    Splitting for proximalformulation We have Ef(x)= N−1 k=0 µ(k) i=1 pi kφ(xi k,ui k,i) + µ(N) i=1 pi N φN (xi N , i)+δ(x|X(p)), Eg(Hx)= N−1 k=0 µ(k) i=1 pi k ¯φ(Fi kxi k+Gi kui k,i)+ µ(N) i=1 pi N ¯φN (Fi N xi N , i), 17 / 28
  • 26.
    Splitting for proximalformulation We have Ef(x)= N−1 k=0 µ(k) i=1 pi kφ(xi k,ui k,i) + µ(N) i=1 pi N φN (xi N , i)+δ(x|X(p)), Eg(Hx)= N−1 k=0 µ(k) i=1 pi k ¯φ(Fi kxi k+Gi kui k,i)+ µ(N) i=1 pi N ¯φN (Fi N xi N , i), where X(p) = {x : xj k+1 = Aj kxi k + Bj kui k + wj k, j ∈ child(k, i)} 17 / 28
  • 27.
    Computation of thedual gradient Using dynamic programming, we solve the problem xν = arg min z { z, H yν + Ef(z)}. where Ef(x)= N−1 k=0 µ(k) i=1 pi kφ(xi k,ui k,i) + µ(N) i=1 pi N φN (xi N , i)+δ(x|X(p)), 18 / 28
  • 28.
    Computation of thedual gradient Factor step: Performed once Parallelisable For time-invariant problems, can be performed once offline Algorithm 1 Solve step qi N ← yi N , ∀i ∈ N[1,µ(N)], %Backward substitution for k = N − 1, . . . , 0 do for i ∈ µ(k) do {in parallel} ui k ← Φi kyi k + j∈child(k,i) Θ j k q j k+1 + σi k qi k ← Di k yi k + j∈child(k,i) Λ j k q j k+1 + ci k end for end for x1 0 = p, %Forward substitution for k = 0, . . . , N − 1 do for i ∈ µ(k) do {in parallel} ui k ← Ki kxi k + ui k for j ∈ child(k, i) do {in parallel} x j k+1 ← A j k xi k + B j k ui k + w j k end for end for end for 19 / 28
  • 29.
    Computation of thedual gradient Dynamic programming approach Parallelisable across all nodes of a stage The solve step involves only matrix-vector products 20 / 28
  • 30.
  • 31.
    Simulation Results Linear spring-masssystem GPU CUDA-C implementation (NVIDIA Tesla 2075) Average and maximum runtime for a random sample of 100 initial points Compared against interior-point solver of Gurobi 21 / 28
  • 32.
  • 33.
    Number of scenarios log2 (scenarios) 7 8 9 10 11 12 13 max.time(sec) 10 -2 10 -1 10 0 10 1 10 2 APG 0.005 APG 0.01 APG 0.05 Gurobi IP 23 / 28
  • 34.
    Number of scenarios Innumbers: 8192 scenarios 6.39 · 105 primal variables 2.0 · 106 dual variables Using g = V = 0.01 we are 40× faster (average) 24 / 28
  • 35.
    Prediction horizon prediction horizon 1020 30 40 50 60 averagetime(sec) 10 -1 10 0 10 1 APG 0.005 APG 0.01 APG 0.05 Gurobi IP 25 / 28
  • 36.
    Prediction horizon prediction horizon 1020 30 40 50 60 max.time(sec) 10 -1 10 0 10 1 APG 0.005 APG 0.01 APG 0.05 Gurobi IP 26 / 28
  • 37.
    Prediction horizon In numbers: N= 60 and 500 scenarios 0.92 · 106 primal variables 2.0 · 106 dual variables Using g = V = 0.01 we are 23× faster (average) 27 / 28
  • 38.
    Stochastic MPC ofdrinking water networks Recent results (to be submitted): About 2 million primal variables 593 scenarios, N = 24 Gurobi requires 1329s on average GPU APG runtime is about 58s 28 / 28
  • 39.
    Thank you foryour attention. This work was financially supported by the EU FP7 research project EFFINET “Efficient Integrated Real-time monitoring and Control of Drinking Water Networks,” grant agreement no. 318556.