Distributed solution of stochastic optimal control problem on GPUs

Distributed solution of stochastic optimal control
problem on GPUs
Ajay K. Sampathiraoa, P. Sopasakisa, A. Bemporada and P. Patrinosb
a IMT Institute for Advanced Studies Lucca, Italy
b Dept. Electr. Eng. (ESAT), KU Leuven, Belgium
December 18, 2015

Applications
Microgrids [Hans et al. ’15]
Drinking water networks [Sampathirao et al. ’15]
HVAC [Long et al. ’13, Zhang et al. ’13, Parisio et al. ’13]
Financial systems [Patrinos et al. ’11, Bemporad et al., ’14]
Chemical process [Lucia et al. ’13]
Distillation column [Garrido and Steinbach, ’11]
1 / 28

Motivation
Stochastic optimisation is not ﬁt for control applications.
2 / 28

Spoiler alert!
Example:
920, 000 decision variables
Interior point runtime 35s
GPU APG solver < 3s
3 / 28

Outline
1. Stochastic optimal control problem formulation
2. Accelerated proximal gradient algorithm
3. Parallelisable implementation
4. Simulations
4 / 28

System description
Discrete-time uncertain linear system:
xk+1 = Aξk
xk + Bξk
uk + wξk
,
ξk is a random variable on a prob. space (Ωk, Fk, Pk). At time k we
observe xk but not ξk.
5 / 28

Stochastic optimal control problem
Optimisation problem:
V (p) = min
π={uk}k=N−1
k=0
E Vf (xN , ξN ) +
N−1
k=0
k(xk, uk, ξk) ,
s.t x0 = p,
xk+1 = Aξk
xk + Bξk
uk + wξk
,
6 / 28

V (p) = min
π={uk}k=N−1
k=0
E Vf (xN , ξN ) +
N−1
k=0
k(xk, uk, ξk) ,
s.t x0 = p,
xk+1 = Aξk
xk + Bξk
uk + wξk
,
where:
E[·]: conditional expectation wrt the product probability measure
6 / 28

V (p) = min
π={uk}k=N−1
k=0
E Vf (xN , ξN ) +
N−1
k=0
k(xk, uk, ξk) ,
s.t x0 = p,
xk+1 = Aξk
xk + Bξk
uk + wξk
,
where:
Casual policy uk = ψk(p,ξξξk−1), with ξξξk = (ξ0, ξ1, . . . , ξk)
6 / 28

V (p) = min
π={uk}k=N−1
k=0
E Vf (xN , ξN ) +
N−1
k=0
k(xk, uk, ξk) ,
s.t x0 = p,
xk+1 = Aξk
xk + Bξk
uk + wξk
,
where:
Casual policy uk = ψk(p,ξξξk−1), with ξξξk = (ξ0, ξ1, . . . , ξk)
and Vf can encode constraints
6 / 28

Stage cost
The stage cost is a function k : Rn × Rm × Ωk → ¯R
k(xk, uk, ξk) = φk(xk, uk, ξk) + ¯φk(Fkxk + Gkuk, ξk),
where φ is real-valued, convex, smooth, e.g.,
φk(xk, uk, ξk) = xkQξk
xk + ukRξk
uk,
and ¯φ is proper, convex, lsc, and possibly non-smooth, e.g.,
¯φk(xk, uk, ξk) = δ(Fkxk + Gkuk | Yξk
).
7 / 28

Terminal cost
The terminal cost is a function Vf : Rn × ΩN → ¯R which can be written
as
Vf (x) = φN (xN , ξN ) + ¯φN (xN , ξN ),
where φN is real-valued, convex, smooth and ¯φN is proper, convex, lsc
and possibly non-smooth.
8 / 28

Total cost
The total cost function can be written as E(f(x) + g(Hx)), where
x = ((xk)k, (uk)k)
f(x) =
N−1
k=0
φk(xk, uk, ξk) + φN (xN , ξN ) + δ(x | X(p))
g(Hx) =
N−1
k=0
¯φk(Fkxk + Gkuk, ξk) + ¯φN (FN xN , ξN ),
and φk and φN are such that f is σ-strongly convex on its domain, that
is, the aﬃne space which deﬁnes the system dynamics, i.e.,
X(p) = {x : xj
k+1 = Aj
kxi
k + Bj
kui
k + wj
k, j ∈ child(k, i)}
9 / 28

II. Proximal gradient algorithm

Proximal operator
We deﬁne a mapping proxγf : Rn → Rn of a closed, convex, proper
extended-real valued function f : Rn → ¯R as
proxγf (v) = arg min
x∈Rn
f(x) +
1
2γ
x − v 2
2 ,
for γ>0.
10 / 28

Proximal of the conjugate function
For a function f : Rn → ¯R we deﬁne its conjugate function to be1
f∗
(y) = sup
x∈Rn
{ y, x − f(x)}.
If we can compute proxγf , then we can also compute proxγf∗ using the
Moreau decomposition formula
v = proxγf (v) + γ proxγ−1f∗ (γ−1
v)
1
R. Rockafellar, Convex analysis. Princeton university press, 1972.
11 / 28

Optimisation problem
Consider the optimisation problem :
P = min
z=Hx
f(x) + g(z),
where f : Rn → ¯R is σ-strongly convex and g : Rm → ¯R is closed, proper
and convex. The Fenchel dual of this problem is:
D = min
y
f∗
(−H y) + g∗
(y),
where f∗ has Lipschitz-continuous gradient with constant 1/σ.
12 / 28

The basic algorithm
The proximal point algorithm applied to the dual optimisation problem
is deﬁned by the recursion on dual variables2:
y0
= 0,
yν+1
= proxλg∗ (yν
+ λH f∗
(−H yν
)).
Using the conjugate subgradient theorem we can deﬁne
xν
:= f∗
(−H yν
) = arg min
z
{ z, H yν
+ f(z)}.
2
P. Combettes and J. Pesquet, “Proximal splitting methods in signal processing”, Fixed-Point Algorithms for Inverse
Problems in Science and Engineering, 2011.
13 / 28

Dual APG algorithm
Nesterov’s accelerated proximal gradient algorithm (APG) converges
at a rate of O(1/ν2) and is deﬁned by the recursion:
vν
= yν
+ θν(θ−1
ν−1 − 1)(yν
− yν−1
),
xν
= arg min
z
{ z, H vν
+ f(z)},
zν
= proxλ−1g(λ−1
vν
+ Hxν
),
yν+1
= vν
+ λ(Hzv
− tv
),
θν+1 =
1
2
( θ4
ν + 4θ2
ν − θ2
ν).
14 / 28

Characteristics of the algoritm
Dual iterates converge at a rate of O(1/ν2)
An ergodic (averaged) primal iterate converges at a rate of O(1/ν2)3
Preconditioning is of crucial importance
Terminate the algorithm when the iterate (xν, zν) satisﬁes
f(x) + g(z) − P ≤ V
x − Hz ∞ ≤ g.
3
P. Patrinos and A. Bemporad, “An accelerated dual gradient-projection algorithm for embedded linear model predictive
control,” IEEE Trans. Aut. Contr., vol. 59, no. 1, pp. 18–33, 2014.
15 / 28

III. APG for Stochastic Optimal
Control Problems

Scenario tree formulation
16 / 28

Splitting for proximal formulation
We have
Ef(x)=
N−1
k=0
µ(k)
i=1
pi
kφ(xi
k,ui
k,i) +
µ(N)
i=1
pi
N φN (xi
N , i)+δ(x|X(p)),
Eg(Hx)=
N−1
k=0
µ(k)
i=1
pi
k
¯φ(Fi
kxi
k+Gi
kui
k,i)+
µ(N)
i=1
pi
N
¯φN (Fi
N xi
N , i),
17 / 28

Splitting for proximal formulation
We have
Ef(x)=
N−1
k=0
µ(k)
i=1
pi
kφ(xi
k,ui
k,i) +
µ(N)
i=1
pi
N φN (xi
N , i)+δ(x|X(p)),
Eg(Hx)=
N−1
k=0
µ(k)
i=1
pi
k
¯φ(Fi
kxi
k+Gi
kui
k,i)+
µ(N)
i=1
pi
N
¯φN (Fi
N xi
N , i),
where
X(p) = {x : xj
k+1 = Aj
kxi
k + Bj
kui
k + wj
k, j ∈ child(k, i)}
17 / 28

Computation of the dual gradient
Using dynamic programming, we solve the problem
xν
= arg min
z
{ z, H yν
+ Ef(z)}.
where
Ef(x)=
N−1
k=0
µ(k)
i=1
pi
kφ(xi
k,ui
k,i) +
µ(N)
i=1
pi
N φN (xi
N , i)+δ(x|X(p)),
18 / 28

Factor step:
Performed once
Parallelisable
For time-invariant problems,
can be performed once oﬄine
Algorithm 1 Solve step
qi
N ← yi
N , ∀i ∈ N[1,µ(N)], %Backward substitution
for k = N − 1, . . . , 0 do
for i ∈ µ(k) do {in parallel}
ui
k ← Φi
kyi
k + j∈child(k,i) Θ
j
k
q
j
k+1
+ σi
k
qi
k ← Di
k yi
k + j∈child(k,i) Λ
j
k
q
j
k+1
+ ci
k
end for
end for
x1
0 = p, %Forward substitution
for k = 0, . . . , N − 1 do
for i ∈ µ(k) do {in parallel}
ui
k ← Ki
kxi
k + ui
k
for j ∈ child(k, i) do {in parallel}
x
j
k+1
← A
j
k
xi
k + B
j
k
ui
k + w
j
k
end for
end for
end for
19 / 28

Dynamic programming approach
Parallelisable across all nodes of a stage
The solve step involves only matrix-vector products
20 / 28

Simulation Results
Linear spring-mass system
GPU CUDA-C implementation (NVIDIA Tesla 2075)
Average and maximum runtime for a random sample of 100 initial
points
Compared against interior-point solver of Gurobi
21 / 28

Number of scenarios
log 2
(scenarios)
7 8 9 10 11 12 13
max.time(sec)
10 -2
10 -1
10 0
10 1
10 2
APG 0.005
APG 0.01
APG 0.05
Gurobi IP
23 / 28

Number of scenarios
In numbers:
8192 scenarios
6.39 · 105 primal variables
2.0 · 106 dual variables
Using g = V = 0.01 we are 40× faster (average)
24 / 28

Prediction horizon
prediction horizon
10 20 30 40 50 60
averagetime(sec)
10 -1
10 0
10 1
APG 0.005
APG 0.01
APG 0.05
Gurobi IP
25 / 28

Prediction horizon
prediction horizon
10 20 30 40 50 60
max.time(sec)
10 -1
10 0
10 1
APG 0.005
APG 0.01
APG 0.05
Gurobi IP
26 / 28

Prediction horizon
In numbers:
N = 60 and 500 scenarios
0.92 · 106 primal variables
2.0 · 106 dual variables
Using g = V = 0.01 we are 23× faster (average)
27 / 28

Stochastic MPC of drinking water networks
Recent results (to be submitted):
About 2 million primal variables
593 scenarios, N = 24
Gurobi requires 1329s on average
GPU APG runtime is about 58s
28 / 28

Thank you for your attention.
This work was ﬁnancially supported by the EU FP7 research project EFFINET “Eﬃcient Integrated Real-time
monitoring and Control of Drinking Water Networks,” grant agreement no. 318556.

Distributed solution of stochastic optimal control problem on GPUs

More Related Content

What's hot

Similar to Distributed solution of stochastic optimal control problem on GPUs

More from Pantelis Sopasakis

Recently uploaded

Distributed solution of stochastic optimal control problem on GPUs