SlideShare a Scribd company logo
1 of 70
Download to read offline
Alternating Direction Method of Multipliers


                               Prof S. Boyd
                         EE364b, Stanford University




source:
Distributed Optimization and Statistical Learning via the Alternating
Direction Method of Multipliers (Boyd, Parikh, Chu, Peleato, Eckstein)


                                                                         1
Goals




robust methods for
    arbitrary-scale optimization
      – machine learning/statistics with huge data-sets
      – dynamic optimization on large-scale network
    decentralized optimization
      – devices/processors/agents coordinate to solve large problem, by passing
        relatively small messages




                                                                                  2
Outline


Dual decomposition

Method of multipliers

Alternating direction method of multipliers

Common patterns

Examples

Consensus and exchange

Conclusions


Dual decomposition                            3
Dual problem



    convex equality constrained optimization problem

                               minimize     f (x)
                               subject to   Ax = b

    Lagrangian: L(x, y) = f (x) + y T (Ax − b)

    dual function: g(y) = inf x L(x, y)

    dual problem:    maximize g(y)

    recover x = argminx L(x, y )



Dual decomposition                                     4
Dual ascent




    gradient method for dual problem: y k+1 = y k + αk g(y k )

      g(y k ) = A˜ − b, where x = argminx L(x, y k )
                 x            ˜

    dual ascent method is
              xk+1    :=    argminx L(x, y k )     // x-minimization
              y k+1   :=    y k + αk (Axk+1 − b)   // dual update

    works, with lots of strong assumptions




Dual decomposition                                                     5
Dual decomposition



    suppose f is separable:

               f (x) = f1 (x1 ) + · · · + fN (xN ),    x = (x1 , . . . , xN )

    then L is separable in x: L(x, y) = L1 (x1 , y) + · · · + LN (xN , y) − y T b,

                            Li (xi , y) = fi (xi ) + y T Ai xi

    x-minimization in dual ascent splits into N separate minimizations

                           xk+1
                            i      :=    argmin Li (xi , y k )
                                             xi

    which can be carried out in parallel


Dual decomposition                                                                   6
Dual decomposition



    dual decomposition (Everett, Dantzig, Wolfe, Benders 1960–65)

                  xk+1
                   i      :=   argminxi Li (xi , y k ),     i = 1, . . . , N
                                            N
                  y k+1   :=   y k + αk (   i=1   Ai xk+1 − b)
                                                      i

                                                    k+1
    scatter y k ; update xi in parallel; gather Ai xi

    solve a large problem
       – by iteratively solving subproblems (in parallel)
       – dual variable update provides coordination

    works, with lots of assumptions; often slow


Dual decomposition                                                             7
Outline


Dual decomposition

Method of multipliers

Alternating direction method of multipliers

Common patterns

Examples

Consensus and exchange

Conclusions


Method of multipliers                         8
Method of multipliers



    a method to robustify dual ascent

    use augmented Lagrangian (Hestenes, Powell 1969), ρ > 0

                 Lρ (x, y) = f (x) + y T (Ax − b) + (ρ/2) Ax − b   2
                                                                   2


    method of multipliers (Hestenes, Powell; analysis in Bertsekas 1982)

                          xk+1      :=   argmin Lρ (x, y k )
                                              x
                              k+1
                          y         :=   y + ρ(Axk+1 − b)
                                          k


    (note specific dual update step length ρ)



Method of multipliers                                                      9
Method of multipliers dual update step


    optimality conditions (for differentiable f ):

                        Ax − b = 0,              f (x ) + AT y = 0

    (primal and dual feasibility)
    since xk+1 minimizes Lρ (x, y k )
                                    k+1 k
                   0    =   x Lρ (x      ,y )
                                   k+1
                        =   x f (x     ) + AT      y k + ρ(Axk+1 − b)
                                     k+1
                        =   x f (x         ) + AT y k+1

    dual update y k+1 = y k + ρ(xk+1 − b) makes (xk+1 , y k+1 ) dual feasible
    primal feasibility achieved in limit: Axk+1 − b → 0


Method of multipliers                                                       10
Method of multipliers




(compared to dual decomposition)
    good news: converges under much more relaxed conditions
    (f can be nondifferentiable, take on value +∞, . . . )

    bad news: quadratic penalty destroys splitting of the x-update, so can’t
    do decomposition




Method of multipliers                                                      11
Outline


Dual decomposition

Method of multipliers

Alternating direction method of multipliers

Common patterns

Examples

Consensus and exchange

Conclusions


Alternating direction method of multipliers      12
Alternating direction method of multipliers




    a method
       – with good robustness of method of multipliers
       – which can support decomposition

    “robust dual decomposition” or “decomposable method of multipliers”

    proposed by Gabay, Mercier, Glowinski, Marrocco in 1976




Alternating direction method of multipliers                               13
Alternating direction method of multipliers

    ADMM problem form (with f , g convex)

                                minimize f (x) + g(z)
                                subject to Ax + Bz = c

       – two sets of variables, with separable objective

    Lρ (x, z, y) = f (x) + g(z) + y T (Ax + Bz − c) + (ρ/2) Ax + Bz − c        2
                                                                               2


    ADMM:
           xk+1      :=   argminx Lρ (x, z k , y k )       // x-minimization
               k+1                        k+1          k
           z         :=   argminz Lρ (x         , z, y )   // z-minimization
           y k+1     :=   y k + ρ(Axk+1 + Bz k+1 − c)      // dual update

Alternating direction method of multipliers                                        14
Alternating direction method of multipliers




    if we minimized over x and z jointly, reduces to method of multipliers

    instead, we do one pass of a Gauss-Seidel method

    we get splitting since we minimize over x with z fixed, and vice versa




Alternating direction method of multipliers                                  15
ADMM and optimality conditions


    optimality conditions (for differentiable case):
       – primal feasibility: Ax + Bz − c = 0
       – dual feasibility: f (x) + AT y = 0,      g(z) + B T y = 0

    since z k+1 minimizes Lρ (xk+1 , z, y k ) we have

              0 =         g(z k+1 ) + B T y k + ρB T (Axk+1 + Bz k+1 − c)
                  =       g(z k+1 ) + B T y k+1

    so with ADMM dual variable update, (xk+1 , z k+1 , y k+1 ) satisfies second
    dual feasibility condition

    primal and first dual feasibility are achieved as k → ∞


Alternating direction method of multipliers                                 16
ADMM with scaled dual variables


    combine linear and quadratic terms in augmented Lagrangian

    Lρ (x, z, y)    =    f (x) + g(z) + y T (Ax + Bz − c) + (ρ/2) Ax + Bz − c    2
                                                                                 2
                    =    f (x) + g(z) + (ρ/2) Ax + Bz − c + u 2 + const.
                                                                2

    with uk = (1/ρ)y k

    ADMM (scaled dual form):

           xk+1     :=    argmin f (x) + (ρ/2) Ax + Bz k − c + uk   2
                                                                    2
                              x
           z k+1    :=    argmin g(z) + (ρ/2) Axk+1 + Bz − c + uk       2
                                                                        2
                              z
           uk+1     :=    uk + (Axk+1 + Bz k+1 − c)


Alternating direction method of multipliers                                 17
Convergence




    assume (very little!)
       – f , g convex, closed, proper
       – L0 has a saddle point

    then ADMM converges:
       – iterates approach feasibility: Axk + Bz k − c → 0
       – objective approaches optimal value: f (xk ) + g(z k ) → p




Alternating direction method of multipliers                          18
Related algorithms


    operator splitting methods
    (Douglas, Peaceman, Rachford, Lions, Mercier, . . . 1950s, 1979)

    proximal point algorithm (Rockafellar 1976)

    Dykstra’s alternating projections algorithm (1983)

    Spingarn’s method of partial inverses (1985)

    Rockafellar-Wets progressive hedging (1991)

    proximal methods (Rockafellar, many others, 1976–present)

    Bregman iterative methods (2008–present)

    most of these are special cases of the proximal point algorithm

Alternating direction method of multipliers                            19
Outline


Dual decomposition

Method of multipliers

Alternating direction method of multipliers

Common patterns

Examples

Consensus and exchange

Conclusions


Common patterns                               20
Common patterns




    x-update step requires minimizing f (x) + (ρ/2) Ax − v 2 2
    (with v = Bz k − c + uk , which is constant during x-update)
    similar for z-update
    several special cases come up often
    can simplify update by exploit structure in these cases




Common patterns                                                    21
Decomposition




    suppose f is block-separable,

             f (x) = f1 (x1 ) + · · · + fN (xN ),   x = (x1 , . . . , xN )

    A is conformably block separable: AT A is block diagonal
    then x-update splits into N parallel updates of xi




Common patterns                                                              22
Proximal operator



    consider x-update when A = I

              x+ = argmin f (x) + (ρ/2) x − v          2
                                                       2   = proxf,ρ (v)
                               x

    some special cases:


     f = IC (indicator fct. of set C)        x+ := ΠC (v) (projection onto C)
     f =λ ·       1   (   1   norm)          x+ := Sλ/ρ (vi ) (soft thresholding)
                                              i

    (Sa (v) = (v − a)+ − (−v − a)+ )




Common patterns                                                                     23
Quadratic objective



    f (x) = (1/2)xT P x + q T x + r

    x+ := (P + ρAT A)−1 (ρAT v − q)

    use matrix inversion lemma when computationally advantageous

          (P + ρAT A)−1 = P −1 − ρP −1 AT (I + ρAP −1 AT )−1 AP −1

    (direct method) cache factorization of P + ρAT A (or I + ρAP −1 AT )

    (iterative method) warm start, early stopping, reducing tolerances




Common patterns                                                            24
Smooth objective




    f smooth

    can use standard methods for smooth minimization
      – gradient, Newton, or quasi-Newton
      – preconditionned CG, limited-memory BFGS (scale to very large problems)

    can exploit
      – warm start
      – early stopping, with tolerances decreasing as ADMM proceeds




Common patterns                                                             25
Outline


Dual decomposition

Method of multipliers

Alternating direction method of multipliers

Common patterns

Examples

Consensus and exchange

Conclusions


Examples                                      26
Constrained convex optimization


    consider ADMM for generic problem

                                     minimize f (x)
                                     subject to x ∈ C

    ADMM form: take g to be indicator of C

                               minimize      f (x) + g(z)
                               subject to    x−z =0

    algorithm:

                 xk+1    :=   argmin f (x) + (ρ/2) x − z k + uk   2
                                                                  2
                                 x
                 z k+1   :=   ΠC (xk+1 + uk )
                 uk+1    :=   uk + xk+1 − z k+1

Examples                                                              27
Lasso


    lasso problem:
                                                 2
                     minimize    (1/2) Ax − b    2   +λ x   1


    ADMM form:
                                                 2
                     minimize (1/2) Ax − b       2   +λ z   1
                     subject to x − z = 0

    ADMM:

                 xk+1    :=   (AT A + ρI)−1 (AT b + ρz k − y k )
                 z k+1   :=   Sλ/ρ (xk+1 + y k /ρ)
                 y k+1   :=   y k + ρ(xk+1 − z k+1 )


Examples                                                           28
Lasso example



    example with dense A ∈ R1500×5000
    (1500 measurements; 5000 regressors)


    computation times

                factorization (same as ridge regression)   1.3s
                subsequent ADMM iterations                 0.03s
                lasso solve (about 50 ADMM iterations)     2.9s
                full regularization path (30 λ’s)          4.4s

    not bad for a very short Matlab script


Examples                                                           29
Sparse inverse covariance selection




    S: empirical covariance of samples from N (0, Σ), with Σ−1 sparse
    (i.e., Gaussian Markov random field)

    estimate Σ−1 via   1   regularized maximum likelihood

                   minimize     Tr(SX) − log det X + λ X    1


    methods: COVSEL (Banerjee et al 2008), graphical lasso (FHT 2008)




Examples                                                                30
Sparse inverse covariance selection via ADMM




    ADMM form:
                  minimize Tr(SX) − log det X + λ Z    1
                  subject to X − Z = 0

    ADMM:

     X k+1   :=   argmin Tr(SX) − log det X + (ρ/2) X − Z k + U k   2
                                                                    F
                    X
     Z k+1   :=   Sλ/ρ (X k+1 + U k )
     U k+1   :=   U k + (X k+1 − Z k+1 )




Examples                                                                31
Analytical solution for X-update




    compute eigendecomposition ρ(Z k − U k ) − S = QΛQT

                         ˜
    form diagonal matrix X with

                           ˜     λi +    λ2 + 4ρ
                                           i
                           Xii =
                                        2ρ

                  ˜
    let X k+1 := QXQT

    cost of X-update is an eigendecomposition




Examples                                                  32
Sparse inverse covariance selection example




    Σ−1 is 1000 × 1000 with 104 nonzeros
      – graphical lasso (Fortran): 20 seconds – 3 minutes
      – ADMM (Matlab): 3 – 10 minutes
      – (depends on choice of λ)

    very rough experiment, but with no special tuning, ADMM is in ballpark
    of recent specialized methods

    (for comparison, COVSEL takes 25+ min when Σ−1 is a 400 × 400
    tridiagonal matrix)




Examples                                                                33
Outline


Dual decomposition

Method of multipliers

Alternating direction method of multipliers

Common patterns

Examples

Consensus and exchange

Conclusions


Consensus and exchange                        34
Consensus optimization


    want to solve problem with N objective terms
                                               N
                               minimize        i=1   fi (x)


      – e.g., fi is the loss function for ith block of training data

    ADMM form:
                                               N
                              minimize      i=1 fi (xi )
                              subject to xi − z = 0

      –   xi are local variables
      –   z is the global variable
      –   xi − z = 0 are consistency or consensus constraints
      –   can add regularization using a g(z) term


Consensus and exchange                                                 35
Consensus optimization via ADMM



                        N                  T                           2
    Lρ (x, z, y) =      i=1    fi (xi ) + yi (xi − z) + (ρ/2) xi − z   2


    ADMM:

        xk+1
         i         :=   argmin fi (xi ) + yi (xi − z k ) + (ρ/2) xi − z k
                                           kT                               2
                                                                            2
                              xi
                               N
                         1
         z   k+1
                   :=               xk+1 + (1/ρ)yi
                                     i
                                                 k
                         N    i=1
          k+1
         yi        :=   yi + ρ(xk+1 − z k+1 )
                         k
                                i


    with regularization, averaging in z update is followed by proxg,ρ



Consensus and exchange                                                          36
Consensus optimization via ADMM


              N      k
    using     i=1   yi = 0, algorithm simplifies to

          xk+1
           i        :=                      kT
                         argmin fi (xi ) + yi (xi − xk ) + (ρ/2) xi − xk   2
                                                                           2
                           xi
           k+1
          yi        :=   yi + ρ(xk+1 − xk+1 )
                          k
                                 i

                                N
    where xk = (1/N )           i=1   xk
                                       i



    in each iteration
      –   gather xk and average to get xk
                   i
      –   scatter the average xk to processors
                    k
      –   update yi locally (in each processor, in parallel)
      –   update xi locally


Consensus and exchange                                                         37
Statistical interpretation




    fi is negative log-likelihood for parameter x given ith data block

    xk+1 is MAP estimate under prior N (xk + (1/ρ)yi , ρI)
     i
                                                   k


    prior mean is previous iteration’s consensus shifted by ‘price’ of processor
    i disagreeing with previous consensus

    processors only need to support a Gaussian MAP method
      – type or number of data in each block not relevant
      – consensus protocol yields global maximum-likelihood estimate




Consensus and exchange                                                        38
Consensus classification


    data (examples) (ai , bi ), i = 1, . . . , N , ai ∈ Rn , bi ∈ {−1, +1}

    linear classifier sign(aT w + v), with weight w, offset v

    margin for ith example is bi (aT w + v); want margin to be positive
                                   i


    loss for ith example is l(bi (aT w + v))
                                   i
      – l is loss function (hinge, logistic, probit, exponential, . . . )

                                       N
    choose w, v to minimize      1
                                 N     i=1   l(bi (aT w + v)) + r(w)
                                                    i
      – r(w) is regularization term ( 2 ,     1,   ...)

    split data and use ADMM consensus to solve


Consensus and exchange                                                       39
Consensus SVM example




    hinge loss l(u) = (1 − u)+ with   2   regularization

    baby problem with n = 2, N = 400 to illustrate

    examples split into 20 groups, in worst possible way:
    each group contains only positive or negative examples




Consensus and exchange                                       40
Iteration 1

           10


            8


            6


            4


            2


            0


          −2


          −4


          −6


          −8


          −10
            −3      −2   −1        0        1   2   3


Consensus and exchange                                  41
Iteration 5

           10


            8


            6


            4


            2


            0


          −2


          −4


          −6


          −8


          −10
            −3      −2   −1        0        1   2   3


Consensus and exchange                                  42
Iteration 40

           10


            8


            6


            4


            2


            0


          −2


          −4


          −6


          −8


          −10
            −3      −2   −1         0        1   2   3


Consensus and exchange                                   43
Distributed lasso example



    example with dense A ∈ R400000×8000 (roughly 30 GB of data)
      – distributed solver written in C using MPI and GSL
      – no optimization or tuned libraries (like ATLAS, MKL)
      – split into 80 subsystems across 10 (8-core) machines on Amazon EC2


    computation times
                loading data                               30s
                factorization                              5m
                subsequent ADMM iterations                 0.5–2s
                lasso solve (about 15 ADMM iterations)     5–6m



Consensus and exchange                                                       44
Exchange problem


                                         N
                         minimize        i=1   fi (xi )
                                         N
                         subject to      i=1   xi = 0

    another canonical problem, like consensus
    in fact, it’s the dual of consensus
    can interpret as N agents exchanging n goods to minimize a total cost
    (xi )j ≥ 0 means agent i receives (xi )j of good j from exchange
    (xi )j < 0 means agent i contributes |(xi )j | of good j to exchange
                  N
    constraint i=1 xi = 0 is equilibrium or market clearing constraint
    optimal dual variable y is a set of valid prices for the goods
    suggests real or virtual cash payment (y )T xi by agent i


Consensus and exchange                                                      45
Exchange ADMM


    solve as a generic constrained convex problem with constraint set

                     C = {x ∈ RnN | x1 + x2 + · · · + xN = 0}

    scaled form:

           xk+1
            i        :=   argmin fi (xi ) + (ρ/2) xi − xk + xk + uk
                                                        i
                                                                      2
                                                                      2
                               xi

           uk+1      :=   uk + xk+1

    unscaled form:

        xk+1
         i      :=    argmin fi (xi ) + y kT xi + (ρ/2) xi − (xk − xk )
                                                               i
                                                                          2
                                                                          2
                          xi

        y k+1   :=    y k + ρxk+1


Consensus and exchange                                                        46
Interpretation as tˆtonnement process
                                a




    tˆtonnement process: iteratively update prices to clear market
     a
    work towards equilibrium by increasing/decreasing prices of goods based
    on excess demand/supply
    dual decomposition is the simplest tˆtonnement algorithm
                                        a
    ADMM adds proximal regularization
      – incorporate agents’ prior commitment to help clear market
      – convergence far more robust convergence than dual decomposition




Consensus and exchange                                                    47
Distributed dynamic energy management



    N devices exchange power in time periods t = 1, . . . , T
    xi ∈ RT is power flow profile for device i
    fi (xi ) is cost of profile xi (and encodes constraints)
    x1 + · · · + xN = 0 is energy balance (in each time period)
    dynamic energy management problem is exchange problem
    exchange ADMM gives distributed method for dynamic energy
    management
    each device optimizes its own profile, with quadratic regularization for
    coordination
    residual (energy imbalance) is driven to zero



Consensus and exchange                                                        48
Generators


    10                                 10
     9                                  5
     8                                  0
     7                                   0      5    10         15   20
     6                                  6
     5                                  4
     4                                  2
     3                                   0      5    10         15   20
     2                                1.5
                                        1
     1                                0.5
     0                                  0
      0     2     4 −x 6     8     10    0      5    10         15   20
                         t                                  t

    3 example generators
    left: generator costs/limits; right: ramp constraints
    can add cost for power changes

Consensus and exchange                                                    49
Fixed loads

                   15


                   10


                    5


                    0
                     0     5      10       15    20
                                       t

    2 example fixed loads
    cost is +∞ for not supplying load; zero otherwise

Consensus and exchange                                  50
Shiftable load

                    2

                  1.5

                    1

                  0.5

                    0
                     0    5       10       15   20
                                       t

    total energy consumed over an interval must exceed given minimum level
    limits on energy consumed in each period
    cost is +∞ for violating constraints; zero otherwise
Consensus and exchange                                                  51
Battery energy storage system
                6
                4
                2
                0
                 0       5       10       15      20
             0.6
             0.4
             0.2
               0
            −0.2
            −0.4
                 0       5       10       15      20
                                      t
    energy store with maximum capacity, charge/discharge limits
    black: battery charge, red: charge/discharge profile
    cost is +∞ for violating constraints; zero otherwise
Consensus and exchange                                            52
Electric vehicle charging system


                   80
                   70
                   60
                   50
                   40
                   30
                   20
                   10
                    0
                     0      5      10       15      20
                                        t

    black: desired charge profile; blue: charge profile
    shortfall cost for not meeting desired charge

Consensus and exchange                                   53
HVAC
                 100
                  90
                  80
                  70
                  60
                  50
                    0       5      10       15    20
                    2
                  1.5
                    1
                  0.5
                    0
                     0      5      10       15    20
                                        t

    thermal load (e.g., room, refrigerator) with temperature limits
    magenta: ambient temperature; blue: load temperature
    red: cooling energy profile
    cost is +∞ for violating constraints; zero otherwise
Consensus and exchange                                                54
External tie

                  2.5
                    2
                  1.5
                    1
                  0.5
                    0
                     0        5      10        15   20

                   1
                   0
                  −1
                     0       5       10        15   20
                                          t

    buy/sell energy from/to external grid at price pext (t) ± γ(t)
    solid: pext (t); dashed: pext (t) ± γ(t)

Consensus and exchange                                               55
Smart grid example



10 devices (already described above)
    3 generators
    2 fixed loads
    1 shiftable load
    1 EV charging systems
    1 battery
    1 HVAC system
    1 external tie




Consensus and exchange                        56
Convergence

iteration: k = 1

 10                                    0.3
  5                                    0.2
  0
   0      5        10       15   20    0.1
  6
  4                                      0
  2
                                      −0.1
   0      5        10       15   20
1.5                                   −0.2
  1
0.5                                   −0.3
  0
   0      5        10       15   20       0      5     10       15   20
                        t                                   t

    left: solid: optimal generator profile, dashed: profile at kth iteration
    right: residual vector xk
                           ¯

Consensus and exchange                                                       57
Convergence

iteration: k = 3

 10                                    0.3
  5                                    0.2
  0
   0      5        10       15   20    0.1
  6
  4                                      0
  2
                                      −0.1
   0      5        10       15   20
1.5                                   −0.2
  1
0.5                                   −0.3
  0
   0      5        10       15   20       0      5     10       15   20
                        t                                   t

    left: solid: optimal generator profile, dashed: profile at kth iteration
    right: residual vector xk
                           ¯

Consensus and exchange                                                       57
Convergence

iteration: k = 5

 10                                    0.3
  5                                    0.2
  0
   0      5        10       15   20    0.1
  6
  4                                      0
  2
                                      −0.1
   0      5        10       15   20
1.5                                   −0.2
  1
0.5                                   −0.3
  0
   0      5        10       15   20       0      5     10       15   20
                        t                                   t

    left: solid: optimal generator profile, dashed: profile at kth iteration
    right: residual vector xk
                           ¯

Consensus and exchange                                                       57
Convergence

iteration: k = 10

 10                                    0.3
  5                                    0.2
  0
   0      5     10       15   20       0.1
  6
  4                                      0
  2
                                     −0.1
   0      5     10       15   20
1.5                                  −0.2
  1
0.5                                  −0.3
  0
   0      5     10       15   20         0       5     10       15   20
                     t                                      t

    left: solid: optimal generator profile, dashed: profile at kth iteration
    right: residual vector xk
                           ¯

Consensus and exchange                                                       57
Convergence

iteration: k = 15

 10                                    0.3
  5                                    0.2
  0
   0      5     10       15   20       0.1
  6
  4                                      0
  2
                                     −0.1
   0      5     10       15   20
1.5                                  −0.2
  1
0.5                                  −0.3
  0
   0      5     10       15   20         0       5     10       15   20
                     t                                      t

    left: solid: optimal generator profile, dashed: profile at kth iteration
    right: residual vector xk
                           ¯

Consensus and exchange                                                       57
Convergence

iteration: k = 20

 10                                    0.3
  5                                    0.2
  0
   0      5     10       15   20       0.1
  6
  4                                      0
  2
                                     −0.1
   0      5     10       15   20
1.5                                  −0.2
  1
0.5                                  −0.3
  0
   0      5     10       15   20         0       5     10       15   20
                     t                                      t

    left: solid: optimal generator profile, dashed: profile at kth iteration
    right: residual vector xk
                           ¯

Consensus and exchange                                                       57
Convergence

iteration: k = 25

 10                                    0.3
  5                                    0.2
  0
   0      5     10       15   20       0.1
  6
  4                                      0
  2
                                     −0.1
   0      5     10       15   20
1.5                                  −0.2
  1
0.5                                  −0.3
  0
   0      5     10       15   20         0       5     10       15   20
                     t                                      t

    left: solid: optimal generator profile, dashed: profile at kth iteration
    right: residual vector xk
                           ¯

Consensus and exchange                                                       57
Convergence

iteration: k = 30

 10                                    0.3
  5                                    0.2
  0
   0      5     10       15   20       0.1
  6
  4                                      0
  2
                                     −0.1
   0      5     10       15   20
1.5                                  −0.2
  1
0.5                                  −0.3
  0
   0      5     10       15   20         0       5     10       15   20
                     t                                      t

    left: solid: optimal generator profile, dashed: profile at kth iteration
    right: residual vector xk
                           ¯

Consensus and exchange                                                       57
Convergence

iteration: k = 35

 10                                    0.3
  5                                    0.2
  0
   0      5     10       15   20       0.1
  6
  4                                      0
  2
                                     −0.1
   0      5     10       15   20
1.5                                  −0.2
  1
0.5                                  −0.3
  0
   0      5     10       15   20         0       5     10       15   20
                     t                                      t

    left: solid: optimal generator profile, dashed: profile at kth iteration
    right: residual vector xk
                           ¯

Consensus and exchange                                                       57
Convergence

iteration: k = 40

 10                                    0.3
  5                                    0.2
  0
   0      5     10       15   20       0.1
  6
  4                                      0
  2
                                     −0.1
   0      5     10       15   20
1.5                                  −0.2
  1
0.5                                  −0.3
  0
   0      5     10       15   20         0       5     10       15   20
                     t                                      t

    left: solid: optimal generator profile, dashed: profile at kth iteration
    right: residual vector xk
                           ¯

Consensus and exchange                                                       57
Convergence

iteration: k = 45

 10                                    0.3
  5                                    0.2
  0
   0      5     10       15   20       0.1
  6
  4                                      0
  2
                                     −0.1
   0      5     10       15   20
1.5                                  −0.2
  1
0.5                                  −0.3
  0
   0      5     10       15   20         0       5     10       15   20
                     t                                      t

    left: solid: optimal generator profile, dashed: profile at kth iteration
    right: residual vector xk
                           ¯

Consensus and exchange                                                       57
Convergence

iteration: k = 50

 10                                    0.3
  5                                    0.2
  0
   0      5     10       15   20       0.1
  6
  4                                      0
  2
                                     −0.1
   0      5     10       15   20
1.5                                  −0.2
  1
0.5                                  −0.3
  0
   0      5     10       15   20         0       5     10       15   20
                     t                                      t

    left: solid: optimal generator profile, dashed: profile at kth iteration
    right: residual vector xk
                           ¯

Consensus and exchange                                                       57
Outline


Dual decomposition

Method of multipliers

Alternating direction method of multipliers

Common patterns

Examples

Consensus and exchange

Conclusions


Conclusions                                   58
Summary and conclusions




ADMM
    is the same as, or closely related to, many methods with other names
    has been around since the 1970s
    gives simple single-processor algorithms that can be competitive with
    state-of-the-art
    can be used to coordinate many processors, each solving a substantial
    problem, to solve a very large problem




Conclusions                                                                 59

More Related Content

What's hot

Matlab lecture 8 – newton's forward and backword interpolation@taj copy
Matlab lecture 8 – newton's forward and backword interpolation@taj   copyMatlab lecture 8 – newton's forward and backword interpolation@taj   copy
Matlab lecture 8 – newton's forward and backword interpolation@taj copyTajim Md. Niamat Ullah Akhund
 
Adaptive unsharp masking
Adaptive unsharp maskingAdaptive unsharp masking
Adaptive unsharp maskingRavi Teja
 
2 d transformations by amit kumar (maimt)
2 d transformations by amit kumar (maimt)2 d transformations by amit kumar (maimt)
2 d transformations by amit kumar (maimt)Amit Kapoor
 
Convolution sum using graphical and matrix method
Convolution sum using graphical and matrix methodConvolution sum using graphical and matrix method
Convolution sum using graphical and matrix methodDr.SHANTHI K.G
 
Hidden lines & surfaces
Hidden lines & surfacesHidden lines & surfaces
Hidden lines & surfacesAnkur Kumar
 
Jacobi and gauss-seidel
Jacobi and gauss-seidelJacobi and gauss-seidel
Jacobi and gauss-seidelarunsmm
 
Enhancement in spatial domain
Enhancement in spatial domainEnhancement in spatial domain
Enhancement in spatial domainAshish Kumar
 
Noise2Score: Tweedie’s Approach to Self-Supervised Image Denoising without Cl...
Noise2Score: Tweedie’s Approach to Self-Supervised Image Denoising without Cl...Noise2Score: Tweedie’s Approach to Self-Supervised Image Denoising without Cl...
Noise2Score: Tweedie’s Approach to Self-Supervised Image Denoising without Cl...KwanyoungKim7
 
Machine learning of structured outputs
Machine learning of structured outputsMachine learning of structured outputs
Machine learning of structured outputszukun
 
Computational Method to Solve the Partial Differential Equations (PDEs)
Computational Method to Solve the Partial Differential  Equations (PDEs)Computational Method to Solve the Partial Differential  Equations (PDEs)
Computational Method to Solve the Partial Differential Equations (PDEs)Dr. Khurram Mehboob
 
Discrete cosine transform
Discrete cosine transform   Discrete cosine transform
Discrete cosine transform Rashmi Karkra
 
Edge Detection and Segmentation
Edge Detection and SegmentationEdge Detection and Segmentation
Edge Detection and SegmentationA B Shinde
 
Wavelet based image compression technique
Wavelet based image compression techniqueWavelet based image compression technique
Wavelet based image compression techniquePriyanka Pachori
 
Digital Image Processing: Image Segmentation
Digital Image Processing: Image SegmentationDigital Image Processing: Image Segmentation
Digital Image Processing: Image SegmentationMostafa G. M. Mostafa
 

What's hot (20)

Matlab lecture 8 – newton's forward and backword interpolation@taj copy
Matlab lecture 8 – newton's forward and backword interpolation@taj   copyMatlab lecture 8 – newton's forward and backword interpolation@taj   copy
Matlab lecture 8 – newton's forward and backword interpolation@taj copy
 
Image denoising
Image denoisingImage denoising
Image denoising
 
Adaptive unsharp masking
Adaptive unsharp maskingAdaptive unsharp masking
Adaptive unsharp masking
 
METHOD OF JACOBI
METHOD OF JACOBIMETHOD OF JACOBI
METHOD OF JACOBI
 
2 d transformations by amit kumar (maimt)
2 d transformations by amit kumar (maimt)2 d transformations by amit kumar (maimt)
2 d transformations by amit kumar (maimt)
 
Digital Image Processing
Digital Image ProcessingDigital Image Processing
Digital Image Processing
 
Convolution sum using graphical and matrix method
Convolution sum using graphical and matrix methodConvolution sum using graphical and matrix method
Convolution sum using graphical and matrix method
 
Hidden lines & surfaces
Hidden lines & surfacesHidden lines & surfaces
Hidden lines & surfaces
 
Jacobi and gauss-seidel
Jacobi and gauss-seidelJacobi and gauss-seidel
Jacobi and gauss-seidel
 
Enhancement in spatial domain
Enhancement in spatial domainEnhancement in spatial domain
Enhancement in spatial domain
 
Noise2Score: Tweedie’s Approach to Self-Supervised Image Denoising without Cl...
Noise2Score: Tweedie’s Approach to Self-Supervised Image Denoising without Cl...Noise2Score: Tweedie’s Approach to Self-Supervised Image Denoising without Cl...
Noise2Score: Tweedie’s Approach to Self-Supervised Image Denoising without Cl...
 
Histogram Equalization
Histogram EqualizationHistogram Equalization
Histogram Equalization
 
Dip chapter 2
Dip chapter 2Dip chapter 2
Dip chapter 2
 
Machine learning of structured outputs
Machine learning of structured outputsMachine learning of structured outputs
Machine learning of structured outputs
 
Computational Method to Solve the Partial Differential Equations (PDEs)
Computational Method to Solve the Partial Differential  Equations (PDEs)Computational Method to Solve the Partial Differential  Equations (PDEs)
Computational Method to Solve the Partial Differential Equations (PDEs)
 
Discrete cosine transform
Discrete cosine transform   Discrete cosine transform
Discrete cosine transform
 
Realibity design
Realibity designRealibity design
Realibity design
 
Edge Detection and Segmentation
Edge Detection and SegmentationEdge Detection and Segmentation
Edge Detection and Segmentation
 
Wavelet based image compression technique
Wavelet based image compression techniqueWavelet based image compression technique
Wavelet based image compression technique
 
Digital Image Processing: Image Segmentation
Digital Image Processing: Image SegmentationDigital Image Processing: Image Segmentation
Digital Image Processing: Image Segmentation
 

Viewers also liked

Stochastic Alternating Direction Method of Multipliers
Stochastic Alternating Direction Method of MultipliersStochastic Alternating Direction Method of Multipliers
Stochastic Alternating Direction Method of MultipliersTaiji Suzuki
 
Alternating direction-method-for-image-restoration
Alternating direction-method-for-image-restorationAlternating direction-method-for-image-restoration
Alternating direction-method-for-image-restorationPrashant Pal
 
Image restoration
Image restorationImage restoration
Image restorationAzad Singh
 
Digital Image Processing - Image Restoration
Digital Image Processing - Image RestorationDigital Image Processing - Image Restoration
Digital Image Processing - Image RestorationMathankumar S
 
Image Restoration (Digital Image Processing)
Image Restoration (Digital Image Processing)Image Restoration (Digital Image Processing)
Image Restoration (Digital Image Processing)Kalyan Acharjya
 

Viewers also liked (6)

Stochastic Alternating Direction Method of Multipliers
Stochastic Alternating Direction Method of MultipliersStochastic Alternating Direction Method of Multipliers
Stochastic Alternating Direction Method of Multipliers
 
Alternating direction-method-for-image-restoration
Alternating direction-method-for-image-restorationAlternating direction-method-for-image-restoration
Alternating direction-method-for-image-restoration
 
Image restoration
Image restorationImage restoration
Image restoration
 
DIP - Image Restoration
DIP - Image RestorationDIP - Image Restoration
DIP - Image Restoration
 
Digital Image Processing - Image Restoration
Digital Image Processing - Image RestorationDigital Image Processing - Image Restoration
Digital Image Processing - Image Restoration
 
Image Restoration (Digital Image Processing)
Image Restoration (Digital Image Processing)Image Restoration (Digital Image Processing)
Image Restoration (Digital Image Processing)
 

Similar to Alternating direction

Measures of risk on variability with application in stochastic activity networks
Measures of risk on variability with application in stochastic activity networksMeasures of risk on variability with application in stochastic activity networks
Measures of risk on variability with application in stochastic activity networksAlexander Decker
 
Kernel based models for geo- and environmental sciences- Alexei Pozdnoukhov –...
Kernel based models for geo- and environmental sciences- Alexei Pozdnoukhov –...Kernel based models for geo- and environmental sciences- Alexei Pozdnoukhov –...
Kernel based models for geo- and environmental sciences- Alexei Pozdnoukhov –...Beniamino Murgante
 
Andreas Eberle
Andreas EberleAndreas Eberle
Andreas EberleBigMC
 
Emat 213 midterm 2 fall 2005
Emat 213 midterm 2 fall 2005Emat 213 midterm 2 fall 2005
Emat 213 midterm 2 fall 2005akabaka12
 
Maximum likelihood estimation of regularisation parameters in inverse problem...
Maximum likelihood estimation of regularisation parameters in inverse problem...Maximum likelihood estimation of regularisation parameters in inverse problem...
Maximum likelihood estimation of regularisation parameters in inverse problem...Valentin De Bortoli
 
Lesson 21: Curve Sketching (slides)
Lesson 21: Curve Sketching (slides)Lesson 21: Curve Sketching (slides)
Lesson 21: Curve Sketching (slides)Mel Anthony Pepito
 
Lesson 21: Curve Sketching (slides)
Lesson 21: Curve Sketching (slides)Lesson 21: Curve Sketching (slides)
Lesson 21: Curve Sketching (slides)Matthew Leingang
 
ISI MSQE Entrance Question Paper (2008)
ISI MSQE Entrance Question Paper (2008)ISI MSQE Entrance Question Paper (2008)
ISI MSQE Entrance Question Paper (2008)CrackDSE
 
Doubly Accelerated Stochastic Variance Reduced Gradient Methods for Regulariz...
Doubly Accelerated Stochastic Variance Reduced Gradient Methods for Regulariz...Doubly Accelerated Stochastic Variance Reduced Gradient Methods for Regulariz...
Doubly Accelerated Stochastic Variance Reduced Gradient Methods for Regulariz...Tomoya Murata
 
Emat 213 study guide
Emat 213 study guideEmat 213 study guide
Emat 213 study guideakabaka12
 
Calculus First Test 2011/10/20
Calculus First Test 2011/10/20Calculus First Test 2011/10/20
Calculus First Test 2011/10/20Kuan-Lun Wang
 
Engr 213 midterm 2b sol 2010
Engr 213 midterm 2b sol 2010Engr 213 midterm 2b sol 2010
Engr 213 midterm 2b sol 2010akabaka12
 
Lesson 28: The Fundamental Theorem of Calculus
Lesson 28: The Fundamental Theorem of CalculusLesson 28: The Fundamental Theorem of Calculus
Lesson 28: The Fundamental Theorem of CalculusMatthew Leingang
 
Quadratic Function Presentation
Quadratic Function PresentationQuadratic Function Presentation
Quadratic Function PresentationRyanWatt
 
Lesson 28: The Fundamental Theorem of Calculus
Lesson 28: The Fundamental Theorem of CalculusLesson 28: The Fundamental Theorem of Calculus
Lesson 28: The Fundamental Theorem of CalculusMatthew Leingang
 
Hyperfunction method for numerical integration and Fredholm integral equation...
Hyperfunction method for numerical integration and Fredholm integral equation...Hyperfunction method for numerical integration and Fredholm integral equation...
Hyperfunction method for numerical integration and Fredholm integral equation...HidenoriOgata
 

Similar to Alternating direction (20)

Measures of risk on variability with application in stochastic activity networks
Measures of risk on variability with application in stochastic activity networksMeasures of risk on variability with application in stochastic activity networks
Measures of risk on variability with application in stochastic activity networks
 
Kernel based models for geo- and environmental sciences- Alexei Pozdnoukhov –...
Kernel based models for geo- and environmental sciences- Alexei Pozdnoukhov –...Kernel based models for geo- and environmental sciences- Alexei Pozdnoukhov –...
Kernel based models for geo- and environmental sciences- Alexei Pozdnoukhov –...
 
Andreas Eberle
Andreas EberleAndreas Eberle
Andreas Eberle
 
Emat 213 midterm 2 fall 2005
Emat 213 midterm 2 fall 2005Emat 213 midterm 2 fall 2005
Emat 213 midterm 2 fall 2005
 
QMC: Operator Splitting Workshop, A Splitting Method for Nonsmooth Nonconvex ...
QMC: Operator Splitting Workshop, A Splitting Method for Nonsmooth Nonconvex ...QMC: Operator Splitting Workshop, A Splitting Method for Nonsmooth Nonconvex ...
QMC: Operator Splitting Workshop, A Splitting Method for Nonsmooth Nonconvex ...
 
Maximum likelihood estimation of regularisation parameters in inverse problem...
Maximum likelihood estimation of regularisation parameters in inverse problem...Maximum likelihood estimation of regularisation parameters in inverse problem...
Maximum likelihood estimation of regularisation parameters in inverse problem...
 
Lesson 21: Curve Sketching (slides)
Lesson 21: Curve Sketching (slides)Lesson 21: Curve Sketching (slides)
Lesson 21: Curve Sketching (slides)
 
Lesson 21: Curve Sketching (slides)
Lesson 21: Curve Sketching (slides)Lesson 21: Curve Sketching (slides)
Lesson 21: Curve Sketching (slides)
 
ISI MSQE Entrance Question Paper (2008)
ISI MSQE Entrance Question Paper (2008)ISI MSQE Entrance Question Paper (2008)
ISI MSQE Entrance Question Paper (2008)
 
QMC: Operator Splitting Workshop, Forward-Backward Splitting Algorithm withou...
QMC: Operator Splitting Workshop, Forward-Backward Splitting Algorithm withou...QMC: Operator Splitting Workshop, Forward-Backward Splitting Algorithm withou...
QMC: Operator Splitting Workshop, Forward-Backward Splitting Algorithm withou...
 
Chapter 3
Chapter 3Chapter 3
Chapter 3
 
Doubly Accelerated Stochastic Variance Reduced Gradient Methods for Regulariz...
Doubly Accelerated Stochastic Variance Reduced Gradient Methods for Regulariz...Doubly Accelerated Stochastic Variance Reduced Gradient Methods for Regulariz...
Doubly Accelerated Stochastic Variance Reduced Gradient Methods for Regulariz...
 
The Gaussian Hardy-Littlewood Maximal Function
The Gaussian Hardy-Littlewood Maximal FunctionThe Gaussian Hardy-Littlewood Maximal Function
The Gaussian Hardy-Littlewood Maximal Function
 
Emat 213 study guide
Emat 213 study guideEmat 213 study guide
Emat 213 study guide
 
Calculus First Test 2011/10/20
Calculus First Test 2011/10/20Calculus First Test 2011/10/20
Calculus First Test 2011/10/20
 
Engr 213 midterm 2b sol 2010
Engr 213 midterm 2b sol 2010Engr 213 midterm 2b sol 2010
Engr 213 midterm 2b sol 2010
 
Lesson 28: The Fundamental Theorem of Calculus
Lesson 28: The Fundamental Theorem of CalculusLesson 28: The Fundamental Theorem of Calculus
Lesson 28: The Fundamental Theorem of Calculus
 
Quadratic Function Presentation
Quadratic Function PresentationQuadratic Function Presentation
Quadratic Function Presentation
 
Lesson 28: The Fundamental Theorem of Calculus
Lesson 28: The Fundamental Theorem of CalculusLesson 28: The Fundamental Theorem of Calculus
Lesson 28: The Fundamental Theorem of Calculus
 
Hyperfunction method for numerical integration and Fredholm integral equation...
Hyperfunction method for numerical integration and Fredholm integral equation...Hyperfunction method for numerical integration and Fredholm integral equation...
Hyperfunction method for numerical integration and Fredholm integral equation...
 

Recently uploaded

Scientific Writing :Research Discourse
Scientific  Writing :Research  DiscourseScientific  Writing :Research  Discourse
Scientific Writing :Research DiscourseAnita GoswamiGiri
 
Beauty Amidst the Bytes_ Unearthing Unexpected Advantages of the Digital Wast...
Beauty Amidst the Bytes_ Unearthing Unexpected Advantages of the Digital Wast...Beauty Amidst the Bytes_ Unearthing Unexpected Advantages of the Digital Wast...
Beauty Amidst the Bytes_ Unearthing Unexpected Advantages of the Digital Wast...DhatriParmar
 
Transaction Management in Database Management System
Transaction Management in Database Management SystemTransaction Management in Database Management System
Transaction Management in Database Management SystemChristalin Nelson
 
Tree View Decoration Attribute in the Odoo 17
Tree View Decoration Attribute in the Odoo 17Tree View Decoration Attribute in the Odoo 17
Tree View Decoration Attribute in the Odoo 17Celine George
 
CLASSIFICATION OF ANTI - CANCER DRUGS.pptx
CLASSIFICATION OF ANTI - CANCER DRUGS.pptxCLASSIFICATION OF ANTI - CANCER DRUGS.pptx
CLASSIFICATION OF ANTI - CANCER DRUGS.pptxAnupam32727
 
Employablity presentation and Future Career Plan.pptx
Employablity presentation and Future Career Plan.pptxEmployablity presentation and Future Career Plan.pptx
Employablity presentation and Future Career Plan.pptxryandux83rd
 
Healthy Minds, Flourishing Lives: A Philosophical Approach to Mental Health a...
Healthy Minds, Flourishing Lives: A Philosophical Approach to Mental Health a...Healthy Minds, Flourishing Lives: A Philosophical Approach to Mental Health a...
Healthy Minds, Flourishing Lives: A Philosophical Approach to Mental Health a...Osopher
 
BIOCHEMISTRY-CARBOHYDRATE METABOLISM CHAPTER 2.pptx
BIOCHEMISTRY-CARBOHYDRATE METABOLISM CHAPTER 2.pptxBIOCHEMISTRY-CARBOHYDRATE METABOLISM CHAPTER 2.pptx
BIOCHEMISTRY-CARBOHYDRATE METABOLISM CHAPTER 2.pptxSayali Powar
 
How to Manage Buy 3 Get 1 Free in Odoo 17
How to Manage Buy 3 Get 1 Free in Odoo 17How to Manage Buy 3 Get 1 Free in Odoo 17
How to Manage Buy 3 Get 1 Free in Odoo 17Celine George
 
31 ĐỀ THI THỬ VÀO LỚP 10 - TIẾNG ANH - FORM MỚI 2025 - 40 CÂU HỎI - BÙI VĂN V...
31 ĐỀ THI THỬ VÀO LỚP 10 - TIẾNG ANH - FORM MỚI 2025 - 40 CÂU HỎI - BÙI VĂN V...31 ĐỀ THI THỬ VÀO LỚP 10 - TIẾNG ANH - FORM MỚI 2025 - 40 CÂU HỎI - BÙI VĂN V...
31 ĐỀ THI THỬ VÀO LỚP 10 - TIẾNG ANH - FORM MỚI 2025 - 40 CÂU HỎI - BÙI VĂN V...Nguyen Thanh Tu Collection
 
ESP 4-EDITED.pdfmmcncncncmcmmnmnmncnmncmnnjvnnv
ESP 4-EDITED.pdfmmcncncncmcmmnmnmncnmncmnnjvnnvESP 4-EDITED.pdfmmcncncncmcmmnmnmncnmncmnnjvnnv
ESP 4-EDITED.pdfmmcncncncmcmmnmnmncnmncmnnjvnnvRicaMaeCastro1
 
Team Lead Succeed – Helping you and your team achieve high-performance teamwo...
Team Lead Succeed – Helping you and your team achieve high-performance teamwo...Team Lead Succeed – Helping you and your team achieve high-performance teamwo...
Team Lead Succeed – Helping you and your team achieve high-performance teamwo...Association for Project Management
 
Sulphonamides, mechanisms and their uses
Sulphonamides, mechanisms and their usesSulphonamides, mechanisms and their uses
Sulphonamides, mechanisms and their usesVijayaLaxmi84
 
Integumentary System SMP B. Pharm Sem I.ppt
Integumentary System SMP B. Pharm Sem I.pptIntegumentary System SMP B. Pharm Sem I.ppt
Integumentary System SMP B. Pharm Sem I.pptshraddhaparab530
 
ARTERIAL BLOOD GAS ANALYSIS........pptx
ARTERIAL BLOOD  GAS ANALYSIS........pptxARTERIAL BLOOD  GAS ANALYSIS........pptx
ARTERIAL BLOOD GAS ANALYSIS........pptxAneriPatwari
 
How to Make a Duplicate of Your Odoo 17 Database
How to Make a Duplicate of Your Odoo 17 DatabaseHow to Make a Duplicate of Your Odoo 17 Database
How to Make a Duplicate of Your Odoo 17 DatabaseCeline George
 
BÀI TẬP BỔ TRỢ TIẾNG ANH 8 - I-LEARN SMART WORLD - CẢ NĂM - CÓ FILE NGHE (BẢN...
BÀI TẬP BỔ TRỢ TIẾNG ANH 8 - I-LEARN SMART WORLD - CẢ NĂM - CÓ FILE NGHE (BẢN...BÀI TẬP BỔ TRỢ TIẾNG ANH 8 - I-LEARN SMART WORLD - CẢ NĂM - CÓ FILE NGHE (BẢN...
BÀI TẬP BỔ TRỢ TIẾNG ANH 8 - I-LEARN SMART WORLD - CẢ NĂM - CÓ FILE NGHE (BẢN...Nguyen Thanh Tu Collection
 

Recently uploaded (20)

Scientific Writing :Research Discourse
Scientific  Writing :Research  DiscourseScientific  Writing :Research  Discourse
Scientific Writing :Research Discourse
 
Mattingly "AI & Prompt Design" - Introduction to Machine Learning"
Mattingly "AI & Prompt Design" - Introduction to Machine Learning"Mattingly "AI & Prompt Design" - Introduction to Machine Learning"
Mattingly "AI & Prompt Design" - Introduction to Machine Learning"
 
Beauty Amidst the Bytes_ Unearthing Unexpected Advantages of the Digital Wast...
Beauty Amidst the Bytes_ Unearthing Unexpected Advantages of the Digital Wast...Beauty Amidst the Bytes_ Unearthing Unexpected Advantages of the Digital Wast...
Beauty Amidst the Bytes_ Unearthing Unexpected Advantages of the Digital Wast...
 
Transaction Management in Database Management System
Transaction Management in Database Management SystemTransaction Management in Database Management System
Transaction Management in Database Management System
 
Tree View Decoration Attribute in the Odoo 17
Tree View Decoration Attribute in the Odoo 17Tree View Decoration Attribute in the Odoo 17
Tree View Decoration Attribute in the Odoo 17
 
CLASSIFICATION OF ANTI - CANCER DRUGS.pptx
CLASSIFICATION OF ANTI - CANCER DRUGS.pptxCLASSIFICATION OF ANTI - CANCER DRUGS.pptx
CLASSIFICATION OF ANTI - CANCER DRUGS.pptx
 
Employablity presentation and Future Career Plan.pptx
Employablity presentation and Future Career Plan.pptxEmployablity presentation and Future Career Plan.pptx
Employablity presentation and Future Career Plan.pptx
 
Healthy Minds, Flourishing Lives: A Philosophical Approach to Mental Health a...
Healthy Minds, Flourishing Lives: A Philosophical Approach to Mental Health a...Healthy Minds, Flourishing Lives: A Philosophical Approach to Mental Health a...
Healthy Minds, Flourishing Lives: A Philosophical Approach to Mental Health a...
 
INCLUSIVE EDUCATION PRACTICES FOR TEACHERS AND TRAINERS.pptx
INCLUSIVE EDUCATION PRACTICES FOR TEACHERS AND TRAINERS.pptxINCLUSIVE EDUCATION PRACTICES FOR TEACHERS AND TRAINERS.pptx
INCLUSIVE EDUCATION PRACTICES FOR TEACHERS AND TRAINERS.pptx
 
BIOCHEMISTRY-CARBOHYDRATE METABOLISM CHAPTER 2.pptx
BIOCHEMISTRY-CARBOHYDRATE METABOLISM CHAPTER 2.pptxBIOCHEMISTRY-CARBOHYDRATE METABOLISM CHAPTER 2.pptx
BIOCHEMISTRY-CARBOHYDRATE METABOLISM CHAPTER 2.pptx
 
How to Manage Buy 3 Get 1 Free in Odoo 17
How to Manage Buy 3 Get 1 Free in Odoo 17How to Manage Buy 3 Get 1 Free in Odoo 17
How to Manage Buy 3 Get 1 Free in Odoo 17
 
Faculty Profile prashantha K EEE dept Sri Sairam college of Engineering
Faculty Profile prashantha K EEE dept Sri Sairam college of EngineeringFaculty Profile prashantha K EEE dept Sri Sairam college of Engineering
Faculty Profile prashantha K EEE dept Sri Sairam college of Engineering
 
31 ĐỀ THI THỬ VÀO LỚP 10 - TIẾNG ANH - FORM MỚI 2025 - 40 CÂU HỎI - BÙI VĂN V...
31 ĐỀ THI THỬ VÀO LỚP 10 - TIẾNG ANH - FORM MỚI 2025 - 40 CÂU HỎI - BÙI VĂN V...31 ĐỀ THI THỬ VÀO LỚP 10 - TIẾNG ANH - FORM MỚI 2025 - 40 CÂU HỎI - BÙI VĂN V...
31 ĐỀ THI THỬ VÀO LỚP 10 - TIẾNG ANH - FORM MỚI 2025 - 40 CÂU HỎI - BÙI VĂN V...
 
ESP 4-EDITED.pdfmmcncncncmcmmnmnmncnmncmnnjvnnv
ESP 4-EDITED.pdfmmcncncncmcmmnmnmncnmncmnnjvnnvESP 4-EDITED.pdfmmcncncncmcmmnmnmncnmncmnnjvnnv
ESP 4-EDITED.pdfmmcncncncmcmmnmnmncnmncmnnjvnnv
 
Team Lead Succeed – Helping you and your team achieve high-performance teamwo...
Team Lead Succeed – Helping you and your team achieve high-performance teamwo...Team Lead Succeed – Helping you and your team achieve high-performance teamwo...
Team Lead Succeed – Helping you and your team achieve high-performance teamwo...
 
Sulphonamides, mechanisms and their uses
Sulphonamides, mechanisms and their usesSulphonamides, mechanisms and their uses
Sulphonamides, mechanisms and their uses
 
Integumentary System SMP B. Pharm Sem I.ppt
Integumentary System SMP B. Pharm Sem I.pptIntegumentary System SMP B. Pharm Sem I.ppt
Integumentary System SMP B. Pharm Sem I.ppt
 
ARTERIAL BLOOD GAS ANALYSIS........pptx
ARTERIAL BLOOD  GAS ANALYSIS........pptxARTERIAL BLOOD  GAS ANALYSIS........pptx
ARTERIAL BLOOD GAS ANALYSIS........pptx
 
How to Make a Duplicate of Your Odoo 17 Database
How to Make a Duplicate of Your Odoo 17 DatabaseHow to Make a Duplicate of Your Odoo 17 Database
How to Make a Duplicate of Your Odoo 17 Database
 
BÀI TẬP BỔ TRỢ TIẾNG ANH 8 - I-LEARN SMART WORLD - CẢ NĂM - CÓ FILE NGHE (BẢN...
BÀI TẬP BỔ TRỢ TIẾNG ANH 8 - I-LEARN SMART WORLD - CẢ NĂM - CÓ FILE NGHE (BẢN...BÀI TẬP BỔ TRỢ TIẾNG ANH 8 - I-LEARN SMART WORLD - CẢ NĂM - CÓ FILE NGHE (BẢN...
BÀI TẬP BỔ TRỢ TIẾNG ANH 8 - I-LEARN SMART WORLD - CẢ NĂM - CÓ FILE NGHE (BẢN...
 

Alternating direction

  • 1. Alternating Direction Method of Multipliers Prof S. Boyd EE364b, Stanford University source: Distributed Optimization and Statistical Learning via the Alternating Direction Method of Multipliers (Boyd, Parikh, Chu, Peleato, Eckstein) 1
  • 2. Goals robust methods for arbitrary-scale optimization – machine learning/statistics with huge data-sets – dynamic optimization on large-scale network decentralized optimization – devices/processors/agents coordinate to solve large problem, by passing relatively small messages 2
  • 3. Outline Dual decomposition Method of multipliers Alternating direction method of multipliers Common patterns Examples Consensus and exchange Conclusions Dual decomposition 3
  • 4. Dual problem convex equality constrained optimization problem minimize f (x) subject to Ax = b Lagrangian: L(x, y) = f (x) + y T (Ax − b) dual function: g(y) = inf x L(x, y) dual problem: maximize g(y) recover x = argminx L(x, y ) Dual decomposition 4
  • 5. Dual ascent gradient method for dual problem: y k+1 = y k + αk g(y k ) g(y k ) = A˜ − b, where x = argminx L(x, y k ) x ˜ dual ascent method is xk+1 := argminx L(x, y k ) // x-minimization y k+1 := y k + αk (Axk+1 − b) // dual update works, with lots of strong assumptions Dual decomposition 5
  • 6. Dual decomposition suppose f is separable: f (x) = f1 (x1 ) + · · · + fN (xN ), x = (x1 , . . . , xN ) then L is separable in x: L(x, y) = L1 (x1 , y) + · · · + LN (xN , y) − y T b, Li (xi , y) = fi (xi ) + y T Ai xi x-minimization in dual ascent splits into N separate minimizations xk+1 i := argmin Li (xi , y k ) xi which can be carried out in parallel Dual decomposition 6
  • 7. Dual decomposition dual decomposition (Everett, Dantzig, Wolfe, Benders 1960–65) xk+1 i := argminxi Li (xi , y k ), i = 1, . . . , N N y k+1 := y k + αk ( i=1 Ai xk+1 − b) i k+1 scatter y k ; update xi in parallel; gather Ai xi solve a large problem – by iteratively solving subproblems (in parallel) – dual variable update provides coordination works, with lots of assumptions; often slow Dual decomposition 7
  • 8. Outline Dual decomposition Method of multipliers Alternating direction method of multipliers Common patterns Examples Consensus and exchange Conclusions Method of multipliers 8
  • 9. Method of multipliers a method to robustify dual ascent use augmented Lagrangian (Hestenes, Powell 1969), ρ > 0 Lρ (x, y) = f (x) + y T (Ax − b) + (ρ/2) Ax − b 2 2 method of multipliers (Hestenes, Powell; analysis in Bertsekas 1982) xk+1 := argmin Lρ (x, y k ) x k+1 y := y + ρ(Axk+1 − b) k (note specific dual update step length ρ) Method of multipliers 9
  • 10. Method of multipliers dual update step optimality conditions (for differentiable f ): Ax − b = 0, f (x ) + AT y = 0 (primal and dual feasibility) since xk+1 minimizes Lρ (x, y k ) k+1 k 0 = x Lρ (x ,y ) k+1 = x f (x ) + AT y k + ρ(Axk+1 − b) k+1 = x f (x ) + AT y k+1 dual update y k+1 = y k + ρ(xk+1 − b) makes (xk+1 , y k+1 ) dual feasible primal feasibility achieved in limit: Axk+1 − b → 0 Method of multipliers 10
  • 11. Method of multipliers (compared to dual decomposition) good news: converges under much more relaxed conditions (f can be nondifferentiable, take on value +∞, . . . ) bad news: quadratic penalty destroys splitting of the x-update, so can’t do decomposition Method of multipliers 11
  • 12. Outline Dual decomposition Method of multipliers Alternating direction method of multipliers Common patterns Examples Consensus and exchange Conclusions Alternating direction method of multipliers 12
  • 13. Alternating direction method of multipliers a method – with good robustness of method of multipliers – which can support decomposition “robust dual decomposition” or “decomposable method of multipliers” proposed by Gabay, Mercier, Glowinski, Marrocco in 1976 Alternating direction method of multipliers 13
  • 14. Alternating direction method of multipliers ADMM problem form (with f , g convex) minimize f (x) + g(z) subject to Ax + Bz = c – two sets of variables, with separable objective Lρ (x, z, y) = f (x) + g(z) + y T (Ax + Bz − c) + (ρ/2) Ax + Bz − c 2 2 ADMM: xk+1 := argminx Lρ (x, z k , y k ) // x-minimization k+1 k+1 k z := argminz Lρ (x , z, y ) // z-minimization y k+1 := y k + ρ(Axk+1 + Bz k+1 − c) // dual update Alternating direction method of multipliers 14
  • 15. Alternating direction method of multipliers if we minimized over x and z jointly, reduces to method of multipliers instead, we do one pass of a Gauss-Seidel method we get splitting since we minimize over x with z fixed, and vice versa Alternating direction method of multipliers 15
  • 16. ADMM and optimality conditions optimality conditions (for differentiable case): – primal feasibility: Ax + Bz − c = 0 – dual feasibility: f (x) + AT y = 0, g(z) + B T y = 0 since z k+1 minimizes Lρ (xk+1 , z, y k ) we have 0 = g(z k+1 ) + B T y k + ρB T (Axk+1 + Bz k+1 − c) = g(z k+1 ) + B T y k+1 so with ADMM dual variable update, (xk+1 , z k+1 , y k+1 ) satisfies second dual feasibility condition primal and first dual feasibility are achieved as k → ∞ Alternating direction method of multipliers 16
  • 17. ADMM with scaled dual variables combine linear and quadratic terms in augmented Lagrangian Lρ (x, z, y) = f (x) + g(z) + y T (Ax + Bz − c) + (ρ/2) Ax + Bz − c 2 2 = f (x) + g(z) + (ρ/2) Ax + Bz − c + u 2 + const. 2 with uk = (1/ρ)y k ADMM (scaled dual form): xk+1 := argmin f (x) + (ρ/2) Ax + Bz k − c + uk 2 2 x z k+1 := argmin g(z) + (ρ/2) Axk+1 + Bz − c + uk 2 2 z uk+1 := uk + (Axk+1 + Bz k+1 − c) Alternating direction method of multipliers 17
  • 18. Convergence assume (very little!) – f , g convex, closed, proper – L0 has a saddle point then ADMM converges: – iterates approach feasibility: Axk + Bz k − c → 0 – objective approaches optimal value: f (xk ) + g(z k ) → p Alternating direction method of multipliers 18
  • 19. Related algorithms operator splitting methods (Douglas, Peaceman, Rachford, Lions, Mercier, . . . 1950s, 1979) proximal point algorithm (Rockafellar 1976) Dykstra’s alternating projections algorithm (1983) Spingarn’s method of partial inverses (1985) Rockafellar-Wets progressive hedging (1991) proximal methods (Rockafellar, many others, 1976–present) Bregman iterative methods (2008–present) most of these are special cases of the proximal point algorithm Alternating direction method of multipliers 19
  • 20. Outline Dual decomposition Method of multipliers Alternating direction method of multipliers Common patterns Examples Consensus and exchange Conclusions Common patterns 20
  • 21. Common patterns x-update step requires minimizing f (x) + (ρ/2) Ax − v 2 2 (with v = Bz k − c + uk , which is constant during x-update) similar for z-update several special cases come up often can simplify update by exploit structure in these cases Common patterns 21
  • 22. Decomposition suppose f is block-separable, f (x) = f1 (x1 ) + · · · + fN (xN ), x = (x1 , . . . , xN ) A is conformably block separable: AT A is block diagonal then x-update splits into N parallel updates of xi Common patterns 22
  • 23. Proximal operator consider x-update when A = I x+ = argmin f (x) + (ρ/2) x − v 2 2 = proxf,ρ (v) x some special cases: f = IC (indicator fct. of set C) x+ := ΠC (v) (projection onto C) f =λ · 1 ( 1 norm) x+ := Sλ/ρ (vi ) (soft thresholding) i (Sa (v) = (v − a)+ − (−v − a)+ ) Common patterns 23
  • 24. Quadratic objective f (x) = (1/2)xT P x + q T x + r x+ := (P + ρAT A)−1 (ρAT v − q) use matrix inversion lemma when computationally advantageous (P + ρAT A)−1 = P −1 − ρP −1 AT (I + ρAP −1 AT )−1 AP −1 (direct method) cache factorization of P + ρAT A (or I + ρAP −1 AT ) (iterative method) warm start, early stopping, reducing tolerances Common patterns 24
  • 25. Smooth objective f smooth can use standard methods for smooth minimization – gradient, Newton, or quasi-Newton – preconditionned CG, limited-memory BFGS (scale to very large problems) can exploit – warm start – early stopping, with tolerances decreasing as ADMM proceeds Common patterns 25
  • 26. Outline Dual decomposition Method of multipliers Alternating direction method of multipliers Common patterns Examples Consensus and exchange Conclusions Examples 26
  • 27. Constrained convex optimization consider ADMM for generic problem minimize f (x) subject to x ∈ C ADMM form: take g to be indicator of C minimize f (x) + g(z) subject to x−z =0 algorithm: xk+1 := argmin f (x) + (ρ/2) x − z k + uk 2 2 x z k+1 := ΠC (xk+1 + uk ) uk+1 := uk + xk+1 − z k+1 Examples 27
  • 28. Lasso lasso problem: 2 minimize (1/2) Ax − b 2 +λ x 1 ADMM form: 2 minimize (1/2) Ax − b 2 +λ z 1 subject to x − z = 0 ADMM: xk+1 := (AT A + ρI)−1 (AT b + ρz k − y k ) z k+1 := Sλ/ρ (xk+1 + y k /ρ) y k+1 := y k + ρ(xk+1 − z k+1 ) Examples 28
  • 29. Lasso example example with dense A ∈ R1500×5000 (1500 measurements; 5000 regressors) computation times factorization (same as ridge regression) 1.3s subsequent ADMM iterations 0.03s lasso solve (about 50 ADMM iterations) 2.9s full regularization path (30 λ’s) 4.4s not bad for a very short Matlab script Examples 29
  • 30. Sparse inverse covariance selection S: empirical covariance of samples from N (0, Σ), with Σ−1 sparse (i.e., Gaussian Markov random field) estimate Σ−1 via 1 regularized maximum likelihood minimize Tr(SX) − log det X + λ X 1 methods: COVSEL (Banerjee et al 2008), graphical lasso (FHT 2008) Examples 30
  • 31. Sparse inverse covariance selection via ADMM ADMM form: minimize Tr(SX) − log det X + λ Z 1 subject to X − Z = 0 ADMM: X k+1 := argmin Tr(SX) − log det X + (ρ/2) X − Z k + U k 2 F X Z k+1 := Sλ/ρ (X k+1 + U k ) U k+1 := U k + (X k+1 − Z k+1 ) Examples 31
  • 32. Analytical solution for X-update compute eigendecomposition ρ(Z k − U k ) − S = QΛQT ˜ form diagonal matrix X with ˜ λi + λ2 + 4ρ i Xii = 2ρ ˜ let X k+1 := QXQT cost of X-update is an eigendecomposition Examples 32
  • 33. Sparse inverse covariance selection example Σ−1 is 1000 × 1000 with 104 nonzeros – graphical lasso (Fortran): 20 seconds – 3 minutes – ADMM (Matlab): 3 – 10 minutes – (depends on choice of λ) very rough experiment, but with no special tuning, ADMM is in ballpark of recent specialized methods (for comparison, COVSEL takes 25+ min when Σ−1 is a 400 × 400 tridiagonal matrix) Examples 33
  • 34. Outline Dual decomposition Method of multipliers Alternating direction method of multipliers Common patterns Examples Consensus and exchange Conclusions Consensus and exchange 34
  • 35. Consensus optimization want to solve problem with N objective terms N minimize i=1 fi (x) – e.g., fi is the loss function for ith block of training data ADMM form: N minimize i=1 fi (xi ) subject to xi − z = 0 – xi are local variables – z is the global variable – xi − z = 0 are consistency or consensus constraints – can add regularization using a g(z) term Consensus and exchange 35
  • 36. Consensus optimization via ADMM N T 2 Lρ (x, z, y) = i=1 fi (xi ) + yi (xi − z) + (ρ/2) xi − z 2 ADMM: xk+1 i := argmin fi (xi ) + yi (xi − z k ) + (ρ/2) xi − z k kT 2 2 xi N 1 z k+1 := xk+1 + (1/ρ)yi i k N i=1 k+1 yi := yi + ρ(xk+1 − z k+1 ) k i with regularization, averaging in z update is followed by proxg,ρ Consensus and exchange 36
  • 37. Consensus optimization via ADMM N k using i=1 yi = 0, algorithm simplifies to xk+1 i := kT argmin fi (xi ) + yi (xi − xk ) + (ρ/2) xi − xk 2 2 xi k+1 yi := yi + ρ(xk+1 − xk+1 ) k i N where xk = (1/N ) i=1 xk i in each iteration – gather xk and average to get xk i – scatter the average xk to processors k – update yi locally (in each processor, in parallel) – update xi locally Consensus and exchange 37
  • 38. Statistical interpretation fi is negative log-likelihood for parameter x given ith data block xk+1 is MAP estimate under prior N (xk + (1/ρ)yi , ρI) i k prior mean is previous iteration’s consensus shifted by ‘price’ of processor i disagreeing with previous consensus processors only need to support a Gaussian MAP method – type or number of data in each block not relevant – consensus protocol yields global maximum-likelihood estimate Consensus and exchange 38
  • 39. Consensus classification data (examples) (ai , bi ), i = 1, . . . , N , ai ∈ Rn , bi ∈ {−1, +1} linear classifier sign(aT w + v), with weight w, offset v margin for ith example is bi (aT w + v); want margin to be positive i loss for ith example is l(bi (aT w + v)) i – l is loss function (hinge, logistic, probit, exponential, . . . ) N choose w, v to minimize 1 N i=1 l(bi (aT w + v)) + r(w) i – r(w) is regularization term ( 2 , 1, ...) split data and use ADMM consensus to solve Consensus and exchange 39
  • 40. Consensus SVM example hinge loss l(u) = (1 − u)+ with 2 regularization baby problem with n = 2, N = 400 to illustrate examples split into 20 groups, in worst possible way: each group contains only positive or negative examples Consensus and exchange 40
  • 41. Iteration 1 10 8 6 4 2 0 −2 −4 −6 −8 −10 −3 −2 −1 0 1 2 3 Consensus and exchange 41
  • 42. Iteration 5 10 8 6 4 2 0 −2 −4 −6 −8 −10 −3 −2 −1 0 1 2 3 Consensus and exchange 42
  • 43. Iteration 40 10 8 6 4 2 0 −2 −4 −6 −8 −10 −3 −2 −1 0 1 2 3 Consensus and exchange 43
  • 44. Distributed lasso example example with dense A ∈ R400000×8000 (roughly 30 GB of data) – distributed solver written in C using MPI and GSL – no optimization or tuned libraries (like ATLAS, MKL) – split into 80 subsystems across 10 (8-core) machines on Amazon EC2 computation times loading data 30s factorization 5m subsequent ADMM iterations 0.5–2s lasso solve (about 15 ADMM iterations) 5–6m Consensus and exchange 44
  • 45. Exchange problem N minimize i=1 fi (xi ) N subject to i=1 xi = 0 another canonical problem, like consensus in fact, it’s the dual of consensus can interpret as N agents exchanging n goods to minimize a total cost (xi )j ≥ 0 means agent i receives (xi )j of good j from exchange (xi )j < 0 means agent i contributes |(xi )j | of good j to exchange N constraint i=1 xi = 0 is equilibrium or market clearing constraint optimal dual variable y is a set of valid prices for the goods suggests real or virtual cash payment (y )T xi by agent i Consensus and exchange 45
  • 46. Exchange ADMM solve as a generic constrained convex problem with constraint set C = {x ∈ RnN | x1 + x2 + · · · + xN = 0} scaled form: xk+1 i := argmin fi (xi ) + (ρ/2) xi − xk + xk + uk i 2 2 xi uk+1 := uk + xk+1 unscaled form: xk+1 i := argmin fi (xi ) + y kT xi + (ρ/2) xi − (xk − xk ) i 2 2 xi y k+1 := y k + ρxk+1 Consensus and exchange 46
  • 47. Interpretation as tˆtonnement process a tˆtonnement process: iteratively update prices to clear market a work towards equilibrium by increasing/decreasing prices of goods based on excess demand/supply dual decomposition is the simplest tˆtonnement algorithm a ADMM adds proximal regularization – incorporate agents’ prior commitment to help clear market – convergence far more robust convergence than dual decomposition Consensus and exchange 47
  • 48. Distributed dynamic energy management N devices exchange power in time periods t = 1, . . . , T xi ∈ RT is power flow profile for device i fi (xi ) is cost of profile xi (and encodes constraints) x1 + · · · + xN = 0 is energy balance (in each time period) dynamic energy management problem is exchange problem exchange ADMM gives distributed method for dynamic energy management each device optimizes its own profile, with quadratic regularization for coordination residual (energy imbalance) is driven to zero Consensus and exchange 48
  • 49. Generators 10 10 9 5 8 0 7 0 5 10 15 20 6 6 5 4 4 2 3 0 5 10 15 20 2 1.5 1 1 0.5 0 0 0 2 4 −x 6 8 10 0 5 10 15 20 t t 3 example generators left: generator costs/limits; right: ramp constraints can add cost for power changes Consensus and exchange 49
  • 50. Fixed loads 15 10 5 0 0 5 10 15 20 t 2 example fixed loads cost is +∞ for not supplying load; zero otherwise Consensus and exchange 50
  • 51. Shiftable load 2 1.5 1 0.5 0 0 5 10 15 20 t total energy consumed over an interval must exceed given minimum level limits on energy consumed in each period cost is +∞ for violating constraints; zero otherwise Consensus and exchange 51
  • 52. Battery energy storage system 6 4 2 0 0 5 10 15 20 0.6 0.4 0.2 0 −0.2 −0.4 0 5 10 15 20 t energy store with maximum capacity, charge/discharge limits black: battery charge, red: charge/discharge profile cost is +∞ for violating constraints; zero otherwise Consensus and exchange 52
  • 53. Electric vehicle charging system 80 70 60 50 40 30 20 10 0 0 5 10 15 20 t black: desired charge profile; blue: charge profile shortfall cost for not meeting desired charge Consensus and exchange 53
  • 54. HVAC 100 90 80 70 60 50 0 5 10 15 20 2 1.5 1 0.5 0 0 5 10 15 20 t thermal load (e.g., room, refrigerator) with temperature limits magenta: ambient temperature; blue: load temperature red: cooling energy profile cost is +∞ for violating constraints; zero otherwise Consensus and exchange 54
  • 55. External tie 2.5 2 1.5 1 0.5 0 0 5 10 15 20 1 0 −1 0 5 10 15 20 t buy/sell energy from/to external grid at price pext (t) ± γ(t) solid: pext (t); dashed: pext (t) ± γ(t) Consensus and exchange 55
  • 56. Smart grid example 10 devices (already described above) 3 generators 2 fixed loads 1 shiftable load 1 EV charging systems 1 battery 1 HVAC system 1 external tie Consensus and exchange 56
  • 57. Convergence iteration: k = 1 10 0.3 5 0.2 0 0 5 10 15 20 0.1 6 4 0 2 −0.1 0 5 10 15 20 1.5 −0.2 1 0.5 −0.3 0 0 5 10 15 20 0 5 10 15 20 t t left: solid: optimal generator profile, dashed: profile at kth iteration right: residual vector xk ¯ Consensus and exchange 57
  • 58. Convergence iteration: k = 3 10 0.3 5 0.2 0 0 5 10 15 20 0.1 6 4 0 2 −0.1 0 5 10 15 20 1.5 −0.2 1 0.5 −0.3 0 0 5 10 15 20 0 5 10 15 20 t t left: solid: optimal generator profile, dashed: profile at kth iteration right: residual vector xk ¯ Consensus and exchange 57
  • 59. Convergence iteration: k = 5 10 0.3 5 0.2 0 0 5 10 15 20 0.1 6 4 0 2 −0.1 0 5 10 15 20 1.5 −0.2 1 0.5 −0.3 0 0 5 10 15 20 0 5 10 15 20 t t left: solid: optimal generator profile, dashed: profile at kth iteration right: residual vector xk ¯ Consensus and exchange 57
  • 60. Convergence iteration: k = 10 10 0.3 5 0.2 0 0 5 10 15 20 0.1 6 4 0 2 −0.1 0 5 10 15 20 1.5 −0.2 1 0.5 −0.3 0 0 5 10 15 20 0 5 10 15 20 t t left: solid: optimal generator profile, dashed: profile at kth iteration right: residual vector xk ¯ Consensus and exchange 57
  • 61. Convergence iteration: k = 15 10 0.3 5 0.2 0 0 5 10 15 20 0.1 6 4 0 2 −0.1 0 5 10 15 20 1.5 −0.2 1 0.5 −0.3 0 0 5 10 15 20 0 5 10 15 20 t t left: solid: optimal generator profile, dashed: profile at kth iteration right: residual vector xk ¯ Consensus and exchange 57
  • 62. Convergence iteration: k = 20 10 0.3 5 0.2 0 0 5 10 15 20 0.1 6 4 0 2 −0.1 0 5 10 15 20 1.5 −0.2 1 0.5 −0.3 0 0 5 10 15 20 0 5 10 15 20 t t left: solid: optimal generator profile, dashed: profile at kth iteration right: residual vector xk ¯ Consensus and exchange 57
  • 63. Convergence iteration: k = 25 10 0.3 5 0.2 0 0 5 10 15 20 0.1 6 4 0 2 −0.1 0 5 10 15 20 1.5 −0.2 1 0.5 −0.3 0 0 5 10 15 20 0 5 10 15 20 t t left: solid: optimal generator profile, dashed: profile at kth iteration right: residual vector xk ¯ Consensus and exchange 57
  • 64. Convergence iteration: k = 30 10 0.3 5 0.2 0 0 5 10 15 20 0.1 6 4 0 2 −0.1 0 5 10 15 20 1.5 −0.2 1 0.5 −0.3 0 0 5 10 15 20 0 5 10 15 20 t t left: solid: optimal generator profile, dashed: profile at kth iteration right: residual vector xk ¯ Consensus and exchange 57
  • 65. Convergence iteration: k = 35 10 0.3 5 0.2 0 0 5 10 15 20 0.1 6 4 0 2 −0.1 0 5 10 15 20 1.5 −0.2 1 0.5 −0.3 0 0 5 10 15 20 0 5 10 15 20 t t left: solid: optimal generator profile, dashed: profile at kth iteration right: residual vector xk ¯ Consensus and exchange 57
  • 66. Convergence iteration: k = 40 10 0.3 5 0.2 0 0 5 10 15 20 0.1 6 4 0 2 −0.1 0 5 10 15 20 1.5 −0.2 1 0.5 −0.3 0 0 5 10 15 20 0 5 10 15 20 t t left: solid: optimal generator profile, dashed: profile at kth iteration right: residual vector xk ¯ Consensus and exchange 57
  • 67. Convergence iteration: k = 45 10 0.3 5 0.2 0 0 5 10 15 20 0.1 6 4 0 2 −0.1 0 5 10 15 20 1.5 −0.2 1 0.5 −0.3 0 0 5 10 15 20 0 5 10 15 20 t t left: solid: optimal generator profile, dashed: profile at kth iteration right: residual vector xk ¯ Consensus and exchange 57
  • 68. Convergence iteration: k = 50 10 0.3 5 0.2 0 0 5 10 15 20 0.1 6 4 0 2 −0.1 0 5 10 15 20 1.5 −0.2 1 0.5 −0.3 0 0 5 10 15 20 0 5 10 15 20 t t left: solid: optimal generator profile, dashed: profile at kth iteration right: residual vector xk ¯ Consensus and exchange 57
  • 69. Outline Dual decomposition Method of multipliers Alternating direction method of multipliers Common patterns Examples Consensus and exchange Conclusions Conclusions 58
  • 70. Summary and conclusions ADMM is the same as, or closely related to, many methods with other names has been around since the 1970s gives simple single-processor algorithms that can be competitive with state-of-the-art can be used to coordinate many processors, each solving a substantial problem, to solve a very large problem Conclusions 59