SlideShare a Scribd company logo
Stochastic Decomposition


        Suvrajeet Sen
 Lecture at the Winter School
          April 2013




                                Epstein Department of ISE
The Plan – Part I ( 70 minutes)
• Review of Basics: 2-stage Stochastic Linear Programs
  (SLP)
• Deterministic Algorithms for 2-stage SLP
   – Subgradients of the Expected Recourse Function
   – Subgradient Methods
   – Deterministic Decomposition (Kelley/Benders/L-shaped)
• Stochastic Algorithms for 2-stage SLP
   – Stochastic Quasi-gradient Methods
   – Sample Average Approximation
   – Stochastic Decomposition (SD) Algorithm
• Computational Results
                                                Epstein Department of ISE
Review of Basics: 2-stage
Stochastic Linear Programs



                         Epstein Department of ISE
The commonly stated 2-stage SLP
 (will be stated again, as needed)




                                Epstein Department of ISE
The Recourse Function and its Expectation




                                    Epstein Department of ISE
Wait a minute!




                 Epstein Department of ISE
One Approach: Use Sampling




                      Epstein Department of ISE
Deterministic Algorithms for
2-stage SLP

Subgradient Method, Kelley/Benders/L-shaped Method



                                           Epstein Department of ISE
Subgradient Method
          (Shor/Polyak/Nesterov/Nemirovski…)
                                  Interchange of Expectation and
• At iteration k let   be given   Subdifferentiation is required here

• Let
• Then,                       where    denotes the
projection operator on the set of the decisions
and,
• As mentioned earlier,     is very difficult to compute!
• No concern about loss of convexity due to sampling

                                                       Epstein Department of ISE
Strengths and Weaknesses of Subgradient Methods

• Strengths
   – Easy to Program, no master problem, and easily parallelizable
   – Recently there have been improvements in step size rules
• Weaknesses
   – Difficult to establish lower bounds (optimality, in general)
   – Traditional step-size rules (e.g. Constant/k) Need a lot of fine tuning
   – Convergence
       • Method makes good progress early on, but like other steepest-descent type
         methods, there is zig-zagging behavior
   – Need ways to stop the algorithm
       • Difficult because upper and lower bounds on objective values are difficult to
         obtain



                                                                       Epstein Department of ISE
Kelley’s Cutting Plane/Benders’/L-shaped
        Decomposition for 2-stage SLP
• Let be a random variable defined on a
  probability space
• Then a stochastic program is given by




                                      Epstein Department of ISE
KBL Decomposition (J. Benders/Van Slyke/Wets)

• At iteration k let   , and    be given. Recall


• Then define



• Let
• Then,

                                        Epstein Department of ISE
Comparing Subgradient Method and KBL Decomposition

• Both evaluate subgradients


• Expensive Operation (requires solving as many second-stage
LPs as there are scenarios)
• Step size in KBL is implicit (user need not worry)
• Master program grows without bound and looks unstable in
the early rounds
• Stopping rule is automatic (Upper Bound – Lower Bound ≤ ε)
• KBL’s use of master can be a bottleneck for parallelization


                                                       Epstein Department of ISE
KBL Graphical Illustration


Expected Recourse
Function




                                 Approximation: fk-1


                       Approximation: fk




                                           Epstein Department of ISE
Regularization of the Master Problem
           (Ruszczynski/Kiwiel/Lemarechal …)

• Addresses the following issue:
– Master program grows without bound and looks unstable in
  the early rounds
• Include an incumbent and a proximity measure
from the incumbent, using σ >0 as a weight:
• Particularly useful for Stochastic Decomposition.


                                                 Epstein Department of ISE
Stochastic Algorithms for
2-stage SLP

    Stochastic Quasi-Gradient, SAA, and SD



                                             Epstein Department of ISE
Some “concrete” instances
                          Table 1: SP Test Instances

  Problem      Domain      # of 1st # of   # of   Universe Comment
   Name                     stage   2nd  random      of
                           variab stage variables scenarios
                             les variabl
                                     es
   LandS Generation           4        12         3                      Made-up
  20TERM Logistics            63       764       40                      Semi-real
    SSN   Telecom             89       706       86                      Semi-real
  STORM Logistics            121      1259       117                     Semi-real

Notice the size of scenarios. Using a deterministic algorithm would be a “non-starter”


                                                                      Epstein Department of ISE
Numerous Large-scale Applications
• Wind-integration in Economic Dispatch
  – How should conventional resources be
    dispatched in the presence of intermittent
    resources?
• Supply chain planning
  – Inventory Transhipment between Regional
    Distribution Centers, Local Warehouses, Outlets
• “Everyone” wants to “solve” Stochastic Unit
  Commitment
                                           Epstein Department of ISE
So what do we mean by “solve”?
I. At the very least
  – An algorithm which, under specified assumptions,
    will provide
     • A first-stage decision, with known metrics of optimality.
       That is, report statistically quantified error
     • Be reasonably fast on easily accessible machines
II. There are other things that people want


                                                   Epstein Department of ISE
Stochastic Quasi-gradient Method (SQG) (Ermoliev/
       Gaivoronski/Lan/Nemirovski/Uryasev …)
• At iteration k let   be given. Sample
• Replace of subgradient optimization with its
unbiased estimate


• Then,
   with
   and

                                            Epstein Department of ISE
Comments on SQG

• Because this is a sampling-based algorithm, you must replicate so that you
can estimate “variability” (of what?)
• If you replicate M times, you will get M first-stage decisions. Which of
these should you use?
 – Could evaluate each of the M first-stage decisions, and then choose the one with
   smallest objective estimate
 – For realistic models, this can be computationally time-consuming
• Could simply choose the mean of replications
 – Less expensive, but may be unreliable
• But most importantly, it does not provide lower bounds



                                                                        Epstein Department of ISE
Sample Average Approximation (SAA, Shapiro, Kleywegt,
           Homem-de-Mello, Linderoth, Wright ….)

• Choose a sample size N; Solve a sampled problem; Repeat M times
• Since you replicate M times, you will get M decisions. Which of these
should you use?
 – Could evaluate each of the M decisions, and then choose the one with
   smallest estimate
     • For realistic models, this can be computationally time-consuming
 – Could simply choose the mean of replications
     • Less expensive, but may be unreliable
• Very widely cited, sometimes misused (because some non-expert users
appear to choose M=1)!


                                                           Epstein Department of ISE
Stochastic Decomposition (SD)

• Allow arbitrarily many outcomes (scenarios)
  including continuous random variables
• Perhaps interface with a simulator …




                                      Epstein Department of ISE
Some Central Questions for SP

• Instead of choosing a sample size at the start, can we
  decide how much to sample, on-the-fly?
   – The analogous question for nonlinear programming would
     be: instead of choosing the number of iterations at the
     start, can we decide the number of iterations, on-the-fly? –
     Yes!
   – So can we do this with sample size selection? Perhaps!
• If we are to determine a sample size on-the-fly, what is
  the smallest number by which to increment the sample
  size, and yet guarantee asymptotic convergence?

                                                    Epstein Department of ISE
Under some assumptions …
• Fixed Recourse Matrix
• Current computer Implementations assume
  – Relatively complete recourse (under revision)
  – Second-stage cost is deterministic (under revision)




                                            Epstein Department of ISE
Approximating the recourse
             function in SD
• At the start of iteration k, sample one more
  outcome … say ωk independently of
• Given let        solve the following LP



• Define                   and calculate for


• Notice the mapping of outcomes         to finitely many dualISE
                                                   Epstein Department of
  vertices.
Approximation used in SD
• The estimated “cut” in SD is given by



• To calculate this “cut” requires one LP corresponding
  to the most recent outcome and the “argmax”
  operations at the bottom of the previous slide
• In addition all previous cuts need to be updated
  … to make old cuts consistent with the changing
  sample size over iterations.
                                            Epstein Department of ISE
Update Previous Cuts
• Updating previously generated subgradients
  – Why?
  – Because … early cuts (based on small sizes) can
    cut away the optimum for ever!
                              Expected Recourse Function



                              An Early Cut can Cause Trouble



                                                   Epstein Department of ISE
How do we get around this?
• Suppose we assume that we know a lower
  bound (e.g. zero) on hn(x), then we can include
  such a lower bound to the older cuts so that
  these older cuts “fade” away.
                           Expected Recourse Function



                           An Early Cut can Cause Trouble



                                                Epstein Department of ISE
Updating Previous Cuts.
• If we assume that all recourse function lower bounds
  are uniformly zero,
   – Then for t < k, the “cut” from iteration t has the following
     form:




                                                     Epstein Department of ISE
Alternative Sampled Approximations
                                                   Sampled Expected
                                                   Recourse Function (SAA)


                   Expected Recourse Function




Updated
Cuts from
Previous                                          Linearization of the
Iterations of SD                                  Sampled ERF (SA or SQG)
                           Lower Bound on the
                           Linearization of the
                           sample mean (SD-cut)

                                                                Epstein Department of ISE
SUMMARY OF APPROXIMATIONS




             SAA

                        SD
                   SA




                             Epstein Department of ISE
Incumbent Solutions and Proximal Master Program
    (QP) … also called Regularized Master Program

• An incumbent is the best solution “estimated”
  through iteration k and will be denoted
• Given an incumbent, we setup a proximal master
  program as follows.

  where,




                                          Epstein Department of ISE
Benefits of the Proximal Master
• Can show that a finite master problem,
  consisting of n1+3 optimality cuts is enough!
  (Here n1 is the number of first-stage variables)
• Convergence can be proven to a unique limit
  which is optimal (with probability 1).
• Stopping rules based on QP duality are
  reasonably efficient.

                                        Epstein Department of ISE
Algorithmic Summary
0. Initialize with the same candidate and incumbent x. k=1.
1. Use sampling to obtain an outcome 
                                         k


2. Derive an SD cut at the candidate, and the incumbent solutions.
           This calls for
           - solution of 2 subproblems using 
                                               k

           - Add any new dual vertex to a list Vk
           - for each prior { t }t 1
                                  k


             choose the best subproblem dual vertex seen thus far
3. Update cut t by multiplying coeffs by (t/k), t  1...k
4. Solve the updated QP master
5. Ascertain whether new candidate becomes new incumbent
6. If stopping rules are not met, increment k, and repeat from 1.
(Stopping rule is based on bootstrapping primal and dual QPs)

                                                        Epstein Department of ISE
Comparisons including 2-stage SD
  FeatureMethod         SQG Algorithm             SAA                Stochastic
                                                                    Decomposition

   Subgradient or          Estimation           Estimation            Estimation
     Estimation
 Step Length Choice            Yes               Depends             Not Needed
      Required
   Stopping Rules           Unknown            Well Studied            Resolved
Parallel Computations         Good               Depends              Not known
Continuous Random              Yes                  Yes                    Yes
     Variables
 First-stage Integer           No                   Yes                    Yes
      Variables
Second-stage Integer           No                   Yes                    No
     Variables

                                                                     Epstein Department of ISE
Of course for small instances, we can always try deterministic equivalents!
The Plan – Part II
• Resume (from Computational Results)
• How were these obtained?
  – SD Solution Quality
     • In-sample optimality tests
     • Out-of-sample, Demo (Yifan Liu)
• An Example: Wind Energy with Sub-hourly
  Dispatch
• Summary
                                         Epstein Department of ISE
Recall this question: What do we mean by
                     “solve”?
I. At the very least
  – An algorithm which, under specified assumptions,
    will provide
     • A first-stage decision, with known metrics of optimality.
       That is, report statistically quantified error
     • Be reasonably fast on easily accessible machines
II. There are other things that people want


                                                   Epstein Department of ISE
What else to people want?
There are other things that people want
• Evidence
   – Experiment with some realistic instances
      • Please note that CEP1 and PGP2 are not realistic. They are for
        debugging!
   – Some numerical controls
      • E.g. Can we reduce “bias”/non-optimality?
• Output for decision support
   – Dual estimates of first-stage
   – Histograms
      • Recourse function
      • Dual prices of second-stage

                                                            Epstein Department of ISE
Some “concrete” instances
                          Table 1: SP Test Instances

  Problem      Domain      # of 1st # of   # of   Universe Comment
   Name                     stage   2nd  random      of
                           variab stage variables scenarios
                             les variabl
                                     es
   LandS Generation           4        12         3                      Made-up
  20TERM Logistics            63       764       40                      Semi-real
    SSN   Telecom             89       706       86                      Semi-real
  STORM Logistics            121      1259       117                     Semi-real

Notice the size of scenarios. Using a deterministic algorithm would be a “non-starter”


                                                                      Epstein Department of ISE
Computational Results - SAA
           Table 2: Statistical Quantification with SAA


                                       SAA Estimates using a     Source:
                                        Computational Grid
                Upper (UB) and                                   • Linderoth, Shapiro and Wright,
Instance Name   Lower Bounds                                        Annals of Operations Research,
                     (LB)
                                  Average Values      95% CI’s      Vol. 142, pp. 215–241 (2006)
                                                                 • Computational Configuration:
                    OBJ-UB           225.624                     Grid Computing Using 100’s of PCs,
   LandS
                    OBJ-LB           225.62                      but only 100 PCs at any given time
                    OBJ-UB          254311.55
  20TERM                                                          Comment by SS: In 2005, an
                    OBJ-LB          254298.57
                                                                  average PC was a Pentium IV,
                    OBJ-UB            9.913
    SSN                                                           Clock Speed: 2-2.4 GHz.
                    OBJ-LB             9.84
                    OBJ-UB         15498739.41                    Each instance of SSN took 30-45
   STORM
                    OBJ-LB         15498657.8                     mins. of wall clock time.
                                                                  Replications: 7-10

                                                                                Epstein Department of ISE
One Example: SSN with Latin
      Hypercube Sampling
Sample Size    Lower Bound        Upper Bound

    50        10.10 (+/- 0.81)   11.38 (+/- 0.023)

   100        8.90(+/- 0.36)     10.542 (+/-0.021)

   500        9.87 (+/-0.22)     10.069(+/- 0.026)

   1000       9.83(+/- 0.29)     9.996 (+/- 0.025)

   5000       9.84 (+/- 0.10)    9.913 (+/- 0.022)



                                       Epstein Department of ISE
PC Configuration for SD
•   Mac Book Air
•   Processor: Intel Core i5
•   Clock Speed: 1.8 GHz
•   4 GB of 1600 MHz DDR3 Memory
•   Replications: 30 for each.



                                   Epstein Department of ISE
Computational Results - SD
                                 Table 3: Statistical Quantification with SAA and SD

                                         SAA Estimates using a
                                                                        SD Estimates using a Laptop
                                          Computational Grid
                Upper (UB) and
Instance Name   Lower Bounds                                                                            % Difference in
                     (LB)                                                                                 Avg.Values
                                    Average Values      95% CI’s      Average Values       95% CI’s


                   OBJ-UB              225.624                           225.54                            0.037%
   LandS
                   OBJ-LB              225.62                            225.24                            0.168%
                   OBJ-UB             254311.55                      254476.87                             0.065%
  20TERM
                   OBJ-LB             254298.57                      253905.44                             0.154%
                   OBJ-UB               9.913                              9.91                             0.03%
    SSN
                   OBJ-LB               9.84                               9.76                            0.813%
                   OBJ-UB           15498739.41                        15498624.37                         0.0007%
   STORM
                   OBJ-LB            15498657.8                        15496619.98                         0.013%




                                                                                               Epstein Department of ISE
SD Solution quality and time
• Solutions are of comparable quality
• Processors are somewhat similar
• Solution times
   – The comparable time for SSN is 50 mins for 30 replications on one
     processor
   – Compare:
   – (30-45) mins x (7 – 10) replications x 100 procs (21000 – 45000)
     processor mins
   – Note: This is only time for sample size of 5000. (But remember, there
     were other sample sizes: 50, 100, 500, …, which we didn’t count)
• Are we beating Moore’s Law? Yes, doubling computational speed
  every 9 months?



                                                            Epstein Department of ISE
How were these obtained?
• In-sample stopping rules: Lower Bounds
  – Check Stability in Set of Dual Vertices
  – Bootstrapped duality gap estimate
     • The latter tests whether the current primal and dual
       solutions from the Proximal Master Program are also
       reasonably “good” solutions for Primal and Dual
       Problems of a Resampled Proximal Master over
       Multiple Replications
• Out-of-sample: Upper Bounds

                                                  Epstein Department of ISE
Stability in Set of Dual Vertices




                             Epstein Department of ISE
1.0
        1.0




                                                                                              0.8
        0.8




                                                                                     20TERM
                                                                                              0.6
        0.6
LandS




                                                                                                    Pi-Ratio
                Pi-Ratio




                                                                                              0.4
        0.4




                                                                                              0.2
        0.2




                                                                                              0.0
        0.0




                                             Iteration Number                                                                       Iteration Number
                           0         100                200             300                                           0       200        400           600            800




                                                                                                1.0
        1.0
        0.8




                                                                                                0.8
                                                                                     STORM
        0.6




                                                                                                0.6
SSN
              Pi-Ratio




                                                                                                           Pi-Ratio
        0.4




                                                                                                0.4
        0.2




                                                                                                0.2
        0.0




                                                                                                0.0



                                              Iteration Number                                                                      Iteration Number
                           0   500         1000          1500    2000         2500                                        0   500     1000          1500       2000




                                                                                                                                                             Epstein Department of ISE
Primal and Dual Regularized Values




                               Epstein Department of ISE
LB And UB of SSN objective function estimates




        1.0
                                    Tight Tolerance




        0.8
                                          Nominal Tolerance
        0.6
Fn(x)




                                                      Loose Tolerance
        0.4




                                                                            Bias/Non-opt
        0.2




                                                                            Reduction

                                                                               Tight Tol.
        0.0




              9.5            10.0              10.5              11.0
                                                                               Nominal Tol.
                                          x                                    Loose Tol.


                                                                        Epstein Department of ISE
SSN: Sonet-Switched Network
                Solution and evaluation time for 20’s replications (Tight)
                             Replication No.   Solution Time (s)          Evaluation Time (s)
                                         0                   313.881051                588.627184
                                         1                   203.690471                651.042227
                                         2                   465.949416                547.517996
Evaluation value ranges                  3
                                         4
                                                             313.606587                589.429828
                                                             355.160629                599.331808
from 9.944203 to                         5                   334.764385                616.674899
                                         6                   529.661100                604.565630
10.279154                                7                   327.888471                545.132039
                                         8                   169.432233                655.688530
(3.3% difference)                        9                   301.293535                604.098964
                                        10                   697.304541                567.361433
                                        11                   315.097318                532.595026
                                        12                   247.439006                555.664374
                                        13                   560.934417                577.258900
                                        14                   342.787909                506.836774
                                        15                   247.356803                570.577690
                                        16                   184.941379                517.999214
                                        17                   339.951593                589.265958
                                        18                   339.381007                579.033263
                                        19                   411.092300                602.349102
                                       Total                7001.614151              11601.050839




                                                                                 Epstein Department of ISE
Solutions
• Mean of Replications: Average all Solutions
• For each seed s, let

where fs denotes the final approximation for seed s
• Compromise solution:



                                            Epstein Department of ISE
Upper Bounds
• For each replication, the objective function
  evaluations can be very time consuming
• We report Objective of both Mean and
  Compromise Solutions




                                        Epstein Department of ISE
Main Take Away

                      In SP,
     Numerical Optimization Meets Statistics
So When You Design Algorithms, Don’t Forget What
          You Need to Deliver: Statistics
  Most Numerical Optimization Methods were Not
             Designed for This Goal.
• Does Speed-up with SD beat Moore’s Law? Yes,
  doubling computational speed every 9 months!
                                     Epstein Department of ISE

More Related Content

Viewers also liked (11)

T008
T008T008
T008
 
T026
T026T026
T026
 
T030
T030T030
T030
 
T003
T003T003
T003
 
T033
T033T033
T033
 
T025
T025T025
T025
 
T018
T018T018
T018
 
T024
T024T024
T024
 
T017
T017T017
T017
 
T010
T010T010
T010
 
T031
T031T031
T031
 

Similar to Sd4 tignes 2013-2

Superefficient Monte Carlo Simulations
Superefficient Monte Carlo SimulationsSuperefficient Monte Carlo Simulations
Superefficient Monte Carlo Simulations
Cheng-An Yang
 
Explicit Density Models
Explicit Density ModelsExplicit Density Models
Explicit Density Models
Sangwoo Mo
 
Sharpness-aware minimization (SAM)
Sharpness-aware minimization (SAM)Sharpness-aware minimization (SAM)
Sharpness-aware minimization (SAM)
Sangwoo Mo
 
Deep Implicit Layers: Learning Structured Problems with Neural Networks
Deep Implicit Layers: Learning Structured Problems with Neural NetworksDeep Implicit Layers: Learning Structured Problems with Neural Networks
Deep Implicit Layers: Learning Structured Problems with Neural Networks
Sangwoo Mo
 
Deep Learning Sample Class (Jon Lederman)
Deep Learning Sample Class (Jon Lederman)Deep Learning Sample Class (Jon Lederman)
Deep Learning Sample Class (Jon Lederman)
Jon Lederman
 
backpropagation in neural networks
backpropagation in neural networksbackpropagation in neural networks
backpropagation in neural networks
Akash Goel
 
PR-305: Exploring Simple Siamese Representation Learning
PR-305: Exploring Simple Siamese Representation LearningPR-305: Exploring Simple Siamese Representation Learning
PR-305: Exploring Simple Siamese Representation Learning
Sungchul Kim
 
adjoint10_nilsvanvelzen
adjoint10_nilsvanvelzenadjoint10_nilsvanvelzen
adjoint10_nilsvanvelzen
Nils van Velzen
 
Deep learning from a novice perspective
Deep learning from a novice perspectiveDeep learning from a novice perspective
Deep learning from a novice perspective
Anirban Santara
 
Advanced Machine Learning
Advanced Machine LearningAdvanced Machine Learning
Advanced Machine Learning
ANANDBABUGOPATHOTI1
 
rnn BASICS
rnn BASICSrnn BASICS
rnn BASICS
Priyanka Reddy
 
Lec10new
Lec10newLec10new
lec10new.ppt
lec10new.pptlec10new.ppt
lec10new.ppt
SumantKuch
 
Deep Learning Approach in Characterizing Salt Body on Seismic Images - by Zhe...
Deep Learning Approach in Characterizing Salt Body on Seismic Images - by Zhe...Deep Learning Approach in Characterizing Salt Body on Seismic Images - by Zhe...
Deep Learning Approach in Characterizing Salt Body on Seismic Images - by Zhe...
Yan Xu
 
Exploring Simple Siamese Representation Learning
Exploring Simple Siamese Representation LearningExploring Simple Siamese Representation Learning
Exploring Simple Siamese Representation Learning
Sungchul Kim
 
Rethinking Attention with Performers
Rethinking Attention with PerformersRethinking Attention with Performers
Rethinking Attention with Performers
Joonhyung Lee
 
ML Module 3 Non Linear Learning.pptx
ML Module 3 Non Linear Learning.pptxML Module 3 Non Linear Learning.pptx
ML Module 3 Non Linear Learning.pptx
DebabrataPain1
 
Nimrita deep learning
Nimrita deep learningNimrita deep learning
Nimrita deep learning
Nimrita Koul
 
Large-Scale Lasso and Elastic-Net Regularized Generalized Linear Models (DB T...
Large-Scale Lasso and Elastic-Net Regularized Generalized Linear Models (DB T...Large-Scale Lasso and Elastic-Net Regularized Generalized Linear Models (DB T...
Large-Scale Lasso and Elastic-Net Regularized Generalized Linear Models (DB T...
Spark Summit
 
Introduction to Perceptron and Neural Network.pptx
Introduction to Perceptron and Neural Network.pptxIntroduction to Perceptron and Neural Network.pptx
Introduction to Perceptron and Neural Network.pptx
Poonam60376
 

Similar to Sd4 tignes 2013-2 (20)

Superefficient Monte Carlo Simulations
Superefficient Monte Carlo SimulationsSuperefficient Monte Carlo Simulations
Superefficient Monte Carlo Simulations
 
Explicit Density Models
Explicit Density ModelsExplicit Density Models
Explicit Density Models
 
Sharpness-aware minimization (SAM)
Sharpness-aware minimization (SAM)Sharpness-aware minimization (SAM)
Sharpness-aware minimization (SAM)
 
Deep Implicit Layers: Learning Structured Problems with Neural Networks
Deep Implicit Layers: Learning Structured Problems with Neural NetworksDeep Implicit Layers: Learning Structured Problems with Neural Networks
Deep Implicit Layers: Learning Structured Problems with Neural Networks
 
Deep Learning Sample Class (Jon Lederman)
Deep Learning Sample Class (Jon Lederman)Deep Learning Sample Class (Jon Lederman)
Deep Learning Sample Class (Jon Lederman)
 
backpropagation in neural networks
backpropagation in neural networksbackpropagation in neural networks
backpropagation in neural networks
 
PR-305: Exploring Simple Siamese Representation Learning
PR-305: Exploring Simple Siamese Representation LearningPR-305: Exploring Simple Siamese Representation Learning
PR-305: Exploring Simple Siamese Representation Learning
 
adjoint10_nilsvanvelzen
adjoint10_nilsvanvelzenadjoint10_nilsvanvelzen
adjoint10_nilsvanvelzen
 
Deep learning from a novice perspective
Deep learning from a novice perspectiveDeep learning from a novice perspective
Deep learning from a novice perspective
 
Advanced Machine Learning
Advanced Machine LearningAdvanced Machine Learning
Advanced Machine Learning
 
rnn BASICS
rnn BASICSrnn BASICS
rnn BASICS
 
Lec10new
Lec10newLec10new
Lec10new
 
lec10new.ppt
lec10new.pptlec10new.ppt
lec10new.ppt
 
Deep Learning Approach in Characterizing Salt Body on Seismic Images - by Zhe...
Deep Learning Approach in Characterizing Salt Body on Seismic Images - by Zhe...Deep Learning Approach in Characterizing Salt Body on Seismic Images - by Zhe...
Deep Learning Approach in Characterizing Salt Body on Seismic Images - by Zhe...
 
Exploring Simple Siamese Representation Learning
Exploring Simple Siamese Representation LearningExploring Simple Siamese Representation Learning
Exploring Simple Siamese Representation Learning
 
Rethinking Attention with Performers
Rethinking Attention with PerformersRethinking Attention with Performers
Rethinking Attention with Performers
 
ML Module 3 Non Linear Learning.pptx
ML Module 3 Non Linear Learning.pptxML Module 3 Non Linear Learning.pptx
ML Module 3 Non Linear Learning.pptx
 
Nimrita deep learning
Nimrita deep learningNimrita deep learning
Nimrita deep learning
 
Large-Scale Lasso and Elastic-Net Regularized Generalized Linear Models (DB T...
Large-Scale Lasso and Elastic-Net Regularized Generalized Linear Models (DB T...Large-Scale Lasso and Elastic-Net Regularized Generalized Linear Models (DB T...
Large-Scale Lasso and Elastic-Net Regularized Generalized Linear Models (DB T...
 
Introduction to Perceptron and Neural Network.pptx
Introduction to Perceptron and Neural Network.pptxIntroduction to Perceptron and Neural Network.pptx
Introduction to Perceptron and Neural Network.pptx
 

Sd4 tignes 2013-2

  • 1. Stochastic Decomposition Suvrajeet Sen Lecture at the Winter School April 2013 Epstein Department of ISE
  • 2. The Plan – Part I ( 70 minutes) • Review of Basics: 2-stage Stochastic Linear Programs (SLP) • Deterministic Algorithms for 2-stage SLP – Subgradients of the Expected Recourse Function – Subgradient Methods – Deterministic Decomposition (Kelley/Benders/L-shaped) • Stochastic Algorithms for 2-stage SLP – Stochastic Quasi-gradient Methods – Sample Average Approximation – Stochastic Decomposition (SD) Algorithm • Computational Results Epstein Department of ISE
  • 3. Review of Basics: 2-stage Stochastic Linear Programs Epstein Department of ISE
  • 4. The commonly stated 2-stage SLP (will be stated again, as needed) Epstein Department of ISE
  • 5. The Recourse Function and its Expectation Epstein Department of ISE
  • 6. Wait a minute! Epstein Department of ISE
  • 7. One Approach: Use Sampling Epstein Department of ISE
  • 8. Deterministic Algorithms for 2-stage SLP Subgradient Method, Kelley/Benders/L-shaped Method Epstein Department of ISE
  • 9. Subgradient Method (Shor/Polyak/Nesterov/Nemirovski…) Interchange of Expectation and • At iteration k let be given Subdifferentiation is required here • Let • Then, where denotes the projection operator on the set of the decisions and, • As mentioned earlier, is very difficult to compute! • No concern about loss of convexity due to sampling Epstein Department of ISE
  • 10. Strengths and Weaknesses of Subgradient Methods • Strengths – Easy to Program, no master problem, and easily parallelizable – Recently there have been improvements in step size rules • Weaknesses – Difficult to establish lower bounds (optimality, in general) – Traditional step-size rules (e.g. Constant/k) Need a lot of fine tuning – Convergence • Method makes good progress early on, but like other steepest-descent type methods, there is zig-zagging behavior – Need ways to stop the algorithm • Difficult because upper and lower bounds on objective values are difficult to obtain Epstein Department of ISE
  • 11. Kelley’s Cutting Plane/Benders’/L-shaped Decomposition for 2-stage SLP • Let be a random variable defined on a probability space • Then a stochastic program is given by Epstein Department of ISE
  • 12. KBL Decomposition (J. Benders/Van Slyke/Wets) • At iteration k let , and be given. Recall • Then define • Let • Then, Epstein Department of ISE
  • 13. Comparing Subgradient Method and KBL Decomposition • Both evaluate subgradients • Expensive Operation (requires solving as many second-stage LPs as there are scenarios) • Step size in KBL is implicit (user need not worry) • Master program grows without bound and looks unstable in the early rounds • Stopping rule is automatic (Upper Bound – Lower Bound ≤ ε) • KBL’s use of master can be a bottleneck for parallelization Epstein Department of ISE
  • 14. KBL Graphical Illustration Expected Recourse Function Approximation: fk-1 Approximation: fk Epstein Department of ISE
  • 15. Regularization of the Master Problem (Ruszczynski/Kiwiel/Lemarechal …) • Addresses the following issue: – Master program grows without bound and looks unstable in the early rounds • Include an incumbent and a proximity measure from the incumbent, using σ >0 as a weight: • Particularly useful for Stochastic Decomposition. Epstein Department of ISE
  • 16. Stochastic Algorithms for 2-stage SLP Stochastic Quasi-Gradient, SAA, and SD Epstein Department of ISE
  • 17. Some “concrete” instances Table 1: SP Test Instances Problem Domain # of 1st # of # of Universe Comment Name stage 2nd random of variab stage variables scenarios les variabl es LandS Generation 4 12 3 Made-up 20TERM Logistics 63 764 40 Semi-real SSN Telecom 89 706 86 Semi-real STORM Logistics 121 1259 117 Semi-real Notice the size of scenarios. Using a deterministic algorithm would be a “non-starter” Epstein Department of ISE
  • 18. Numerous Large-scale Applications • Wind-integration in Economic Dispatch – How should conventional resources be dispatched in the presence of intermittent resources? • Supply chain planning – Inventory Transhipment between Regional Distribution Centers, Local Warehouses, Outlets • “Everyone” wants to “solve” Stochastic Unit Commitment Epstein Department of ISE
  • 19. So what do we mean by “solve”? I. At the very least – An algorithm which, under specified assumptions, will provide • A first-stage decision, with known metrics of optimality. That is, report statistically quantified error • Be reasonably fast on easily accessible machines II. There are other things that people want Epstein Department of ISE
  • 20. Stochastic Quasi-gradient Method (SQG) (Ermoliev/ Gaivoronski/Lan/Nemirovski/Uryasev …) • At iteration k let be given. Sample • Replace of subgradient optimization with its unbiased estimate • Then, with and Epstein Department of ISE
  • 21. Comments on SQG • Because this is a sampling-based algorithm, you must replicate so that you can estimate “variability” (of what?) • If you replicate M times, you will get M first-stage decisions. Which of these should you use? – Could evaluate each of the M first-stage decisions, and then choose the one with smallest objective estimate – For realistic models, this can be computationally time-consuming • Could simply choose the mean of replications – Less expensive, but may be unreliable • But most importantly, it does not provide lower bounds Epstein Department of ISE
  • 22. Sample Average Approximation (SAA, Shapiro, Kleywegt, Homem-de-Mello, Linderoth, Wright ….) • Choose a sample size N; Solve a sampled problem; Repeat M times • Since you replicate M times, you will get M decisions. Which of these should you use? – Could evaluate each of the M decisions, and then choose the one with smallest estimate • For realistic models, this can be computationally time-consuming – Could simply choose the mean of replications • Less expensive, but may be unreliable • Very widely cited, sometimes misused (because some non-expert users appear to choose M=1)! Epstein Department of ISE
  • 23. Stochastic Decomposition (SD) • Allow arbitrarily many outcomes (scenarios) including continuous random variables • Perhaps interface with a simulator … Epstein Department of ISE
  • 24. Some Central Questions for SP • Instead of choosing a sample size at the start, can we decide how much to sample, on-the-fly? – The analogous question for nonlinear programming would be: instead of choosing the number of iterations at the start, can we decide the number of iterations, on-the-fly? – Yes! – So can we do this with sample size selection? Perhaps! • If we are to determine a sample size on-the-fly, what is the smallest number by which to increment the sample size, and yet guarantee asymptotic convergence? Epstein Department of ISE
  • 25. Under some assumptions … • Fixed Recourse Matrix • Current computer Implementations assume – Relatively complete recourse (under revision) – Second-stage cost is deterministic (under revision) Epstein Department of ISE
  • 26. Approximating the recourse function in SD • At the start of iteration k, sample one more outcome … say ωk independently of • Given let solve the following LP • Define and calculate for • Notice the mapping of outcomes to finitely many dualISE Epstein Department of vertices.
  • 27. Approximation used in SD • The estimated “cut” in SD is given by • To calculate this “cut” requires one LP corresponding to the most recent outcome and the “argmax” operations at the bottom of the previous slide • In addition all previous cuts need to be updated … to make old cuts consistent with the changing sample size over iterations. Epstein Department of ISE
  • 28. Update Previous Cuts • Updating previously generated subgradients – Why? – Because … early cuts (based on small sizes) can cut away the optimum for ever! Expected Recourse Function An Early Cut can Cause Trouble Epstein Department of ISE
  • 29. How do we get around this? • Suppose we assume that we know a lower bound (e.g. zero) on hn(x), then we can include such a lower bound to the older cuts so that these older cuts “fade” away. Expected Recourse Function An Early Cut can Cause Trouble Epstein Department of ISE
  • 30. Updating Previous Cuts. • If we assume that all recourse function lower bounds are uniformly zero, – Then for t < k, the “cut” from iteration t has the following form: Epstein Department of ISE
  • 31. Alternative Sampled Approximations Sampled Expected Recourse Function (SAA) Expected Recourse Function Updated Cuts from Previous Linearization of the Iterations of SD Sampled ERF (SA or SQG) Lower Bound on the Linearization of the sample mean (SD-cut) Epstein Department of ISE
  • 32. SUMMARY OF APPROXIMATIONS SAA SD SA Epstein Department of ISE
  • 33. Incumbent Solutions and Proximal Master Program (QP) … also called Regularized Master Program • An incumbent is the best solution “estimated” through iteration k and will be denoted • Given an incumbent, we setup a proximal master program as follows. where, Epstein Department of ISE
  • 34. Benefits of the Proximal Master • Can show that a finite master problem, consisting of n1+3 optimality cuts is enough! (Here n1 is the number of first-stage variables) • Convergence can be proven to a unique limit which is optimal (with probability 1). • Stopping rules based on QP duality are reasonably efficient. Epstein Department of ISE
  • 35. Algorithmic Summary 0. Initialize with the same candidate and incumbent x. k=1. 1. Use sampling to obtain an outcome  k 2. Derive an SD cut at the candidate, and the incumbent solutions. This calls for - solution of 2 subproblems using  k - Add any new dual vertex to a list Vk - for each prior { t }t 1 k choose the best subproblem dual vertex seen thus far 3. Update cut t by multiplying coeffs by (t/k), t  1...k 4. Solve the updated QP master 5. Ascertain whether new candidate becomes new incumbent 6. If stopping rules are not met, increment k, and repeat from 1. (Stopping rule is based on bootstrapping primal and dual QPs) Epstein Department of ISE
  • 36. Comparisons including 2-stage SD FeatureMethod SQG Algorithm SAA Stochastic Decomposition Subgradient or Estimation Estimation Estimation Estimation Step Length Choice Yes Depends Not Needed Required Stopping Rules Unknown Well Studied Resolved Parallel Computations Good Depends Not known Continuous Random Yes Yes Yes Variables First-stage Integer No Yes Yes Variables Second-stage Integer No Yes No Variables Epstein Department of ISE Of course for small instances, we can always try deterministic equivalents!
  • 37. The Plan – Part II • Resume (from Computational Results) • How were these obtained? – SD Solution Quality • In-sample optimality tests • Out-of-sample, Demo (Yifan Liu) • An Example: Wind Energy with Sub-hourly Dispatch • Summary Epstein Department of ISE
  • 38. Recall this question: What do we mean by “solve”? I. At the very least – An algorithm which, under specified assumptions, will provide • A first-stage decision, with known metrics of optimality. That is, report statistically quantified error • Be reasonably fast on easily accessible machines II. There are other things that people want Epstein Department of ISE
  • 39. What else to people want? There are other things that people want • Evidence – Experiment with some realistic instances • Please note that CEP1 and PGP2 are not realistic. They are for debugging! – Some numerical controls • E.g. Can we reduce “bias”/non-optimality? • Output for decision support – Dual estimates of first-stage – Histograms • Recourse function • Dual prices of second-stage Epstein Department of ISE
  • 40. Some “concrete” instances Table 1: SP Test Instances Problem Domain # of 1st # of # of Universe Comment Name stage 2nd random of variab stage variables scenarios les variabl es LandS Generation 4 12 3 Made-up 20TERM Logistics 63 764 40 Semi-real SSN Telecom 89 706 86 Semi-real STORM Logistics 121 1259 117 Semi-real Notice the size of scenarios. Using a deterministic algorithm would be a “non-starter” Epstein Department of ISE
  • 41. Computational Results - SAA Table 2: Statistical Quantification with SAA SAA Estimates using a Source: Computational Grid Upper (UB) and • Linderoth, Shapiro and Wright, Instance Name Lower Bounds Annals of Operations Research, (LB) Average Values 95% CI’s Vol. 142, pp. 215–241 (2006) • Computational Configuration: OBJ-UB 225.624 Grid Computing Using 100’s of PCs, LandS OBJ-LB 225.62 but only 100 PCs at any given time OBJ-UB 254311.55 20TERM Comment by SS: In 2005, an OBJ-LB 254298.57 average PC was a Pentium IV, OBJ-UB 9.913 SSN Clock Speed: 2-2.4 GHz. OBJ-LB 9.84 OBJ-UB 15498739.41 Each instance of SSN took 30-45 STORM OBJ-LB 15498657.8 mins. of wall clock time. Replications: 7-10 Epstein Department of ISE
  • 42. One Example: SSN with Latin Hypercube Sampling Sample Size Lower Bound Upper Bound 50 10.10 (+/- 0.81) 11.38 (+/- 0.023) 100 8.90(+/- 0.36) 10.542 (+/-0.021) 500 9.87 (+/-0.22) 10.069(+/- 0.026) 1000 9.83(+/- 0.29) 9.996 (+/- 0.025) 5000 9.84 (+/- 0.10) 9.913 (+/- 0.022) Epstein Department of ISE
  • 43. PC Configuration for SD • Mac Book Air • Processor: Intel Core i5 • Clock Speed: 1.8 GHz • 4 GB of 1600 MHz DDR3 Memory • Replications: 30 for each. Epstein Department of ISE
  • 44. Computational Results - SD Table 3: Statistical Quantification with SAA and SD SAA Estimates using a SD Estimates using a Laptop Computational Grid Upper (UB) and Instance Name Lower Bounds % Difference in (LB) Avg.Values Average Values 95% CI’s Average Values 95% CI’s OBJ-UB 225.624 225.54 0.037% LandS OBJ-LB 225.62 225.24 0.168% OBJ-UB 254311.55 254476.87 0.065% 20TERM OBJ-LB 254298.57 253905.44 0.154% OBJ-UB 9.913 9.91 0.03% SSN OBJ-LB 9.84 9.76 0.813% OBJ-UB 15498739.41 15498624.37 0.0007% STORM OBJ-LB 15498657.8 15496619.98 0.013% Epstein Department of ISE
  • 45. SD Solution quality and time • Solutions are of comparable quality • Processors are somewhat similar • Solution times – The comparable time for SSN is 50 mins for 30 replications on one processor – Compare: – (30-45) mins x (7 – 10) replications x 100 procs (21000 – 45000) processor mins – Note: This is only time for sample size of 5000. (But remember, there were other sample sizes: 50, 100, 500, …, which we didn’t count) • Are we beating Moore’s Law? Yes, doubling computational speed every 9 months? Epstein Department of ISE
  • 46. How were these obtained? • In-sample stopping rules: Lower Bounds – Check Stability in Set of Dual Vertices – Bootstrapped duality gap estimate • The latter tests whether the current primal and dual solutions from the Proximal Master Program are also reasonably “good” solutions for Primal and Dual Problems of a Resampled Proximal Master over Multiple Replications • Out-of-sample: Upper Bounds Epstein Department of ISE
  • 47. Stability in Set of Dual Vertices Epstein Department of ISE
  • 48. 1.0 1.0 0.8 0.8 20TERM 0.6 0.6 LandS Pi-Ratio Pi-Ratio 0.4 0.4 0.2 0.2 0.0 0.0 Iteration Number Iteration Number 0 100 200 300 0 200 400 600 800 1.0 1.0 0.8 0.8 STORM 0.6 0.6 SSN Pi-Ratio Pi-Ratio 0.4 0.4 0.2 0.2 0.0 0.0 Iteration Number Iteration Number 0 500 1000 1500 2000 2500 0 500 1000 1500 2000 Epstein Department of ISE
  • 49. Primal and Dual Regularized Values Epstein Department of ISE
  • 50. LB And UB of SSN objective function estimates 1.0 Tight Tolerance 0.8 Nominal Tolerance 0.6 Fn(x) Loose Tolerance 0.4 Bias/Non-opt 0.2 Reduction Tight Tol. 0.0 9.5 10.0 10.5 11.0 Nominal Tol. x Loose Tol. Epstein Department of ISE
  • 51. SSN: Sonet-Switched Network Solution and evaluation time for 20’s replications (Tight) Replication No. Solution Time (s) Evaluation Time (s) 0 313.881051 588.627184 1 203.690471 651.042227 2 465.949416 547.517996 Evaluation value ranges 3 4 313.606587 589.429828 355.160629 599.331808 from 9.944203 to 5 334.764385 616.674899 6 529.661100 604.565630 10.279154 7 327.888471 545.132039 8 169.432233 655.688530 (3.3% difference) 9 301.293535 604.098964 10 697.304541 567.361433 11 315.097318 532.595026 12 247.439006 555.664374 13 560.934417 577.258900 14 342.787909 506.836774 15 247.356803 570.577690 16 184.941379 517.999214 17 339.951593 589.265958 18 339.381007 579.033263 19 411.092300 602.349102 Total 7001.614151 11601.050839 Epstein Department of ISE
  • 52. Solutions • Mean of Replications: Average all Solutions • For each seed s, let where fs denotes the final approximation for seed s • Compromise solution: Epstein Department of ISE
  • 53. Upper Bounds • For each replication, the objective function evaluations can be very time consuming • We report Objective of both Mean and Compromise Solutions Epstein Department of ISE
  • 54. Main Take Away In SP, Numerical Optimization Meets Statistics So When You Design Algorithms, Don’t Forget What You Need to Deliver: Statistics Most Numerical Optimization Methods were Not Designed for This Goal. • Does Speed-up with SD beat Moore’s Law? Yes, doubling computational speed every 9 months! Epstein Department of ISE