The document discusses stochastic decomposition (SD) algorithms for solving two-stage stochastic linear programs (SLPs) with recourse. It begins with an overview of deterministic algorithms like subgradient methods and Kelley's cutting plane/Benders' decomposition method. It then covers stochastic algorithms like stochastic quasi-gradient methods, sample average approximation, and the stochastic decomposition algorithm. Key aspects of the SD algorithm are that it approximates the recourse function by solving linear programs based on sampled scenarios and updates cuts from previous iterations to maintain convergence as the sample size increases over iterations.
SBAC-PAD 2018: On the resilience of RTL NN accelerators fault characterizatio...LEGATO project
This document summarizes research on analyzing the resilience of neural network (NN) accelerators implemented at the register-transfer level (RTL) against faults. It characterized the vulnerability of different components of the RTL NN to faults through experiments and found that intermediate data is most vulnerable. It also proposed an efficient fault mitigation technique that combines bit masking, word masking, and sign-bit masking. This hybrid technique was shown to improve accuracy when faults occur compared to existing approaches. Future work areas discussed include evaluating the technique on advanced NN models and silicon implementations.
This document describes the design and simulation of negative group delay (NGD) phase shifters. It proposes a passive, broadband NGD unit cell that can be combined with other phase shifters to increase the phase bandwidth. Simulation shows that NGD phase shifters can increase the phase bandwidth by over 450% compared to unloaded transmission lines. Experimental NGD phase shifters are also presented that demonstrate both positive and negative phase shifts with increased bandwidth and low insertion loss.
Community Cafe convened to brainstorm how Healdsburg could foster innovation in the future. This presentation was used to kick off the meeting and focused on past innovation and community spirit.
This document provides examples of customers for Repatron. While no specific customers are named, it appears to highlight representative clients to demonstrate the types of organizations that utilize Repatron's services or products. The purpose is likely to establish credibility and showcase that established companies in various industries have chosen Repatron as a partner or vendor.
Curso de QUÍMICA 1 para primer semestre de bachillerato. BLOQUE 2: Interrelación de la materia y la energía; BLOQUE 3: El modelo atómico actual y sus aplicaciónes; BLOQUE 4: La interpretación de la tabla periódica.
Questpond - Top 10 Interview Questions and Answers on OOPSgdrealspace
This document discusses key concepts in object-oriented programming (OOP) such as objects, attributes, behaviors, identity, encapsulation, inheritance, and polymorphism. It provides definitions and examples for each concept. For example, it explains that an object bundles variables and methods, attributes define an object's characteristics, and encapsulation hides an object's internal details and validates values before they can be changed. The document also distinguishes between compile-time and run-time polymorphism.
SBAC-PAD 2018: On the resilience of RTL NN accelerators fault characterizatio...LEGATO project
This document summarizes research on analyzing the resilience of neural network (NN) accelerators implemented at the register-transfer level (RTL) against faults. It characterized the vulnerability of different components of the RTL NN to faults through experiments and found that intermediate data is most vulnerable. It also proposed an efficient fault mitigation technique that combines bit masking, word masking, and sign-bit masking. This hybrid technique was shown to improve accuracy when faults occur compared to existing approaches. Future work areas discussed include evaluating the technique on advanced NN models and silicon implementations.
This document describes the design and simulation of negative group delay (NGD) phase shifters. It proposes a passive, broadband NGD unit cell that can be combined with other phase shifters to increase the phase bandwidth. Simulation shows that NGD phase shifters can increase the phase bandwidth by over 450% compared to unloaded transmission lines. Experimental NGD phase shifters are also presented that demonstrate both positive and negative phase shifts with increased bandwidth and low insertion loss.
Community Cafe convened to brainstorm how Healdsburg could foster innovation in the future. This presentation was used to kick off the meeting and focused on past innovation and community spirit.
This document provides examples of customers for Repatron. While no specific customers are named, it appears to highlight representative clients to demonstrate the types of organizations that utilize Repatron's services or products. The purpose is likely to establish credibility and showcase that established companies in various industries have chosen Repatron as a partner or vendor.
Curso de QUÍMICA 1 para primer semestre de bachillerato. BLOQUE 2: Interrelación de la materia y la energía; BLOQUE 3: El modelo atómico actual y sus aplicaciónes; BLOQUE 4: La interpretación de la tabla periódica.
Questpond - Top 10 Interview Questions and Answers on OOPSgdrealspace
This document discusses key concepts in object-oriented programming (OOP) such as objects, attributes, behaviors, identity, encapsulation, inheritance, and polymorphism. It provides definitions and examples for each concept. For example, it explains that an object bundles variables and methods, attributes define an object's characteristics, and encapsulation hides an object's internal details and validates values before they can be changed. The document also distinguishes between compile-time and run-time polymorphism.
This document proposes a super-efficient Monte Carlo simulation algorithm that converges rapidly for estimating multivariate integrals. It introduces the concept of approximate super-efficient integrands and describes two algorithms - Approximate Super-Efficient (ASE) and Progressive ASE (PASE). ASE transforms integrands to be approximately super-efficient, allowing faster 1/N^α convergence over 1/N for conventional Monte Carlo. PASE iteratively improves the approximation for rates approaching 1/N^2. Both outperform conventional Monte Carlo while being broadly applicable unlike true super-efficient integrands. Numerical results demonstrate PASE achieving improved convergence with larger sample sizes N.
This document summarizes recent advances in deep generative models with explicit density estimation. It discusses variational autoencoders (VAEs), including techniques to improve VAEs such as importance weighting, semi-amortized inference, and mitigating posterior collapse. It also covers energy-based models, autoregressive models, flow-based models, vector-quantized VAEs, hierarchical VAEs, and diffusion probabilistic models. The document provides an overview of these generative models with a focus on density estimation and generation quality.
Lab seminar on
- Sharpness-Aware Minimization for Efficiently Improving Generalization (ICLR 2021)
- When Vision Transformers Outperform ResNets without Pretraining or Strong Data Augmentations (under review)
Deep Implicit Layers: Learning Structured Problems with Neural NetworksSangwoo Mo
Deep implicit layers allow neural networks to solve structured problems by following algorithmic rules. They include layers for convex optimization, discrete optimization, differential equations, and more. The forward pass runs an algorithm, while the backward pass computes gradients using algorithmic properties like KKT conditions. This enables problems like structured prediction, meta-learning, and time series modeling to be solved reliably with neural networks by respecting their underlying structure.
Deep Learning Sample Class (Jon Lederman)Jon Lederman
Deep learning uses neural networks that can learn their own features from data. The document discusses the history and limitations of early neural networks like perceptrons that used hand-engineered features. Modern deep learning overcomes these limitations by using hierarchical neural networks that can learn increasingly complex features from raw data through backpropagation and gradient descent. Deep learning networks represent features using tensors, or multidimensional arrays, that are learned from data through training examples.
PR-305: Exploring Simple Siamese Representation LearningSungchul Kim
SimSiam is a self-supervised learning method that uses a Siamese network with stop-gradient to learn representations from unlabeled data. The paper finds that stop-gradient plays an essential role in preventing the model from collapsing to a degenerate solution. Additionally, it is hypothesized that SimSiam implicitly optimizes an Expectation-Maximization-like algorithm that alternates between updating the network parameters and assigning representations to samples in a manner analogous to k-means clustering.
This document discusses automatic localization techniques for data assimilation in ocean modeling using the OpenDA framework. It presents the OpenDA framework, which defines interfaces for data assimilation components. It also discusses ensemble Kalman filtering methods and the need for localization to address spurious correlations. The document describes automatic localization techniques proposed by Anderson and Zhang and Oliver to determine localization weights without requiring user specification. It outlines experiments using these techniques with the NEMO ocean model, assimilating sea surface height observations into the model. The results demonstrate that localization improves the model results, though limitations remain due to the number of observations available.
1. Recurrent neural networks can model sequential data like time series by incorporating hidden state that has internal dynamics. This allows the model to store information for long periods of time.
2. Two key types of recurrent networks are linear dynamical systems and hidden Markov models. Long short-term memory networks were developed to address the problem of exploding or vanishing gradients in training traditional recurrent networks.
3. Recurrent networks can learn tasks like binary addition by recognizing patterns in the inputs over time rather than relying on fixed architectures like feedforward networks. They have been successfully applied to handwriting recognition.
1. Recurrent neural networks can model sequential data like time series by incorporating hidden state that has internal dynamics. This allows the model to store information for long periods of time.
2. Two key types of recurrent networks are linear dynamical systems and hidden Markov models. Long short-term memory networks were developed to address the problem of exploding or vanishing gradients in training traditional recurrent networks.
3. Recurrent networks can learn tasks like binary addition by recognizing patterns in the inputs over time rather than relying on fixed architectures like feedforward networks. They have been successfully applied to handwriting recognition.
This document proposes a super-efficient Monte Carlo simulation algorithm that converges rapidly for estimating multivariate integrals. It introduces the concept of approximate super-efficient integrands and describes two algorithms - Approximate Super-Efficient (ASE) and Progressive ASE (PASE). ASE transforms integrands to be approximately super-efficient, allowing faster 1/N^α convergence over 1/N for conventional Monte Carlo. PASE iteratively improves the approximation for rates approaching 1/N^2. Both outperform conventional Monte Carlo while being broadly applicable unlike true super-efficient integrands. Numerical results demonstrate PASE achieving improved convergence with larger sample sizes N.
This document summarizes recent advances in deep generative models with explicit density estimation. It discusses variational autoencoders (VAEs), including techniques to improve VAEs such as importance weighting, semi-amortized inference, and mitigating posterior collapse. It also covers energy-based models, autoregressive models, flow-based models, vector-quantized VAEs, hierarchical VAEs, and diffusion probabilistic models. The document provides an overview of these generative models with a focus on density estimation and generation quality.
Lab seminar on
- Sharpness-Aware Minimization for Efficiently Improving Generalization (ICLR 2021)
- When Vision Transformers Outperform ResNets without Pretraining or Strong Data Augmentations (under review)
Deep Implicit Layers: Learning Structured Problems with Neural NetworksSangwoo Mo
Deep implicit layers allow neural networks to solve structured problems by following algorithmic rules. They include layers for convex optimization, discrete optimization, differential equations, and more. The forward pass runs an algorithm, while the backward pass computes gradients using algorithmic properties like KKT conditions. This enables problems like structured prediction, meta-learning, and time series modeling to be solved reliably with neural networks by respecting their underlying structure.
Deep Learning Sample Class (Jon Lederman)Jon Lederman
Deep learning uses neural networks that can learn their own features from data. The document discusses the history and limitations of early neural networks like perceptrons that used hand-engineered features. Modern deep learning overcomes these limitations by using hierarchical neural networks that can learn increasingly complex features from raw data through backpropagation and gradient descent. Deep learning networks represent features using tensors, or multidimensional arrays, that are learned from data through training examples.
PR-305: Exploring Simple Siamese Representation LearningSungchul Kim
SimSiam is a self-supervised learning method that uses a Siamese network with stop-gradient to learn representations from unlabeled data. The paper finds that stop-gradient plays an essential role in preventing the model from collapsing to a degenerate solution. Additionally, it is hypothesized that SimSiam implicitly optimizes an Expectation-Maximization-like algorithm that alternates between updating the network parameters and assigning representations to samples in a manner analogous to k-means clustering.
This document discusses automatic localization techniques for data assimilation in ocean modeling using the OpenDA framework. It presents the OpenDA framework, which defines interfaces for data assimilation components. It also discusses ensemble Kalman filtering methods and the need for localization to address spurious correlations. The document describes automatic localization techniques proposed by Anderson and Zhang and Oliver to determine localization weights without requiring user specification. It outlines experiments using these techniques with the NEMO ocean model, assimilating sea surface height observations into the model. The results demonstrate that localization improves the model results, though limitations remain due to the number of observations available.
1. Recurrent neural networks can model sequential data like time series by incorporating hidden state that has internal dynamics. This allows the model to store information for long periods of time.
2. Two key types of recurrent networks are linear dynamical systems and hidden Markov models. Long short-term memory networks were developed to address the problem of exploding or vanishing gradients in training traditional recurrent networks.
3. Recurrent networks can learn tasks like binary addition by recognizing patterns in the inputs over time rather than relying on fixed architectures like feedforward networks. They have been successfully applied to handwriting recognition.
1. Recurrent neural networks can model sequential data like time series by incorporating hidden state that has internal dynamics. This allows the model to store information for long periods of time.
2. Two key types of recurrent networks are linear dynamical systems and hidden Markov models. Long short-term memory networks were developed to address the problem of exploding or vanishing gradients in training traditional recurrent networks.
3. Recurrent networks can learn tasks like binary addition by recognizing patterns in the inputs over time rather than relying on fixed architectures like feedforward networks. They have been successfully applied to handwriting recognition.
1. Recurrent neural networks can model sequential data like time series by incorporating hidden state that has internal dynamics. This allows the model to store information for long periods of time.
2. Two key types of recurrent networks are linear dynamical systems and hidden Markov models. Long short-term memory networks were developed to address the problem of exploding or vanishing gradients in training traditional recurrent networks.
3. Recurrent networks can learn tasks like binary addition by treating each input/output pair as a time step and learning state transitions, unlike feedforward networks which require fixed input/output lengths.
- Recurrent neural networks (RNNs) can model sequential data by incorporating a hidden state with internal dynamics. This allows RNNs to store information over long periods of time.
- Two common types of models that include hidden state are linear dynamical systems and hidden Markov models, but RNNs have more powerful computational abilities due to their distributed, non-linear hidden state.
- RNNs can be trained using backpropagation through time to learn the hidden state dynamics and generate appropriate outputs for a given input sequence.
Exploring Simple Siamese Representation LearningSungchul Kim
This document discusses an unsupervised representation learning method called SimSiam. It proposes that SimSiam can be interpreted as an expectation-maximization algorithm that alternates between updating the encoder parameters and assigning representations to images. Key aspects discussed include how the stop-gradient operation prevents collapsed representations, the role of the predictor network, effects of batch size and batch normalization, and alternatives to the cosine similarity measure. Empirical results show that SimSiam learns meaningful representations without collapsing, and the various design choices affect performance but not the ability to prevent collapsed representations.
This document provides an overview of non-linear machine learning models. It introduces non-linear models and compares them to linear models. It discusses stochastic gradient descent and batch gradient descent optimization algorithms. It also covers neural networks, including model representations, activation functions, perceptrons, multi-layer perceptrons, and backpropagation. Additionally, it discusses regularization techniques to reduce overfitting, support vector machines, and K-nearest neighbors algorithms.
Deep learning uses multilayered neural networks to process information in a robust, generalizable, and scalable way. It has various applications including image recognition, sentiment analysis, machine translation, and more. Deep learning concepts include computational graphs, artificial neural networks, and optimization techniques like gradient descent. Prominent deep learning architectures include convolutional neural networks, recurrent neural networks, autoencoders, and generative adversarial networks.
Large-Scale Lasso and Elastic-Net Regularized Generalized Linear Models (DB T...Spark Summit
This document discusses using regularized linear models like logistic regression with feature engineering techniques like polynomial expansion to solve classification problems in a scalable way. It describes how polynomial expansion can make nonlinear relationships linear by transforming features into higher dimensions. It also explains how Elastic Net regularization, which combines L1 and L2 penalties, can select important features and scale to large datasets using Apache Spark. Experiments on several datasets show logistic regression with degree-2 polynomial features performs comparably to nonlinear kernels while training faster.
Introduction to Perceptron and Neural Network.pptx
Sd4 tignes 2013-2
1. Stochastic Decomposition
Suvrajeet Sen
Lecture at the Winter School
April 2013
Epstein Department of ISE
2. The Plan – Part I ( 70 minutes)
• Review of Basics: 2-stage Stochastic Linear Programs
(SLP)
• Deterministic Algorithms for 2-stage SLP
– Subgradients of the Expected Recourse Function
– Subgradient Methods
– Deterministic Decomposition (Kelley/Benders/L-shaped)
• Stochastic Algorithms for 2-stage SLP
– Stochastic Quasi-gradient Methods
– Sample Average Approximation
– Stochastic Decomposition (SD) Algorithm
• Computational Results
Epstein Department of ISE
3. Review of Basics: 2-stage
Stochastic Linear Programs
Epstein Department of ISE
4. The commonly stated 2-stage SLP
(will be stated again, as needed)
Epstein Department of ISE
9. Subgradient Method
(Shor/Polyak/Nesterov/Nemirovski…)
Interchange of Expectation and
• At iteration k let be given Subdifferentiation is required here
• Let
• Then, where denotes the
projection operator on the set of the decisions
and,
• As mentioned earlier, is very difficult to compute!
• No concern about loss of convexity due to sampling
Epstein Department of ISE
10. Strengths and Weaknesses of Subgradient Methods
• Strengths
– Easy to Program, no master problem, and easily parallelizable
– Recently there have been improvements in step size rules
• Weaknesses
– Difficult to establish lower bounds (optimality, in general)
– Traditional step-size rules (e.g. Constant/k) Need a lot of fine tuning
– Convergence
• Method makes good progress early on, but like other steepest-descent type
methods, there is zig-zagging behavior
– Need ways to stop the algorithm
• Difficult because upper and lower bounds on objective values are difficult to
obtain
Epstein Department of ISE
11. Kelley’s Cutting Plane/Benders’/L-shaped
Decomposition for 2-stage SLP
• Let be a random variable defined on a
probability space
• Then a stochastic program is given by
Epstein Department of ISE
12. KBL Decomposition (J. Benders/Van Slyke/Wets)
• At iteration k let , and be given. Recall
• Then define
• Let
• Then,
Epstein Department of ISE
13. Comparing Subgradient Method and KBL Decomposition
• Both evaluate subgradients
• Expensive Operation (requires solving as many second-stage
LPs as there are scenarios)
• Step size in KBL is implicit (user need not worry)
• Master program grows without bound and looks unstable in
the early rounds
• Stopping rule is automatic (Upper Bound – Lower Bound ≤ ε)
• KBL’s use of master can be a bottleneck for parallelization
Epstein Department of ISE
15. Regularization of the Master Problem
(Ruszczynski/Kiwiel/Lemarechal …)
• Addresses the following issue:
– Master program grows without bound and looks unstable in
the early rounds
• Include an incumbent and a proximity measure
from the incumbent, using σ >0 as a weight:
• Particularly useful for Stochastic Decomposition.
Epstein Department of ISE
17. Some “concrete” instances
Table 1: SP Test Instances
Problem Domain # of 1st # of # of Universe Comment
Name stage 2nd random of
variab stage variables scenarios
les variabl
es
LandS Generation 4 12 3 Made-up
20TERM Logistics 63 764 40 Semi-real
SSN Telecom 89 706 86 Semi-real
STORM Logistics 121 1259 117 Semi-real
Notice the size of scenarios. Using a deterministic algorithm would be a “non-starter”
Epstein Department of ISE
18. Numerous Large-scale Applications
• Wind-integration in Economic Dispatch
– How should conventional resources be
dispatched in the presence of intermittent
resources?
• Supply chain planning
– Inventory Transhipment between Regional
Distribution Centers, Local Warehouses, Outlets
• “Everyone” wants to “solve” Stochastic Unit
Commitment
Epstein Department of ISE
19. So what do we mean by “solve”?
I. At the very least
– An algorithm which, under specified assumptions,
will provide
• A first-stage decision, with known metrics of optimality.
That is, report statistically quantified error
• Be reasonably fast on easily accessible machines
II. There are other things that people want
Epstein Department of ISE
20. Stochastic Quasi-gradient Method (SQG) (Ermoliev/
Gaivoronski/Lan/Nemirovski/Uryasev …)
• At iteration k let be given. Sample
• Replace of subgradient optimization with its
unbiased estimate
• Then,
with
and
Epstein Department of ISE
21. Comments on SQG
• Because this is a sampling-based algorithm, you must replicate so that you
can estimate “variability” (of what?)
• If you replicate M times, you will get M first-stage decisions. Which of
these should you use?
– Could evaluate each of the M first-stage decisions, and then choose the one with
smallest objective estimate
– For realistic models, this can be computationally time-consuming
• Could simply choose the mean of replications
– Less expensive, but may be unreliable
• But most importantly, it does not provide lower bounds
Epstein Department of ISE
22. Sample Average Approximation (SAA, Shapiro, Kleywegt,
Homem-de-Mello, Linderoth, Wright ….)
• Choose a sample size N; Solve a sampled problem; Repeat M times
• Since you replicate M times, you will get M decisions. Which of these
should you use?
– Could evaluate each of the M decisions, and then choose the one with
smallest estimate
• For realistic models, this can be computationally time-consuming
– Could simply choose the mean of replications
• Less expensive, but may be unreliable
• Very widely cited, sometimes misused (because some non-expert users
appear to choose M=1)!
Epstein Department of ISE
23. Stochastic Decomposition (SD)
• Allow arbitrarily many outcomes (scenarios)
including continuous random variables
• Perhaps interface with a simulator …
Epstein Department of ISE
24. Some Central Questions for SP
• Instead of choosing a sample size at the start, can we
decide how much to sample, on-the-fly?
– The analogous question for nonlinear programming would
be: instead of choosing the number of iterations at the
start, can we decide the number of iterations, on-the-fly? –
Yes!
– So can we do this with sample size selection? Perhaps!
• If we are to determine a sample size on-the-fly, what is
the smallest number by which to increment the sample
size, and yet guarantee asymptotic convergence?
Epstein Department of ISE
25. Under some assumptions …
• Fixed Recourse Matrix
• Current computer Implementations assume
– Relatively complete recourse (under revision)
– Second-stage cost is deterministic (under revision)
Epstein Department of ISE
26. Approximating the recourse
function in SD
• At the start of iteration k, sample one more
outcome … say ωk independently of
• Given let solve the following LP
• Define and calculate for
• Notice the mapping of outcomes to finitely many dualISE
Epstein Department of
vertices.
27. Approximation used in SD
• The estimated “cut” in SD is given by
• To calculate this “cut” requires one LP corresponding
to the most recent outcome and the “argmax”
operations at the bottom of the previous slide
• In addition all previous cuts need to be updated
… to make old cuts consistent with the changing
sample size over iterations.
Epstein Department of ISE
28. Update Previous Cuts
• Updating previously generated subgradients
– Why?
– Because … early cuts (based on small sizes) can
cut away the optimum for ever!
Expected Recourse Function
An Early Cut can Cause Trouble
Epstein Department of ISE
29. How do we get around this?
• Suppose we assume that we know a lower
bound (e.g. zero) on hn(x), then we can include
such a lower bound to the older cuts so that
these older cuts “fade” away.
Expected Recourse Function
An Early Cut can Cause Trouble
Epstein Department of ISE
30. Updating Previous Cuts.
• If we assume that all recourse function lower bounds
are uniformly zero,
– Then for t < k, the “cut” from iteration t has the following
form:
Epstein Department of ISE
31. Alternative Sampled Approximations
Sampled Expected
Recourse Function (SAA)
Expected Recourse Function
Updated
Cuts from
Previous Linearization of the
Iterations of SD Sampled ERF (SA or SQG)
Lower Bound on the
Linearization of the
sample mean (SD-cut)
Epstein Department of ISE
33. Incumbent Solutions and Proximal Master Program
(QP) … also called Regularized Master Program
• An incumbent is the best solution “estimated”
through iteration k and will be denoted
• Given an incumbent, we setup a proximal master
program as follows.
where,
Epstein Department of ISE
34. Benefits of the Proximal Master
• Can show that a finite master problem,
consisting of n1+3 optimality cuts is enough!
(Here n1 is the number of first-stage variables)
• Convergence can be proven to a unique limit
which is optimal (with probability 1).
• Stopping rules based on QP duality are
reasonably efficient.
Epstein Department of ISE
35. Algorithmic Summary
0. Initialize with the same candidate and incumbent x. k=1.
1. Use sampling to obtain an outcome
k
2. Derive an SD cut at the candidate, and the incumbent solutions.
This calls for
- solution of 2 subproblems using
k
- Add any new dual vertex to a list Vk
- for each prior { t }t 1
k
choose the best subproblem dual vertex seen thus far
3. Update cut t by multiplying coeffs by (t/k), t 1...k
4. Solve the updated QP master
5. Ascertain whether new candidate becomes new incumbent
6. If stopping rules are not met, increment k, and repeat from 1.
(Stopping rule is based on bootstrapping primal and dual QPs)
Epstein Department of ISE
36. Comparisons including 2-stage SD
FeatureMethod SQG Algorithm SAA Stochastic
Decomposition
Subgradient or Estimation Estimation Estimation
Estimation
Step Length Choice Yes Depends Not Needed
Required
Stopping Rules Unknown Well Studied Resolved
Parallel Computations Good Depends Not known
Continuous Random Yes Yes Yes
Variables
First-stage Integer No Yes Yes
Variables
Second-stage Integer No Yes No
Variables
Epstein Department of ISE
Of course for small instances, we can always try deterministic equivalents!
37. The Plan – Part II
• Resume (from Computational Results)
• How were these obtained?
– SD Solution Quality
• In-sample optimality tests
• Out-of-sample, Demo (Yifan Liu)
• An Example: Wind Energy with Sub-hourly
Dispatch
• Summary
Epstein Department of ISE
38. Recall this question: What do we mean by
“solve”?
I. At the very least
– An algorithm which, under specified assumptions,
will provide
• A first-stage decision, with known metrics of optimality.
That is, report statistically quantified error
• Be reasonably fast on easily accessible machines
II. There are other things that people want
Epstein Department of ISE
39. What else to people want?
There are other things that people want
• Evidence
– Experiment with some realistic instances
• Please note that CEP1 and PGP2 are not realistic. They are for
debugging!
– Some numerical controls
• E.g. Can we reduce “bias”/non-optimality?
• Output for decision support
– Dual estimates of first-stage
– Histograms
• Recourse function
• Dual prices of second-stage
Epstein Department of ISE
40. Some “concrete” instances
Table 1: SP Test Instances
Problem Domain # of 1st # of # of Universe Comment
Name stage 2nd random of
variab stage variables scenarios
les variabl
es
LandS Generation 4 12 3 Made-up
20TERM Logistics 63 764 40 Semi-real
SSN Telecom 89 706 86 Semi-real
STORM Logistics 121 1259 117 Semi-real
Notice the size of scenarios. Using a deterministic algorithm would be a “non-starter”
Epstein Department of ISE
41. Computational Results - SAA
Table 2: Statistical Quantification with SAA
SAA Estimates using a Source:
Computational Grid
Upper (UB) and • Linderoth, Shapiro and Wright,
Instance Name Lower Bounds Annals of Operations Research,
(LB)
Average Values 95% CI’s Vol. 142, pp. 215–241 (2006)
• Computational Configuration:
OBJ-UB 225.624 Grid Computing Using 100’s of PCs,
LandS
OBJ-LB 225.62 but only 100 PCs at any given time
OBJ-UB 254311.55
20TERM Comment by SS: In 2005, an
OBJ-LB 254298.57
average PC was a Pentium IV,
OBJ-UB 9.913
SSN Clock Speed: 2-2.4 GHz.
OBJ-LB 9.84
OBJ-UB 15498739.41 Each instance of SSN took 30-45
STORM
OBJ-LB 15498657.8 mins. of wall clock time.
Replications: 7-10
Epstein Department of ISE
42. One Example: SSN with Latin
Hypercube Sampling
Sample Size Lower Bound Upper Bound
50 10.10 (+/- 0.81) 11.38 (+/- 0.023)
100 8.90(+/- 0.36) 10.542 (+/-0.021)
500 9.87 (+/-0.22) 10.069(+/- 0.026)
1000 9.83(+/- 0.29) 9.996 (+/- 0.025)
5000 9.84 (+/- 0.10) 9.913 (+/- 0.022)
Epstein Department of ISE
43. PC Configuration for SD
• Mac Book Air
• Processor: Intel Core i5
• Clock Speed: 1.8 GHz
• 4 GB of 1600 MHz DDR3 Memory
• Replications: 30 for each.
Epstein Department of ISE
44. Computational Results - SD
Table 3: Statistical Quantification with SAA and SD
SAA Estimates using a
SD Estimates using a Laptop
Computational Grid
Upper (UB) and
Instance Name Lower Bounds % Difference in
(LB) Avg.Values
Average Values 95% CI’s Average Values 95% CI’s
OBJ-UB 225.624 225.54 0.037%
LandS
OBJ-LB 225.62 225.24 0.168%
OBJ-UB 254311.55 254476.87 0.065%
20TERM
OBJ-LB 254298.57 253905.44 0.154%
OBJ-UB 9.913 9.91 0.03%
SSN
OBJ-LB 9.84 9.76 0.813%
OBJ-UB 15498739.41 15498624.37 0.0007%
STORM
OBJ-LB 15498657.8 15496619.98 0.013%
Epstein Department of ISE
45. SD Solution quality and time
• Solutions are of comparable quality
• Processors are somewhat similar
• Solution times
– The comparable time for SSN is 50 mins for 30 replications on one
processor
– Compare:
– (30-45) mins x (7 – 10) replications x 100 procs (21000 – 45000)
processor mins
– Note: This is only time for sample size of 5000. (But remember, there
were other sample sizes: 50, 100, 500, …, which we didn’t count)
• Are we beating Moore’s Law? Yes, doubling computational speed
every 9 months?
Epstein Department of ISE
46. How were these obtained?
• In-sample stopping rules: Lower Bounds
– Check Stability in Set of Dual Vertices
– Bootstrapped duality gap estimate
• The latter tests whether the current primal and dual
solutions from the Proximal Master Program are also
reasonably “good” solutions for Primal and Dual
Problems of a Resampled Proximal Master over
Multiple Replications
• Out-of-sample: Upper Bounds
Epstein Department of ISE
48. 1.0
1.0
0.8
0.8
20TERM
0.6
0.6
LandS
Pi-Ratio
Pi-Ratio
0.4
0.4
0.2
0.2
0.0
0.0
Iteration Number Iteration Number
0 100 200 300 0 200 400 600 800
1.0
1.0
0.8
0.8
STORM
0.6
0.6
SSN
Pi-Ratio
Pi-Ratio
0.4
0.4
0.2
0.2
0.0
0.0
Iteration Number Iteration Number
0 500 1000 1500 2000 2500 0 500 1000 1500 2000
Epstein Department of ISE
49. Primal and Dual Regularized Values
Epstein Department of ISE
50. LB And UB of SSN objective function estimates
1.0
Tight Tolerance
0.8
Nominal Tolerance
0.6
Fn(x)
Loose Tolerance
0.4
Bias/Non-opt
0.2
Reduction
Tight Tol.
0.0
9.5 10.0 10.5 11.0
Nominal Tol.
x Loose Tol.
Epstein Department of ISE
51. SSN: Sonet-Switched Network
Solution and evaluation time for 20’s replications (Tight)
Replication No. Solution Time (s) Evaluation Time (s)
0 313.881051 588.627184
1 203.690471 651.042227
2 465.949416 547.517996
Evaluation value ranges 3
4
313.606587 589.429828
355.160629 599.331808
from 9.944203 to 5 334.764385 616.674899
6 529.661100 604.565630
10.279154 7 327.888471 545.132039
8 169.432233 655.688530
(3.3% difference) 9 301.293535 604.098964
10 697.304541 567.361433
11 315.097318 532.595026
12 247.439006 555.664374
13 560.934417 577.258900
14 342.787909 506.836774
15 247.356803 570.577690
16 184.941379 517.999214
17 339.951593 589.265958
18 339.381007 579.033263
19 411.092300 602.349102
Total 7001.614151 11601.050839
Epstein Department of ISE
52. Solutions
• Mean of Replications: Average all Solutions
• For each seed s, let
where fs denotes the final approximation for seed s
• Compromise solution:
Epstein Department of ISE
53. Upper Bounds
• For each replication, the objective function
evaluations can be very time consuming
• We report Objective of both Mean and
Compromise Solutions
Epstein Department of ISE
54. Main Take Away
In SP,
Numerical Optimization Meets Statistics
So When You Design Algorithms, Don’t Forget What
You Need to Deliver: Statistics
Most Numerical Optimization Methods were Not
Designed for This Goal.
• Does Speed-up with SD beat Moore’s Law? Yes,
doubling computational speed every 9 months!
Epstein Department of ISE