Stochastic Optimization is
(almost) as easy as Deterministic
Optimization
Chaitanya Swamy
Joint work with David Shmoys
done while at Cornell University
Stochastic Optimization
• Way of modeling uncertainty.
• Exact data is unavailable or expensive – data is
uncertain, specified by a probability distribution.
Want to make the best decisions given this
uncertainty in the data.
• Applications in logistics, transportation models,
financial instruments, network design, production
planning, …
• Dates back to 1950’s and the work of Dantzig.
An Example
Components Assembly Products
Assemble To Order system
Demand is a random variable.
Want to decide inventory levels
now anticipating future demand.
Then get to know actual demand.
Can adjust inventory levels after
observing demand.
Excess demand  can buy more
stock paying more at the last minute
Low demand  can stock inventory
paying a holding cost
Two-Stage Recourse Model
Given : Probability distribution over inputs.
Stage I : Make some advance decisions – plan ahead
or hedge against uncertainty.
Observe the actual input scenario.
Stage II : Take recourse. Can augment earlier
solution paying a recourse cost.
Choose stage I decisions to minimize
(stage I cost) + (expected stage II recourse cost).
2-Stage Stochastic Facility
Location
Distribution over clients gives
the set of clients to serve.
client set D
facility
Stage I: Open some facilities in
advance; pay cost fi for facility i.
Stage I cost = ∑(i opened) fi .
stage I facility
2-Stage Stochastic Facility
Location
Distribution over clients gives
the set of clients to serve.
client set D
facility
Stage I: Open some facilities in
advance; pay cost fi for facility i.
Stage I cost = ∑(i opened) fi .
stage I facility
How is the probability distribution on clients specified?
• A short (polynomial) list of possible scenarios
• Independent probabilities that each client exists
• A black box that can be sampled.
2-Stage Stochastic Facility
Location
Distribution over clients gives
the set of clients to serve.
facility
Stage I: Open some facilities in
advance; pay cost fi for facility i.
Stage I cost = ∑(i opened) fi .
stage I facility
Actual scenario A = { clients to serve}, materializes.
Stage II: Can open more facilities to serve clients in A; pay
cost fi
A
to open facility i. Assign clients in A to facilities.
Stage II cost = ∑ fi
A
+ (cost of serving clients in A).
i opened in
scenario A
2-Stage Stochastic Facility
Location
Distribution over clients gives
the set of clients to serve.
facility
Stage I: Open some facilities in
advance; pay cost fi for facility i.
Stage I cost = ∑(i opened) fi .
stage I facility
Actual scenario A = { clients to serve}, materializes.
Stage II: Can open more facilities to serve clients in A; pay
cost fi
A
to open facility i. Assign clients in A to facilities.
Stage II cost = ∑ fi
A
+ (cost of serving clients in A).
i opened in
scenario A
Want to decide which facilities to open in stage I.
Goal: Minimize Total Cost =
(stage I cost) + EA D [stage II cost for A].
Two extremes:
1. Ultra-conservative: plan for “everything” and buy only
in stage I.
2. Defer all decisions to stage II.
Both strategies may be sub-optimal.
Want to prove a worst-case guarantee.
Give an algorithm that “works well” on any instance,
and for any probability distribution.
Approximation Algorithm
Hard to solve the problem exactly – even special
cases are #P-hard.
Settle for approximate solutions.
Give polytime algorithm that always finds near-
optimal solutions.
A is a -approximation algorithm if,
•A runs in polynomial time.
•A(I) ≤ .OPT(I) on all instances I,
 is called the approximation ratio of A.
Previous Models Considered
• Dye, Stougie & Tomasgard:
– approx. algorithm for a resource provisioning problem.
– polynomial-scenario model.
• Ravi & Sinha (RS04):
– other problems in the polynomial-scenario model.
• Immorlica, Karger, Minkoff & Mirrokni:
– polynomial-scenario model and independent-activation model.
– proportional costs: (stage II cost) = (stage I cost),
e.g., fi
A
= .fi for each facility i, in each scenario A.
• Gupta, Pal, Ravi & Sinha (GPRS04):
– black box model: arbitrary probability distributions.
– Also need proportional costs.
Our Results
• Give the first approximation algorithms for
2-stage stochastic integer optimization problems
– black-box model
– no assumptions on costs.
For some problems improve upon previous results
obtained in more restricted models.
• Give a fully polynomial approximation scheme for a large
class of 2-stage stochastic linear programs (contrast with
Nesterov & Vial ‘00, Kleywegt, Shapiro & Homem De-Mello ‘01, Dyer,
Kanan & Stougie ‘02).
• Give a way to “reduce” stochastic optimization problems
to their deterministic versions.
Stochastic Set Cover (SSC)
Universe U = {e1, …, en }, subsets S1, S2, …, Sm
 U, set S has
weight wS.
Deterministic problem: Pick a minimum weight collection of
sets that covers each element.
Stochastic version: Set of elements to be covered is given by
a probability distribution.
– choose some sets initially paying wS for set S
– subset A  U to be covered is revealed
– can pick additional sets paying wS
A
for set S.
Minimize (w-cost of sets picked in stage I) +
EA U [wA
-cost of new sets picked for scenario A].
An Integer Program
For simplicity, consider wS
A
= WS for every scenario A.
pA : probability of scenario A  U.
xS : indicates if set S is picked in stage I.
yA,S : indicates if set S is picked in scenario A.
Minimize ∑S SxS + ∑AU pA ∑S WSyA,S
subject to,
∑S:eS xS + ∑S:eS yA,S ≥ 1 for each A  U, eA
xS, yA,S  {0,1} for each S, A.
A Linear Program
xS, yA,S ≥ 0 for each S, A
Exponential number of variables and exponential number
of constraints.
A Rounding Theorem
Assume LP can be solved in polynomial time.
Suppose for the deterministic problem, we have an
-approximation algorithm wrt. the LP relaxation, i.e.,
A such that A(I) ≤ .(optimal LP solution for I)
for every instance I.
e.g., “the greedy algorithm” for set cover is a
log n-approximation algorithm wrt. LP relaxation.
Theorem: Can use such an -approx. algorithm to get a
2-approximation algorithm for stochastic set cover.
Rounding the LP
Assume LP can be solved in polynomial time.
Suppose we have an -approximation algorithm wrt. the LP
relaxation for the deterministic problem.
Let E = {e : ∑S:eS xS ≥ ½}.
So (2x) is a fractional set cover for the set E can round to get
an integer set cover S for E of cost ∑SS S ≤ (∑S 2SxS) .
S is the first stage decision.
Let (x,y) : optimal solution with cost OPT.
∑S:eS xS + ∑S:eS yA,S ≥ 1 for each A  U, eA
 for every element e, either
∑S:eS xS ≥ ½ OR in each scenario A : eA, ∑S:eS yA,S ≥ ½.
Sets
Elements
Rounding (contd.)
Set in S
Element in E
Consider any scenario A. Elements in A E are covered.
For every e AE, it must be that ∑S:eS yA,S ≥ ½.
So (2yA
) is a fractional set cover for AE  can round to
get a set cover of W-cost ≤ (∑S 2WSyA,S) .
A
Using this to augment S in scenario A, expected cost
≤ ∑SS S + 2.∑ AU pA (∑S WSyA,S) ≤ 2.OPT.
An -approx. algorithm for deterministic problem gives a
2-approximation guarantee for stochastic problem.
In the polynomial-scenario model, gives simple polytime
approximation algorithms for covering problems.
• 2log n-approximation for SSC.
• 4-approximation for stochastic vertex cover.
• 4-approximation for stochastic multicut on trees.
Ravi & Sinha gave a log n-approximation algorithm for
SSC, 2-approximation algorithm for stochastic vertex
cover in the polynomial-scenario model.
Rounding (contd.)
Let E = {e : ∑S:eS xS ≥ ½}.
So (2x) is a fractional set cover for the set E can round to get
an integer set cover S of cost ∑SS S ≤ (∑S 2SxS) .
S is the first stage decision.
Rounding the LP
Assume LP can be solved in polynomial time.
Suppose we have an -approximation algorithm wrt. the LP
relaxation for the deterministic problem.
Let (x,y) : optimal solution with cost OPT.
∑S:eS xS + ∑S:eS yA,S ≥ 1 for each A  U, eA
 for every element e, either
∑S:eS xS ≥ ½ OR in each scenario A : eA, ∑S:eS yA,S ≥ ½.
A Compact Formulation
pA : probability of scenario A  U.
xS : indicates if set S is picked in stage I.
Minimize h(x) = ∑S SxS + f(x) s.t. xS ≥ 0 for each S
(SSC-P)
where, f(x) = ∑AU pAfA(x)
and fA(x) = min. ∑S WSyA,S
s.t. ∑S:eS yA,S ≥ 1 – ∑S:eS xS for each eA
yA,S ≥ 0 for each S.
Equivalent to earlier LP.
Each fA(x) is convex, so f(x) and h(x) are convex functions.
The General Strategy
1. Get a (1+)-optimal solution (x) to the convex
program using the ellipsoid method.
2. Convert fractional solution (x) to integer solution
– decouple stage I and stage II scenarios
– use -approx. algorithm for the deterministic problem
to solve subproblems.
Obtain a c.-approximation algorithm for the
stochastic integer problem.
If yi is infeasible, use violated inequality
to chop off infeasible half-ellipsoid.
The Ellipsoid Method
Ellipsoid  squashed sphere
Start with ball containing polytope P.
yi = center of current ellipsoid.
Min c.x subject to xP.
P
The Ellipsoid Method
Min c.x subject to xP.
P
If yi is infeasible, use violated inequality
to chop off infeasible half-ellipsoid.
New ellipsoid = min. volume ellipsoid
containing “unchopped” half-ellipsoid.
Ellipsoid  squashed sphere
Start with ball containing polytope P.
yi = center of current ellipsoid.
The Ellipsoid Method
Min c.x subject to xP.
If yi is infeasible, use violated inequality
to chop off infeasible half-ellipsoid.
New ellipsoid = min. volume ellipsoid
containing “unchopped” half-ellipsoid.
If yi P, use objective function cut
c.x ≤ c.yi to chop off polytope, half-
ellipsoid.
c.x ≤ c.yi
Ellipsoid  squashed sphere
Start with ball containing polytope P.
yi = center of current ellipsoid.
P
The Ellipsoid Method
Min c.x subject to xP.
Ellipsoid  squashed sphere
Start with ball containing polytope P.
yi = center of current ellipsoid.
If yi is infeasible, use violated inequality
to chop off infeasible half-ellipsoid.
New ellipsoid = min. volume ellipsoid
containing “unchopped” half-ellipsoid.
If yi P, use objective function cut
c.x ≤ c.yi to chop off polytope, half-
ellipsoid.
P
The Ellipsoid Method
Min c.x subject to xP.
P
x1, x2, …, xk: points lying in P. c.xk is a close to optimal value.
Ellipsoid  squashed sphere
Start with ball containing polytope P.
yi = center of current ellipsoid.
If yi is infeasible, use violated inequality
to chop off infeasible half-ellipsoid.
New ellipsoid = min. volume ellipsoid
containing “unchopped” half-ellipsoid.
If yi P, use objective function cut
c.x ≤ c.yi to chop off polytope, half-
ellipsoid.
x1
x2
xk
x*
Ellipsoid for Convex Optimization
Min h(x) subject to xP.
P
Start with ball containing polytope P.
yi = center of current ellipsoid.
If yi is infeasible, use violated inequality.
If yi P – how to make progress?
add inequality h(x) ≤ h(yi)? Separation
becomes difficult.
yi
h(x) ≤ h(yi)
Let d = subgradient at yi.
use subgradient cut d.(x–yi) ≤ 0.
Generate new min. volume ellipsoid.
Ellipsoid for Convex Optimization
Min h(x) subject to xP.
P
Start with ball containing polytope P.
yi = center of current ellipsoid.
If yi P – how to make progress?
d 
m
is a subgradient of h(.) at u, if for every v, h(v)-h(u) ≥ d.(v-u).
add inequality h(x) ≤ h(yi)? Separation
becomes difficult.
If yi is infeasible, use violated inequality.
d
yi
h(x) ≤ h(yi)
Ellipsoid for Convex Optimization
Min h(x) subject to xP.
P
Start with ball containing polytope P.
yi = center of current ellipsoid.
If yi P – how to make progress?
d 
m
is a subgradient of h(.) at u, if for every v, h(v)-h(u) ≥ d.(v-u).
Let d = subgradient at yi.
use subgradient cut d.(x–yi) ≤ 0.
Generate new min. volume ellipsoid.
x1, x2, …, xk: points in P. Can show, mini=1…k h(xi) ≤ OPT+.
x*
x1
x2
add inequality h(x) ≤ h(yi)? Separation
becomes difficult.
If yi is infeasible, use violated inequality.
Let d' = -subgradient at yi.
use -subgradient cut d'.(x–yi) ≤ 0.
Ellipsoid for Convex Optimization
Min h(x) subject to xP.
P
x1, x2, …, xk: points in P. Can show, mini=1…k h(xi) ≤ OPT/(1-) + .
Start with ball containing polytope P.
yi = center of current ellipsoid.
If yi P – how to make progress?
add inequality h(x) ≤ h(yi)? Separation
becomes difficult.
subgradient is difficult to compute.
If yi is infeasible, use violated inequality.
d' 
m
is a -subgradient of h(.) at u, if vP, h(v)-h(u) ≥ d'.(v-u) – .h(u).
d'
yi
h(x) ≤ h(yi)
Subgradients and -subgradients
Vector d is a subgradient of h(.) at u,
if for every v, h(v) - h(u) ≥ d.(v-u).
Vector d' is an -subgradient of h(.) at u,
if for every vP, h(v) - h(u) ≥ d'.(v-u) – .h(u).
P = { x : 0 ≤ xS ≤ 1 for each set S }.
h(x) = ∑S SxS + ∑AU pA fA(x) = .x + ∑AU pA fA(x)
Lemma: Let d be a subgradient at u, and d' be a vector
such that dS – S ≤ d'S ≤ dS for each set S. Then,
d' is an -subgradient at point u.
Getting a “nice” subgradient
h(x) = .x + ∑AU pA fA(x)
fA(x) = min. ∑S WSyA,S
s.t. ∑S:eS yA,S ≥ 1 – ∑S:eS xS
eA
yA,S ≥ 0 S
Getting a “nice” subgradient
h(x) = .x + ∑AU pA fA(x)
fA(x) = min. ∑S WSyA,S = max. ∑eA (1 – ∑S:eS xS) zA,e
s.t. ∑S:eS yA,S ≥ 1 – ∑S:eS xS s.t. ∑eA S zA,e ≤ WS
eA S
yA,S ≥ 0 S zA,e ≥ 0 eA
Getting a “nice” subgradient
h(x) = .x + ∑AU pA fA(x)
fA(x) = min. ∑S WSyA,S = max. ∑e (1 – ∑S:eS xS) zA,e
s.t. ∑S:eS yA,S ≥ 1 – ∑S:eS xS s.t. ∑eS zA,e ≤ WS
eA S
yA,S ≥ 0 S zA,e = 0 eA, zA,e ≥ 0 e
Consider point u m
. Let zA  optimal dual solution for A at u. So
fA(u) = ∑e (1 – ∑S:eS uS) zA,e.
For any other point v, zA is a feasible dual solution for A. So
fA(v) ≥ ∑e (1 – ∑S:eS vS) zA,e.
Get that h(v) – h(u) ≥ ∑S (S – ∑AU pA ∑eS zA,e)(vS – uS) = d.(v-u) where
dS = S – ∑AU pA ∑eS zA,e. So d is a subgradient of h(.) at point u.
Computing an -Subgradient
Given point u 
m
. zA  optimal dual solution for A at u.
Vector d with dS = S – ∑AU pA ∑eS zA,e
= ∑AU pA(S – ∑eS zA,e) is a subgradient at u.
Goal: Get d' such that dS – S ≤ d'S ≤ dS for each S.
For each S, -WS ≤ dS ≤ S. Let  = maxS WS /S.
• Sample once from black box to get random scenario A.
Compute XS = S – ∑eS zA,e for each S.
E[XS] = dS and Var[XS] ≤ WS.
2
• Sample O(2
/2.log(m/)) times to compute d' such that
Pr[S, dS – S ≤ d'S ≤ dS] ≥ 1-
 d' is an -subgradient at u with probability ≥ 1-
Putting it all together
Min h(x) subject to xP.  Can compute -subgradients.
Run ellipsoid algorithm.
Given yi = center of current ellipsoid.
Continue with smaller ellipsoid.
If yi is infeasible, use violated
inequality as a cut.
If yi P use -subgradient cut.
P
x1
x2
xk
x*
Generate points x1, x2, …, xk in P. Return x = argmini=1…k h(xi).
Get that h(x) ≤ OPT/(1-) + .
Cannot evaluate h(.) – how to compute x = argmini=1…k h(xi)?
One last hurdle
xk
Will find point x in the convex hull of x1,…, xk such that h(x)
is close to mini=1…k h(xi).
x1
x2 x1 x2
Take points x1 and x2.
Do bisection search using -subgradients
to find point x on x1–x2 line segment with
value close to min(h(x1), h(x2)).
d' : -subgradient
d'.(x-y) ≥ 0
y
One last hurdle
Cannot evaluate h(.) – how to compute x = argmini=1…k h(xi)?
Will find point x in the convex hull of x1,…, xk such that h(x)
is close to mini=1…k h(xi).
xk
x2 x2
x1
x1
Take points x1 and x2.
Do bisection search using -subgradients
to find point x on x1–x2 line segment with
value close to min(h(x1), h(x2)).
One last hurdle
Cannot evaluate h(.) – how to compute x = argmini=1…k h(xi)?
Will find point x in the convex hull of x1,…, xk such that h(x)
is close to mini=1…k h(xi).
xk
x2 x2
x1
x1
Take points x1 and x2.
Do bisection search using -subgradients
to find point x on x1–x2 line segment with
value close to min(h(x1), h(x2)).
Take points x1 and x2.
Do bisection search using -subgradients
to find point x on x1–x2 line segment with
value close to min(h(x1), h(x2)).
One last hurdle
Cannot evaluate h(.) – how to compute x = argmini=1…k h(xi)?
Will find point x in the convex hull of x1,…, xk such that h(x)
is close to mini=1…k h(xi).
xk
x2
x1
x2
Stop when search interval is small enough.
Set x = either end point of remaining segment.
x1
x x
Take points x1 and x2.
Do bisection search using -subgradients
to find point x on x1–x2 line segment with
value close to min(h(x1), h(x2)).
Iterate using x and x3,…, xk updating x
along the way.
One last hurdle
Cannot evaluate h(.) – how to compute x = argmini=1…k h(xi)?
Will find point x in the convex hull of x1,…, xk such that h(x)
is close to mini=1…k h(xi).
xk
x2
x1
x
x
Take points x1 and x2.
Do bisection search using -subgradients
to find point x on x1–x2 line segment with
value close to min(h(x1), h(x2)).
Iterate using x and x3,…, xk updating x
along the way.
One last hurdle
Cannot evaluate h(.) – how to compute x = argmini=1…k h(xi)?
Will find point x in the convex hull of x1,…, xk such that h(x)
is close to mini=1…k h(xi).
xk
x2
x1
One last hurdle
Cannot evaluate h(.) – how to compute x = argmini=1…k h(xi)?
Will find point x in the convex hull of x1,…, xk such that h(x)
is close to mini=1…k h(xi).
xk
x2
x1
Take points x1 and x2.
Do bisection search using -subgradients
to find point x on x1–x2 line segment with
value close to min(h(x1), h(x2)).
Iterate using x and x3,…, xk updating x
along the way.
One last hurdle
Cannot evaluate h(.) – how to compute x = argmini=1…k h(xi)?
Will find point x in the convex hull of x1,…, xk such that h(x)
is close to mini=1…k h(xi).
xk
x2
x
x1
Can show that h(x) is close to mini=1…k h(xi).
Take points x1 and x2.
Do bisection search using -subgradients
to find point x on x1–x2 line segment with
value close to min(h(x1), h(x2)).
Iterate using x and x3,…, xk updating x
along the way.
Finally,
Get solution x with h(x) close to OPT.
Sample initially to detect if OPT = Ω(1/) – this
allows one to get a (1+).OPT guarantee.
Theorem: (SSC-P) can be solved to within a factor
of (1 +) in polynomial time, with high probability.
Gives a (2log n+)-approximation algorithm for
the stochastic set cover problem.
A Solvable Class of Stochastic LPs
Minimize h(x) = w.x + ∑AU pAfA(x)
s.t. x m
, x ≥ 0, x P
where fA(x) = min. wA.yA + cA.rA
s.t. BA
rA ≥ jA
DA
rA + TA
yA ≥ lA
– TA
x
yA 
m
, rA 
n
, yA ≥ 0, rA ≥ 0.
Theorem: Can get a (1+)-optimal solution for this class of
stochastic programs in polynomial time.
Sample Average Approximation
Sample Average Approximation (SAA) method:
– Sample initially N times from scenario distribution
– Solve 2-stage problem estimating pA with frequency of occurrence
of scenario A
How large should N be?
Kleywegt, Shapiro & Homem De-Mello (KSH01):
– bound N by variance of a certain quantity – need not be
polynomially bounded even for our class of programs.
ShmoysS (recent work):
– show using -subgradients that for our class, N can be poly-
bounded.
Nemirovskii & Shapiro:
– show that for SSC with non-scenario dependent costs, KSH01 gives
polynomial bound on N for (preprocessing + SAA) algorithm.
Summary of Results
• Give the first FPRAS to solve a class of stochastic
linear programs.
• Obtain the first approximation algorithms for
2-stage discrete stochastic problems – no
assumptions on costs or probability distribution.
– 2log n-approx. for set cover
– 4-approx. for vertex cover (VC) and multicut on trees.
GPRS04 gave an 8-approx. for VC under
proportional costs in the black box model.
Results (contd.)
– 3.23-approx. for uncapacitated facility location (FL). Get
constant guarantees for several variants such as FL with
penalties, or soft capacities, or services.
Improve upon results of GPRS04, RS04 obtained in
restricted models.
– (1+)-approx. for multicommodity flow.
Immorlica et al. gave optimal algorithm for single-
commodity in polynomial “scenario” (demand) case.
• Give a general technique to lift deterministic
guarantees to stochastic setting.
Open Questions
• Practical Impact: Can one use -subgradients
in other deterministic optimization methods,
e.g., cutting plane methods? Interior-point
algorithms?
• Multi-stage stochastic optimization.
• Characterize which deterministic problems
are “easy to solve” in stochastic setting, and
which problems are “hard”.
Thank You.

stochopt-long as part of stochastic optimization

  • 1.
    Stochastic Optimization is (almost)as easy as Deterministic Optimization Chaitanya Swamy Joint work with David Shmoys done while at Cornell University
  • 2.
    Stochastic Optimization • Wayof modeling uncertainty. • Exact data is unavailable or expensive – data is uncertain, specified by a probability distribution. Want to make the best decisions given this uncertainty in the data. • Applications in logistics, transportation models, financial instruments, network design, production planning, … • Dates back to 1950’s and the work of Dantzig.
  • 3.
    An Example Components AssemblyProducts Assemble To Order system Demand is a random variable. Want to decide inventory levels now anticipating future demand. Then get to know actual demand. Can adjust inventory levels after observing demand. Excess demand  can buy more stock paying more at the last minute Low demand  can stock inventory paying a holding cost
  • 4.
    Two-Stage Recourse Model Given: Probability distribution over inputs. Stage I : Make some advance decisions – plan ahead or hedge against uncertainty. Observe the actual input scenario. Stage II : Take recourse. Can augment earlier solution paying a recourse cost. Choose stage I decisions to minimize (stage I cost) + (expected stage II recourse cost).
  • 5.
    2-Stage Stochastic Facility Location Distributionover clients gives the set of clients to serve. client set D facility Stage I: Open some facilities in advance; pay cost fi for facility i. Stage I cost = ∑(i opened) fi . stage I facility
  • 6.
    2-Stage Stochastic Facility Location Distributionover clients gives the set of clients to serve. client set D facility Stage I: Open some facilities in advance; pay cost fi for facility i. Stage I cost = ∑(i opened) fi . stage I facility How is the probability distribution on clients specified? • A short (polynomial) list of possible scenarios • Independent probabilities that each client exists • A black box that can be sampled.
  • 7.
    2-Stage Stochastic Facility Location Distributionover clients gives the set of clients to serve. facility Stage I: Open some facilities in advance; pay cost fi for facility i. Stage I cost = ∑(i opened) fi . stage I facility Actual scenario A = { clients to serve}, materializes. Stage II: Can open more facilities to serve clients in A; pay cost fi A to open facility i. Assign clients in A to facilities. Stage II cost = ∑ fi A + (cost of serving clients in A). i opened in scenario A
  • 8.
    2-Stage Stochastic Facility Location Distributionover clients gives the set of clients to serve. facility Stage I: Open some facilities in advance; pay cost fi for facility i. Stage I cost = ∑(i opened) fi . stage I facility Actual scenario A = { clients to serve}, materializes. Stage II: Can open more facilities to serve clients in A; pay cost fi A to open facility i. Assign clients in A to facilities. Stage II cost = ∑ fi A + (cost of serving clients in A). i opened in scenario A
  • 9.
    Want to decidewhich facilities to open in stage I. Goal: Minimize Total Cost = (stage I cost) + EA D [stage II cost for A]. Two extremes: 1. Ultra-conservative: plan for “everything” and buy only in stage I. 2. Defer all decisions to stage II. Both strategies may be sub-optimal. Want to prove a worst-case guarantee. Give an algorithm that “works well” on any instance, and for any probability distribution.
  • 10.
    Approximation Algorithm Hard tosolve the problem exactly – even special cases are #P-hard. Settle for approximate solutions. Give polytime algorithm that always finds near- optimal solutions. A is a -approximation algorithm if, •A runs in polynomial time. •A(I) ≤ .OPT(I) on all instances I,  is called the approximation ratio of A.
  • 11.
    Previous Models Considered •Dye, Stougie & Tomasgard: – approx. algorithm for a resource provisioning problem. – polynomial-scenario model. • Ravi & Sinha (RS04): – other problems in the polynomial-scenario model. • Immorlica, Karger, Minkoff & Mirrokni: – polynomial-scenario model and independent-activation model. – proportional costs: (stage II cost) = (stage I cost), e.g., fi A = .fi for each facility i, in each scenario A. • Gupta, Pal, Ravi & Sinha (GPRS04): – black box model: arbitrary probability distributions. – Also need proportional costs.
  • 12.
    Our Results • Givethe first approximation algorithms for 2-stage stochastic integer optimization problems – black-box model – no assumptions on costs. For some problems improve upon previous results obtained in more restricted models. • Give a fully polynomial approximation scheme for a large class of 2-stage stochastic linear programs (contrast with Nesterov & Vial ‘00, Kleywegt, Shapiro & Homem De-Mello ‘01, Dyer, Kanan & Stougie ‘02). • Give a way to “reduce” stochastic optimization problems to their deterministic versions.
  • 13.
    Stochastic Set Cover(SSC) Universe U = {e1, …, en }, subsets S1, S2, …, Sm  U, set S has weight wS. Deterministic problem: Pick a minimum weight collection of sets that covers each element. Stochastic version: Set of elements to be covered is given by a probability distribution. – choose some sets initially paying wS for set S – subset A  U to be covered is revealed – can pick additional sets paying wS A for set S. Minimize (w-cost of sets picked in stage I) + EA U [wA -cost of new sets picked for scenario A].
  • 14.
    An Integer Program Forsimplicity, consider wS A = WS for every scenario A. pA : probability of scenario A  U. xS : indicates if set S is picked in stage I. yA,S : indicates if set S is picked in scenario A. Minimize ∑S SxS + ∑AU pA ∑S WSyA,S subject to, ∑S:eS xS + ∑S:eS yA,S ≥ 1 for each A  U, eA xS, yA,S  {0,1} for each S, A. A Linear Program xS, yA,S ≥ 0 for each S, A Exponential number of variables and exponential number of constraints.
  • 15.
    A Rounding Theorem AssumeLP can be solved in polynomial time. Suppose for the deterministic problem, we have an -approximation algorithm wrt. the LP relaxation, i.e., A such that A(I) ≤ .(optimal LP solution for I) for every instance I. e.g., “the greedy algorithm” for set cover is a log n-approximation algorithm wrt. LP relaxation. Theorem: Can use such an -approx. algorithm to get a 2-approximation algorithm for stochastic set cover.
  • 16.
    Rounding the LP AssumeLP can be solved in polynomial time. Suppose we have an -approximation algorithm wrt. the LP relaxation for the deterministic problem. Let E = {e : ∑S:eS xS ≥ ½}. So (2x) is a fractional set cover for the set E can round to get an integer set cover S for E of cost ∑SS S ≤ (∑S 2SxS) . S is the first stage decision. Let (x,y) : optimal solution with cost OPT. ∑S:eS xS + ∑S:eS yA,S ≥ 1 for each A  U, eA  for every element e, either ∑S:eS xS ≥ ½ OR in each scenario A : eA, ∑S:eS yA,S ≥ ½.
  • 17.
    Sets Elements Rounding (contd.) Set inS Element in E Consider any scenario A. Elements in A E are covered. For every e AE, it must be that ∑S:eS yA,S ≥ ½. So (2yA ) is a fractional set cover for AE  can round to get a set cover of W-cost ≤ (∑S 2WSyA,S) . A Using this to augment S in scenario A, expected cost ≤ ∑SS S + 2.∑ AU pA (∑S WSyA,S) ≤ 2.OPT.
  • 18.
    An -approx. algorithmfor deterministic problem gives a 2-approximation guarantee for stochastic problem. In the polynomial-scenario model, gives simple polytime approximation algorithms for covering problems. • 2log n-approximation for SSC. • 4-approximation for stochastic vertex cover. • 4-approximation for stochastic multicut on trees. Ravi & Sinha gave a log n-approximation algorithm for SSC, 2-approximation algorithm for stochastic vertex cover in the polynomial-scenario model. Rounding (contd.)
  • 19.
    Let E ={e : ∑S:eS xS ≥ ½}. So (2x) is a fractional set cover for the set E can round to get an integer set cover S of cost ∑SS S ≤ (∑S 2SxS) . S is the first stage decision. Rounding the LP Assume LP can be solved in polynomial time. Suppose we have an -approximation algorithm wrt. the LP relaxation for the deterministic problem. Let (x,y) : optimal solution with cost OPT. ∑S:eS xS + ∑S:eS yA,S ≥ 1 for each A  U, eA  for every element e, either ∑S:eS xS ≥ ½ OR in each scenario A : eA, ∑S:eS yA,S ≥ ½.
  • 20.
    A Compact Formulation pA: probability of scenario A  U. xS : indicates if set S is picked in stage I. Minimize h(x) = ∑S SxS + f(x) s.t. xS ≥ 0 for each S (SSC-P) where, f(x) = ∑AU pAfA(x) and fA(x) = min. ∑S WSyA,S s.t. ∑S:eS yA,S ≥ 1 – ∑S:eS xS for each eA yA,S ≥ 0 for each S. Equivalent to earlier LP. Each fA(x) is convex, so f(x) and h(x) are convex functions.
  • 21.
    The General Strategy 1.Get a (1+)-optimal solution (x) to the convex program using the ellipsoid method. 2. Convert fractional solution (x) to integer solution – decouple stage I and stage II scenarios – use -approx. algorithm for the deterministic problem to solve subproblems. Obtain a c.-approximation algorithm for the stochastic integer problem.
  • 22.
    If yi isinfeasible, use violated inequality to chop off infeasible half-ellipsoid. The Ellipsoid Method Ellipsoid  squashed sphere Start with ball containing polytope P. yi = center of current ellipsoid. Min c.x subject to xP. P
  • 23.
    The Ellipsoid Method Minc.x subject to xP. P If yi is infeasible, use violated inequality to chop off infeasible half-ellipsoid. New ellipsoid = min. volume ellipsoid containing “unchopped” half-ellipsoid. Ellipsoid  squashed sphere Start with ball containing polytope P. yi = center of current ellipsoid.
  • 24.
    The Ellipsoid Method Minc.x subject to xP. If yi is infeasible, use violated inequality to chop off infeasible half-ellipsoid. New ellipsoid = min. volume ellipsoid containing “unchopped” half-ellipsoid. If yi P, use objective function cut c.x ≤ c.yi to chop off polytope, half- ellipsoid. c.x ≤ c.yi Ellipsoid  squashed sphere Start with ball containing polytope P. yi = center of current ellipsoid. P
  • 25.
    The Ellipsoid Method Minc.x subject to xP. Ellipsoid  squashed sphere Start with ball containing polytope P. yi = center of current ellipsoid. If yi is infeasible, use violated inequality to chop off infeasible half-ellipsoid. New ellipsoid = min. volume ellipsoid containing “unchopped” half-ellipsoid. If yi P, use objective function cut c.x ≤ c.yi to chop off polytope, half- ellipsoid. P
  • 26.
    The Ellipsoid Method Minc.x subject to xP. P x1, x2, …, xk: points lying in P. c.xk is a close to optimal value. Ellipsoid  squashed sphere Start with ball containing polytope P. yi = center of current ellipsoid. If yi is infeasible, use violated inequality to chop off infeasible half-ellipsoid. New ellipsoid = min. volume ellipsoid containing “unchopped” half-ellipsoid. If yi P, use objective function cut c.x ≤ c.yi to chop off polytope, half- ellipsoid. x1 x2 xk x*
  • 27.
    Ellipsoid for ConvexOptimization Min h(x) subject to xP. P Start with ball containing polytope P. yi = center of current ellipsoid. If yi is infeasible, use violated inequality. If yi P – how to make progress? add inequality h(x) ≤ h(yi)? Separation becomes difficult. yi h(x) ≤ h(yi)
  • 28.
    Let d =subgradient at yi. use subgradient cut d.(x–yi) ≤ 0. Generate new min. volume ellipsoid. Ellipsoid for Convex Optimization Min h(x) subject to xP. P Start with ball containing polytope P. yi = center of current ellipsoid. If yi P – how to make progress? d  m is a subgradient of h(.) at u, if for every v, h(v)-h(u) ≥ d.(v-u). add inequality h(x) ≤ h(yi)? Separation becomes difficult. If yi is infeasible, use violated inequality. d yi h(x) ≤ h(yi)
  • 29.
    Ellipsoid for ConvexOptimization Min h(x) subject to xP. P Start with ball containing polytope P. yi = center of current ellipsoid. If yi P – how to make progress? d  m is a subgradient of h(.) at u, if for every v, h(v)-h(u) ≥ d.(v-u). Let d = subgradient at yi. use subgradient cut d.(x–yi) ≤ 0. Generate new min. volume ellipsoid. x1, x2, …, xk: points in P. Can show, mini=1…k h(xi) ≤ OPT+. x* x1 x2 add inequality h(x) ≤ h(yi)? Separation becomes difficult. If yi is infeasible, use violated inequality.
  • 30.
    Let d' =-subgradient at yi. use -subgradient cut d'.(x–yi) ≤ 0. Ellipsoid for Convex Optimization Min h(x) subject to xP. P x1, x2, …, xk: points in P. Can show, mini=1…k h(xi) ≤ OPT/(1-) + . Start with ball containing polytope P. yi = center of current ellipsoid. If yi P – how to make progress? add inequality h(x) ≤ h(yi)? Separation becomes difficult. subgradient is difficult to compute. If yi is infeasible, use violated inequality. d'  m is a -subgradient of h(.) at u, if vP, h(v)-h(u) ≥ d'.(v-u) – .h(u). d' yi h(x) ≤ h(yi)
  • 31.
    Subgradients and -subgradients Vectord is a subgradient of h(.) at u, if for every v, h(v) - h(u) ≥ d.(v-u). Vector d' is an -subgradient of h(.) at u, if for every vP, h(v) - h(u) ≥ d'.(v-u) – .h(u). P = { x : 0 ≤ xS ≤ 1 for each set S }. h(x) = ∑S SxS + ∑AU pA fA(x) = .x + ∑AU pA fA(x) Lemma: Let d be a subgradient at u, and d' be a vector such that dS – S ≤ d'S ≤ dS for each set S. Then, d' is an -subgradient at point u.
  • 32.
    Getting a “nice”subgradient h(x) = .x + ∑AU pA fA(x) fA(x) = min. ∑S WSyA,S s.t. ∑S:eS yA,S ≥ 1 – ∑S:eS xS eA yA,S ≥ 0 S
  • 33.
    Getting a “nice”subgradient h(x) = .x + ∑AU pA fA(x) fA(x) = min. ∑S WSyA,S = max. ∑eA (1 – ∑S:eS xS) zA,e s.t. ∑S:eS yA,S ≥ 1 – ∑S:eS xS s.t. ∑eA S zA,e ≤ WS eA S yA,S ≥ 0 S zA,e ≥ 0 eA
  • 34.
    Getting a “nice”subgradient h(x) = .x + ∑AU pA fA(x) fA(x) = min. ∑S WSyA,S = max. ∑e (1 – ∑S:eS xS) zA,e s.t. ∑S:eS yA,S ≥ 1 – ∑S:eS xS s.t. ∑eS zA,e ≤ WS eA S yA,S ≥ 0 S zA,e = 0 eA, zA,e ≥ 0 e Consider point u m . Let zA  optimal dual solution for A at u. So fA(u) = ∑e (1 – ∑S:eS uS) zA,e. For any other point v, zA is a feasible dual solution for A. So fA(v) ≥ ∑e (1 – ∑S:eS vS) zA,e. Get that h(v) – h(u) ≥ ∑S (S – ∑AU pA ∑eS zA,e)(vS – uS) = d.(v-u) where dS = S – ∑AU pA ∑eS zA,e. So d is a subgradient of h(.) at point u.
  • 35.
    Computing an -Subgradient Givenpoint u  m . zA  optimal dual solution for A at u. Vector d with dS = S – ∑AU pA ∑eS zA,e = ∑AU pA(S – ∑eS zA,e) is a subgradient at u. Goal: Get d' such that dS – S ≤ d'S ≤ dS for each S. For each S, -WS ≤ dS ≤ S. Let  = maxS WS /S. • Sample once from black box to get random scenario A. Compute XS = S – ∑eS zA,e for each S. E[XS] = dS and Var[XS] ≤ WS. 2 • Sample O(2 /2.log(m/)) times to compute d' such that Pr[S, dS – S ≤ d'S ≤ dS] ≥ 1-  d' is an -subgradient at u with probability ≥ 1-
  • 36.
    Putting it alltogether Min h(x) subject to xP.  Can compute -subgradients. Run ellipsoid algorithm. Given yi = center of current ellipsoid. Continue with smaller ellipsoid. If yi is infeasible, use violated inequality as a cut. If yi P use -subgradient cut. P x1 x2 xk x* Generate points x1, x2, …, xk in P. Return x = argmini=1…k h(xi). Get that h(x) ≤ OPT/(1-) + .
  • 37.
    Cannot evaluate h(.)– how to compute x = argmini=1…k h(xi)? One last hurdle xk Will find point x in the convex hull of x1,…, xk such that h(x) is close to mini=1…k h(xi). x1 x2 x1 x2 Take points x1 and x2. Do bisection search using -subgradients to find point x on x1–x2 line segment with value close to min(h(x1), h(x2)). d' : -subgradient d'.(x-y) ≥ 0 y
  • 38.
    One last hurdle Cannotevaluate h(.) – how to compute x = argmini=1…k h(xi)? Will find point x in the convex hull of x1,…, xk such that h(x) is close to mini=1…k h(xi). xk x2 x2 x1 x1 Take points x1 and x2. Do bisection search using -subgradients to find point x on x1–x2 line segment with value close to min(h(x1), h(x2)).
  • 39.
    One last hurdle Cannotevaluate h(.) – how to compute x = argmini=1…k h(xi)? Will find point x in the convex hull of x1,…, xk such that h(x) is close to mini=1…k h(xi). xk x2 x2 x1 x1 Take points x1 and x2. Do bisection search using -subgradients to find point x on x1–x2 line segment with value close to min(h(x1), h(x2)).
  • 40.
    Take points x1and x2. Do bisection search using -subgradients to find point x on x1–x2 line segment with value close to min(h(x1), h(x2)). One last hurdle Cannot evaluate h(.) – how to compute x = argmini=1…k h(xi)? Will find point x in the convex hull of x1,…, xk such that h(x) is close to mini=1…k h(xi). xk x2 x1 x2 Stop when search interval is small enough. Set x = either end point of remaining segment. x1 x x
  • 41.
    Take points x1and x2. Do bisection search using -subgradients to find point x on x1–x2 line segment with value close to min(h(x1), h(x2)). Iterate using x and x3,…, xk updating x along the way. One last hurdle Cannot evaluate h(.) – how to compute x = argmini=1…k h(xi)? Will find point x in the convex hull of x1,…, xk such that h(x) is close to mini=1…k h(xi). xk x2 x1 x
  • 42.
    x Take points x1and x2. Do bisection search using -subgradients to find point x on x1–x2 line segment with value close to min(h(x1), h(x2)). Iterate using x and x3,…, xk updating x along the way. One last hurdle Cannot evaluate h(.) – how to compute x = argmini=1…k h(xi)? Will find point x in the convex hull of x1,…, xk such that h(x) is close to mini=1…k h(xi). xk x2 x1
  • 43.
    One last hurdle Cannotevaluate h(.) – how to compute x = argmini=1…k h(xi)? Will find point x in the convex hull of x1,…, xk such that h(x) is close to mini=1…k h(xi). xk x2 x1 Take points x1 and x2. Do bisection search using -subgradients to find point x on x1–x2 line segment with value close to min(h(x1), h(x2)). Iterate using x and x3,…, xk updating x along the way.
  • 44.
    One last hurdle Cannotevaluate h(.) – how to compute x = argmini=1…k h(xi)? Will find point x in the convex hull of x1,…, xk such that h(x) is close to mini=1…k h(xi). xk x2 x x1 Can show that h(x) is close to mini=1…k h(xi). Take points x1 and x2. Do bisection search using -subgradients to find point x on x1–x2 line segment with value close to min(h(x1), h(x2)). Iterate using x and x3,…, xk updating x along the way.
  • 45.
    Finally, Get solution xwith h(x) close to OPT. Sample initially to detect if OPT = Ω(1/) – this allows one to get a (1+).OPT guarantee. Theorem: (SSC-P) can be solved to within a factor of (1 +) in polynomial time, with high probability. Gives a (2log n+)-approximation algorithm for the stochastic set cover problem.
  • 46.
    A Solvable Classof Stochastic LPs Minimize h(x) = w.x + ∑AU pAfA(x) s.t. x m , x ≥ 0, x P where fA(x) = min. wA.yA + cA.rA s.t. BA rA ≥ jA DA rA + TA yA ≥ lA – TA x yA  m , rA  n , yA ≥ 0, rA ≥ 0. Theorem: Can get a (1+)-optimal solution for this class of stochastic programs in polynomial time.
  • 47.
    Sample Average Approximation SampleAverage Approximation (SAA) method: – Sample initially N times from scenario distribution – Solve 2-stage problem estimating pA with frequency of occurrence of scenario A How large should N be? Kleywegt, Shapiro & Homem De-Mello (KSH01): – bound N by variance of a certain quantity – need not be polynomially bounded even for our class of programs. ShmoysS (recent work): – show using -subgradients that for our class, N can be poly- bounded. Nemirovskii & Shapiro: – show that for SSC with non-scenario dependent costs, KSH01 gives polynomial bound on N for (preprocessing + SAA) algorithm.
  • 48.
    Summary of Results •Give the first FPRAS to solve a class of stochastic linear programs. • Obtain the first approximation algorithms for 2-stage discrete stochastic problems – no assumptions on costs or probability distribution. – 2log n-approx. for set cover – 4-approx. for vertex cover (VC) and multicut on trees. GPRS04 gave an 8-approx. for VC under proportional costs in the black box model.
  • 49.
    Results (contd.) – 3.23-approx.for uncapacitated facility location (FL). Get constant guarantees for several variants such as FL with penalties, or soft capacities, or services. Improve upon results of GPRS04, RS04 obtained in restricted models. – (1+)-approx. for multicommodity flow. Immorlica et al. gave optimal algorithm for single- commodity in polynomial “scenario” (demand) case. • Give a general technique to lift deterministic guarantees to stochastic setting.
  • 50.
    Open Questions • PracticalImpact: Can one use -subgradients in other deterministic optimization methods, e.g., cutting plane methods? Interior-point algorithms? • Multi-stage stochastic optimization. • Characterize which deterministic problems are “easy to solve” in stochastic setting, and which problems are “hard”.
  • 51.

Editor's Notes