SlideShare a Scribd company logo
Admissibility of solution estimators to stochastic
optimization problems
Amitabh Basu
Joint Work with Tu Nguyen and Ao Sun
Foundations of Deep Learning, Opening Workshop,
SAMSI, Durham, August 2019
1 / 19
A general stochastic optimization problem
F : X × Rm → R
ξ is a random variable taking values in Rm
min
x∈X
Eξ[ F(x, ξ) ]
2 / 19
A general stochastic optimization problem
F : X × Rm → R
ξ is a random variable taking values in Rm
min
x∈X
Eξ[ F(x, ξ) ]
Example:
Supervised Machine Learning: One sees samples (z, y) ∈ Rn × R
of labeled data from some (joint) distribution, and one aims to find
a function f ∈ F in a hypothesis class F that minimizes the
expected loss E(z,y)[ (f (z), y)], where : R × R → R+ is some
loss function. Then X = F, m = n + 1, ξ = (z, y), and
F(f , (z, y)) = (f (z), y).
2 / 19
A general stochastic optimization problem
F : X × Rm → R
ξ is a random variable taking values in Rm
min
x∈X
Eξ[ F(x, ξ) ]
Example:
(News) Vendor Problem: (News) Vendor buys some units of a
product (newspapers) from supplier at cost of c > 0 dollars/unit;
at most u units available. Stochastic demand for product. Product
sold at price p > c dollars/unit. End of day, vendor can return
unsold product to supplier at r < c dollars/unit. Find number of
units to buy to maximize (minimize) the expected profit (loss).
2 / 19
A general stochastic optimization problem
F : X × Rm → R
ξ is a random variable taking values in Rm
min
x∈X
Eξ[ F(x, ξ) ]
Example:
(News) Vendor Problem: (News) Vendor buys some units of a
product (newspapers) from supplier at cost of c > 0 dollars/unit;
at most u units available. Stochastic demand for product. Product
sold at price p > c dollars/unit. End of day, vendor can return
unsold product to supplier at r < c dollars/unit. Find number of
units to buy to maximize (minimize) the expected profit (loss).
m = 1, X = [0, u],
F(x, ξ) = cx − p min{x, ξ} − r max{x − ξ, 0}.
2 / 19
Solving the problem
F : X × Rm → R
ξ is a random variable taking values in Rm
min
x∈X
Eξ[ F(x, ξ) ]
3 / 19
Solving the problem
F : X × Rm → R
ξ is a random variable taking values in Rm
min
x∈X
Eξ[ F(x, ξ) ]
Solve the problem only given access to n i.i.d. samples of ξ.
3 / 19
Solving the problem
F : X × Rm → R
ξ is a random variable taking values in Rm
min
x∈X
Eξ[ F(x, ξ) ]
Solve the problem only given access to n i.i.d. samples of ξ.
Natural idea: Given samples ξ1, . . . , ξn ∈ Rd , solve the
deterministic problem
min
x∈X
1
n
n
i=1
F(x, ξi
)
3 / 19
Solving the problem
F : X × Rm → R
ξ is a random variable taking values in Rm
min
x∈X
Eξ[ F(x, ξ) ]
Solve the problem only given access to n i.i.d. samples of ξ.
Natural idea: Given samples ξ1, . . . , ξn ∈ Rd , solve the
deterministic problem
min
x∈X
1
n
n
i=1
F(x, ξi
)
Stochastic optimizers call this sample average approximation
(SAA); machine learners call this empirical risk minimization.
3 / 19
Concrete Problem
F(x, ξ) = ξT x
X ⊆ Rd is a compact set (e.g., polytope, integer points in a
polytope). So m = d.
ξ ∼ N(µ, Σ).
min
x∈X
Eξ[ F(x, ξ) ] = min
x∈X
Eξ[ ξT
x ] = min
x∈X
µT
x
4 / 19
Concrete Problem
F(x, ξ) = ξT x
X ⊆ Rd is a compact set (e.g., polytope, integer points in a
polytope). So m = d.
ξ ∼ N(µ, Σ).
min
x∈X
Eξ[ F(x, ξ) ] = min
x∈X
Eξ[ ξT
x ] = min
x∈X
µT
x
Solve the problem only given access to n i.i.d. samples of ξ.
Important: µ is unknown.
4 / 19
Concrete Problem
F(x, ξ) = ξT x
X ⊆ Rd is a compact set (e.g., polytope, integer points in a
polytope). So m = d.
ξ ∼ N(µ, Σ).
min
x∈X
Eξ[ F(x, ξ) ] = min
x∈X
Eξ[ ξT
x ] = min
x∈X
µT
x
Solve the problem only given access to n i.i.d. samples of ξ.
Important: µ is unknown.
Sample Average Approximation (SAA):
min
x∈X
1
n
n
i=1
F(x, ξi
) = min
x∈X
ξ
T
x
where ξ := 1
n
n
i=1 ξi .
4 / 19
A quick tour of Statistical Decision Theory
Set of states of nature, modeled by a set Θ.
Set of possible actions to take, modeled by A.
In a particular state of nature θ ∈ Θ, the performance of any
action a ∈ A, is evaluated by a loss function L(θ, a). Goal: choose
action to minimize loss.
(Partial/Incomplete) Information about θ is obtained through a
random variable y taking values in a sample space χ. The
distribution of y depends on the particular state of nature θ,
denoted by Pθ.
5 / 19
A quick tour of Statistical Decision Theory
Set of states of nature, modeled by a set Θ.
Set of possible actions to take, modeled by A.
In a particular state of nature θ ∈ Θ, the performance of any
action a ∈ A, is evaluated by a loss function L(θ, a). Goal: choose
action to minimize loss.
(Partial/Incomplete) Information about θ is obtained through a
random variable y taking values in a sample space χ. The
distribution of y depends on the particular state of nature θ,
denoted by Pθ.
Decision Rule: Takes y ∈ χ as input and reports an action a ∈ A.
Denote by δ : χ → A.
5 / 19
Our problem cast as statistical decision problem
X ⊆ Rd is a compact set. ξ ∼ N(µ, I).
min
x∈X
Eξ[ F(x, ξ) ] = min
x∈X
Eξ[ ξT
x ] = min
x∈X
µT
x
States of Nature: Θ = Rd = {all possible µ ∈ Rd }.
Set of Actions: X ⊆ Rd .
6 / 19
Our problem cast as statistical decision problem
X ⊆ Rd is a compact set. ξ ∼ N(µ, I).
min
x∈X
Eξ[ F(x, ξ) ] = min
x∈X
Eξ[ ξT
x ] = min
x∈X
µT
x
States of Nature: Θ = Rd = {all possible µ ∈ Rd }.
Set of Actions: X ⊆ Rd .
Loss function: ?
6 / 19
Our problem cast as statistical decision problem
X ⊆ Rd is a compact set. ξ ∼ N(µ, I).
min
x∈X
Eξ[ F(x, ξ) ] = min
x∈X
Eξ[ ξT
x ] = min
x∈X
µT
x
States of Nature: Θ = Rd = {all possible µ ∈ Rd }.
Set of Actions: X ⊆ Rd .
Loss function:
L(¯µ, ¯x) = Eξ∼N(¯µ,I)[ F(¯x, ξ) ] − min
x∈X
Eξ∼N(¯µ,I)[ F(x, ξ) ]
6 / 19
Our problem cast as statistical decision problem
X ⊆ Rd is a compact set. ξ ∼ N(µ, I).
min
x∈X
Eξ[ F(x, ξ) ] = min
x∈X
Eξ[ ξT
x ] = min
x∈X
µT
x
States of Nature: Θ = Rd = {all possible µ ∈ Rd }.
Set of Actions: X ⊆ Rd .
Loss function:
L(¯µ, ¯x) = Eξ∼N(¯µ,I)[ F(¯x, ξ) ] − minx∈X Eξ∼N(¯µ,I)[ F(x, ξ) ]
= ¯µT ¯x − ¯µT x(¯µ)
6 / 19
Our problem cast as statistical decision problem
X ⊆ Rd is a compact set. ξ ∼ N(µ, I).
min
x∈X
Eξ[ F(x, ξ) ] = min
x∈X
Eξ[ ξT
x ] = min
x∈X
µT
x
States of Nature: Θ = Rd = {all possible µ ∈ Rd }.
Set of Actions: X ⊆ Rd .
Loss function:
L(¯µ, ¯x) = Eξ∼N(¯µ,I)[ F(¯x, ξ) ] − minx∈X Eξ∼N(¯µ,I)[ F(x, ξ) ]
= ¯µT ¯x − ¯µT x(¯µ)
Sample Space: χ = Rd
× Rd
× . . . × Rd
n times
6 / 19
Our problem cast as statistical decision problem
X ⊆ Rd is a compact set. ξ ∼ N(µ, I).
min
x∈X
Eξ[ F(x, ξ) ] = min
x∈X
Eξ[ ξT
x ] = min
x∈X
µT
x
States of Nature: Θ = Rd = {all possible µ ∈ Rd }.
Set of Actions: X ⊆ Rd .
Loss function:
L(¯µ, ¯x) = Eξ∼N(¯µ,I)[ F(¯x, ξ) ] − minx∈X Eξ∼N(¯µ,I)[ F(x, ξ) ]
= ¯µT ¯x − ¯µT x(¯µ)
Sample Space: χ = Rd
× Rd
× . . . × Rd
n times
Decision Rule: δ : χ → X.
6 / 19
Our problem cast as statistical decision problem
X ⊆ Rd is a compact set. ξ ∼ N(µ, I).
min
x∈X
Eξ[ F(x, ξ) ] = min
x∈X
Eξ[ ξT
x ] = min
x∈X
µT
x
States of Nature: Θ = Rd = {all possible µ ∈ Rd }.
Set of Actions: X ⊆ Rd .
Loss function:
L(¯µ, ¯x) = Eξ∼N(¯µ,I)[ F(¯x, ξ) ] − minx∈X Eξ∼N(¯µ,I)[ F(x, ξ) ]
= ¯µT ¯x − ¯µT x(¯µ)
Sample Space: χ = Rd
× Rd
× . . . × Rd
n times
Decision Rule: δ : χ → X.
SAA: δ(ξ1
, . . . , ξn
) ∈ arg max{ξ
T
x : x ∈ X}
6 / 19
Decision Rules
1 2
34
1 2
34
7 / 19
Decision Rules
1 2
34
7 / 19
Decision Rules
1 2
34
7 / 19
How does one decide between decision rules?
8 / 19
How does one decide between decision rules?
States of nature Θ, Actions A, Loss function L : Θ × A → R,
Sample space χ with distributions {Pθ : θ ∈ Θ}.
8 / 19
How does one decide between decision rules?
States of nature Θ, Actions A, Loss function L : Θ × A → R,
Sample space χ with distributions {Pθ : θ ∈ Θ}.
Given a decision rule δ : χ → A, define the risk function of this
decision rule as:
Rδ(θ) := Ey∼Pθ
[ L(θ, δ(y)) ]
8 / 19
How does one decide between decision rules?
States of nature Θ, Actions A, Loss function L : Θ × A → R,
Sample space χ with distributions {Pθ : θ ∈ Θ}.
Given a decision rule δ : χ → A, define the risk function of this
decision rule as:
Rδ(θ) := Ey∼Pθ
[ L(θ, δ(y)) ]
We say that a decision rule δ dominates a decision rule δ if
Rδ (θ) ≤ Rδ(θ) for all θ ∈ Θ, and Rδ (θ∗) < Rδ(θ∗) for some
θ∗ ∈ Θ.
8 / 19
How does one decide between decision rules?
States of nature Θ, Actions A, Loss function L : Θ × A → R,
Sample space χ with distributions {Pθ : θ ∈ Θ}.
Given a decision rule δ : χ → A, define the risk function of this
decision rule as:
Rδ(θ) := Ey∼Pθ
[ L(θ, δ(y)) ]
We say that a decision rule δ dominates a decision rule δ if
Rδ (θ) ≤ Rδ(θ) for all θ ∈ Θ, and Rδ (θ∗) < Rδ(θ∗) for some
θ∗ ∈ Θ.
If a decision rule δ is not dominated by any other decision rule, we
say that δ is admissible. Otherwise, it is inadmissible.
8 / 19
Is the Sample Average Approximation (SAA) rule admissible?
9 / 19
Admissibility in stochastic optimization
Stochastic optimization setup:
F : X × Rm → R, ξ is a R.V. in Rm
min
x∈X
Eξ[ F(x, ξ) ]
Want to solve with access to n i.i.d. samples of ξ.
10 / 19
Admissibility in stochastic optimization
Stochastic optimization setup:
F : X × Rm → R, ξ is a R.V. in Rm
min
x∈X
Eξ[ F(x, ξ) ]
Want to solve with access to n i.i.d. samples of ξ.
Statistical decision theory view:
ξ ∼ N(µ, I); states of nature Θ = Rm = {all possible µ ∈ Rm}.
Set of actions A = X, Sample space χ = Rd
× Rd
× . . . × Rd
n times
Loss function
L(¯µ, ¯x) = Eξ∼N(¯µ,I)[ F(¯x, ξ) ] − min
x∈X
Eξ∼N(¯µ,I)[ F(x, ξ) ]
Given a decision rule δ : χ → X, the risk of δ
Rδ(µ) := Eξ1,...,ξn [ L(µ, δ(ξ1
, . . . , ξn
)) ]
10 / 19
Admissibility in stochastic optimization
L(¯µ, ¯x) = Eξ∼N(¯µ,I)[ F(¯x, ξ) ] − min
x∈X
Eξ∼N(¯µ,I)[ F(x, ξ) ]
Given a decision rule δ : χ → X, the risk of δ
Rδ(µ) := Eξ1,...,ξn [ L(µ, δ(ξ1
, . . . , ξn
)) ]
Sample Average Approximation (SAA):
min
x∈X
1
n
n
i=1
F(x, ξi
)
11 / 19
Admissibility in stochastic optimization
L(¯µ, ¯x) = Eξ∼N(¯µ,I)[ F(¯x, ξ) ] − min
x∈X
Eξ∼N(¯µ,I)[ F(x, ξ) ]
Given a decision rule δ : χ → X, the risk of δ
Rδ(µ) := Eξ1,...,ξn [ L(µ, δ(ξ1
, . . . , ξn
)) ]
Sample Average Approximation (SAA):
min
x∈X
1
n
n
i=1
F(x, ξi
)
Sample Average Approximation (SAA) can be inadmissible!!
11 / 19
Inadmissibility of SAA: Stein’s Paradox
L(¯µ, ¯x) = Eξ∼N(¯µ,I)[ F(¯x, ξ) ] − min
x∈X
Eξ∼N(¯µ,I)[ F(x, ξ) ]
Sample Average Approximation (SAA) can be inadmissible!!
F(x, ξ) = x − ξ 2, X = Rd , ξ ∼ N(µ, I).
min
x∈Rd
Eξ[ F(x, ξ) ] = min
x∈Rd
Eξ[ x − ξ 2
]
12 / 19
Inadmissibility of SAA: Stein’s Paradox
L(¯µ, ¯x) = Eξ∼N(¯µ,I)[ F(¯x, ξ) ] − min
x∈X
Eξ∼N(¯µ,I)[ F(x, ξ) ]
Sample Average Approximation (SAA) can be inadmissible!!
min
x∈Rd
Eξ∼N(µ,I)[ x − ξ 2
] = min
x∈Rd
x − µ 2
+ V[ ξ ]
Optimal solution: x(¯µ) = ¯µ, Optimal value: V[ ξ ] = d.
L(¯µ, ¯x) = ¯x − ¯µ 2
.
13 / 19
Inadmissibility of SAA: Stein’s Paradox
L(¯µ, ¯x) = Eξ∼N(¯µ,I)[ F(¯x, ξ) ] − min
x∈X
Eξ∼N(¯µ,I)[ F(x, ξ) ]
Sample Average Approximation (SAA) can be inadmissible!!
min
x∈Rd
Eξ∼N(µ,I)[ x − ξ 2
] = min
x∈Rd
x − µ 2
+ V[ ξ ]
Optimal solution: x(¯µ) = ¯µ, Optimal value: V[ ξ ] = d.
L(¯µ, ¯x) = ¯x − ¯µ 2
.
Sample Average Approximation (SAA):
min
x∈Rd
1
n
n
i=1
x − ξi 2
13 / 19
Inadmissibility of SAA: Stein’s Paradox
L(¯µ, ¯x) = Eξ∼N(¯µ,I)[ F(¯x, ξ) ] − min
x∈X
Eξ∼N(¯µ,I)[ F(x, ξ) ]
Sample Average Approximation (SAA) can be inadmissible!!
min
x∈Rd
Eξ∼N(µ,I)[ x − ξ 2
] = min
x∈Rd
x − µ 2
+ V[ ξ ]
Optimal solution: x(¯µ) = ¯µ, Optimal value: V[ ξ ] = d.
L(¯µ, ¯x) = ¯x − ¯µ 2
.
Sample Average Approximation (SAA):
min
x∈Rd
1
n
n
i=1
x − ξi 2
δSAA(ξ1
, . . . , ξn
) = ξ :=
n
i=1
ξi
.
13 / 19
Inadmissibility of SAA: Stein’s Paradox
L(¯µ, ¯x) = Eξ∼N(¯µ,I)[ F(¯x, ξ) ] − min
x∈X
Eξ∼N(¯µ,I)[ F(x, ξ) ]
Sample Average Approximation (SAA) can be inadmissible!!
min
x∈Rd
Eξ∼N(µ,I)[ x − ξ 2
]
Generalized to arbitrary convex quadratic function with uncertain
linear term in Davarnia and Cornu´ejols 2018. Follow-up work from
a Bayesian perspective in Davarnia, Kocuk and Cornu´ejols 2018.
14 / 19
A class of problems with no Stein’s paradox
THEOREM Basu-Nguyen-Sun 2018
Consider the problem of optimizing an uncertain linear objective
ξ ∼ N(µ, I) over a fixed compact set X ⊆ Rd :
min
x∈X
Eξ∼N(µ,I)[ ξT
x ]
The Sample Average Approximation (SAA) rule is admissible.
15 / 19
16 / 19
Main technical ideas/tools
17 / 19
Main technical ideas/tools
Sufficient Statistic: P = {Pθ : θ ∈ Θ} family of distributions for
r.v. y in sample space χ. Sufficient statistic for this family is a
function T : χ → τ such that the conditional probability
P(y|T = t) does not depend on θ.
17 / 19
Main technical ideas/tools
Sufficient Statistic: P = {Pθ : θ ∈ Θ} family of distributions for
r.v. y in sample space χ. Sufficient statistic for this family is a
function T : χ → τ such that the conditional probability
P(y|T = t) does not depend on θ.
FACT:
χ = Rd
× . . . × Rd
n times
, P = {N(µ, I) × . . . × N(µ, I)
n times
: µ ∈ Rd
},
i.e., (ξ1, . . . , ξn) ∈ χ are i.i.d samples from the normal distribution
N(µ, I). Then T(ξ1 . . . , ξn) = ξ := 1
n
n
i=1 ξi is a sufficient
statistic for P.
17 / 19
Main technical ideas/tools
Sufficient Statistic: P = {Pθ : θ ∈ Θ} family of distributions for
r.v. y in sample space χ. Sufficient statistic for this family is a
function T : χ → τ such that the conditional probability
P(y|T = t) does not depend on θ.
THEOREM Rao-Blackwell 1940s
If the loss function is convex in the action space, then for any
decision rule δ, there exists a rule δ which is a function only of a
sufficient statistic and Rδ ≤ Rδ.
17 / 19
Main technical ideas/tools
For any decision rule δ, define the function
F(µ) = Rδ(µ) − RδSAA
(µ).
Suffices to show that there exists ˆµ ∈ Rd such that F(ˆµ) > 0.
17 / 19
Main technical ideas/tools
For any decision rule δ, define the function
F(µ) = Rδ(µ) − RδSAA
(µ).
Suffices to show that there exists ˆµ ∈ Rd such that F(ˆµ) > 0.
First observe: F(0) = 0.
17 / 19
Main technical ideas/tools
For any decision rule δ, define the function
F(µ) = Rδ(µ) − RδSAA
(µ).
Suffices to show that there exists ˆµ ∈ Rd such that F(ˆµ) > 0.
First observe: F(0) = 0.
Then compute 2F(0); show it has a strictly positive eigenvalue.
17 / 19
Main technical ideas/tools
For any decision rule δ, define the function
F(µ) = Rδ(µ) − RδSAA
(µ).
Suffices to show that there exists ˆµ ∈ Rd such that F(ˆµ) > 0.
First observe: F(0) = 0.
Then compute 2F(0); show it has a strictly positive eigenvalue.
Use a fact from probability theory that for any Lebesgue integrable
function f : Rn → Rd , the map
µ → Ey∈N(µ,Σ) [ f (y) ] :=
Rd
f (y) exp −
1
2
(y−µ)T
Σ−1
(y−µ) dy
has derivatives of all orders and these can be computed by taking
the derivative under the integral sign.
17 / 19
Open Questions
What about nonlinear objectives over compact feasible
regions? For example, what if F(x, ξ) = xT Qx + ξT x for
some fixed PSD matrix Q, and X is a compact (convex) set?
18 / 19
Open Questions
What about nonlinear objectives over compact feasible
regions? For example, what if F(x, ξ) = xT Qx + ξT x for
some fixed PSD matrix Q, and X is a compact (convex) set?
What about piecewise linear objectives F(x, ξ)? Recall News
Vendor Problem.
18 / 19
Open Questions
What about nonlinear objectives over compact feasible
regions? For example, what if F(x, ξ) = xT Qx + ξT x for
some fixed PSD matrix Q, and X is a compact (convex) set?
What about piecewise linear objectives F(x, ξ)? Recall News
Vendor Problem.
Objectives coming from machine learning problems, such as
neural network training with squared or logistic loss
(admissibility of “empirical risk minimization”). Maybe this
depends on the hypothesis class that is being learnt?
18 / 19
Open Questions
What about nonlinear objectives over compact feasible
regions? For example, what if F(x, ξ) = xT Qx + ξT x for
some fixed PSD matrix Q, and X is a compact (convex) set?
What about piecewise linear objectives F(x, ξ)? Recall News
Vendor Problem.
Objectives coming from machine learning problems, such as
neural network training with squared or logistic loss
(admissibility of “empirical risk minimization”). Maybe this
depends on the hypothesis class that is being learnt?
METATHEOREM (from G´erard Cornu´ejols): Admissible if and
only if feasible region is bounded !?
18 / 19
THANK YOU !
Questions/Comments ?
19 / 19

More Related Content

What's hot

Lesson 23: Antiderivatives (slides)
Lesson 23: Antiderivatives (slides)Lesson 23: Antiderivatives (slides)
Lesson 23: Antiderivatives (slides)
Matthew Leingang
 
Lesson 17: The Method of Lagrange Multipliers
Lesson 17: The Method of Lagrange MultipliersLesson 17: The Method of Lagrange Multipliers
Lesson 17: The Method of Lagrange Multipliers
Matthew Leingang
 
Lesson 20: Derivatives and the Shapes of Curves (slides)
Lesson 20: Derivatives and the Shapes of Curves (slides)Lesson 20: Derivatives and the Shapes of Curves (slides)
Lesson 20: Derivatives and the Shapes of Curves (slides)
Matthew Leingang
 
Lesson 25: Evaluating Definite Integrals (slides)
Lesson 25: Evaluating Definite Integrals (slides)Lesson 25: Evaluating Definite Integrals (slides)
Lesson 25: Evaluating Definite Integrals (slides)
Matthew Leingang
 
Lesson 27: Evaluating Definite Integrals
Lesson 27: Evaluating Definite IntegralsLesson 27: Evaluating Definite Integrals
Lesson 27: Evaluating Definite Integrals
Matthew Leingang
 
Lesson 18: Maximum and Minimum Values (slides)
Lesson 18: Maximum and Minimum Values (slides)Lesson 18: Maximum and Minimum Values (slides)
Lesson 18: Maximum and Minimum Values (slides)
Matthew Leingang
 
Lesson 19: Maximum and Minimum Values
Lesson 19: Maximum and Minimum ValuesLesson 19: Maximum and Minimum Values
Lesson 19: Maximum and Minimum Values
Matthew Leingang
 
Convex Analysis and Duality (based on "Functional Analysis and Optimization" ...
Convex Analysis and Duality (based on "Functional Analysis and Optimization" ...Convex Analysis and Duality (based on "Functional Analysis and Optimization" ...
Convex Analysis and Duality (based on "Functional Analysis and Optimization" ...
Katsuya Ito
 
Lesson 21: Curve Sketching (slides)
Lesson 21: Curve Sketching (slides)Lesson 21: Curve Sketching (slides)
Lesson 21: Curve Sketching (slides)
Matthew Leingang
 
Function in Mathematics
Function in MathematicsFunction in Mathematics
Function in Mathematics
ghhgj jhgh
 
Approximate Bayesian model choice via random forests
Approximate Bayesian model choice via random forestsApproximate Bayesian model choice via random forests
Approximate Bayesian model choice via random forests
Christian Robert
 
Slides mc gill-v3
Slides mc gill-v3Slides mc gill-v3
Slides mc gill-v3
Arthur Charpentier
 
Application of partial derivatives with two variables
Application of partial derivatives with two variablesApplication of partial derivatives with two variables
Application of partial derivatives with two variables
Sagar Patel
 
Lesson 17: Indeterminate forms and l'Hôpital's Rule (slides)
Lesson 17: Indeterminate forms and l'Hôpital's Rule (slides)Lesson 17: Indeterminate forms and l'Hôpital's Rule (slides)
Lesson 17: Indeterminate forms and l'Hôpital's Rule (slides)
Matthew Leingang
 
Bias-variance decomposition in Random Forests
Bias-variance decomposition in Random ForestsBias-variance decomposition in Random Forests
Bias-variance decomposition in Random Forests
Gilles Louppe
 
Side 2019 #7
Side 2019 #7Side 2019 #7
Side 2019 #7
Arthur Charpentier
 
Slides simplexe
Slides simplexeSlides simplexe
Slides simplexe
Arthur Charpentier
 
Application of derivatives
Application of derivatives Application of derivatives
Application of derivatives
Seyid Kadher
 
Mac2311 study guide-tcm6-49721
Mac2311 study guide-tcm6-49721Mac2311 study guide-tcm6-49721
Mac2311 study guide-tcm6-49721
Glicerio Gavilan
 

What's hot (19)

Lesson 23: Antiderivatives (slides)
Lesson 23: Antiderivatives (slides)Lesson 23: Antiderivatives (slides)
Lesson 23: Antiderivatives (slides)
 
Lesson 17: The Method of Lagrange Multipliers
Lesson 17: The Method of Lagrange MultipliersLesson 17: The Method of Lagrange Multipliers
Lesson 17: The Method of Lagrange Multipliers
 
Lesson 20: Derivatives and the Shapes of Curves (slides)
Lesson 20: Derivatives and the Shapes of Curves (slides)Lesson 20: Derivatives and the Shapes of Curves (slides)
Lesson 20: Derivatives and the Shapes of Curves (slides)
 
Lesson 25: Evaluating Definite Integrals (slides)
Lesson 25: Evaluating Definite Integrals (slides)Lesson 25: Evaluating Definite Integrals (slides)
Lesson 25: Evaluating Definite Integrals (slides)
 
Lesson 27: Evaluating Definite Integrals
Lesson 27: Evaluating Definite IntegralsLesson 27: Evaluating Definite Integrals
Lesson 27: Evaluating Definite Integrals
 
Lesson 18: Maximum and Minimum Values (slides)
Lesson 18: Maximum and Minimum Values (slides)Lesson 18: Maximum and Minimum Values (slides)
Lesson 18: Maximum and Minimum Values (slides)
 
Lesson 19: Maximum and Minimum Values
Lesson 19: Maximum and Minimum ValuesLesson 19: Maximum and Minimum Values
Lesson 19: Maximum and Minimum Values
 
Convex Analysis and Duality (based on "Functional Analysis and Optimization" ...
Convex Analysis and Duality (based on "Functional Analysis and Optimization" ...Convex Analysis and Duality (based on "Functional Analysis and Optimization" ...
Convex Analysis and Duality (based on "Functional Analysis and Optimization" ...
 
Lesson 21: Curve Sketching (slides)
Lesson 21: Curve Sketching (slides)Lesson 21: Curve Sketching (slides)
Lesson 21: Curve Sketching (slides)
 
Function in Mathematics
Function in MathematicsFunction in Mathematics
Function in Mathematics
 
Approximate Bayesian model choice via random forests
Approximate Bayesian model choice via random forestsApproximate Bayesian model choice via random forests
Approximate Bayesian model choice via random forests
 
Slides mc gill-v3
Slides mc gill-v3Slides mc gill-v3
Slides mc gill-v3
 
Application of partial derivatives with two variables
Application of partial derivatives with two variablesApplication of partial derivatives with two variables
Application of partial derivatives with two variables
 
Lesson 17: Indeterminate forms and l'Hôpital's Rule (slides)
Lesson 17: Indeterminate forms and l'Hôpital's Rule (slides)Lesson 17: Indeterminate forms and l'Hôpital's Rule (slides)
Lesson 17: Indeterminate forms and l'Hôpital's Rule (slides)
 
Bias-variance decomposition in Random Forests
Bias-variance decomposition in Random ForestsBias-variance decomposition in Random Forests
Bias-variance decomposition in Random Forests
 
Side 2019 #7
Side 2019 #7Side 2019 #7
Side 2019 #7
 
Slides simplexe
Slides simplexeSlides simplexe
Slides simplexe
 
Application of derivatives
Application of derivatives Application of derivatives
Application of derivatives
 
Mac2311 study guide-tcm6-49721
Mac2311 study guide-tcm6-49721Mac2311 study guide-tcm6-49721
Mac2311 study guide-tcm6-49721
 

Similar to Deep Learning Opening Workshop - Admissibility of Solution Estimators in Stochastic Optimization - Amitabh Basu, August 12, 2019

New Insights and Perspectives on the Natural Gradient Method
New Insights and Perspectives on the Natural Gradient MethodNew Insights and Perspectives on the Natural Gradient Method
New Insights and Perspectives on the Natural Gradient Method
Yoonho Lee
 
Maximum likelihood estimation of regularisation parameters in inverse problem...
Maximum likelihood estimation of regularisation parameters in inverse problem...Maximum likelihood estimation of regularisation parameters in inverse problem...
Maximum likelihood estimation of regularisation parameters in inverse problem...
Valentin De Bortoli
 
Derivative free optimization
Derivative free optimizationDerivative free optimization
Derivative free optimization
helalmohammad2
 
A Statistical Perspective on Retrieval-Based Models.pdf
A Statistical Perspective on Retrieval-Based Models.pdfA Statistical Perspective on Retrieval-Based Models.pdf
A Statistical Perspective on Retrieval-Based Models.pdf
Po-Chuan Chen
 
Classification and regression based on derivatives: a consistency result for ...
Classification and regression based on derivatives: a consistency result for ...Classification and regression based on derivatives: a consistency result for ...
Classification and regression based on derivatives: a consistency result for ...
tuxette
 
Kernels and Support Vector Machines
Kernels and Support Vector  MachinesKernels and Support Vector  Machines
Kernels and Support Vector Machines
Edgar Marca
 
Understanding Random Forests: From Theory to Practice
Understanding Random Forests: From Theory to PracticeUnderstanding Random Forests: From Theory to Practice
Understanding Random Forests: From Theory to Practice
Gilles Louppe
 
MCQMC 2016 Tutorial
MCQMC 2016 TutorialMCQMC 2016 Tutorial
MCQMC 2016 Tutorial
Fred J. Hickernell
 
The low-rank basis problem for a matrix subspace
The low-rank basis problem for a matrix subspaceThe low-rank basis problem for a matrix subspace
The low-rank basis problem for a matrix subspace
Tasuku Soma
 
Rademacher Averages: Theory and Practice
Rademacher Averages: Theory and PracticeRademacher Averages: Theory and Practice
Rademacher Averages: Theory and Practice
Two Sigma
 
Bachelor_Defense
Bachelor_DefenseBachelor_Defense
Bachelor_Defense
Teja Turk
 
Applicationofpartialderivativeswithtwovariables 140225070102-phpapp01 (1)
Applicationofpartialderivativeswithtwovariables 140225070102-phpapp01 (1)Applicationofpartialderivativeswithtwovariables 140225070102-phpapp01 (1)
Applicationofpartialderivativeswithtwovariables 140225070102-phpapp01 (1)
shreemadghodasra
 
Maximizing Submodular Function over the Integer Lattice
Maximizing Submodular Function over the Integer LatticeMaximizing Submodular Function over the Integer Lattice
Maximizing Submodular Function over the Integer Lattice
Tasuku Soma
 
smtlecture.7
smtlecture.7smtlecture.7
smtlecture.7
Roberto Bruttomesso
 
ma112011id535
ma112011id535ma112011id535
ma112011id535
matsushimalab
 
Murphy: Machine learning A probabilistic perspective: Ch.9
Murphy: Machine learning A probabilistic perspective: Ch.9Murphy: Machine learning A probabilistic perspective: Ch.9
Murphy: Machine learning A probabilistic perspective: Ch.9
Daisuke Yoneoka
 
Higher-Order (F, α, β, ρ, d) –Convexity for Multiobjective Programming Problem
Higher-Order (F, α, β, ρ, d) –Convexity for Multiobjective Programming ProblemHigher-Order (F, α, β, ρ, d) –Convexity for Multiobjective Programming Problem
Higher-Order (F, α, β, ρ, d) –Convexity for Multiobjective Programming Problem
inventionjournals
 
Interpretable Sparse Sliced Inverse Regression for digitized functional data
Interpretable Sparse Sliced Inverse Regression for digitized functional dataInterpretable Sparse Sliced Inverse Regression for digitized functional data
Interpretable Sparse Sliced Inverse Regression for digitized functional data
tuxette
 
Indefinite Integral
Indefinite IntegralIndefinite Integral
Indefinite Integral
JelaiAujero
 
On the Family of Concept Forming Operators in Polyadic FCA
On the Family of Concept Forming Operators in Polyadic FCAOn the Family of Concept Forming Operators in Polyadic FCA
On the Family of Concept Forming Operators in Polyadic FCA
Dmitrii Ignatov
 

Similar to Deep Learning Opening Workshop - Admissibility of Solution Estimators in Stochastic Optimization - Amitabh Basu, August 12, 2019 (20)

New Insights and Perspectives on the Natural Gradient Method
New Insights and Perspectives on the Natural Gradient MethodNew Insights and Perspectives on the Natural Gradient Method
New Insights and Perspectives on the Natural Gradient Method
 
Maximum likelihood estimation of regularisation parameters in inverse problem...
Maximum likelihood estimation of regularisation parameters in inverse problem...Maximum likelihood estimation of regularisation parameters in inverse problem...
Maximum likelihood estimation of regularisation parameters in inverse problem...
 
Derivative free optimization
Derivative free optimizationDerivative free optimization
Derivative free optimization
 
A Statistical Perspective on Retrieval-Based Models.pdf
A Statistical Perspective on Retrieval-Based Models.pdfA Statistical Perspective on Retrieval-Based Models.pdf
A Statistical Perspective on Retrieval-Based Models.pdf
 
Classification and regression based on derivatives: a consistency result for ...
Classification and regression based on derivatives: a consistency result for ...Classification and regression based on derivatives: a consistency result for ...
Classification and regression based on derivatives: a consistency result for ...
 
Kernels and Support Vector Machines
Kernels and Support Vector  MachinesKernels and Support Vector  Machines
Kernels and Support Vector Machines
 
Understanding Random Forests: From Theory to Practice
Understanding Random Forests: From Theory to PracticeUnderstanding Random Forests: From Theory to Practice
Understanding Random Forests: From Theory to Practice
 
MCQMC 2016 Tutorial
MCQMC 2016 TutorialMCQMC 2016 Tutorial
MCQMC 2016 Tutorial
 
The low-rank basis problem for a matrix subspace
The low-rank basis problem for a matrix subspaceThe low-rank basis problem for a matrix subspace
The low-rank basis problem for a matrix subspace
 
Rademacher Averages: Theory and Practice
Rademacher Averages: Theory and PracticeRademacher Averages: Theory and Practice
Rademacher Averages: Theory and Practice
 
Bachelor_Defense
Bachelor_DefenseBachelor_Defense
Bachelor_Defense
 
Applicationofpartialderivativeswithtwovariables 140225070102-phpapp01 (1)
Applicationofpartialderivativeswithtwovariables 140225070102-phpapp01 (1)Applicationofpartialderivativeswithtwovariables 140225070102-phpapp01 (1)
Applicationofpartialderivativeswithtwovariables 140225070102-phpapp01 (1)
 
Maximizing Submodular Function over the Integer Lattice
Maximizing Submodular Function over the Integer LatticeMaximizing Submodular Function over the Integer Lattice
Maximizing Submodular Function over the Integer Lattice
 
smtlecture.7
smtlecture.7smtlecture.7
smtlecture.7
 
ma112011id535
ma112011id535ma112011id535
ma112011id535
 
Murphy: Machine learning A probabilistic perspective: Ch.9
Murphy: Machine learning A probabilistic perspective: Ch.9Murphy: Machine learning A probabilistic perspective: Ch.9
Murphy: Machine learning A probabilistic perspective: Ch.9
 
Higher-Order (F, α, β, ρ, d) –Convexity for Multiobjective Programming Problem
Higher-Order (F, α, β, ρ, d) –Convexity for Multiobjective Programming ProblemHigher-Order (F, α, β, ρ, d) –Convexity for Multiobjective Programming Problem
Higher-Order (F, α, β, ρ, d) –Convexity for Multiobjective Programming Problem
 
Interpretable Sparse Sliced Inverse Regression for digitized functional data
Interpretable Sparse Sliced Inverse Regression for digitized functional dataInterpretable Sparse Sliced Inverse Regression for digitized functional data
Interpretable Sparse Sliced Inverse Regression for digitized functional data
 
Indefinite Integral
Indefinite IntegralIndefinite Integral
Indefinite Integral
 
On the Family of Concept Forming Operators in Polyadic FCA
On the Family of Concept Forming Operators in Polyadic FCAOn the Family of Concept Forming Operators in Polyadic FCA
On the Family of Concept Forming Operators in Polyadic FCA
 

More from The Statistical and Applied Mathematical Sciences Institute

Causal Inference Opening Workshop - Latent Variable Models, Causal Inference,...
Causal Inference Opening Workshop - Latent Variable Models, Causal Inference,...Causal Inference Opening Workshop - Latent Variable Models, Causal Inference,...
Causal Inference Opening Workshop - Latent Variable Models, Causal Inference,...
The Statistical and Applied Mathematical Sciences Institute
 
2019 Fall Series: Special Guest Lecture - 0-1 Phase Transitions in High Dimen...
2019 Fall Series: Special Guest Lecture - 0-1 Phase Transitions in High Dimen...2019 Fall Series: Special Guest Lecture - 0-1 Phase Transitions in High Dimen...
2019 Fall Series: Special Guest Lecture - 0-1 Phase Transitions in High Dimen...
The Statistical and Applied Mathematical Sciences Institute
 
Causal Inference Opening Workshop - Causal Discovery in Neuroimaging Data - F...
Causal Inference Opening Workshop - Causal Discovery in Neuroimaging Data - F...Causal Inference Opening Workshop - Causal Discovery in Neuroimaging Data - F...
Causal Inference Opening Workshop - Causal Discovery in Neuroimaging Data - F...
The Statistical and Applied Mathematical Sciences Institute
 
Causal Inference Opening Workshop - Smooth Extensions to BART for Heterogeneo...
Causal Inference Opening Workshop - Smooth Extensions to BART for Heterogeneo...Causal Inference Opening Workshop - Smooth Extensions to BART for Heterogeneo...
Causal Inference Opening Workshop - Smooth Extensions to BART for Heterogeneo...
The Statistical and Applied Mathematical Sciences Institute
 
Causal Inference Opening Workshop - A Bracketing Relationship between Differe...
Causal Inference Opening Workshop - A Bracketing Relationship between Differe...Causal Inference Opening Workshop - A Bracketing Relationship between Differe...
Causal Inference Opening Workshop - A Bracketing Relationship between Differe...
The Statistical and Applied Mathematical Sciences Institute
 
Causal Inference Opening Workshop - Testing Weak Nulls in Matched Observation...
Causal Inference Opening Workshop - Testing Weak Nulls in Matched Observation...Causal Inference Opening Workshop - Testing Weak Nulls in Matched Observation...
Causal Inference Opening Workshop - Testing Weak Nulls in Matched Observation...
The Statistical and Applied Mathematical Sciences Institute
 
Causal Inference Opening Workshop - Difference-in-differences: more than meet...
Causal Inference Opening Workshop - Difference-in-differences: more than meet...Causal Inference Opening Workshop - Difference-in-differences: more than meet...
Causal Inference Opening Workshop - Difference-in-differences: more than meet...
The Statistical and Applied Mathematical Sciences Institute
 
Causal Inference Opening Workshop - New Statistical Learning Methods for Esti...
Causal Inference Opening Workshop - New Statistical Learning Methods for Esti...Causal Inference Opening Workshop - New Statistical Learning Methods for Esti...
Causal Inference Opening Workshop - New Statistical Learning Methods for Esti...
The Statistical and Applied Mathematical Sciences Institute
 
Causal Inference Opening Workshop - Bipartite Causal Inference with Interfere...
Causal Inference Opening Workshop - Bipartite Causal Inference with Interfere...Causal Inference Opening Workshop - Bipartite Causal Inference with Interfere...
Causal Inference Opening Workshop - Bipartite Causal Inference with Interfere...
The Statistical and Applied Mathematical Sciences Institute
 
Causal Inference Opening Workshop - Bridging the Gap Between Causal Literatur...
Causal Inference Opening Workshop - Bridging the Gap Between Causal Literatur...Causal Inference Opening Workshop - Bridging the Gap Between Causal Literatur...
Causal Inference Opening Workshop - Bridging the Gap Between Causal Literatur...
The Statistical and Applied Mathematical Sciences Institute
 
Causal Inference Opening Workshop - Some Applications of Reinforcement Learni...
Causal Inference Opening Workshop - Some Applications of Reinforcement Learni...Causal Inference Opening Workshop - Some Applications of Reinforcement Learni...
Causal Inference Opening Workshop - Some Applications of Reinforcement Learni...
The Statistical and Applied Mathematical Sciences Institute
 
Causal Inference Opening Workshop - Bracketing Bounds for Differences-in-Diff...
Causal Inference Opening Workshop - Bracketing Bounds for Differences-in-Diff...Causal Inference Opening Workshop - Bracketing Bounds for Differences-in-Diff...
Causal Inference Opening Workshop - Bracketing Bounds for Differences-in-Diff...
The Statistical and Applied Mathematical Sciences Institute
 
Causal Inference Opening Workshop - Assisting the Impact of State Polcies: Br...
Causal Inference Opening Workshop - Assisting the Impact of State Polcies: Br...Causal Inference Opening Workshop - Assisting the Impact of State Polcies: Br...
Causal Inference Opening Workshop - Assisting the Impact of State Polcies: Br...
The Statistical and Applied Mathematical Sciences Institute
 
Causal Inference Opening Workshop - Experimenting in Equilibrium - Stefan Wag...
Causal Inference Opening Workshop - Experimenting in Equilibrium - Stefan Wag...Causal Inference Opening Workshop - Experimenting in Equilibrium - Stefan Wag...
Causal Inference Opening Workshop - Experimenting in Equilibrium - Stefan Wag...
The Statistical and Applied Mathematical Sciences Institute
 
Causal Inference Opening Workshop - Targeted Learning for Causal Inference Ba...
Causal Inference Opening Workshop - Targeted Learning for Causal Inference Ba...Causal Inference Opening Workshop - Targeted Learning for Causal Inference Ba...
Causal Inference Opening Workshop - Targeted Learning for Causal Inference Ba...
The Statistical and Applied Mathematical Sciences Institute
 
Causal Inference Opening Workshop - Bayesian Nonparametric Models for Treatme...
Causal Inference Opening Workshop - Bayesian Nonparametric Models for Treatme...Causal Inference Opening Workshop - Bayesian Nonparametric Models for Treatme...
Causal Inference Opening Workshop - Bayesian Nonparametric Models for Treatme...
The Statistical and Applied Mathematical Sciences Institute
 
2019 Fall Series: Special Guest Lecture - Adversarial Risk Analysis of the Ge...
2019 Fall Series: Special Guest Lecture - Adversarial Risk Analysis of the Ge...2019 Fall Series: Special Guest Lecture - Adversarial Risk Analysis of the Ge...
2019 Fall Series: Special Guest Lecture - Adversarial Risk Analysis of the Ge...
The Statistical and Applied Mathematical Sciences Institute
 
2019 Fall Series: Professional Development, Writing Academic Papers…What Work...
2019 Fall Series: Professional Development, Writing Academic Papers…What Work...2019 Fall Series: Professional Development, Writing Academic Papers…What Work...
2019 Fall Series: Professional Development, Writing Academic Papers…What Work...
The Statistical and Applied Mathematical Sciences Institute
 
2019 GDRR: Blockchain Data Analytics - Machine Learning in/for Blockchain: Fu...
2019 GDRR: Blockchain Data Analytics - Machine Learning in/for Blockchain: Fu...2019 GDRR: Blockchain Data Analytics - Machine Learning in/for Blockchain: Fu...
2019 GDRR: Blockchain Data Analytics - Machine Learning in/for Blockchain: Fu...
The Statistical and Applied Mathematical Sciences Institute
 
2019 GDRR: Blockchain Data Analytics - QuTrack: Model Life Cycle Management f...
2019 GDRR: Blockchain Data Analytics - QuTrack: Model Life Cycle Management f...2019 GDRR: Blockchain Data Analytics - QuTrack: Model Life Cycle Management f...
2019 GDRR: Blockchain Data Analytics - QuTrack: Model Life Cycle Management f...
The Statistical and Applied Mathematical Sciences Institute
 

More from The Statistical and Applied Mathematical Sciences Institute (20)

Causal Inference Opening Workshop - Latent Variable Models, Causal Inference,...
Causal Inference Opening Workshop - Latent Variable Models, Causal Inference,...Causal Inference Opening Workshop - Latent Variable Models, Causal Inference,...
Causal Inference Opening Workshop - Latent Variable Models, Causal Inference,...
 
2019 Fall Series: Special Guest Lecture - 0-1 Phase Transitions in High Dimen...
2019 Fall Series: Special Guest Lecture - 0-1 Phase Transitions in High Dimen...2019 Fall Series: Special Guest Lecture - 0-1 Phase Transitions in High Dimen...
2019 Fall Series: Special Guest Lecture - 0-1 Phase Transitions in High Dimen...
 
Causal Inference Opening Workshop - Causal Discovery in Neuroimaging Data - F...
Causal Inference Opening Workshop - Causal Discovery in Neuroimaging Data - F...Causal Inference Opening Workshop - Causal Discovery in Neuroimaging Data - F...
Causal Inference Opening Workshop - Causal Discovery in Neuroimaging Data - F...
 
Causal Inference Opening Workshop - Smooth Extensions to BART for Heterogeneo...
Causal Inference Opening Workshop - Smooth Extensions to BART for Heterogeneo...Causal Inference Opening Workshop - Smooth Extensions to BART for Heterogeneo...
Causal Inference Opening Workshop - Smooth Extensions to BART for Heterogeneo...
 
Causal Inference Opening Workshop - A Bracketing Relationship between Differe...
Causal Inference Opening Workshop - A Bracketing Relationship between Differe...Causal Inference Opening Workshop - A Bracketing Relationship between Differe...
Causal Inference Opening Workshop - A Bracketing Relationship between Differe...
 
Causal Inference Opening Workshop - Testing Weak Nulls in Matched Observation...
Causal Inference Opening Workshop - Testing Weak Nulls in Matched Observation...Causal Inference Opening Workshop - Testing Weak Nulls in Matched Observation...
Causal Inference Opening Workshop - Testing Weak Nulls in Matched Observation...
 
Causal Inference Opening Workshop - Difference-in-differences: more than meet...
Causal Inference Opening Workshop - Difference-in-differences: more than meet...Causal Inference Opening Workshop - Difference-in-differences: more than meet...
Causal Inference Opening Workshop - Difference-in-differences: more than meet...
 
Causal Inference Opening Workshop - New Statistical Learning Methods for Esti...
Causal Inference Opening Workshop - New Statistical Learning Methods for Esti...Causal Inference Opening Workshop - New Statistical Learning Methods for Esti...
Causal Inference Opening Workshop - New Statistical Learning Methods for Esti...
 
Causal Inference Opening Workshop - Bipartite Causal Inference with Interfere...
Causal Inference Opening Workshop - Bipartite Causal Inference with Interfere...Causal Inference Opening Workshop - Bipartite Causal Inference with Interfere...
Causal Inference Opening Workshop - Bipartite Causal Inference with Interfere...
 
Causal Inference Opening Workshop - Bridging the Gap Between Causal Literatur...
Causal Inference Opening Workshop - Bridging the Gap Between Causal Literatur...Causal Inference Opening Workshop - Bridging the Gap Between Causal Literatur...
Causal Inference Opening Workshop - Bridging the Gap Between Causal Literatur...
 
Causal Inference Opening Workshop - Some Applications of Reinforcement Learni...
Causal Inference Opening Workshop - Some Applications of Reinforcement Learni...Causal Inference Opening Workshop - Some Applications of Reinforcement Learni...
Causal Inference Opening Workshop - Some Applications of Reinforcement Learni...
 
Causal Inference Opening Workshop - Bracketing Bounds for Differences-in-Diff...
Causal Inference Opening Workshop - Bracketing Bounds for Differences-in-Diff...Causal Inference Opening Workshop - Bracketing Bounds for Differences-in-Diff...
Causal Inference Opening Workshop - Bracketing Bounds for Differences-in-Diff...
 
Causal Inference Opening Workshop - Assisting the Impact of State Polcies: Br...
Causal Inference Opening Workshop - Assisting the Impact of State Polcies: Br...Causal Inference Opening Workshop - Assisting the Impact of State Polcies: Br...
Causal Inference Opening Workshop - Assisting the Impact of State Polcies: Br...
 
Causal Inference Opening Workshop - Experimenting in Equilibrium - Stefan Wag...
Causal Inference Opening Workshop - Experimenting in Equilibrium - Stefan Wag...Causal Inference Opening Workshop - Experimenting in Equilibrium - Stefan Wag...
Causal Inference Opening Workshop - Experimenting in Equilibrium - Stefan Wag...
 
Causal Inference Opening Workshop - Targeted Learning for Causal Inference Ba...
Causal Inference Opening Workshop - Targeted Learning for Causal Inference Ba...Causal Inference Opening Workshop - Targeted Learning for Causal Inference Ba...
Causal Inference Opening Workshop - Targeted Learning for Causal Inference Ba...
 
Causal Inference Opening Workshop - Bayesian Nonparametric Models for Treatme...
Causal Inference Opening Workshop - Bayesian Nonparametric Models for Treatme...Causal Inference Opening Workshop - Bayesian Nonparametric Models for Treatme...
Causal Inference Opening Workshop - Bayesian Nonparametric Models for Treatme...
 
2019 Fall Series: Special Guest Lecture - Adversarial Risk Analysis of the Ge...
2019 Fall Series: Special Guest Lecture - Adversarial Risk Analysis of the Ge...2019 Fall Series: Special Guest Lecture - Adversarial Risk Analysis of the Ge...
2019 Fall Series: Special Guest Lecture - Adversarial Risk Analysis of the Ge...
 
2019 Fall Series: Professional Development, Writing Academic Papers…What Work...
2019 Fall Series: Professional Development, Writing Academic Papers…What Work...2019 Fall Series: Professional Development, Writing Academic Papers…What Work...
2019 Fall Series: Professional Development, Writing Academic Papers…What Work...
 
2019 GDRR: Blockchain Data Analytics - Machine Learning in/for Blockchain: Fu...
2019 GDRR: Blockchain Data Analytics - Machine Learning in/for Blockchain: Fu...2019 GDRR: Blockchain Data Analytics - Machine Learning in/for Blockchain: Fu...
2019 GDRR: Blockchain Data Analytics - Machine Learning in/for Blockchain: Fu...
 
2019 GDRR: Blockchain Data Analytics - QuTrack: Model Life Cycle Management f...
2019 GDRR: Blockchain Data Analytics - QuTrack: Model Life Cycle Management f...2019 GDRR: Blockchain Data Analytics - QuTrack: Model Life Cycle Management f...
2019 GDRR: Blockchain Data Analytics - QuTrack: Model Life Cycle Management f...
 

Recently uploaded

RHEOLOGY Physical pharmaceutics-II notes for B.pharm 4th sem students
RHEOLOGY Physical pharmaceutics-II notes for B.pharm 4th sem studentsRHEOLOGY Physical pharmaceutics-II notes for B.pharm 4th sem students
RHEOLOGY Physical pharmaceutics-II notes for B.pharm 4th sem students
Himanshu Rai
 
Level 3 NCEA - NZ: A Nation In the Making 1872 - 1900 SML.ppt
Level 3 NCEA - NZ: A  Nation In the Making 1872 - 1900 SML.pptLevel 3 NCEA - NZ: A  Nation In the Making 1872 - 1900 SML.ppt
Level 3 NCEA - NZ: A Nation In the Making 1872 - 1900 SML.ppt
Henry Hollis
 
Gender and Mental Health - Counselling and Family Therapy Applications and In...
Gender and Mental Health - Counselling and Family Therapy Applications and In...Gender and Mental Health - Counselling and Family Therapy Applications and In...
Gender and Mental Health - Counselling and Family Therapy Applications and In...
PsychoTech Services
 
BÀI TẬP BỔ TRỢ TIẾNG ANH 8 CẢ NĂM - GLOBAL SUCCESS - NĂM HỌC 2023-2024 (CÓ FI...
BÀI TẬP BỔ TRỢ TIẾNG ANH 8 CẢ NĂM - GLOBAL SUCCESS - NĂM HỌC 2023-2024 (CÓ FI...BÀI TẬP BỔ TRỢ TIẾNG ANH 8 CẢ NĂM - GLOBAL SUCCESS - NĂM HỌC 2023-2024 (CÓ FI...
BÀI TẬP BỔ TRỢ TIẾNG ANH 8 CẢ NĂM - GLOBAL SUCCESS - NĂM HỌC 2023-2024 (CÓ FI...
Nguyen Thanh Tu Collection
 
Leveraging Generative AI to Drive Nonprofit Innovation
Leveraging Generative AI to Drive Nonprofit InnovationLeveraging Generative AI to Drive Nonprofit Innovation
Leveraging Generative AI to Drive Nonprofit Innovation
TechSoup
 
BIOLOGY NATIONAL EXAMINATION COUNCIL (NECO) 2024 PRACTICAL MANUAL.pptx
BIOLOGY NATIONAL EXAMINATION COUNCIL (NECO) 2024 PRACTICAL MANUAL.pptxBIOLOGY NATIONAL EXAMINATION COUNCIL (NECO) 2024 PRACTICAL MANUAL.pptx
BIOLOGY NATIONAL EXAMINATION COUNCIL (NECO) 2024 PRACTICAL MANUAL.pptx
RidwanHassanYusuf
 
مصحف القراءات العشر أعد أحرف الخلاف سمير بسيوني.pdf
مصحف القراءات العشر   أعد أحرف الخلاف سمير بسيوني.pdfمصحف القراءات العشر   أعد أحرف الخلاف سمير بسيوني.pdf
مصحف القراءات العشر أعد أحرف الخلاف سمير بسيوني.pdf
سمير بسيوني
 
Jemison, MacLaughlin, and Majumder "Broadening Pathways for Editors and Authors"
Jemison, MacLaughlin, and Majumder "Broadening Pathways for Editors and Authors"Jemison, MacLaughlin, and Majumder "Broadening Pathways for Editors and Authors"
Jemison, MacLaughlin, and Majumder "Broadening Pathways for Editors and Authors"
National Information Standards Organization (NISO)
 
Mule event processing models | MuleSoft Mysore Meetup #47
Mule event processing models | MuleSoft Mysore Meetup #47Mule event processing models | MuleSoft Mysore Meetup #47
Mule event processing models | MuleSoft Mysore Meetup #47
MysoreMuleSoftMeetup
 
Walmart Business+ and Spark Good for Nonprofits.pdf
Walmart Business+ and Spark Good for Nonprofits.pdfWalmart Business+ and Spark Good for Nonprofits.pdf
Walmart Business+ and Spark Good for Nonprofits.pdf
TechSoup
 
A Visual Guide to 1 Samuel | A Tale of Two Hearts
A Visual Guide to 1 Samuel | A Tale of Two HeartsA Visual Guide to 1 Samuel | A Tale of Two Hearts
A Visual Guide to 1 Samuel | A Tale of Two Hearts
Steve Thomason
 
Beyond Degrees - Empowering the Workforce in the Context of Skills-First.pptx
Beyond Degrees - Empowering the Workforce in the Context of Skills-First.pptxBeyond Degrees - Empowering the Workforce in the Context of Skills-First.pptx
Beyond Degrees - Empowering the Workforce in the Context of Skills-First.pptx
EduSkills OECD
 
BÀI TẬP DẠY THÊM TIẾNG ANH LỚP 7 CẢ NĂM FRIENDS PLUS SÁCH CHÂN TRỜI SÁNG TẠO ...
BÀI TẬP DẠY THÊM TIẾNG ANH LỚP 7 CẢ NĂM FRIENDS PLUS SÁCH CHÂN TRỜI SÁNG TẠO ...BÀI TẬP DẠY THÊM TIẾNG ANH LỚP 7 CẢ NĂM FRIENDS PLUS SÁCH CHÂN TRỜI SÁNG TẠO ...
BÀI TẬP DẠY THÊM TIẾNG ANH LỚP 7 CẢ NĂM FRIENDS PLUS SÁCH CHÂN TRỜI SÁNG TẠO ...
Nguyen Thanh Tu Collection
 
Film vocab for eal 3 students: Australia the movie
Film vocab for eal 3 students: Australia the movieFilm vocab for eal 3 students: Australia the movie
Film vocab for eal 3 students: Australia the movie
Nicholas Montgomery
 
Temple of Asclepius in Thrace. Excavation results
Temple of Asclepius in Thrace. Excavation resultsTemple of Asclepius in Thrace. Excavation results
Temple of Asclepius in Thrace. Excavation results
Krassimira Luka
 
How to deliver Powerpoint Presentations.pptx
How to deliver Powerpoint  Presentations.pptxHow to deliver Powerpoint  Presentations.pptx
How to deliver Powerpoint Presentations.pptx
HajraNaeem15
 
How to Make a Field Mandatory in Odoo 17
How to Make a Field Mandatory in Odoo 17How to Make a Field Mandatory in Odoo 17
How to Make a Field Mandatory in Odoo 17
Celine George
 
math operations ued in python and all used
math operations ued in python and all usedmath operations ued in python and all used
math operations ued in python and all used
ssuser13ffe4
 
Pharmaceutics Pharmaceuticals best of brub
Pharmaceutics Pharmaceuticals best of brubPharmaceutics Pharmaceuticals best of brub
Pharmaceutics Pharmaceuticals best of brub
danielkiash986
 
Lifelines of National Economy chapter for Class 10 STUDY MATERIAL PDF
Lifelines of National Economy chapter for Class 10 STUDY MATERIAL PDFLifelines of National Economy chapter for Class 10 STUDY MATERIAL PDF
Lifelines of National Economy chapter for Class 10 STUDY MATERIAL PDF
Vivekanand Anglo Vedic Academy
 

Recently uploaded (20)

RHEOLOGY Physical pharmaceutics-II notes for B.pharm 4th sem students
RHEOLOGY Physical pharmaceutics-II notes for B.pharm 4th sem studentsRHEOLOGY Physical pharmaceutics-II notes for B.pharm 4th sem students
RHEOLOGY Physical pharmaceutics-II notes for B.pharm 4th sem students
 
Level 3 NCEA - NZ: A Nation In the Making 1872 - 1900 SML.ppt
Level 3 NCEA - NZ: A  Nation In the Making 1872 - 1900 SML.pptLevel 3 NCEA - NZ: A  Nation In the Making 1872 - 1900 SML.ppt
Level 3 NCEA - NZ: A Nation In the Making 1872 - 1900 SML.ppt
 
Gender and Mental Health - Counselling and Family Therapy Applications and In...
Gender and Mental Health - Counselling and Family Therapy Applications and In...Gender and Mental Health - Counselling and Family Therapy Applications and In...
Gender and Mental Health - Counselling and Family Therapy Applications and In...
 
BÀI TẬP BỔ TRỢ TIẾNG ANH 8 CẢ NĂM - GLOBAL SUCCESS - NĂM HỌC 2023-2024 (CÓ FI...
BÀI TẬP BỔ TRỢ TIẾNG ANH 8 CẢ NĂM - GLOBAL SUCCESS - NĂM HỌC 2023-2024 (CÓ FI...BÀI TẬP BỔ TRỢ TIẾNG ANH 8 CẢ NĂM - GLOBAL SUCCESS - NĂM HỌC 2023-2024 (CÓ FI...
BÀI TẬP BỔ TRỢ TIẾNG ANH 8 CẢ NĂM - GLOBAL SUCCESS - NĂM HỌC 2023-2024 (CÓ FI...
 
Leveraging Generative AI to Drive Nonprofit Innovation
Leveraging Generative AI to Drive Nonprofit InnovationLeveraging Generative AI to Drive Nonprofit Innovation
Leveraging Generative AI to Drive Nonprofit Innovation
 
BIOLOGY NATIONAL EXAMINATION COUNCIL (NECO) 2024 PRACTICAL MANUAL.pptx
BIOLOGY NATIONAL EXAMINATION COUNCIL (NECO) 2024 PRACTICAL MANUAL.pptxBIOLOGY NATIONAL EXAMINATION COUNCIL (NECO) 2024 PRACTICAL MANUAL.pptx
BIOLOGY NATIONAL EXAMINATION COUNCIL (NECO) 2024 PRACTICAL MANUAL.pptx
 
مصحف القراءات العشر أعد أحرف الخلاف سمير بسيوني.pdf
مصحف القراءات العشر   أعد أحرف الخلاف سمير بسيوني.pdfمصحف القراءات العشر   أعد أحرف الخلاف سمير بسيوني.pdf
مصحف القراءات العشر أعد أحرف الخلاف سمير بسيوني.pdf
 
Jemison, MacLaughlin, and Majumder "Broadening Pathways for Editors and Authors"
Jemison, MacLaughlin, and Majumder "Broadening Pathways for Editors and Authors"Jemison, MacLaughlin, and Majumder "Broadening Pathways for Editors and Authors"
Jemison, MacLaughlin, and Majumder "Broadening Pathways for Editors and Authors"
 
Mule event processing models | MuleSoft Mysore Meetup #47
Mule event processing models | MuleSoft Mysore Meetup #47Mule event processing models | MuleSoft Mysore Meetup #47
Mule event processing models | MuleSoft Mysore Meetup #47
 
Walmart Business+ and Spark Good for Nonprofits.pdf
Walmart Business+ and Spark Good for Nonprofits.pdfWalmart Business+ and Spark Good for Nonprofits.pdf
Walmart Business+ and Spark Good for Nonprofits.pdf
 
A Visual Guide to 1 Samuel | A Tale of Two Hearts
A Visual Guide to 1 Samuel | A Tale of Two HeartsA Visual Guide to 1 Samuel | A Tale of Two Hearts
A Visual Guide to 1 Samuel | A Tale of Two Hearts
 
Beyond Degrees - Empowering the Workforce in the Context of Skills-First.pptx
Beyond Degrees - Empowering the Workforce in the Context of Skills-First.pptxBeyond Degrees - Empowering the Workforce in the Context of Skills-First.pptx
Beyond Degrees - Empowering the Workforce in the Context of Skills-First.pptx
 
BÀI TẬP DẠY THÊM TIẾNG ANH LỚP 7 CẢ NĂM FRIENDS PLUS SÁCH CHÂN TRỜI SÁNG TẠO ...
BÀI TẬP DẠY THÊM TIẾNG ANH LỚP 7 CẢ NĂM FRIENDS PLUS SÁCH CHÂN TRỜI SÁNG TẠO ...BÀI TẬP DẠY THÊM TIẾNG ANH LỚP 7 CẢ NĂM FRIENDS PLUS SÁCH CHÂN TRỜI SÁNG TẠO ...
BÀI TẬP DẠY THÊM TIẾNG ANH LỚP 7 CẢ NĂM FRIENDS PLUS SÁCH CHÂN TRỜI SÁNG TẠO ...
 
Film vocab for eal 3 students: Australia the movie
Film vocab for eal 3 students: Australia the movieFilm vocab for eal 3 students: Australia the movie
Film vocab for eal 3 students: Australia the movie
 
Temple of Asclepius in Thrace. Excavation results
Temple of Asclepius in Thrace. Excavation resultsTemple of Asclepius in Thrace. Excavation results
Temple of Asclepius in Thrace. Excavation results
 
How to deliver Powerpoint Presentations.pptx
How to deliver Powerpoint  Presentations.pptxHow to deliver Powerpoint  Presentations.pptx
How to deliver Powerpoint Presentations.pptx
 
How to Make a Field Mandatory in Odoo 17
How to Make a Field Mandatory in Odoo 17How to Make a Field Mandatory in Odoo 17
How to Make a Field Mandatory in Odoo 17
 
math operations ued in python and all used
math operations ued in python and all usedmath operations ued in python and all used
math operations ued in python and all used
 
Pharmaceutics Pharmaceuticals best of brub
Pharmaceutics Pharmaceuticals best of brubPharmaceutics Pharmaceuticals best of brub
Pharmaceutics Pharmaceuticals best of brub
 
Lifelines of National Economy chapter for Class 10 STUDY MATERIAL PDF
Lifelines of National Economy chapter for Class 10 STUDY MATERIAL PDFLifelines of National Economy chapter for Class 10 STUDY MATERIAL PDF
Lifelines of National Economy chapter for Class 10 STUDY MATERIAL PDF
 

Deep Learning Opening Workshop - Admissibility of Solution Estimators in Stochastic Optimization - Amitabh Basu, August 12, 2019

  • 1. Admissibility of solution estimators to stochastic optimization problems Amitabh Basu Joint Work with Tu Nguyen and Ao Sun Foundations of Deep Learning, Opening Workshop, SAMSI, Durham, August 2019 1 / 19
  • 2. A general stochastic optimization problem F : X × Rm → R ξ is a random variable taking values in Rm min x∈X Eξ[ F(x, ξ) ] 2 / 19
  • 3. A general stochastic optimization problem F : X × Rm → R ξ is a random variable taking values in Rm min x∈X Eξ[ F(x, ξ) ] Example: Supervised Machine Learning: One sees samples (z, y) ∈ Rn × R of labeled data from some (joint) distribution, and one aims to find a function f ∈ F in a hypothesis class F that minimizes the expected loss E(z,y)[ (f (z), y)], where : R × R → R+ is some loss function. Then X = F, m = n + 1, ξ = (z, y), and F(f , (z, y)) = (f (z), y). 2 / 19
  • 4. A general stochastic optimization problem F : X × Rm → R ξ is a random variable taking values in Rm min x∈X Eξ[ F(x, ξ) ] Example: (News) Vendor Problem: (News) Vendor buys some units of a product (newspapers) from supplier at cost of c > 0 dollars/unit; at most u units available. Stochastic demand for product. Product sold at price p > c dollars/unit. End of day, vendor can return unsold product to supplier at r < c dollars/unit. Find number of units to buy to maximize (minimize) the expected profit (loss). 2 / 19
  • 5. A general stochastic optimization problem F : X × Rm → R ξ is a random variable taking values in Rm min x∈X Eξ[ F(x, ξ) ] Example: (News) Vendor Problem: (News) Vendor buys some units of a product (newspapers) from supplier at cost of c > 0 dollars/unit; at most u units available. Stochastic demand for product. Product sold at price p > c dollars/unit. End of day, vendor can return unsold product to supplier at r < c dollars/unit. Find number of units to buy to maximize (minimize) the expected profit (loss). m = 1, X = [0, u], F(x, ξ) = cx − p min{x, ξ} − r max{x − ξ, 0}. 2 / 19
  • 6. Solving the problem F : X × Rm → R ξ is a random variable taking values in Rm min x∈X Eξ[ F(x, ξ) ] 3 / 19
  • 7. Solving the problem F : X × Rm → R ξ is a random variable taking values in Rm min x∈X Eξ[ F(x, ξ) ] Solve the problem only given access to n i.i.d. samples of ξ. 3 / 19
  • 8. Solving the problem F : X × Rm → R ξ is a random variable taking values in Rm min x∈X Eξ[ F(x, ξ) ] Solve the problem only given access to n i.i.d. samples of ξ. Natural idea: Given samples ξ1, . . . , ξn ∈ Rd , solve the deterministic problem min x∈X 1 n n i=1 F(x, ξi ) 3 / 19
  • 9. Solving the problem F : X × Rm → R ξ is a random variable taking values in Rm min x∈X Eξ[ F(x, ξ) ] Solve the problem only given access to n i.i.d. samples of ξ. Natural idea: Given samples ξ1, . . . , ξn ∈ Rd , solve the deterministic problem min x∈X 1 n n i=1 F(x, ξi ) Stochastic optimizers call this sample average approximation (SAA); machine learners call this empirical risk minimization. 3 / 19
  • 10. Concrete Problem F(x, ξ) = ξT x X ⊆ Rd is a compact set (e.g., polytope, integer points in a polytope). So m = d. ξ ∼ N(µ, Σ). min x∈X Eξ[ F(x, ξ) ] = min x∈X Eξ[ ξT x ] = min x∈X µT x 4 / 19
  • 11. Concrete Problem F(x, ξ) = ξT x X ⊆ Rd is a compact set (e.g., polytope, integer points in a polytope). So m = d. ξ ∼ N(µ, Σ). min x∈X Eξ[ F(x, ξ) ] = min x∈X Eξ[ ξT x ] = min x∈X µT x Solve the problem only given access to n i.i.d. samples of ξ. Important: µ is unknown. 4 / 19
  • 12. Concrete Problem F(x, ξ) = ξT x X ⊆ Rd is a compact set (e.g., polytope, integer points in a polytope). So m = d. ξ ∼ N(µ, Σ). min x∈X Eξ[ F(x, ξ) ] = min x∈X Eξ[ ξT x ] = min x∈X µT x Solve the problem only given access to n i.i.d. samples of ξ. Important: µ is unknown. Sample Average Approximation (SAA): min x∈X 1 n n i=1 F(x, ξi ) = min x∈X ξ T x where ξ := 1 n n i=1 ξi . 4 / 19
  • 13. A quick tour of Statistical Decision Theory Set of states of nature, modeled by a set Θ. Set of possible actions to take, modeled by A. In a particular state of nature θ ∈ Θ, the performance of any action a ∈ A, is evaluated by a loss function L(θ, a). Goal: choose action to minimize loss. (Partial/Incomplete) Information about θ is obtained through a random variable y taking values in a sample space χ. The distribution of y depends on the particular state of nature θ, denoted by Pθ. 5 / 19
  • 14. A quick tour of Statistical Decision Theory Set of states of nature, modeled by a set Θ. Set of possible actions to take, modeled by A. In a particular state of nature θ ∈ Θ, the performance of any action a ∈ A, is evaluated by a loss function L(θ, a). Goal: choose action to minimize loss. (Partial/Incomplete) Information about θ is obtained through a random variable y taking values in a sample space χ. The distribution of y depends on the particular state of nature θ, denoted by Pθ. Decision Rule: Takes y ∈ χ as input and reports an action a ∈ A. Denote by δ : χ → A. 5 / 19
  • 15. Our problem cast as statistical decision problem X ⊆ Rd is a compact set. ξ ∼ N(µ, I). min x∈X Eξ[ F(x, ξ) ] = min x∈X Eξ[ ξT x ] = min x∈X µT x States of Nature: Θ = Rd = {all possible µ ∈ Rd }. Set of Actions: X ⊆ Rd . 6 / 19
  • 16. Our problem cast as statistical decision problem X ⊆ Rd is a compact set. ξ ∼ N(µ, I). min x∈X Eξ[ F(x, ξ) ] = min x∈X Eξ[ ξT x ] = min x∈X µT x States of Nature: Θ = Rd = {all possible µ ∈ Rd }. Set of Actions: X ⊆ Rd . Loss function: ? 6 / 19
  • 17. Our problem cast as statistical decision problem X ⊆ Rd is a compact set. ξ ∼ N(µ, I). min x∈X Eξ[ F(x, ξ) ] = min x∈X Eξ[ ξT x ] = min x∈X µT x States of Nature: Θ = Rd = {all possible µ ∈ Rd }. Set of Actions: X ⊆ Rd . Loss function: L(¯µ, ¯x) = Eξ∼N(¯µ,I)[ F(¯x, ξ) ] − min x∈X Eξ∼N(¯µ,I)[ F(x, ξ) ] 6 / 19
  • 18. Our problem cast as statistical decision problem X ⊆ Rd is a compact set. ξ ∼ N(µ, I). min x∈X Eξ[ F(x, ξ) ] = min x∈X Eξ[ ξT x ] = min x∈X µT x States of Nature: Θ = Rd = {all possible µ ∈ Rd }. Set of Actions: X ⊆ Rd . Loss function: L(¯µ, ¯x) = Eξ∼N(¯µ,I)[ F(¯x, ξ) ] − minx∈X Eξ∼N(¯µ,I)[ F(x, ξ) ] = ¯µT ¯x − ¯µT x(¯µ) 6 / 19
  • 19. Our problem cast as statistical decision problem X ⊆ Rd is a compact set. ξ ∼ N(µ, I). min x∈X Eξ[ F(x, ξ) ] = min x∈X Eξ[ ξT x ] = min x∈X µT x States of Nature: Θ = Rd = {all possible µ ∈ Rd }. Set of Actions: X ⊆ Rd . Loss function: L(¯µ, ¯x) = Eξ∼N(¯µ,I)[ F(¯x, ξ) ] − minx∈X Eξ∼N(¯µ,I)[ F(x, ξ) ] = ¯µT ¯x − ¯µT x(¯µ) Sample Space: χ = Rd × Rd × . . . × Rd n times 6 / 19
  • 20. Our problem cast as statistical decision problem X ⊆ Rd is a compact set. ξ ∼ N(µ, I). min x∈X Eξ[ F(x, ξ) ] = min x∈X Eξ[ ξT x ] = min x∈X µT x States of Nature: Θ = Rd = {all possible µ ∈ Rd }. Set of Actions: X ⊆ Rd . Loss function: L(¯µ, ¯x) = Eξ∼N(¯µ,I)[ F(¯x, ξ) ] − minx∈X Eξ∼N(¯µ,I)[ F(x, ξ) ] = ¯µT ¯x − ¯µT x(¯µ) Sample Space: χ = Rd × Rd × . . . × Rd n times Decision Rule: δ : χ → X. 6 / 19
  • 21. Our problem cast as statistical decision problem X ⊆ Rd is a compact set. ξ ∼ N(µ, I). min x∈X Eξ[ F(x, ξ) ] = min x∈X Eξ[ ξT x ] = min x∈X µT x States of Nature: Θ = Rd = {all possible µ ∈ Rd }. Set of Actions: X ⊆ Rd . Loss function: L(¯µ, ¯x) = Eξ∼N(¯µ,I)[ F(¯x, ξ) ] − minx∈X Eξ∼N(¯µ,I)[ F(x, ξ) ] = ¯µT ¯x − ¯µT x(¯µ) Sample Space: χ = Rd × Rd × . . . × Rd n times Decision Rule: δ : χ → X. SAA: δ(ξ1 , . . . , ξn ) ∈ arg max{ξ T x : x ∈ X} 6 / 19
  • 22. Decision Rules 1 2 34 1 2 34 7 / 19
  • 25. How does one decide between decision rules? 8 / 19
  • 26. How does one decide between decision rules? States of nature Θ, Actions A, Loss function L : Θ × A → R, Sample space χ with distributions {Pθ : θ ∈ Θ}. 8 / 19
  • 27. How does one decide between decision rules? States of nature Θ, Actions A, Loss function L : Θ × A → R, Sample space χ with distributions {Pθ : θ ∈ Θ}. Given a decision rule δ : χ → A, define the risk function of this decision rule as: Rδ(θ) := Ey∼Pθ [ L(θ, δ(y)) ] 8 / 19
  • 28. How does one decide between decision rules? States of nature Θ, Actions A, Loss function L : Θ × A → R, Sample space χ with distributions {Pθ : θ ∈ Θ}. Given a decision rule δ : χ → A, define the risk function of this decision rule as: Rδ(θ) := Ey∼Pθ [ L(θ, δ(y)) ] We say that a decision rule δ dominates a decision rule δ if Rδ (θ) ≤ Rδ(θ) for all θ ∈ Θ, and Rδ (θ∗) < Rδ(θ∗) for some θ∗ ∈ Θ. 8 / 19
  • 29. How does one decide between decision rules? States of nature Θ, Actions A, Loss function L : Θ × A → R, Sample space χ with distributions {Pθ : θ ∈ Θ}. Given a decision rule δ : χ → A, define the risk function of this decision rule as: Rδ(θ) := Ey∼Pθ [ L(θ, δ(y)) ] We say that a decision rule δ dominates a decision rule δ if Rδ (θ) ≤ Rδ(θ) for all θ ∈ Θ, and Rδ (θ∗) < Rδ(θ∗) for some θ∗ ∈ Θ. If a decision rule δ is not dominated by any other decision rule, we say that δ is admissible. Otherwise, it is inadmissible. 8 / 19
  • 30. Is the Sample Average Approximation (SAA) rule admissible? 9 / 19
  • 31. Admissibility in stochastic optimization Stochastic optimization setup: F : X × Rm → R, ξ is a R.V. in Rm min x∈X Eξ[ F(x, ξ) ] Want to solve with access to n i.i.d. samples of ξ. 10 / 19
  • 32. Admissibility in stochastic optimization Stochastic optimization setup: F : X × Rm → R, ξ is a R.V. in Rm min x∈X Eξ[ F(x, ξ) ] Want to solve with access to n i.i.d. samples of ξ. Statistical decision theory view: ξ ∼ N(µ, I); states of nature Θ = Rm = {all possible µ ∈ Rm}. Set of actions A = X, Sample space χ = Rd × Rd × . . . × Rd n times Loss function L(¯µ, ¯x) = Eξ∼N(¯µ,I)[ F(¯x, ξ) ] − min x∈X Eξ∼N(¯µ,I)[ F(x, ξ) ] Given a decision rule δ : χ → X, the risk of δ Rδ(µ) := Eξ1,...,ξn [ L(µ, δ(ξ1 , . . . , ξn )) ] 10 / 19
  • 33. Admissibility in stochastic optimization L(¯µ, ¯x) = Eξ∼N(¯µ,I)[ F(¯x, ξ) ] − min x∈X Eξ∼N(¯µ,I)[ F(x, ξ) ] Given a decision rule δ : χ → X, the risk of δ Rδ(µ) := Eξ1,...,ξn [ L(µ, δ(ξ1 , . . . , ξn )) ] Sample Average Approximation (SAA): min x∈X 1 n n i=1 F(x, ξi ) 11 / 19
  • 34. Admissibility in stochastic optimization L(¯µ, ¯x) = Eξ∼N(¯µ,I)[ F(¯x, ξ) ] − min x∈X Eξ∼N(¯µ,I)[ F(x, ξ) ] Given a decision rule δ : χ → X, the risk of δ Rδ(µ) := Eξ1,...,ξn [ L(µ, δ(ξ1 , . . . , ξn )) ] Sample Average Approximation (SAA): min x∈X 1 n n i=1 F(x, ξi ) Sample Average Approximation (SAA) can be inadmissible!! 11 / 19
  • 35. Inadmissibility of SAA: Stein’s Paradox L(¯µ, ¯x) = Eξ∼N(¯µ,I)[ F(¯x, ξ) ] − min x∈X Eξ∼N(¯µ,I)[ F(x, ξ) ] Sample Average Approximation (SAA) can be inadmissible!! F(x, ξ) = x − ξ 2, X = Rd , ξ ∼ N(µ, I). min x∈Rd Eξ[ F(x, ξ) ] = min x∈Rd Eξ[ x − ξ 2 ] 12 / 19
  • 36. Inadmissibility of SAA: Stein’s Paradox L(¯µ, ¯x) = Eξ∼N(¯µ,I)[ F(¯x, ξ) ] − min x∈X Eξ∼N(¯µ,I)[ F(x, ξ) ] Sample Average Approximation (SAA) can be inadmissible!! min x∈Rd Eξ∼N(µ,I)[ x − ξ 2 ] = min x∈Rd x − µ 2 + V[ ξ ] Optimal solution: x(¯µ) = ¯µ, Optimal value: V[ ξ ] = d. L(¯µ, ¯x) = ¯x − ¯µ 2 . 13 / 19
  • 37. Inadmissibility of SAA: Stein’s Paradox L(¯µ, ¯x) = Eξ∼N(¯µ,I)[ F(¯x, ξ) ] − min x∈X Eξ∼N(¯µ,I)[ F(x, ξ) ] Sample Average Approximation (SAA) can be inadmissible!! min x∈Rd Eξ∼N(µ,I)[ x − ξ 2 ] = min x∈Rd x − µ 2 + V[ ξ ] Optimal solution: x(¯µ) = ¯µ, Optimal value: V[ ξ ] = d. L(¯µ, ¯x) = ¯x − ¯µ 2 . Sample Average Approximation (SAA): min x∈Rd 1 n n i=1 x − ξi 2 13 / 19
  • 38. Inadmissibility of SAA: Stein’s Paradox L(¯µ, ¯x) = Eξ∼N(¯µ,I)[ F(¯x, ξ) ] − min x∈X Eξ∼N(¯µ,I)[ F(x, ξ) ] Sample Average Approximation (SAA) can be inadmissible!! min x∈Rd Eξ∼N(µ,I)[ x − ξ 2 ] = min x∈Rd x − µ 2 + V[ ξ ] Optimal solution: x(¯µ) = ¯µ, Optimal value: V[ ξ ] = d. L(¯µ, ¯x) = ¯x − ¯µ 2 . Sample Average Approximation (SAA): min x∈Rd 1 n n i=1 x − ξi 2 δSAA(ξ1 , . . . , ξn ) = ξ := n i=1 ξi . 13 / 19
  • 39. Inadmissibility of SAA: Stein’s Paradox L(¯µ, ¯x) = Eξ∼N(¯µ,I)[ F(¯x, ξ) ] − min x∈X Eξ∼N(¯µ,I)[ F(x, ξ) ] Sample Average Approximation (SAA) can be inadmissible!! min x∈Rd Eξ∼N(µ,I)[ x − ξ 2 ] Generalized to arbitrary convex quadratic function with uncertain linear term in Davarnia and Cornu´ejols 2018. Follow-up work from a Bayesian perspective in Davarnia, Kocuk and Cornu´ejols 2018. 14 / 19
  • 40. A class of problems with no Stein’s paradox THEOREM Basu-Nguyen-Sun 2018 Consider the problem of optimizing an uncertain linear objective ξ ∼ N(µ, I) over a fixed compact set X ⊆ Rd : min x∈X Eξ∼N(µ,I)[ ξT x ] The Sample Average Approximation (SAA) rule is admissible. 15 / 19
  • 43. Main technical ideas/tools Sufficient Statistic: P = {Pθ : θ ∈ Θ} family of distributions for r.v. y in sample space χ. Sufficient statistic for this family is a function T : χ → τ such that the conditional probability P(y|T = t) does not depend on θ. 17 / 19
  • 44. Main technical ideas/tools Sufficient Statistic: P = {Pθ : θ ∈ Θ} family of distributions for r.v. y in sample space χ. Sufficient statistic for this family is a function T : χ → τ such that the conditional probability P(y|T = t) does not depend on θ. FACT: χ = Rd × . . . × Rd n times , P = {N(µ, I) × . . . × N(µ, I) n times : µ ∈ Rd }, i.e., (ξ1, . . . , ξn) ∈ χ are i.i.d samples from the normal distribution N(µ, I). Then T(ξ1 . . . , ξn) = ξ := 1 n n i=1 ξi is a sufficient statistic for P. 17 / 19
  • 45. Main technical ideas/tools Sufficient Statistic: P = {Pθ : θ ∈ Θ} family of distributions for r.v. y in sample space χ. Sufficient statistic for this family is a function T : χ → τ such that the conditional probability P(y|T = t) does not depend on θ. THEOREM Rao-Blackwell 1940s If the loss function is convex in the action space, then for any decision rule δ, there exists a rule δ which is a function only of a sufficient statistic and Rδ ≤ Rδ. 17 / 19
  • 46. Main technical ideas/tools For any decision rule δ, define the function F(µ) = Rδ(µ) − RδSAA (µ). Suffices to show that there exists ˆµ ∈ Rd such that F(ˆµ) > 0. 17 / 19
  • 47. Main technical ideas/tools For any decision rule δ, define the function F(µ) = Rδ(µ) − RδSAA (µ). Suffices to show that there exists ˆµ ∈ Rd such that F(ˆµ) > 0. First observe: F(0) = 0. 17 / 19
  • 48. Main technical ideas/tools For any decision rule δ, define the function F(µ) = Rδ(µ) − RδSAA (µ). Suffices to show that there exists ˆµ ∈ Rd such that F(ˆµ) > 0. First observe: F(0) = 0. Then compute 2F(0); show it has a strictly positive eigenvalue. 17 / 19
  • 49. Main technical ideas/tools For any decision rule δ, define the function F(µ) = Rδ(µ) − RδSAA (µ). Suffices to show that there exists ˆµ ∈ Rd such that F(ˆµ) > 0. First observe: F(0) = 0. Then compute 2F(0); show it has a strictly positive eigenvalue. Use a fact from probability theory that for any Lebesgue integrable function f : Rn → Rd , the map µ → Ey∈N(µ,Σ) [ f (y) ] := Rd f (y) exp − 1 2 (y−µ)T Σ−1 (y−µ) dy has derivatives of all orders and these can be computed by taking the derivative under the integral sign. 17 / 19
  • 50. Open Questions What about nonlinear objectives over compact feasible regions? For example, what if F(x, ξ) = xT Qx + ξT x for some fixed PSD matrix Q, and X is a compact (convex) set? 18 / 19
  • 51. Open Questions What about nonlinear objectives over compact feasible regions? For example, what if F(x, ξ) = xT Qx + ξT x for some fixed PSD matrix Q, and X is a compact (convex) set? What about piecewise linear objectives F(x, ξ)? Recall News Vendor Problem. 18 / 19
  • 52. Open Questions What about nonlinear objectives over compact feasible regions? For example, what if F(x, ξ) = xT Qx + ξT x for some fixed PSD matrix Q, and X is a compact (convex) set? What about piecewise linear objectives F(x, ξ)? Recall News Vendor Problem. Objectives coming from machine learning problems, such as neural network training with squared or logistic loss (admissibility of “empirical risk minimization”). Maybe this depends on the hypothesis class that is being learnt? 18 / 19
  • 53. Open Questions What about nonlinear objectives over compact feasible regions? For example, what if F(x, ξ) = xT Qx + ξT x for some fixed PSD matrix Q, and X is a compact (convex) set? What about piecewise linear objectives F(x, ξ)? Recall News Vendor Problem. Objectives coming from machine learning problems, such as neural network training with squared or logistic loss (admissibility of “empirical risk minimization”). Maybe this depends on the hypothesis class that is being learnt? METATHEOREM (from G´erard Cornu´ejols): Admissible if and only if feasible region is bounded !? 18 / 19