SlideShare a Scribd company logo
1 of 54
Download to read offline
Admissibility of solution estimators to stochastic
optimization problems
Amitabh Basu
Joint Work with Tu Nguyen and Ao Sun
Foundations of Deep Learning, Opening Workshop,
SAMSI, Durham, August 2019
1 / 19
A general stochastic optimization problem
F : X × Rm → R
ξ is a random variable taking values in Rm
min
x∈X
Eξ[ F(x, ξ) ]
2 / 19
A general stochastic optimization problem
F : X × Rm → R
ξ is a random variable taking values in Rm
min
x∈X
Eξ[ F(x, ξ) ]
Example:
Supervised Machine Learning: One sees samples (z, y) ∈ Rn × R
of labeled data from some (joint) distribution, and one aims to find
a function f ∈ F in a hypothesis class F that minimizes the
expected loss E(z,y)[ (f (z), y)], where : R × R → R+ is some
loss function. Then X = F, m = n + 1, ξ = (z, y), and
F(f , (z, y)) = (f (z), y).
2 / 19
A general stochastic optimization problem
F : X × Rm → R
ξ is a random variable taking values in Rm
min
x∈X
Eξ[ F(x, ξ) ]
Example:
(News) Vendor Problem: (News) Vendor buys some units of a
product (newspapers) from supplier at cost of c > 0 dollars/unit;
at most u units available. Stochastic demand for product. Product
sold at price p > c dollars/unit. End of day, vendor can return
unsold product to supplier at r < c dollars/unit. Find number of
units to buy to maximize (minimize) the expected profit (loss).
2 / 19
A general stochastic optimization problem
F : X × Rm → R
ξ is a random variable taking values in Rm
min
x∈X
Eξ[ F(x, ξ) ]
Example:
(News) Vendor Problem: (News) Vendor buys some units of a
product (newspapers) from supplier at cost of c > 0 dollars/unit;
at most u units available. Stochastic demand for product. Product
sold at price p > c dollars/unit. End of day, vendor can return
unsold product to supplier at r < c dollars/unit. Find number of
units to buy to maximize (minimize) the expected profit (loss).
m = 1, X = [0, u],
F(x, ξ) = cx − p min{x, ξ} − r max{x − ξ, 0}.
2 / 19
Solving the problem
F : X × Rm → R
ξ is a random variable taking values in Rm
min
x∈X
Eξ[ F(x, ξ) ]
3 / 19
Solving the problem
F : X × Rm → R
ξ is a random variable taking values in Rm
min
x∈X
Eξ[ F(x, ξ) ]
Solve the problem only given access to n i.i.d. samples of ξ.
3 / 19
Solving the problem
F : X × Rm → R
ξ is a random variable taking values in Rm
min
x∈X
Eξ[ F(x, ξ) ]
Solve the problem only given access to n i.i.d. samples of ξ.
Natural idea: Given samples ξ1, . . . , ξn ∈ Rd , solve the
deterministic problem
min
x∈X
1
n
n
i=1
F(x, ξi
)
3 / 19
Solving the problem
F : X × Rm → R
ξ is a random variable taking values in Rm
min
x∈X
Eξ[ F(x, ξ) ]
Solve the problem only given access to n i.i.d. samples of ξ.
Natural idea: Given samples ξ1, . . . , ξn ∈ Rd , solve the
deterministic problem
min
x∈X
1
n
n
i=1
F(x, ξi
)
Stochastic optimizers call this sample average approximation
(SAA); machine learners call this empirical risk minimization.
3 / 19
Concrete Problem
F(x, ξ) = ξT x
X ⊆ Rd is a compact set (e.g., polytope, integer points in a
polytope). So m = d.
ξ ∼ N(µ, Σ).
min
x∈X
Eξ[ F(x, ξ) ] = min
x∈X
Eξ[ ξT
x ] = min
x∈X
µT
x
4 / 19
Concrete Problem
F(x, ξ) = ξT x
X ⊆ Rd is a compact set (e.g., polytope, integer points in a
polytope). So m = d.
ξ ∼ N(µ, Σ).
min
x∈X
Eξ[ F(x, ξ) ] = min
x∈X
Eξ[ ξT
x ] = min
x∈X
µT
x
Solve the problem only given access to n i.i.d. samples of ξ.
Important: µ is unknown.
4 / 19
Concrete Problem
F(x, ξ) = ξT x
X ⊆ Rd is a compact set (e.g., polytope, integer points in a
polytope). So m = d.
ξ ∼ N(µ, Σ).
min
x∈X
Eξ[ F(x, ξ) ] = min
x∈X
Eξ[ ξT
x ] = min
x∈X
µT
x
Solve the problem only given access to n i.i.d. samples of ξ.
Important: µ is unknown.
Sample Average Approximation (SAA):
min
x∈X
1
n
n
i=1
F(x, ξi
) = min
x∈X
ξ
T
x
where ξ := 1
n
n
i=1 ξi .
4 / 19
A quick tour of Statistical Decision Theory
Set of states of nature, modeled by a set Θ.
Set of possible actions to take, modeled by A.
In a particular state of nature θ ∈ Θ, the performance of any
action a ∈ A, is evaluated by a loss function L(θ, a). Goal: choose
action to minimize loss.
(Partial/Incomplete) Information about θ is obtained through a
random variable y taking values in a sample space χ. The
distribution of y depends on the particular state of nature θ,
denoted by Pθ.
5 / 19
A quick tour of Statistical Decision Theory
Set of states of nature, modeled by a set Θ.
Set of possible actions to take, modeled by A.
In a particular state of nature θ ∈ Θ, the performance of any
action a ∈ A, is evaluated by a loss function L(θ, a). Goal: choose
action to minimize loss.
(Partial/Incomplete) Information about θ is obtained through a
random variable y taking values in a sample space χ. The
distribution of y depends on the particular state of nature θ,
denoted by Pθ.
Decision Rule: Takes y ∈ χ as input and reports an action a ∈ A.
Denote by δ : χ → A.
5 / 19
Our problem cast as statistical decision problem
X ⊆ Rd is a compact set. ξ ∼ N(µ, I).
min
x∈X
Eξ[ F(x, ξ) ] = min
x∈X
Eξ[ ξT
x ] = min
x∈X
µT
x
States of Nature: Θ = Rd = {all possible µ ∈ Rd }.
Set of Actions: X ⊆ Rd .
6 / 19
Our problem cast as statistical decision problem
X ⊆ Rd is a compact set. ξ ∼ N(µ, I).
min
x∈X
Eξ[ F(x, ξ) ] = min
x∈X
Eξ[ ξT
x ] = min
x∈X
µT
x
States of Nature: Θ = Rd = {all possible µ ∈ Rd }.
Set of Actions: X ⊆ Rd .
Loss function: ?
6 / 19
Our problem cast as statistical decision problem
X ⊆ Rd is a compact set. ξ ∼ N(µ, I).
min
x∈X
Eξ[ F(x, ξ) ] = min
x∈X
Eξ[ ξT
x ] = min
x∈X
µT
x
States of Nature: Θ = Rd = {all possible µ ∈ Rd }.
Set of Actions: X ⊆ Rd .
Loss function:
L(¯µ, ¯x) = Eξ∼N(¯µ,I)[ F(¯x, ξ) ] − min
x∈X
Eξ∼N(¯µ,I)[ F(x, ξ) ]
6 / 19
Our problem cast as statistical decision problem
X ⊆ Rd is a compact set. ξ ∼ N(µ, I).
min
x∈X
Eξ[ F(x, ξ) ] = min
x∈X
Eξ[ ξT
x ] = min
x∈X
µT
x
States of Nature: Θ = Rd = {all possible µ ∈ Rd }.
Set of Actions: X ⊆ Rd .
Loss function:
L(¯µ, ¯x) = Eξ∼N(¯µ,I)[ F(¯x, ξ) ] − minx∈X Eξ∼N(¯µ,I)[ F(x, ξ) ]
= ¯µT ¯x − ¯µT x(¯µ)
6 / 19
Our problem cast as statistical decision problem
X ⊆ Rd is a compact set. ξ ∼ N(µ, I).
min
x∈X
Eξ[ F(x, ξ) ] = min
x∈X
Eξ[ ξT
x ] = min
x∈X
µT
x
States of Nature: Θ = Rd = {all possible µ ∈ Rd }.
Set of Actions: X ⊆ Rd .
Loss function:
L(¯µ, ¯x) = Eξ∼N(¯µ,I)[ F(¯x, ξ) ] − minx∈X Eξ∼N(¯µ,I)[ F(x, ξ) ]
= ¯µT ¯x − ¯µT x(¯µ)
Sample Space: χ = Rd
× Rd
× . . . × Rd
n times
6 / 19
Our problem cast as statistical decision problem
X ⊆ Rd is a compact set. ξ ∼ N(µ, I).
min
x∈X
Eξ[ F(x, ξ) ] = min
x∈X
Eξ[ ξT
x ] = min
x∈X
µT
x
States of Nature: Θ = Rd = {all possible µ ∈ Rd }.
Set of Actions: X ⊆ Rd .
Loss function:
L(¯µ, ¯x) = Eξ∼N(¯µ,I)[ F(¯x, ξ) ] − minx∈X Eξ∼N(¯µ,I)[ F(x, ξ) ]
= ¯µT ¯x − ¯µT x(¯µ)
Sample Space: χ = Rd
× Rd
× . . . × Rd
n times
Decision Rule: δ : χ → X.
6 / 19
Our problem cast as statistical decision problem
X ⊆ Rd is a compact set. ξ ∼ N(µ, I).
min
x∈X
Eξ[ F(x, ξ) ] = min
x∈X
Eξ[ ξT
x ] = min
x∈X
µT
x
States of Nature: Θ = Rd = {all possible µ ∈ Rd }.
Set of Actions: X ⊆ Rd .
Loss function:
L(¯µ, ¯x) = Eξ∼N(¯µ,I)[ F(¯x, ξ) ] − minx∈X Eξ∼N(¯µ,I)[ F(x, ξ) ]
= ¯µT ¯x − ¯µT x(¯µ)
Sample Space: χ = Rd
× Rd
× . . . × Rd
n times
Decision Rule: δ : χ → X.
SAA: δ(ξ1
, . . . , ξn
) ∈ arg max{ξ
T
x : x ∈ X}
6 / 19
Decision Rules
1 2
34
1 2
34
7 / 19
Decision Rules
1 2
34
7 / 19
Decision Rules
1 2
34
7 / 19
How does one decide between decision rules?
8 / 19
How does one decide between decision rules?
States of nature Θ, Actions A, Loss function L : Θ × A → R,
Sample space χ with distributions {Pθ : θ ∈ Θ}.
8 / 19
How does one decide between decision rules?
States of nature Θ, Actions A, Loss function L : Θ × A → R,
Sample space χ with distributions {Pθ : θ ∈ Θ}.
Given a decision rule δ : χ → A, define the risk function of this
decision rule as:
Rδ(θ) := Ey∼Pθ
[ L(θ, δ(y)) ]
8 / 19
How does one decide between decision rules?
States of nature Θ, Actions A, Loss function L : Θ × A → R,
Sample space χ with distributions {Pθ : θ ∈ Θ}.
Given a decision rule δ : χ → A, define the risk function of this
decision rule as:
Rδ(θ) := Ey∼Pθ
[ L(θ, δ(y)) ]
We say that a decision rule δ dominates a decision rule δ if
Rδ (θ) ≤ Rδ(θ) for all θ ∈ Θ, and Rδ (θ∗) < Rδ(θ∗) for some
θ∗ ∈ Θ.
8 / 19
How does one decide between decision rules?
States of nature Θ, Actions A, Loss function L : Θ × A → R,
Sample space χ with distributions {Pθ : θ ∈ Θ}.
Given a decision rule δ : χ → A, define the risk function of this
decision rule as:
Rδ(θ) := Ey∼Pθ
[ L(θ, δ(y)) ]
We say that a decision rule δ dominates a decision rule δ if
Rδ (θ) ≤ Rδ(θ) for all θ ∈ Θ, and Rδ (θ∗) < Rδ(θ∗) for some
θ∗ ∈ Θ.
If a decision rule δ is not dominated by any other decision rule, we
say that δ is admissible. Otherwise, it is inadmissible.
8 / 19
Is the Sample Average Approximation (SAA) rule admissible?
9 / 19
Admissibility in stochastic optimization
Stochastic optimization setup:
F : X × Rm → R, ξ is a R.V. in Rm
min
x∈X
Eξ[ F(x, ξ) ]
Want to solve with access to n i.i.d. samples of ξ.
10 / 19
Admissibility in stochastic optimization
Stochastic optimization setup:
F : X × Rm → R, ξ is a R.V. in Rm
min
x∈X
Eξ[ F(x, ξ) ]
Want to solve with access to n i.i.d. samples of ξ.
Statistical decision theory view:
ξ ∼ N(µ, I); states of nature Θ = Rm = {all possible µ ∈ Rm}.
Set of actions A = X, Sample space χ = Rd
× Rd
× . . . × Rd
n times
Loss function
L(¯µ, ¯x) = Eξ∼N(¯µ,I)[ F(¯x, ξ) ] − min
x∈X
Eξ∼N(¯µ,I)[ F(x, ξ) ]
Given a decision rule δ : χ → X, the risk of δ
Rδ(µ) := Eξ1,...,ξn [ L(µ, δ(ξ1
, . . . , ξn
)) ]
10 / 19
Admissibility in stochastic optimization
L(¯µ, ¯x) = Eξ∼N(¯µ,I)[ F(¯x, ξ) ] − min
x∈X
Eξ∼N(¯µ,I)[ F(x, ξ) ]
Given a decision rule δ : χ → X, the risk of δ
Rδ(µ) := Eξ1,...,ξn [ L(µ, δ(ξ1
, . . . , ξn
)) ]
Sample Average Approximation (SAA):
min
x∈X
1
n
n
i=1
F(x, ξi
)
11 / 19
Admissibility in stochastic optimization
L(¯µ, ¯x) = Eξ∼N(¯µ,I)[ F(¯x, ξ) ] − min
x∈X
Eξ∼N(¯µ,I)[ F(x, ξ) ]
Given a decision rule δ : χ → X, the risk of δ
Rδ(µ) := Eξ1,...,ξn [ L(µ, δ(ξ1
, . . . , ξn
)) ]
Sample Average Approximation (SAA):
min
x∈X
1
n
n
i=1
F(x, ξi
)
Sample Average Approximation (SAA) can be inadmissible!!
11 / 19
Inadmissibility of SAA: Stein’s Paradox
L(¯µ, ¯x) = Eξ∼N(¯µ,I)[ F(¯x, ξ) ] − min
x∈X
Eξ∼N(¯µ,I)[ F(x, ξ) ]
Sample Average Approximation (SAA) can be inadmissible!!
F(x, ξ) = x − ξ 2, X = Rd , ξ ∼ N(µ, I).
min
x∈Rd
Eξ[ F(x, ξ) ] = min
x∈Rd
Eξ[ x − ξ 2
]
12 / 19
Inadmissibility of SAA: Stein’s Paradox
L(¯µ, ¯x) = Eξ∼N(¯µ,I)[ F(¯x, ξ) ] − min
x∈X
Eξ∼N(¯µ,I)[ F(x, ξ) ]
Sample Average Approximation (SAA) can be inadmissible!!
min
x∈Rd
Eξ∼N(µ,I)[ x − ξ 2
] = min
x∈Rd
x − µ 2
+ V[ ξ ]
Optimal solution: x(¯µ) = ¯µ, Optimal value: V[ ξ ] = d.
L(¯µ, ¯x) = ¯x − ¯µ 2
.
13 / 19
Inadmissibility of SAA: Stein’s Paradox
L(¯µ, ¯x) = Eξ∼N(¯µ,I)[ F(¯x, ξ) ] − min
x∈X
Eξ∼N(¯µ,I)[ F(x, ξ) ]
Sample Average Approximation (SAA) can be inadmissible!!
min
x∈Rd
Eξ∼N(µ,I)[ x − ξ 2
] = min
x∈Rd
x − µ 2
+ V[ ξ ]
Optimal solution: x(¯µ) = ¯µ, Optimal value: V[ ξ ] = d.
L(¯µ, ¯x) = ¯x − ¯µ 2
.
Sample Average Approximation (SAA):
min
x∈Rd
1
n
n
i=1
x − ξi 2
13 / 19
Inadmissibility of SAA: Stein’s Paradox
L(¯µ, ¯x) = Eξ∼N(¯µ,I)[ F(¯x, ξ) ] − min
x∈X
Eξ∼N(¯µ,I)[ F(x, ξ) ]
Sample Average Approximation (SAA) can be inadmissible!!
min
x∈Rd
Eξ∼N(µ,I)[ x − ξ 2
] = min
x∈Rd
x − µ 2
+ V[ ξ ]
Optimal solution: x(¯µ) = ¯µ, Optimal value: V[ ξ ] = d.
L(¯µ, ¯x) = ¯x − ¯µ 2
.
Sample Average Approximation (SAA):
min
x∈Rd
1
n
n
i=1
x − ξi 2
δSAA(ξ1
, . . . , ξn
) = ξ :=
n
i=1
ξi
.
13 / 19
Inadmissibility of SAA: Stein’s Paradox
L(¯µ, ¯x) = Eξ∼N(¯µ,I)[ F(¯x, ξ) ] − min
x∈X
Eξ∼N(¯µ,I)[ F(x, ξ) ]
Sample Average Approximation (SAA) can be inadmissible!!
min
x∈Rd
Eξ∼N(µ,I)[ x − ξ 2
]
Generalized to arbitrary convex quadratic function with uncertain
linear term in Davarnia and Cornu´ejols 2018. Follow-up work from
a Bayesian perspective in Davarnia, Kocuk and Cornu´ejols 2018.
14 / 19
A class of problems with no Stein’s paradox
THEOREM Basu-Nguyen-Sun 2018
Consider the problem of optimizing an uncertain linear objective
ξ ∼ N(µ, I) over a fixed compact set X ⊆ Rd :
min
x∈X
Eξ∼N(µ,I)[ ξT
x ]
The Sample Average Approximation (SAA) rule is admissible.
15 / 19
16 / 19
Main technical ideas/tools
17 / 19
Main technical ideas/tools
Sufficient Statistic: P = {Pθ : θ ∈ Θ} family of distributions for
r.v. y in sample space χ. Sufficient statistic for this family is a
function T : χ → τ such that the conditional probability
P(y|T = t) does not depend on θ.
17 / 19
Main technical ideas/tools
Sufficient Statistic: P = {Pθ : θ ∈ Θ} family of distributions for
r.v. y in sample space χ. Sufficient statistic for this family is a
function T : χ → τ such that the conditional probability
P(y|T = t) does not depend on θ.
FACT:
χ = Rd
× . . . × Rd
n times
, P = {N(µ, I) × . . . × N(µ, I)
n times
: µ ∈ Rd
},
i.e., (ξ1, . . . , ξn) ∈ χ are i.i.d samples from the normal distribution
N(µ, I). Then T(ξ1 . . . , ξn) = ξ := 1
n
n
i=1 ξi is a sufficient
statistic for P.
17 / 19
Main technical ideas/tools
Sufficient Statistic: P = {Pθ : θ ∈ Θ} family of distributions for
r.v. y in sample space χ. Sufficient statistic for this family is a
function T : χ → τ such that the conditional probability
P(y|T = t) does not depend on θ.
THEOREM Rao-Blackwell 1940s
If the loss function is convex in the action space, then for any
decision rule δ, there exists a rule δ which is a function only of a
sufficient statistic and Rδ ≤ Rδ.
17 / 19
Main technical ideas/tools
For any decision rule δ, define the function
F(µ) = Rδ(µ) − RδSAA
(µ).
Suffices to show that there exists ˆµ ∈ Rd such that F(ˆµ) > 0.
17 / 19
Main technical ideas/tools
For any decision rule δ, define the function
F(µ) = Rδ(µ) − RδSAA
(µ).
Suffices to show that there exists ˆµ ∈ Rd such that F(ˆµ) > 0.
First observe: F(0) = 0.
17 / 19
Main technical ideas/tools
For any decision rule δ, define the function
F(µ) = Rδ(µ) − RδSAA
(µ).
Suffices to show that there exists ˆµ ∈ Rd such that F(ˆµ) > 0.
First observe: F(0) = 0.
Then compute 2F(0); show it has a strictly positive eigenvalue.
17 / 19
Main technical ideas/tools
For any decision rule δ, define the function
F(µ) = Rδ(µ) − RδSAA
(µ).
Suffices to show that there exists ˆµ ∈ Rd such that F(ˆµ) > 0.
First observe: F(0) = 0.
Then compute 2F(0); show it has a strictly positive eigenvalue.
Use a fact from probability theory that for any Lebesgue integrable
function f : Rn → Rd , the map
µ → Ey∈N(µ,Σ) [ f (y) ] :=
Rd
f (y) exp −
1
2
(y−µ)T
Σ−1
(y−µ) dy
has derivatives of all orders and these can be computed by taking
the derivative under the integral sign.
17 / 19
Open Questions
What about nonlinear objectives over compact feasible
regions? For example, what if F(x, ξ) = xT Qx + ξT x for
some fixed PSD matrix Q, and X is a compact (convex) set?
18 / 19
Open Questions
What about nonlinear objectives over compact feasible
regions? For example, what if F(x, ξ) = xT Qx + ξT x for
some fixed PSD matrix Q, and X is a compact (convex) set?
What about piecewise linear objectives F(x, ξ)? Recall News
Vendor Problem.
18 / 19
Open Questions
What about nonlinear objectives over compact feasible
regions? For example, what if F(x, ξ) = xT Qx + ξT x for
some fixed PSD matrix Q, and X is a compact (convex) set?
What about piecewise linear objectives F(x, ξ)? Recall News
Vendor Problem.
Objectives coming from machine learning problems, such as
neural network training with squared or logistic loss
(admissibility of “empirical risk minimization”). Maybe this
depends on the hypothesis class that is being learnt?
18 / 19
Open Questions
What about nonlinear objectives over compact feasible
regions? For example, what if F(x, ξ) = xT Qx + ξT x for
some fixed PSD matrix Q, and X is a compact (convex) set?
What about piecewise linear objectives F(x, ξ)? Recall News
Vendor Problem.
Objectives coming from machine learning problems, such as
neural network training with squared or logistic loss
(admissibility of “empirical risk minimization”). Maybe this
depends on the hypothesis class that is being learnt?
METATHEOREM (from G´erard Cornu´ejols): Admissible if and
only if feasible region is bounded !?
18 / 19
THANK YOU !
Questions/Comments ?
19 / 19

More Related Content

What's hot

Lesson 23: Antiderivatives (slides)
Lesson 23: Antiderivatives (slides)Lesson 23: Antiderivatives (slides)
Lesson 23: Antiderivatives (slides)Matthew Leingang
 
Lesson 17: The Method of Lagrange Multipliers
Lesson 17: The Method of Lagrange MultipliersLesson 17: The Method of Lagrange Multipliers
Lesson 17: The Method of Lagrange MultipliersMatthew Leingang
 
Lesson 20: Derivatives and the Shapes of Curves (slides)
Lesson 20: Derivatives and the Shapes of Curves (slides)Lesson 20: Derivatives and the Shapes of Curves (slides)
Lesson 20: Derivatives and the Shapes of Curves (slides)Matthew Leingang
 
Lesson 25: Evaluating Definite Integrals (slides)
Lesson 25: Evaluating Definite Integrals (slides)Lesson 25: Evaluating Definite Integrals (slides)
Lesson 25: Evaluating Definite Integrals (slides)Matthew Leingang
 
Lesson 27: Evaluating Definite Integrals
Lesson 27: Evaluating Definite IntegralsLesson 27: Evaluating Definite Integrals
Lesson 27: Evaluating Definite IntegralsMatthew Leingang
 
Lesson 18: Maximum and Minimum Values (slides)
Lesson 18: Maximum and Minimum Values (slides)Lesson 18: Maximum and Minimum Values (slides)
Lesson 18: Maximum and Minimum Values (slides)Matthew Leingang
 
Lesson 19: Maximum and Minimum Values
Lesson 19: Maximum and Minimum ValuesLesson 19: Maximum and Minimum Values
Lesson 19: Maximum and Minimum ValuesMatthew Leingang
 
Convex Analysis and Duality (based on "Functional Analysis and Optimization" ...
Convex Analysis and Duality (based on "Functional Analysis and Optimization" ...Convex Analysis and Duality (based on "Functional Analysis and Optimization" ...
Convex Analysis and Duality (based on "Functional Analysis and Optimization" ...Katsuya Ito
 
Lesson 21: Curve Sketching (slides)
Lesson 21: Curve Sketching (slides)Lesson 21: Curve Sketching (slides)
Lesson 21: Curve Sketching (slides)Matthew Leingang
 
Function in Mathematics
Function in MathematicsFunction in Mathematics
Function in Mathematicsghhgj jhgh
 
Approximate Bayesian model choice via random forests
Approximate Bayesian model choice via random forestsApproximate Bayesian model choice via random forests
Approximate Bayesian model choice via random forestsChristian Robert
 
Application of partial derivatives with two variables
Application of partial derivatives with two variablesApplication of partial derivatives with two variables
Application of partial derivatives with two variablesSagar Patel
 
Lesson 17: Indeterminate forms and l'Hôpital's Rule (slides)
Lesson 17: Indeterminate forms and l'Hôpital's Rule (slides)Lesson 17: Indeterminate forms and l'Hôpital's Rule (slides)
Lesson 17: Indeterminate forms and l'Hôpital's Rule (slides)Matthew Leingang
 
Bias-variance decomposition in Random Forests
Bias-variance decomposition in Random ForestsBias-variance decomposition in Random Forests
Bias-variance decomposition in Random ForestsGilles Louppe
 
Application of derivatives
Application of derivatives Application of derivatives
Application of derivatives Seyid Kadher
 
Mac2311 study guide-tcm6-49721
Mac2311 study guide-tcm6-49721Mac2311 study guide-tcm6-49721
Mac2311 study guide-tcm6-49721Glicerio Gavilan
 

What's hot (19)

Lesson 23: Antiderivatives (slides)
Lesson 23: Antiderivatives (slides)Lesson 23: Antiderivatives (slides)
Lesson 23: Antiderivatives (slides)
 
Lesson 17: The Method of Lagrange Multipliers
Lesson 17: The Method of Lagrange MultipliersLesson 17: The Method of Lagrange Multipliers
Lesson 17: The Method of Lagrange Multipliers
 
Lesson 20: Derivatives and the Shapes of Curves (slides)
Lesson 20: Derivatives and the Shapes of Curves (slides)Lesson 20: Derivatives and the Shapes of Curves (slides)
Lesson 20: Derivatives and the Shapes of Curves (slides)
 
Lesson 25: Evaluating Definite Integrals (slides)
Lesson 25: Evaluating Definite Integrals (slides)Lesson 25: Evaluating Definite Integrals (slides)
Lesson 25: Evaluating Definite Integrals (slides)
 
Lesson 27: Evaluating Definite Integrals
Lesson 27: Evaluating Definite IntegralsLesson 27: Evaluating Definite Integrals
Lesson 27: Evaluating Definite Integrals
 
Lesson 18: Maximum and Minimum Values (slides)
Lesson 18: Maximum and Minimum Values (slides)Lesson 18: Maximum and Minimum Values (slides)
Lesson 18: Maximum and Minimum Values (slides)
 
Lesson 19: Maximum and Minimum Values
Lesson 19: Maximum and Minimum ValuesLesson 19: Maximum and Minimum Values
Lesson 19: Maximum and Minimum Values
 
Convex Analysis and Duality (based on "Functional Analysis and Optimization" ...
Convex Analysis and Duality (based on "Functional Analysis and Optimization" ...Convex Analysis and Duality (based on "Functional Analysis and Optimization" ...
Convex Analysis and Duality (based on "Functional Analysis and Optimization" ...
 
Lesson 21: Curve Sketching (slides)
Lesson 21: Curve Sketching (slides)Lesson 21: Curve Sketching (slides)
Lesson 21: Curve Sketching (slides)
 
Function in Mathematics
Function in MathematicsFunction in Mathematics
Function in Mathematics
 
Approximate Bayesian model choice via random forests
Approximate Bayesian model choice via random forestsApproximate Bayesian model choice via random forests
Approximate Bayesian model choice via random forests
 
Slides mc gill-v3
Slides mc gill-v3Slides mc gill-v3
Slides mc gill-v3
 
Application of partial derivatives with two variables
Application of partial derivatives with two variablesApplication of partial derivatives with two variables
Application of partial derivatives with two variables
 
Lesson 17: Indeterminate forms and l'Hôpital's Rule (slides)
Lesson 17: Indeterminate forms and l'Hôpital's Rule (slides)Lesson 17: Indeterminate forms and l'Hôpital's Rule (slides)
Lesson 17: Indeterminate forms and l'Hôpital's Rule (slides)
 
Bias-variance decomposition in Random Forests
Bias-variance decomposition in Random ForestsBias-variance decomposition in Random Forests
Bias-variance decomposition in Random Forests
 
Side 2019 #7
Side 2019 #7Side 2019 #7
Side 2019 #7
 
Slides simplexe
Slides simplexeSlides simplexe
Slides simplexe
 
Application of derivatives
Application of derivatives Application of derivatives
Application of derivatives
 
Mac2311 study guide-tcm6-49721
Mac2311 study guide-tcm6-49721Mac2311 study guide-tcm6-49721
Mac2311 study guide-tcm6-49721
 

Similar to Deep Learning Opening Workshop - Admissibility of Solution Estimators in Stochastic Optimization - Amitabh Basu, August 12, 2019

New Insights and Perspectives on the Natural Gradient Method
New Insights and Perspectives on the Natural Gradient MethodNew Insights and Perspectives on the Natural Gradient Method
New Insights and Perspectives on the Natural Gradient MethodYoonho Lee
 
Maximum likelihood estimation of regularisation parameters in inverse problem...
Maximum likelihood estimation of regularisation parameters in inverse problem...Maximum likelihood estimation of regularisation parameters in inverse problem...
Maximum likelihood estimation of regularisation parameters in inverse problem...Valentin De Bortoli
 
Derivative free optimization
Derivative free optimizationDerivative free optimization
Derivative free optimizationhelalmohammad2
 
A Statistical Perspective on Retrieval-Based Models.pdf
A Statistical Perspective on Retrieval-Based Models.pdfA Statistical Perspective on Retrieval-Based Models.pdf
A Statistical Perspective on Retrieval-Based Models.pdfPo-Chuan Chen
 
Classification and regression based on derivatives: a consistency result for ...
Classification and regression based on derivatives: a consistency result for ...Classification and regression based on derivatives: a consistency result for ...
Classification and regression based on derivatives: a consistency result for ...tuxette
 
Kernels and Support Vector Machines
Kernels and Support Vector  MachinesKernels and Support Vector  Machines
Kernels and Support Vector MachinesEdgar Marca
 
Understanding Random Forests: From Theory to Practice
Understanding Random Forests: From Theory to PracticeUnderstanding Random Forests: From Theory to Practice
Understanding Random Forests: From Theory to PracticeGilles Louppe
 
The low-rank basis problem for a matrix subspace
The low-rank basis problem for a matrix subspaceThe low-rank basis problem for a matrix subspace
The low-rank basis problem for a matrix subspaceTasuku Soma
 
Rademacher Averages: Theory and Practice
Rademacher Averages: Theory and PracticeRademacher Averages: Theory and Practice
Rademacher Averages: Theory and PracticeTwo Sigma
 
Bachelor_Defense
Bachelor_DefenseBachelor_Defense
Bachelor_DefenseTeja Turk
 
Applicationofpartialderivativeswithtwovariables 140225070102-phpapp01 (1)
Applicationofpartialderivativeswithtwovariables 140225070102-phpapp01 (1)Applicationofpartialderivativeswithtwovariables 140225070102-phpapp01 (1)
Applicationofpartialderivativeswithtwovariables 140225070102-phpapp01 (1)shreemadghodasra
 
Maximizing Submodular Function over the Integer Lattice
Maximizing Submodular Function over the Integer LatticeMaximizing Submodular Function over the Integer Lattice
Maximizing Submodular Function over the Integer LatticeTasuku Soma
 
Murphy: Machine learning A probabilistic perspective: Ch.9
Murphy: Machine learning A probabilistic perspective: Ch.9Murphy: Machine learning A probabilistic perspective: Ch.9
Murphy: Machine learning A probabilistic perspective: Ch.9Daisuke Yoneoka
 
Higher-Order (F, α, β, ρ, d) –Convexity for Multiobjective Programming Problem
Higher-Order (F, α, β, ρ, d) –Convexity for Multiobjective Programming ProblemHigher-Order (F, α, β, ρ, d) –Convexity for Multiobjective Programming Problem
Higher-Order (F, α, β, ρ, d) –Convexity for Multiobjective Programming Probleminventionjournals
 
Interpretable Sparse Sliced Inverse Regression for digitized functional data
Interpretable Sparse Sliced Inverse Regression for digitized functional dataInterpretable Sparse Sliced Inverse Regression for digitized functional data
Interpretable Sparse Sliced Inverse Regression for digitized functional datatuxette
 
Indefinite Integral
Indefinite IntegralIndefinite Integral
Indefinite IntegralJelaiAujero
 
On the Family of Concept Forming Operators in Polyadic FCA
On the Family of Concept Forming Operators in Polyadic FCAOn the Family of Concept Forming Operators in Polyadic FCA
On the Family of Concept Forming Operators in Polyadic FCADmitrii Ignatov
 

Similar to Deep Learning Opening Workshop - Admissibility of Solution Estimators in Stochastic Optimization - Amitabh Basu, August 12, 2019 (20)

New Insights and Perspectives on the Natural Gradient Method
New Insights and Perspectives on the Natural Gradient MethodNew Insights and Perspectives on the Natural Gradient Method
New Insights and Perspectives on the Natural Gradient Method
 
Maximum likelihood estimation of regularisation parameters in inverse problem...
Maximum likelihood estimation of regularisation parameters in inverse problem...Maximum likelihood estimation of regularisation parameters in inverse problem...
Maximum likelihood estimation of regularisation parameters in inverse problem...
 
Derivative free optimization
Derivative free optimizationDerivative free optimization
Derivative free optimization
 
A Statistical Perspective on Retrieval-Based Models.pdf
A Statistical Perspective on Retrieval-Based Models.pdfA Statistical Perspective on Retrieval-Based Models.pdf
A Statistical Perspective on Retrieval-Based Models.pdf
 
Classification and regression based on derivatives: a consistency result for ...
Classification and regression based on derivatives: a consistency result for ...Classification and regression based on derivatives: a consistency result for ...
Classification and regression based on derivatives: a consistency result for ...
 
Kernels and Support Vector Machines
Kernels and Support Vector  MachinesKernels and Support Vector  Machines
Kernels and Support Vector Machines
 
Understanding Random Forests: From Theory to Practice
Understanding Random Forests: From Theory to PracticeUnderstanding Random Forests: From Theory to Practice
Understanding Random Forests: From Theory to Practice
 
MCQMC 2016 Tutorial
MCQMC 2016 TutorialMCQMC 2016 Tutorial
MCQMC 2016 Tutorial
 
The low-rank basis problem for a matrix subspace
The low-rank basis problem for a matrix subspaceThe low-rank basis problem for a matrix subspace
The low-rank basis problem for a matrix subspace
 
Rademacher Averages: Theory and Practice
Rademacher Averages: Theory and PracticeRademacher Averages: Theory and Practice
Rademacher Averages: Theory and Practice
 
Bachelor_Defense
Bachelor_DefenseBachelor_Defense
Bachelor_Defense
 
Applicationofpartialderivativeswithtwovariables 140225070102-phpapp01 (1)
Applicationofpartialderivativeswithtwovariables 140225070102-phpapp01 (1)Applicationofpartialderivativeswithtwovariables 140225070102-phpapp01 (1)
Applicationofpartialderivativeswithtwovariables 140225070102-phpapp01 (1)
 
Maximizing Submodular Function over the Integer Lattice
Maximizing Submodular Function over the Integer LatticeMaximizing Submodular Function over the Integer Lattice
Maximizing Submodular Function over the Integer Lattice
 
smtlecture.7
smtlecture.7smtlecture.7
smtlecture.7
 
ma112011id535
ma112011id535ma112011id535
ma112011id535
 
Murphy: Machine learning A probabilistic perspective: Ch.9
Murphy: Machine learning A probabilistic perspective: Ch.9Murphy: Machine learning A probabilistic perspective: Ch.9
Murphy: Machine learning A probabilistic perspective: Ch.9
 
Higher-Order (F, α, β, ρ, d) –Convexity for Multiobjective Programming Problem
Higher-Order (F, α, β, ρ, d) –Convexity for Multiobjective Programming ProblemHigher-Order (F, α, β, ρ, d) –Convexity for Multiobjective Programming Problem
Higher-Order (F, α, β, ρ, d) –Convexity for Multiobjective Programming Problem
 
Interpretable Sparse Sliced Inverse Regression for digitized functional data
Interpretable Sparse Sliced Inverse Regression for digitized functional dataInterpretable Sparse Sliced Inverse Regression for digitized functional data
Interpretable Sparse Sliced Inverse Regression for digitized functional data
 
Indefinite Integral
Indefinite IntegralIndefinite Integral
Indefinite Integral
 
On the Family of Concept Forming Operators in Polyadic FCA
On the Family of Concept Forming Operators in Polyadic FCAOn the Family of Concept Forming Operators in Polyadic FCA
On the Family of Concept Forming Operators in Polyadic FCA
 

More from The Statistical and Applied Mathematical Sciences Institute

More from The Statistical and Applied Mathematical Sciences Institute (20)

Causal Inference Opening Workshop - Latent Variable Models, Causal Inference,...
Causal Inference Opening Workshop - Latent Variable Models, Causal Inference,...Causal Inference Opening Workshop - Latent Variable Models, Causal Inference,...
Causal Inference Opening Workshop - Latent Variable Models, Causal Inference,...
 
2019 Fall Series: Special Guest Lecture - 0-1 Phase Transitions in High Dimen...
2019 Fall Series: Special Guest Lecture - 0-1 Phase Transitions in High Dimen...2019 Fall Series: Special Guest Lecture - 0-1 Phase Transitions in High Dimen...
2019 Fall Series: Special Guest Lecture - 0-1 Phase Transitions in High Dimen...
 
Causal Inference Opening Workshop - Causal Discovery in Neuroimaging Data - F...
Causal Inference Opening Workshop - Causal Discovery in Neuroimaging Data - F...Causal Inference Opening Workshop - Causal Discovery in Neuroimaging Data - F...
Causal Inference Opening Workshop - Causal Discovery in Neuroimaging Data - F...
 
Causal Inference Opening Workshop - Smooth Extensions to BART for Heterogeneo...
Causal Inference Opening Workshop - Smooth Extensions to BART for Heterogeneo...Causal Inference Opening Workshop - Smooth Extensions to BART for Heterogeneo...
Causal Inference Opening Workshop - Smooth Extensions to BART for Heterogeneo...
 
Causal Inference Opening Workshop - A Bracketing Relationship between Differe...
Causal Inference Opening Workshop - A Bracketing Relationship between Differe...Causal Inference Opening Workshop - A Bracketing Relationship between Differe...
Causal Inference Opening Workshop - A Bracketing Relationship between Differe...
 
Causal Inference Opening Workshop - Testing Weak Nulls in Matched Observation...
Causal Inference Opening Workshop - Testing Weak Nulls in Matched Observation...Causal Inference Opening Workshop - Testing Weak Nulls in Matched Observation...
Causal Inference Opening Workshop - Testing Weak Nulls in Matched Observation...
 
Causal Inference Opening Workshop - Difference-in-differences: more than meet...
Causal Inference Opening Workshop - Difference-in-differences: more than meet...Causal Inference Opening Workshop - Difference-in-differences: more than meet...
Causal Inference Opening Workshop - Difference-in-differences: more than meet...
 
Causal Inference Opening Workshop - New Statistical Learning Methods for Esti...
Causal Inference Opening Workshop - New Statistical Learning Methods for Esti...Causal Inference Opening Workshop - New Statistical Learning Methods for Esti...
Causal Inference Opening Workshop - New Statistical Learning Methods for Esti...
 
Causal Inference Opening Workshop - Bipartite Causal Inference with Interfere...
Causal Inference Opening Workshop - Bipartite Causal Inference with Interfere...Causal Inference Opening Workshop - Bipartite Causal Inference with Interfere...
Causal Inference Opening Workshop - Bipartite Causal Inference with Interfere...
 
Causal Inference Opening Workshop - Bridging the Gap Between Causal Literatur...
Causal Inference Opening Workshop - Bridging the Gap Between Causal Literatur...Causal Inference Opening Workshop - Bridging the Gap Between Causal Literatur...
Causal Inference Opening Workshop - Bridging the Gap Between Causal Literatur...
 
Causal Inference Opening Workshop - Some Applications of Reinforcement Learni...
Causal Inference Opening Workshop - Some Applications of Reinforcement Learni...Causal Inference Opening Workshop - Some Applications of Reinforcement Learni...
Causal Inference Opening Workshop - Some Applications of Reinforcement Learni...
 
Causal Inference Opening Workshop - Bracketing Bounds for Differences-in-Diff...
Causal Inference Opening Workshop - Bracketing Bounds for Differences-in-Diff...Causal Inference Opening Workshop - Bracketing Bounds for Differences-in-Diff...
Causal Inference Opening Workshop - Bracketing Bounds for Differences-in-Diff...
 
Causal Inference Opening Workshop - Assisting the Impact of State Polcies: Br...
Causal Inference Opening Workshop - Assisting the Impact of State Polcies: Br...Causal Inference Opening Workshop - Assisting the Impact of State Polcies: Br...
Causal Inference Opening Workshop - Assisting the Impact of State Polcies: Br...
 
Causal Inference Opening Workshop - Experimenting in Equilibrium - Stefan Wag...
Causal Inference Opening Workshop - Experimenting in Equilibrium - Stefan Wag...Causal Inference Opening Workshop - Experimenting in Equilibrium - Stefan Wag...
Causal Inference Opening Workshop - Experimenting in Equilibrium - Stefan Wag...
 
Causal Inference Opening Workshop - Targeted Learning for Causal Inference Ba...
Causal Inference Opening Workshop - Targeted Learning for Causal Inference Ba...Causal Inference Opening Workshop - Targeted Learning for Causal Inference Ba...
Causal Inference Opening Workshop - Targeted Learning for Causal Inference Ba...
 
Causal Inference Opening Workshop - Bayesian Nonparametric Models for Treatme...
Causal Inference Opening Workshop - Bayesian Nonparametric Models for Treatme...Causal Inference Opening Workshop - Bayesian Nonparametric Models for Treatme...
Causal Inference Opening Workshop - Bayesian Nonparametric Models for Treatme...
 
2019 Fall Series: Special Guest Lecture - Adversarial Risk Analysis of the Ge...
2019 Fall Series: Special Guest Lecture - Adversarial Risk Analysis of the Ge...2019 Fall Series: Special Guest Lecture - Adversarial Risk Analysis of the Ge...
2019 Fall Series: Special Guest Lecture - Adversarial Risk Analysis of the Ge...
 
2019 Fall Series: Professional Development, Writing Academic Papers…What Work...
2019 Fall Series: Professional Development, Writing Academic Papers…What Work...2019 Fall Series: Professional Development, Writing Academic Papers…What Work...
2019 Fall Series: Professional Development, Writing Academic Papers…What Work...
 
2019 GDRR: Blockchain Data Analytics - Machine Learning in/for Blockchain: Fu...
2019 GDRR: Blockchain Data Analytics - Machine Learning in/for Blockchain: Fu...2019 GDRR: Blockchain Data Analytics - Machine Learning in/for Blockchain: Fu...
2019 GDRR: Blockchain Data Analytics - Machine Learning in/for Blockchain: Fu...
 
2019 GDRR: Blockchain Data Analytics - QuTrack: Model Life Cycle Management f...
2019 GDRR: Blockchain Data Analytics - QuTrack: Model Life Cycle Management f...2019 GDRR: Blockchain Data Analytics - QuTrack: Model Life Cycle Management f...
2019 GDRR: Blockchain Data Analytics - QuTrack: Model Life Cycle Management f...
 

Recently uploaded

psychiatric nursing HISTORY COLLECTION .docx
psychiatric  nursing HISTORY  COLLECTION  .docxpsychiatric  nursing HISTORY  COLLECTION  .docx
psychiatric nursing HISTORY COLLECTION .docxPoojaSen20
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingTechSoup
 
SECOND SEMESTER TOPIC COVERAGE SY 2023-2024 Trends, Networks, and Critical Th...
SECOND SEMESTER TOPIC COVERAGE SY 2023-2024 Trends, Networks, and Critical Th...SECOND SEMESTER TOPIC COVERAGE SY 2023-2024 Trends, Networks, and Critical Th...
SECOND SEMESTER TOPIC COVERAGE SY 2023-2024 Trends, Networks, and Critical Th...KokoStevan
 
Measures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDMeasures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDThiyagu K
 
This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.christianmathematics
 
Mixin Classes in Odoo 17 How to Extend Models Using Mixin Classes
Mixin Classes in Odoo 17  How to Extend Models Using Mixin ClassesMixin Classes in Odoo 17  How to Extend Models Using Mixin Classes
Mixin Classes in Odoo 17 How to Extend Models Using Mixin ClassesCeline George
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxheathfieldcps1
 
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...christianmathematics
 
Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Celine George
 
Application orientated numerical on hev.ppt
Application orientated numerical on hev.pptApplication orientated numerical on hev.ppt
Application orientated numerical on hev.pptRamjanShidvankar
 
Ecological Succession. ( ECOSYSTEM, B. Pharmacy, 1st Year, Sem-II, Environmen...
Ecological Succession. ( ECOSYSTEM, B. Pharmacy, 1st Year, Sem-II, Environmen...Ecological Succession. ( ECOSYSTEM, B. Pharmacy, 1st Year, Sem-II, Environmen...
Ecological Succession. ( ECOSYSTEM, B. Pharmacy, 1st Year, Sem-II, Environmen...Shubhangi Sonawane
 
An Overview of Mutual Funds Bcom Project.pdf
An Overview of Mutual Funds Bcom Project.pdfAn Overview of Mutual Funds Bcom Project.pdf
An Overview of Mutual Funds Bcom Project.pdfSanaAli374401
 
Making and Justifying Mathematical Decisions.pdf
Making and Justifying Mathematical Decisions.pdfMaking and Justifying Mathematical Decisions.pdf
Making and Justifying Mathematical Decisions.pdfChris Hunter
 
Introduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsIntroduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsTechSoup
 
Seal of Good Local Governance (SGLG) 2024Final.pptx
Seal of Good Local Governance (SGLG) 2024Final.pptxSeal of Good Local Governance (SGLG) 2024Final.pptx
Seal of Good Local Governance (SGLG) 2024Final.pptxnegromaestrong
 
Measures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeMeasures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeThiyagu K
 
Unit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptxUnit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptxVishalSingh1417
 
fourth grading exam for kindergarten in writing
fourth grading exam for kindergarten in writingfourth grading exam for kindergarten in writing
fourth grading exam for kindergarten in writingTeacherCyreneCayanan
 
Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..Disha Kariya
 

Recently uploaded (20)

psychiatric nursing HISTORY COLLECTION .docx
psychiatric  nursing HISTORY  COLLECTION  .docxpsychiatric  nursing HISTORY  COLLECTION  .docx
psychiatric nursing HISTORY COLLECTION .docx
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy Consulting
 
SECOND SEMESTER TOPIC COVERAGE SY 2023-2024 Trends, Networks, and Critical Th...
SECOND SEMESTER TOPIC COVERAGE SY 2023-2024 Trends, Networks, and Critical Th...SECOND SEMESTER TOPIC COVERAGE SY 2023-2024 Trends, Networks, and Critical Th...
SECOND SEMESTER TOPIC COVERAGE SY 2023-2024 Trends, Networks, and Critical Th...
 
Measures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDMeasures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SD
 
This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.
 
Mixin Classes in Odoo 17 How to Extend Models Using Mixin Classes
Mixin Classes in Odoo 17  How to Extend Models Using Mixin ClassesMixin Classes in Odoo 17  How to Extend Models Using Mixin Classes
Mixin Classes in Odoo 17 How to Extend Models Using Mixin Classes
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptx
 
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
 
Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17
 
Application orientated numerical on hev.ppt
Application orientated numerical on hev.pptApplication orientated numerical on hev.ppt
Application orientated numerical on hev.ppt
 
Ecological Succession. ( ECOSYSTEM, B. Pharmacy, 1st Year, Sem-II, Environmen...
Ecological Succession. ( ECOSYSTEM, B. Pharmacy, 1st Year, Sem-II, Environmen...Ecological Succession. ( ECOSYSTEM, B. Pharmacy, 1st Year, Sem-II, Environmen...
Ecological Succession. ( ECOSYSTEM, B. Pharmacy, 1st Year, Sem-II, Environmen...
 
An Overview of Mutual Funds Bcom Project.pdf
An Overview of Mutual Funds Bcom Project.pdfAn Overview of Mutual Funds Bcom Project.pdf
An Overview of Mutual Funds Bcom Project.pdf
 
Making and Justifying Mathematical Decisions.pdf
Making and Justifying Mathematical Decisions.pdfMaking and Justifying Mathematical Decisions.pdf
Making and Justifying Mathematical Decisions.pdf
 
Introduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsIntroduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The Basics
 
Seal of Good Local Governance (SGLG) 2024Final.pptx
Seal of Good Local Governance (SGLG) 2024Final.pptxSeal of Good Local Governance (SGLG) 2024Final.pptx
Seal of Good Local Governance (SGLG) 2024Final.pptx
 
Measures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeMeasures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and Mode
 
Unit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptxUnit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptx
 
fourth grading exam for kindergarten in writing
fourth grading exam for kindergarten in writingfourth grading exam for kindergarten in writing
fourth grading exam for kindergarten in writing
 
Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1
 
Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..
 

Deep Learning Opening Workshop - Admissibility of Solution Estimators in Stochastic Optimization - Amitabh Basu, August 12, 2019

  • 1. Admissibility of solution estimators to stochastic optimization problems Amitabh Basu Joint Work with Tu Nguyen and Ao Sun Foundations of Deep Learning, Opening Workshop, SAMSI, Durham, August 2019 1 / 19
  • 2. A general stochastic optimization problem F : X × Rm → R ξ is a random variable taking values in Rm min x∈X Eξ[ F(x, ξ) ] 2 / 19
  • 3. A general stochastic optimization problem F : X × Rm → R ξ is a random variable taking values in Rm min x∈X Eξ[ F(x, ξ) ] Example: Supervised Machine Learning: One sees samples (z, y) ∈ Rn × R of labeled data from some (joint) distribution, and one aims to find a function f ∈ F in a hypothesis class F that minimizes the expected loss E(z,y)[ (f (z), y)], where : R × R → R+ is some loss function. Then X = F, m = n + 1, ξ = (z, y), and F(f , (z, y)) = (f (z), y). 2 / 19
  • 4. A general stochastic optimization problem F : X × Rm → R ξ is a random variable taking values in Rm min x∈X Eξ[ F(x, ξ) ] Example: (News) Vendor Problem: (News) Vendor buys some units of a product (newspapers) from supplier at cost of c > 0 dollars/unit; at most u units available. Stochastic demand for product. Product sold at price p > c dollars/unit. End of day, vendor can return unsold product to supplier at r < c dollars/unit. Find number of units to buy to maximize (minimize) the expected profit (loss). 2 / 19
  • 5. A general stochastic optimization problem F : X × Rm → R ξ is a random variable taking values in Rm min x∈X Eξ[ F(x, ξ) ] Example: (News) Vendor Problem: (News) Vendor buys some units of a product (newspapers) from supplier at cost of c > 0 dollars/unit; at most u units available. Stochastic demand for product. Product sold at price p > c dollars/unit. End of day, vendor can return unsold product to supplier at r < c dollars/unit. Find number of units to buy to maximize (minimize) the expected profit (loss). m = 1, X = [0, u], F(x, ξ) = cx − p min{x, ξ} − r max{x − ξ, 0}. 2 / 19
  • 6. Solving the problem F : X × Rm → R ξ is a random variable taking values in Rm min x∈X Eξ[ F(x, ξ) ] 3 / 19
  • 7. Solving the problem F : X × Rm → R ξ is a random variable taking values in Rm min x∈X Eξ[ F(x, ξ) ] Solve the problem only given access to n i.i.d. samples of ξ. 3 / 19
  • 8. Solving the problem F : X × Rm → R ξ is a random variable taking values in Rm min x∈X Eξ[ F(x, ξ) ] Solve the problem only given access to n i.i.d. samples of ξ. Natural idea: Given samples ξ1, . . . , ξn ∈ Rd , solve the deterministic problem min x∈X 1 n n i=1 F(x, ξi ) 3 / 19
  • 9. Solving the problem F : X × Rm → R ξ is a random variable taking values in Rm min x∈X Eξ[ F(x, ξ) ] Solve the problem only given access to n i.i.d. samples of ξ. Natural idea: Given samples ξ1, . . . , ξn ∈ Rd , solve the deterministic problem min x∈X 1 n n i=1 F(x, ξi ) Stochastic optimizers call this sample average approximation (SAA); machine learners call this empirical risk minimization. 3 / 19
  • 10. Concrete Problem F(x, ξ) = ξT x X ⊆ Rd is a compact set (e.g., polytope, integer points in a polytope). So m = d. ξ ∼ N(µ, Σ). min x∈X Eξ[ F(x, ξ) ] = min x∈X Eξ[ ξT x ] = min x∈X µT x 4 / 19
  • 11. Concrete Problem F(x, ξ) = ξT x X ⊆ Rd is a compact set (e.g., polytope, integer points in a polytope). So m = d. ξ ∼ N(µ, Σ). min x∈X Eξ[ F(x, ξ) ] = min x∈X Eξ[ ξT x ] = min x∈X µT x Solve the problem only given access to n i.i.d. samples of ξ. Important: µ is unknown. 4 / 19
  • 12. Concrete Problem F(x, ξ) = ξT x X ⊆ Rd is a compact set (e.g., polytope, integer points in a polytope). So m = d. ξ ∼ N(µ, Σ). min x∈X Eξ[ F(x, ξ) ] = min x∈X Eξ[ ξT x ] = min x∈X µT x Solve the problem only given access to n i.i.d. samples of ξ. Important: µ is unknown. Sample Average Approximation (SAA): min x∈X 1 n n i=1 F(x, ξi ) = min x∈X ξ T x where ξ := 1 n n i=1 ξi . 4 / 19
  • 13. A quick tour of Statistical Decision Theory Set of states of nature, modeled by a set Θ. Set of possible actions to take, modeled by A. In a particular state of nature θ ∈ Θ, the performance of any action a ∈ A, is evaluated by a loss function L(θ, a). Goal: choose action to minimize loss. (Partial/Incomplete) Information about θ is obtained through a random variable y taking values in a sample space χ. The distribution of y depends on the particular state of nature θ, denoted by Pθ. 5 / 19
  • 14. A quick tour of Statistical Decision Theory Set of states of nature, modeled by a set Θ. Set of possible actions to take, modeled by A. In a particular state of nature θ ∈ Θ, the performance of any action a ∈ A, is evaluated by a loss function L(θ, a). Goal: choose action to minimize loss. (Partial/Incomplete) Information about θ is obtained through a random variable y taking values in a sample space χ. The distribution of y depends on the particular state of nature θ, denoted by Pθ. Decision Rule: Takes y ∈ χ as input and reports an action a ∈ A. Denote by δ : χ → A. 5 / 19
  • 15. Our problem cast as statistical decision problem X ⊆ Rd is a compact set. ξ ∼ N(µ, I). min x∈X Eξ[ F(x, ξ) ] = min x∈X Eξ[ ξT x ] = min x∈X µT x States of Nature: Θ = Rd = {all possible µ ∈ Rd }. Set of Actions: X ⊆ Rd . 6 / 19
  • 16. Our problem cast as statistical decision problem X ⊆ Rd is a compact set. ξ ∼ N(µ, I). min x∈X Eξ[ F(x, ξ) ] = min x∈X Eξ[ ξT x ] = min x∈X µT x States of Nature: Θ = Rd = {all possible µ ∈ Rd }. Set of Actions: X ⊆ Rd . Loss function: ? 6 / 19
  • 17. Our problem cast as statistical decision problem X ⊆ Rd is a compact set. ξ ∼ N(µ, I). min x∈X Eξ[ F(x, ξ) ] = min x∈X Eξ[ ξT x ] = min x∈X µT x States of Nature: Θ = Rd = {all possible µ ∈ Rd }. Set of Actions: X ⊆ Rd . Loss function: L(¯µ, ¯x) = Eξ∼N(¯µ,I)[ F(¯x, ξ) ] − min x∈X Eξ∼N(¯µ,I)[ F(x, ξ) ] 6 / 19
  • 18. Our problem cast as statistical decision problem X ⊆ Rd is a compact set. ξ ∼ N(µ, I). min x∈X Eξ[ F(x, ξ) ] = min x∈X Eξ[ ξT x ] = min x∈X µT x States of Nature: Θ = Rd = {all possible µ ∈ Rd }. Set of Actions: X ⊆ Rd . Loss function: L(¯µ, ¯x) = Eξ∼N(¯µ,I)[ F(¯x, ξ) ] − minx∈X Eξ∼N(¯µ,I)[ F(x, ξ) ] = ¯µT ¯x − ¯µT x(¯µ) 6 / 19
  • 19. Our problem cast as statistical decision problem X ⊆ Rd is a compact set. ξ ∼ N(µ, I). min x∈X Eξ[ F(x, ξ) ] = min x∈X Eξ[ ξT x ] = min x∈X µT x States of Nature: Θ = Rd = {all possible µ ∈ Rd }. Set of Actions: X ⊆ Rd . Loss function: L(¯µ, ¯x) = Eξ∼N(¯µ,I)[ F(¯x, ξ) ] − minx∈X Eξ∼N(¯µ,I)[ F(x, ξ) ] = ¯µT ¯x − ¯µT x(¯µ) Sample Space: χ = Rd × Rd × . . . × Rd n times 6 / 19
  • 20. Our problem cast as statistical decision problem X ⊆ Rd is a compact set. ξ ∼ N(µ, I). min x∈X Eξ[ F(x, ξ) ] = min x∈X Eξ[ ξT x ] = min x∈X µT x States of Nature: Θ = Rd = {all possible µ ∈ Rd }. Set of Actions: X ⊆ Rd . Loss function: L(¯µ, ¯x) = Eξ∼N(¯µ,I)[ F(¯x, ξ) ] − minx∈X Eξ∼N(¯µ,I)[ F(x, ξ) ] = ¯µT ¯x − ¯µT x(¯µ) Sample Space: χ = Rd × Rd × . . . × Rd n times Decision Rule: δ : χ → X. 6 / 19
  • 21. Our problem cast as statistical decision problem X ⊆ Rd is a compact set. ξ ∼ N(µ, I). min x∈X Eξ[ F(x, ξ) ] = min x∈X Eξ[ ξT x ] = min x∈X µT x States of Nature: Θ = Rd = {all possible µ ∈ Rd }. Set of Actions: X ⊆ Rd . Loss function: L(¯µ, ¯x) = Eξ∼N(¯µ,I)[ F(¯x, ξ) ] − minx∈X Eξ∼N(¯µ,I)[ F(x, ξ) ] = ¯µT ¯x − ¯µT x(¯µ) Sample Space: χ = Rd × Rd × . . . × Rd n times Decision Rule: δ : χ → X. SAA: δ(ξ1 , . . . , ξn ) ∈ arg max{ξ T x : x ∈ X} 6 / 19
  • 22. Decision Rules 1 2 34 1 2 34 7 / 19
  • 25. How does one decide between decision rules? 8 / 19
  • 26. How does one decide between decision rules? States of nature Θ, Actions A, Loss function L : Θ × A → R, Sample space χ with distributions {Pθ : θ ∈ Θ}. 8 / 19
  • 27. How does one decide between decision rules? States of nature Θ, Actions A, Loss function L : Θ × A → R, Sample space χ with distributions {Pθ : θ ∈ Θ}. Given a decision rule δ : χ → A, define the risk function of this decision rule as: Rδ(θ) := Ey∼Pθ [ L(θ, δ(y)) ] 8 / 19
  • 28. How does one decide between decision rules? States of nature Θ, Actions A, Loss function L : Θ × A → R, Sample space χ with distributions {Pθ : θ ∈ Θ}. Given a decision rule δ : χ → A, define the risk function of this decision rule as: Rδ(θ) := Ey∼Pθ [ L(θ, δ(y)) ] We say that a decision rule δ dominates a decision rule δ if Rδ (θ) ≤ Rδ(θ) for all θ ∈ Θ, and Rδ (θ∗) < Rδ(θ∗) for some θ∗ ∈ Θ. 8 / 19
  • 29. How does one decide between decision rules? States of nature Θ, Actions A, Loss function L : Θ × A → R, Sample space χ with distributions {Pθ : θ ∈ Θ}. Given a decision rule δ : χ → A, define the risk function of this decision rule as: Rδ(θ) := Ey∼Pθ [ L(θ, δ(y)) ] We say that a decision rule δ dominates a decision rule δ if Rδ (θ) ≤ Rδ(θ) for all θ ∈ Θ, and Rδ (θ∗) < Rδ(θ∗) for some θ∗ ∈ Θ. If a decision rule δ is not dominated by any other decision rule, we say that δ is admissible. Otherwise, it is inadmissible. 8 / 19
  • 30. Is the Sample Average Approximation (SAA) rule admissible? 9 / 19
  • 31. Admissibility in stochastic optimization Stochastic optimization setup: F : X × Rm → R, ξ is a R.V. in Rm min x∈X Eξ[ F(x, ξ) ] Want to solve with access to n i.i.d. samples of ξ. 10 / 19
  • 32. Admissibility in stochastic optimization Stochastic optimization setup: F : X × Rm → R, ξ is a R.V. in Rm min x∈X Eξ[ F(x, ξ) ] Want to solve with access to n i.i.d. samples of ξ. Statistical decision theory view: ξ ∼ N(µ, I); states of nature Θ = Rm = {all possible µ ∈ Rm}. Set of actions A = X, Sample space χ = Rd × Rd × . . . × Rd n times Loss function L(¯µ, ¯x) = Eξ∼N(¯µ,I)[ F(¯x, ξ) ] − min x∈X Eξ∼N(¯µ,I)[ F(x, ξ) ] Given a decision rule δ : χ → X, the risk of δ Rδ(µ) := Eξ1,...,ξn [ L(µ, δ(ξ1 , . . . , ξn )) ] 10 / 19
  • 33. Admissibility in stochastic optimization L(¯µ, ¯x) = Eξ∼N(¯µ,I)[ F(¯x, ξ) ] − min x∈X Eξ∼N(¯µ,I)[ F(x, ξ) ] Given a decision rule δ : χ → X, the risk of δ Rδ(µ) := Eξ1,...,ξn [ L(µ, δ(ξ1 , . . . , ξn )) ] Sample Average Approximation (SAA): min x∈X 1 n n i=1 F(x, ξi ) 11 / 19
  • 34. Admissibility in stochastic optimization L(¯µ, ¯x) = Eξ∼N(¯µ,I)[ F(¯x, ξ) ] − min x∈X Eξ∼N(¯µ,I)[ F(x, ξ) ] Given a decision rule δ : χ → X, the risk of δ Rδ(µ) := Eξ1,...,ξn [ L(µ, δ(ξ1 , . . . , ξn )) ] Sample Average Approximation (SAA): min x∈X 1 n n i=1 F(x, ξi ) Sample Average Approximation (SAA) can be inadmissible!! 11 / 19
  • 35. Inadmissibility of SAA: Stein’s Paradox L(¯µ, ¯x) = Eξ∼N(¯µ,I)[ F(¯x, ξ) ] − min x∈X Eξ∼N(¯µ,I)[ F(x, ξ) ] Sample Average Approximation (SAA) can be inadmissible!! F(x, ξ) = x − ξ 2, X = Rd , ξ ∼ N(µ, I). min x∈Rd Eξ[ F(x, ξ) ] = min x∈Rd Eξ[ x − ξ 2 ] 12 / 19
  • 36. Inadmissibility of SAA: Stein’s Paradox L(¯µ, ¯x) = Eξ∼N(¯µ,I)[ F(¯x, ξ) ] − min x∈X Eξ∼N(¯µ,I)[ F(x, ξ) ] Sample Average Approximation (SAA) can be inadmissible!! min x∈Rd Eξ∼N(µ,I)[ x − ξ 2 ] = min x∈Rd x − µ 2 + V[ ξ ] Optimal solution: x(¯µ) = ¯µ, Optimal value: V[ ξ ] = d. L(¯µ, ¯x) = ¯x − ¯µ 2 . 13 / 19
  • 37. Inadmissibility of SAA: Stein’s Paradox L(¯µ, ¯x) = Eξ∼N(¯µ,I)[ F(¯x, ξ) ] − min x∈X Eξ∼N(¯µ,I)[ F(x, ξ) ] Sample Average Approximation (SAA) can be inadmissible!! min x∈Rd Eξ∼N(µ,I)[ x − ξ 2 ] = min x∈Rd x − µ 2 + V[ ξ ] Optimal solution: x(¯µ) = ¯µ, Optimal value: V[ ξ ] = d. L(¯µ, ¯x) = ¯x − ¯µ 2 . Sample Average Approximation (SAA): min x∈Rd 1 n n i=1 x − ξi 2 13 / 19
  • 38. Inadmissibility of SAA: Stein’s Paradox L(¯µ, ¯x) = Eξ∼N(¯µ,I)[ F(¯x, ξ) ] − min x∈X Eξ∼N(¯µ,I)[ F(x, ξ) ] Sample Average Approximation (SAA) can be inadmissible!! min x∈Rd Eξ∼N(µ,I)[ x − ξ 2 ] = min x∈Rd x − µ 2 + V[ ξ ] Optimal solution: x(¯µ) = ¯µ, Optimal value: V[ ξ ] = d. L(¯µ, ¯x) = ¯x − ¯µ 2 . Sample Average Approximation (SAA): min x∈Rd 1 n n i=1 x − ξi 2 δSAA(ξ1 , . . . , ξn ) = ξ := n i=1 ξi . 13 / 19
  • 39. Inadmissibility of SAA: Stein’s Paradox L(¯µ, ¯x) = Eξ∼N(¯µ,I)[ F(¯x, ξ) ] − min x∈X Eξ∼N(¯µ,I)[ F(x, ξ) ] Sample Average Approximation (SAA) can be inadmissible!! min x∈Rd Eξ∼N(µ,I)[ x − ξ 2 ] Generalized to arbitrary convex quadratic function with uncertain linear term in Davarnia and Cornu´ejols 2018. Follow-up work from a Bayesian perspective in Davarnia, Kocuk and Cornu´ejols 2018. 14 / 19
  • 40. A class of problems with no Stein’s paradox THEOREM Basu-Nguyen-Sun 2018 Consider the problem of optimizing an uncertain linear objective ξ ∼ N(µ, I) over a fixed compact set X ⊆ Rd : min x∈X Eξ∼N(µ,I)[ ξT x ] The Sample Average Approximation (SAA) rule is admissible. 15 / 19
  • 43. Main technical ideas/tools Sufficient Statistic: P = {Pθ : θ ∈ Θ} family of distributions for r.v. y in sample space χ. Sufficient statistic for this family is a function T : χ → τ such that the conditional probability P(y|T = t) does not depend on θ. 17 / 19
  • 44. Main technical ideas/tools Sufficient Statistic: P = {Pθ : θ ∈ Θ} family of distributions for r.v. y in sample space χ. Sufficient statistic for this family is a function T : χ → τ such that the conditional probability P(y|T = t) does not depend on θ. FACT: χ = Rd × . . . × Rd n times , P = {N(µ, I) × . . . × N(µ, I) n times : µ ∈ Rd }, i.e., (ξ1, . . . , ξn) ∈ χ are i.i.d samples from the normal distribution N(µ, I). Then T(ξ1 . . . , ξn) = ξ := 1 n n i=1 ξi is a sufficient statistic for P. 17 / 19
  • 45. Main technical ideas/tools Sufficient Statistic: P = {Pθ : θ ∈ Θ} family of distributions for r.v. y in sample space χ. Sufficient statistic for this family is a function T : χ → τ such that the conditional probability P(y|T = t) does not depend on θ. THEOREM Rao-Blackwell 1940s If the loss function is convex in the action space, then for any decision rule δ, there exists a rule δ which is a function only of a sufficient statistic and Rδ ≤ Rδ. 17 / 19
  • 46. Main technical ideas/tools For any decision rule δ, define the function F(µ) = Rδ(µ) − RδSAA (µ). Suffices to show that there exists ˆµ ∈ Rd such that F(ˆµ) > 0. 17 / 19
  • 47. Main technical ideas/tools For any decision rule δ, define the function F(µ) = Rδ(µ) − RδSAA (µ). Suffices to show that there exists ˆµ ∈ Rd such that F(ˆµ) > 0. First observe: F(0) = 0. 17 / 19
  • 48. Main technical ideas/tools For any decision rule δ, define the function F(µ) = Rδ(µ) − RδSAA (µ). Suffices to show that there exists ˆµ ∈ Rd such that F(ˆµ) > 0. First observe: F(0) = 0. Then compute 2F(0); show it has a strictly positive eigenvalue. 17 / 19
  • 49. Main technical ideas/tools For any decision rule δ, define the function F(µ) = Rδ(µ) − RδSAA (µ). Suffices to show that there exists ˆµ ∈ Rd such that F(ˆµ) > 0. First observe: F(0) = 0. Then compute 2F(0); show it has a strictly positive eigenvalue. Use a fact from probability theory that for any Lebesgue integrable function f : Rn → Rd , the map µ → Ey∈N(µ,Σ) [ f (y) ] := Rd f (y) exp − 1 2 (y−µ)T Σ−1 (y−µ) dy has derivatives of all orders and these can be computed by taking the derivative under the integral sign. 17 / 19
  • 50. Open Questions What about nonlinear objectives over compact feasible regions? For example, what if F(x, ξ) = xT Qx + ξT x for some fixed PSD matrix Q, and X is a compact (convex) set? 18 / 19
  • 51. Open Questions What about nonlinear objectives over compact feasible regions? For example, what if F(x, ξ) = xT Qx + ξT x for some fixed PSD matrix Q, and X is a compact (convex) set? What about piecewise linear objectives F(x, ξ)? Recall News Vendor Problem. 18 / 19
  • 52. Open Questions What about nonlinear objectives over compact feasible regions? For example, what if F(x, ξ) = xT Qx + ξT x for some fixed PSD matrix Q, and X is a compact (convex) set? What about piecewise linear objectives F(x, ξ)? Recall News Vendor Problem. Objectives coming from machine learning problems, such as neural network training with squared or logistic loss (admissibility of “empirical risk minimization”). Maybe this depends on the hypothesis class that is being learnt? 18 / 19
  • 53. Open Questions What about nonlinear objectives over compact feasible regions? For example, what if F(x, ξ) = xT Qx + ξT x for some fixed PSD matrix Q, and X is a compact (convex) set? What about piecewise linear objectives F(x, ξ)? Recall News Vendor Problem. Objectives coming from machine learning problems, such as neural network training with squared or logistic loss (admissibility of “empirical risk minimization”). Maybe this depends on the hypothesis class that is being learnt? METATHEOREM (from G´erard Cornu´ejols): Admissible if and only if feasible region is bounded !? 18 / 19