SlideShare a Scribd company logo
Statistical inference using
stochastic gradient descent
Constantine Caramanis1
Liu Liu1 Anastasios (Tasos) Kyrillidis2 Tianyang Li1
1The University of Texas at Austin
2IBM T.J. Watson Research Center, Yorktown Heights → Rice University
Statistical inference is important
Quantifying uncertainty
Signal? Noise?
Skill? Luck?
Frequentist inference
confidence interval
hypothesis testing
Statistical inference is important
Quantifying uncertainty
Signal? Noise?
Skill? Luck?
Frequentist inference
confidence interval
hypothesis testing
Confidence intervals can be used to detect adversarial
attacks.
Outline of This Work
(a) Large Scale Problems: Point Estimates computed via SGD
(b) Confidence Intervals computed by Boostrap: too expensive.
(c) This talk: we can compute using SGD.
(d) Application to adversarial attacks: implicitly learning the
manifold.
SGD in ERM – mini batch SGD
To solve empirical risk minimization (ERM)
f (θ) =
1
n
n
i=1
fi (θ),
where fi (θ) = θ(Zi ).
At each step:
Draw S i.i.d. uniformly random indices It from [n] (with
replacement)
Compute stochastic gradient gs(θt) = 1
S i∈It
fi (θt)
θt+1 = θt − ηgs(θt)
Asymptotic normality – classical results
M-estimator – statistics
When number of samples n → ∞,
√
n(θ − θ∗
) N(0, H∗−1
G∗
H∗−1
),
where G∗ = EZ [ θ θ∗ (Z) θ θ∗ (Z) ] and H∗ = EZ [ 2
θ θ∗ (Z)].
Stochastic approximation – optimization
When number of steps t → ∞,
√
t
1
t
t
i=1
θt − θ N(0, H−1
GH−1
),
where G = E[gs(θ)gs(θ) |= θ] and H = 2f (θ).
Asymptotic normality – classical results
M-estimator – statistics
When number of samples n → ∞,
√
n(θ − θ∗
) N(0, H∗−1
G∗
H∗−1
),
where G∗ = EZ [ θ θ∗ (Z) θ θ∗ (Z) ] and H∗ = EZ [ 2
θ θ∗ (Z)].
Stochastic approximation – optimization
When number of steps t → ∞,
√
t
1
t
t
i=1
θt − θ N(0, H−1
GH−1
),
where G = E[gs(θ)gs(θ) |= θ] and H = 2f (θ).
SGD not only useful for optimization,
but also useful for statistical inference!
Statistical inference using mini batch SGD
burn in
θ−b, θ−b+1, · · · θ−1, θ0,
¯θ
(i)
t =1
t
t
j=1 θ
(i)
j
θ
(1)
1 , θ
(1)
2 , · · · , θ
(1)
t
discarded
θ
(1)
t+1, θ
(1)
t+2, · · · , θ
(1)
t+d
θ
(2)
1 , θ
(2)
2 , · · · , θ
(2)
t θ
(2)
t+1, θ
(2)
t+2, · · · , θ
(2)
t+d
...
θ
(R)
1 , θ
(R)
2 , · · · , θ
(R)
t θ
(R)
t+1, θ
(R)
t+2, · · · , θ
(R)
t+d
At each step:
Draw S i.i.d. uniformly random
indices It from [n] (with replacement)
Compute stochastic gradient
gs(θt) = 1
S i∈It
fi (θt)
θt+1 = θt − ηgs(θt)
Use an ensemble of i = 1, 2, . . . , R estima-
tors for statistical inference:
θ(i)
= θ +
√
S
√
t
√
n
(¯θ
(i)
t − θ).
Advantages of SGD inference
empirically not more expensive, uses
many fewer operations than
bootstrap
can be used when training neural
networks with SGD
easy to plug into existing SGD code
Other statistical inference
methods
directly computing inverse
Fisher information matrix
resampling:
bootstrap, subsampling
Advantages of SGD inference
empirically not more expensive, uses
many fewer operations than
bootstrap
can be used when training neural
networks with SGD
easy to plug into existing SGD code
Other statistical inference
methods
directly computing inverse
Fisher information matrix
resampling:
bootstrap, subsampling
Too computationally expensive,
not suited for “big data”!
Intuition – Ornstein-Uhlenbeck process approximation
In SGD, denote ∆t = θt − θ, and we have
∆t+1 = ∆t − ηgs(θ + ∆t).
∆t can be approximated by the Ornstein-Uhlenbeck process
d∆(T) = −H∆ dT +
√
ηG
1
2 dB(T),
where B(T) is a standard Brownian motion.
Intuition – Ornstein-Uhlenbeck process approximation
Denote ¯θt = 1
t
t
i=1 θt.
√
t(¯θt − θ) can be approximated as
√
t(¯θt − θ) = 1√
t
t
i=1
(θi − θ)
= 1
η
√
t
t
i=1
(θi − θ)η ≈ 1
η
√
t
tη
0
∆(T) dT,
(1)
where we use the approximation that η ≈ dT. By rearranging terms and multiplying both sides by H−1,
we can rewrite the stochastic differential equation as ∆(T) dT = −H−1 d∆(T) +
√
ηH−1G
1
2 dB(T).
Thus, we have
tη
0
∆(T) dT = −H−1
(∆(tη) − ∆(0)) +
√
ηH−1
G
1
2 B(tη). (2)
After plugging (2) into (1) we have
√
t ¯θt − θ ≈ − 1
η
√
t
H−1
(∆(tη) − ∆(0)) + 1√
tη
H−1
G
1
2 B(tη).
When ∆(0) = 0, the variance Var −1/η
√
t · H−1 (∆(tη) − ∆(0)) = O (1/tη). Since 1/√
tη ·
H−1G
1
2 B(tη) ∼ N(0, H−1GH−1), when η → 0 and ηt → ∞, we conclude that
√
t(¯θt − θ) ∼ N(0, H−1
GH−1
).
Theoretical guarantee
Theorem
For a differentiable convex function f (θ) = 1
n
n
i=1 fi (θ), with gradient f (θ), let θ ∈ Rp be
its minimizer, and denote its Hessian at θ by H := 2f (θ) . Assume that ∀θ ∈ Rp, f satisfies:
(F1) Weak strong convexity: (θ − θ) f (θ) ≥ α θ − θ 2
2, for constant α > 0,
(F2) Lipschitz gradient continuity: f (θ) 2 ≤ L θ − θ 2, for constant L > 0,
(F3) Bounded Taylor remainder: f (θ) − H(θ − θ) 2 ≤ E θ − θ 2
2, for constant E > 0,
(F4) Bounded Hessian spectrum at θ: 0 < λL ≤ λi (H) ≤ λU < ∞, ∀i.
Furthermore, let gs(θ) be a stochastic gradient of f , satisfying:
(G1) E [gs(θ) | θ] = f (θ),
(G2) E gs(θ) 2
2 | θ ≤ A θ − θ 2
2 + B,
(G3) E gs(θ) 4
2 | θ ≤ C θ − θ 4
2 + D,
(G4) E gs(θ)gs(θ) | θ − G 2
≤ A1 θ − θ 2 + A2 θ − θ 2
2 + A3 θ − θ 3
2 + A4 θ − θ 4
2,
for positive, data dependent constants A, B, C, D, Ai , for i = 1, . . . , 4. Assume that
θ1 − θ 2
2 = O(η); then for sufficiently small step size η > 0, the average SGD sequence
θt = 1
t
n
i=1 θi satisfies:
tE[(¯θt − θ)(¯θt − θ) ] − H−1
GH−1
2
√
η + 1
tη + tη2,
where G = E[gs(θ)gs(θ) | θ].
Theoretical guarantee
Theorem
For a differentiable convex function f (θ) = 1
n
n
i=1 fi (θ), with gradient f (θ), let θ ∈ Rp be
its minimizer, and denote its Hessian at θ by H := 2f (θ) . Assume that ∀θ ∈ Rp, f satisfies:
(F1) Weak strong convexity: (θ − θ) f (θ) ≥ α θ − θ 2
2, for constant α > 0,
(F2) Lipschitz gradient continuity: f (θ) 2 ≤ L θ − θ 2, for constant L > 0,
(F3) Bounded Taylor remainder: f (θ) − H(θ − θ) 2 ≤ E θ − θ 2
2, for constant E > 0,
(F4) Bounded Hessian spectrum at θ: 0 < λL ≤ λi (H) ≤ λU < ∞, ∀i.
Furthermore, let gs(θ) be a stochastic gradient of f , satisfying:
(G1) E [gs(θ) | θ] = f (θ),
(G2) E gs(θ) 2
2 | θ ≤ A θ − θ 2
2 + B,
(G3) E gs(θ) 4
2 | θ ≤ C θ − θ 4
2 + D,
(G4) E gs(θ)gs(θ) | θ − G 2
≤ A1 θ − θ 2 + A2 θ − θ 2
2 + A3 θ − θ 3
2 + A4 θ − θ 4
2,
for positive, data dependent constants A, B, C, D, Ai , for i = 1, . . . , 4. Assume that
θ1 − θ 2
2 = O(η); then for sufficiently small step size η > 0, the average SGD sequence
θt = 1
t
n
i=1 θi satisfies:
tE[(¯θt − θ)(¯θt − θ) ] − H−1
GH−1
2
√
η + 1
tη + tη2,
where G = E[gs(θ)gs(θ) | θ].
Proof idea: H−1 = η i≥0(I − ηH)i
Comparison with bootstrap
Univariate model estimation
−1.00 −0.75 −0.50 −0.25 0.00 0.25 0.50 0.75 1.00
0.0
0.5
1.0
1.5
2.0
N(0, 1/n)
θSGD − ¯θSGD
θbootstrap − ¯θbootstrap
(a) Normal
1√
2π
exp(−(x−µ)2
2
)
µ = 0
0.8 1.0 1.2 1.4 1.6
0
1
2
3
4
SGD
bootstrap
(b) Exponential
µe−µx
µ = 1
0.8 1.0 1.2 1.4
0
1
2
3
4
5
SGD
bootstrap
(c) Poisson
µx
e−µx
x!
µ = 1
95% confidence interval coverage simulation
η t = 100 t = 500 t = 2500
0.1 (0.957, 4.41) (0.955, 4.51) (0.960, 4.53)
0.02 (0.869, 3.30) (0.923, 3.77) (0.918, 3.87)
0.004 (0.634, 2.01) (0.862, 3.20) (0.916, 3.70)
(a) Bootstrap (0.941, 4.14), normal approximation (0.928, 3.87)
η t = 100 t = 500 t = 2500
0.1 (0.949, 4.74) (0.962, 4.91) (0.963, 4.94)
0.02 (0.845, 3.37) (0.916, 4.01) (0.927, 4.17)
0.004 (0.616, 2.00) (0.832, 3.30) (0.897, 3.93)
(b) Bootstrap (0.938, 4.47), normal approximation (0.925, 4.18)
Table 1: Linear regression: dimension = 10, 100 samples. (a) diagonal
covariance (b) non-diagonal covariance
η t = 100 t = 500 t = 2500
0.1 (0.872, 0.204) (0.937, 0.249) (0.939, 0.258)
0.02 (0.610, 0.112) (0.871, 0.196) (0.926, 0.237)
0.004 (0.312, 0.051) (0.596, 0.111) (0.86, 0.194)
(a) Bootstrap (0.932, 0.253), normal approximation (0.957, 0.264)
η t = 100 t = 500 t = 2500
0.1 (0.859, 0.206) (0.931, 0.255) (0.947, 0.266)
0.02 (0.600, 0.112) (0.847, 0.197) (0.931, 0.244)
0.004 (0.302, 0.051) (0.583, 0.111) (0.851, 0.195)
(b) Bootstrap (0.932, 0.245), normal approximation (0.954, 0.256)
Table 2: Logistic regression: dimension = 10, 1000 samples. (a) diagonal
covariance (b) non-diagonal covariance
Better when
each replicate’s average uses a longer consecutive sequence
larger step size
(coverage probability, confidence interval width)
Adversarial Attacks
Neural network classifiers with very high accuracy on test sets are
extremely susceptible to nearly imperceptible adversarial attacks.
Adversarial Attacks
Adversarial Attacks
Confidence intervals for mitigating adversarial examples
MNIST – logistic regression
0 5 10 15 20 25
0
5
10
15
20
25
(b) Original “0”:
P{0 | image} ≈ 1 − e−46
CI ≈ (1 − e−28
, 1 − e−64
)
0 5 10 15 20 25
0
5
10
15
20
25
(c) Adversarial “0”:
P{0 | image} ≈ e−17
CI ≈ (e−31
, 1 − e−11
)
0 5 10 15 20 25
0
5
10
15
20
25
Figure 1: MNIST adversarial perturbation
(scaled for display)
Adversarial examples produced by gradient attack have
large confidence intervals!

More Related Content

What's hot

Specific Finite Groups(General)
Specific Finite Groups(General)Specific Finite Groups(General)
Specific Finite Groups(General)
Shane Nicklas
 
ملزمة الرياضيات للصف السادس التطبيقي الفصل الاول الاعداد المركبة 2022
 ملزمة الرياضيات للصف السادس التطبيقي الفصل الاول الاعداد المركبة 2022 ملزمة الرياضيات للصف السادس التطبيقي الفصل الاول الاعداد المركبة 2022
ملزمة الرياضيات للصف السادس التطبيقي الفصل الاول الاعداد المركبة 2022
anasKhalaf4
 
Introducing Zap Q-Learning
Introducing Zap Q-Learning   Introducing Zap Q-Learning
Introducing Zap Q-Learning
Sean Meyn
 
Reinforcement Learning: Hidden Theory and New Super-Fast Algorithms
Reinforcement Learning: Hidden Theory and New Super-Fast AlgorithmsReinforcement Learning: Hidden Theory and New Super-Fast Algorithms
Reinforcement Learning: Hidden Theory and New Super-Fast Algorithms
Sean Meyn
 
Tetsunao Matsuta
Tetsunao MatsutaTetsunao Matsuta
Tetsunao Matsuta
Suurist
 
MUMS Undergraduate Workshop - A Biased Introduction to Global Sensitivity Ana...
MUMS Undergraduate Workshop - A Biased Introduction to Global Sensitivity Ana...MUMS Undergraduate Workshop - A Biased Introduction to Global Sensitivity Ana...
MUMS Undergraduate Workshop - A Biased Introduction to Global Sensitivity Ana...
The Statistical and Applied Mathematical Sciences Institute
 
Application H-matrices for solving PDEs with multi-scale coefficients, jumpin...
Application H-matrices for solving PDEs with multi-scale coefficients, jumpin...Application H-matrices for solving PDEs with multi-scale coefficients, jumpin...
Application H-matrices for solving PDEs with multi-scale coefficients, jumpin...
Alexander Litvinenko
 
Common fixed point theorems for random operators in hilbert space
Common fixed point theorems  for  random operators in hilbert spaceCommon fixed point theorems  for  random operators in hilbert space
Common fixed point theorems for random operators in hilbert space
Alexander Decker
 
Vancouver18
Vancouver18Vancouver18
Vancouver18
Christian Robert
 
Specific Finite Groups(General)
Specific Finite Groups(General)Specific Finite Groups(General)
Specific Finite Groups(General)
Shane Nicklas
 
Specific Finite Groups(General)
Specific Finite Groups(General)Specific Finite Groups(General)
Specific Finite Groups(General)
Shane Nicklas
 
Hiroaki Shiokawa
Hiroaki ShiokawaHiroaki Shiokawa
Hiroaki Shiokawa
Suurist
 
Modeling the Dynamics of SGD by Stochastic Differential Equation
Modeling the Dynamics of SGD by Stochastic Differential EquationModeling the Dynamics of SGD by Stochastic Differential Equation
Modeling the Dynamics of SGD by Stochastic Differential Equation
Mark Chang
 
On maximal and variational Fourier restriction
On maximal and variational Fourier restrictionOn maximal and variational Fourier restriction
On maximal and variational Fourier restriction
VjekoslavKovac1
 
Theta θ(g,x) and pi π(g,x) polynomials of hexagonal trapezoid system tb,a
Theta θ(g,x) and pi π(g,x) polynomials of hexagonal trapezoid system tb,aTheta θ(g,x) and pi π(g,x) polynomials of hexagonal trapezoid system tb,a
Theta θ(g,x) and pi π(g,x) polynomials of hexagonal trapezoid system tb,a
ijcsa
 
Solovay Kitaev theorem
Solovay Kitaev theoremSolovay Kitaev theorem
Solovay Kitaev theorem
JamesMa54
 
Sol7
Sol7Sol7
Iit jee question_paper
Iit jee question_paperIit jee question_paper
Iit jee question_paper
RahulMishra774
 

What's hot (18)

Specific Finite Groups(General)
Specific Finite Groups(General)Specific Finite Groups(General)
Specific Finite Groups(General)
 
ملزمة الرياضيات للصف السادس التطبيقي الفصل الاول الاعداد المركبة 2022
 ملزمة الرياضيات للصف السادس التطبيقي الفصل الاول الاعداد المركبة 2022 ملزمة الرياضيات للصف السادس التطبيقي الفصل الاول الاعداد المركبة 2022
ملزمة الرياضيات للصف السادس التطبيقي الفصل الاول الاعداد المركبة 2022
 
Introducing Zap Q-Learning
Introducing Zap Q-Learning   Introducing Zap Q-Learning
Introducing Zap Q-Learning
 
Reinforcement Learning: Hidden Theory and New Super-Fast Algorithms
Reinforcement Learning: Hidden Theory and New Super-Fast AlgorithmsReinforcement Learning: Hidden Theory and New Super-Fast Algorithms
Reinforcement Learning: Hidden Theory and New Super-Fast Algorithms
 
Tetsunao Matsuta
Tetsunao MatsutaTetsunao Matsuta
Tetsunao Matsuta
 
MUMS Undergraduate Workshop - A Biased Introduction to Global Sensitivity Ana...
MUMS Undergraduate Workshop - A Biased Introduction to Global Sensitivity Ana...MUMS Undergraduate Workshop - A Biased Introduction to Global Sensitivity Ana...
MUMS Undergraduate Workshop - A Biased Introduction to Global Sensitivity Ana...
 
Application H-matrices for solving PDEs with multi-scale coefficients, jumpin...
Application H-matrices for solving PDEs with multi-scale coefficients, jumpin...Application H-matrices for solving PDEs with multi-scale coefficients, jumpin...
Application H-matrices for solving PDEs with multi-scale coefficients, jumpin...
 
Common fixed point theorems for random operators in hilbert space
Common fixed point theorems  for  random operators in hilbert spaceCommon fixed point theorems  for  random operators in hilbert space
Common fixed point theorems for random operators in hilbert space
 
Vancouver18
Vancouver18Vancouver18
Vancouver18
 
Specific Finite Groups(General)
Specific Finite Groups(General)Specific Finite Groups(General)
Specific Finite Groups(General)
 
Specific Finite Groups(General)
Specific Finite Groups(General)Specific Finite Groups(General)
Specific Finite Groups(General)
 
Hiroaki Shiokawa
Hiroaki ShiokawaHiroaki Shiokawa
Hiroaki Shiokawa
 
Modeling the Dynamics of SGD by Stochastic Differential Equation
Modeling the Dynamics of SGD by Stochastic Differential EquationModeling the Dynamics of SGD by Stochastic Differential Equation
Modeling the Dynamics of SGD by Stochastic Differential Equation
 
On maximal and variational Fourier restriction
On maximal and variational Fourier restrictionOn maximal and variational Fourier restriction
On maximal and variational Fourier restriction
 
Theta θ(g,x) and pi π(g,x) polynomials of hexagonal trapezoid system tb,a
Theta θ(g,x) and pi π(g,x) polynomials of hexagonal trapezoid system tb,aTheta θ(g,x) and pi π(g,x) polynomials of hexagonal trapezoid system tb,a
Theta θ(g,x) and pi π(g,x) polynomials of hexagonal trapezoid system tb,a
 
Solovay Kitaev theorem
Solovay Kitaev theoremSolovay Kitaev theorem
Solovay Kitaev theorem
 
Sol7
Sol7Sol7
Sol7
 
Iit jee question_paper
Iit jee question_paperIit jee question_paper
Iit jee question_paper
 

Similar to Statistical Inference Using Stochastic Gradient Descent

Complex analysis notes
Complex analysis notesComplex analysis notes
Complex analysis notes
Prakash Dabhi
 
Ps02 cmth03 unit 1
Ps02 cmth03 unit 1Ps02 cmth03 unit 1
Ps02 cmth03 unit 1
Prakash Dabhi
 
2017-07, Research Seminar at Keio University, Metric Perspective of Stochasti...
2017-07, Research Seminar at Keio University, Metric Perspective of Stochasti...2017-07, Research Seminar at Keio University, Metric Perspective of Stochasti...
2017-07, Research Seminar at Keio University, Metric Perspective of Stochasti...
asahiushio1
 
3 grechnikov
3 grechnikov3 grechnikov
3 grechnikov
Yandex
 
MLP輪読スパース8章 トレースノルム正則化
MLP輪読スパース8章 トレースノルム正則化MLP輪読スパース8章 トレースノルム正則化
MLP輪読スパース8章 トレースノルム正則化
Akira Tanimoto
 
Recurrence Relation for Achromatic Number of Line Graph of Graph
Recurrence Relation for Achromatic Number of Line Graph of GraphRecurrence Relation for Achromatic Number of Line Graph of Graph
Recurrence Relation for Achromatic Number of Line Graph of Graph
IRJET Journal
 
Murphy: Machine learning A probabilistic perspective: Ch.9
Murphy: Machine learning A probabilistic perspective: Ch.9Murphy: Machine learning A probabilistic perspective: Ch.9
Murphy: Machine learning A probabilistic perspective: Ch.9
Daisuke Yoneoka
 
A common random fixed point theorem for rational inequality in hilbert space
A common random fixed point theorem for rational inequality in hilbert spaceA common random fixed point theorem for rational inequality in hilbert space
A common random fixed point theorem for rational inequality in hilbert space
Alexander Decker
 
PMED Transition Workshop - A Bayesian Model for Joint Longitudinal and Surviv...
PMED Transition Workshop - A Bayesian Model for Joint Longitudinal and Surviv...PMED Transition Workshop - A Bayesian Model for Joint Longitudinal and Surviv...
PMED Transition Workshop - A Bayesian Model for Joint Longitudinal and Surviv...
The Statistical and Applied Mathematical Sciences Institute
 
Fixed point theorems for random variables in complete metric spaces
Fixed point theorems for random variables in complete metric spacesFixed point theorems for random variables in complete metric spaces
Fixed point theorems for random variables in complete metric spaces
Alexander Decker
 
Ejercicios prueba de algebra de la UTN- widmar aguilar
Ejercicios prueba de algebra de la UTN-  widmar aguilarEjercicios prueba de algebra de la UTN-  widmar aguilar
Ejercicios prueba de algebra de la UTN- widmar aguilar
Widmar Aguilar Gonzalez
 
Radiation
RadiationRadiation
Radiation
Soumith V
 
stochastic processes assignment help
stochastic processes assignment helpstochastic processes assignment help
stochastic processes assignment help
Statistics Homework Helper
 
Ejercicio de fasores
Ejercicio de fasoresEjercicio de fasores
Ejercicio de fasores
dpancheins
 
A common unique random fixed point theorem in hilbert space using integral ty...
A common unique random fixed point theorem in hilbert space using integral ty...A common unique random fixed point theorem in hilbert space using integral ty...
A common unique random fixed point theorem in hilbert space using integral ty...
Alexander Decker
 
A common random fixed point theorem for rational ineqality in hilbert space ...
 A common random fixed point theorem for rational ineqality in hilbert space ... A common random fixed point theorem for rational ineqality in hilbert space ...
A common random fixed point theorem for rational ineqality in hilbert space ...
Alexander Decker
 
ATT00001ATT00002ATT00003ATT00004ATT00005CARD.docx
ATT00001ATT00002ATT00003ATT00004ATT00005CARD.docxATT00001ATT00002ATT00003ATT00004ATT00005CARD.docx
ATT00001ATT00002ATT00003ATT00004ATT00005CARD.docx
ikirkton
 
Sparse Representation of Multivariate Extremes with Applications to Anomaly R...
Sparse Representation of Multivariate Extremes with Applications to Anomaly R...Sparse Representation of Multivariate Extremes with Applications to Anomaly R...
Sparse Representation of Multivariate Extremes with Applications to Anomaly R...
Hayato Watanabe
 
A derivation of the sampling formulas for An Entity-Topic Model for Entity Li...
A derivation of the sampling formulas for An Entity-Topic Model for Entity Li...A derivation of the sampling formulas for An Entity-Topic Model for Entity Li...
A derivation of the sampling formulas for An Entity-Topic Model for Entity Li...
Tomonari Masada
 
Cálculo ii howard anton - capítulo 16 [tópicos do cálculo vetorial]
Cálculo ii   howard anton - capítulo 16 [tópicos do cálculo vetorial]Cálculo ii   howard anton - capítulo 16 [tópicos do cálculo vetorial]
Cálculo ii howard anton - capítulo 16 [tópicos do cálculo vetorial]
Henrique Covatti
 

Similar to Statistical Inference Using Stochastic Gradient Descent (20)

Complex analysis notes
Complex analysis notesComplex analysis notes
Complex analysis notes
 
Ps02 cmth03 unit 1
Ps02 cmth03 unit 1Ps02 cmth03 unit 1
Ps02 cmth03 unit 1
 
2017-07, Research Seminar at Keio University, Metric Perspective of Stochasti...
2017-07, Research Seminar at Keio University, Metric Perspective of Stochasti...2017-07, Research Seminar at Keio University, Metric Perspective of Stochasti...
2017-07, Research Seminar at Keio University, Metric Perspective of Stochasti...
 
3 grechnikov
3 grechnikov3 grechnikov
3 grechnikov
 
MLP輪読スパース8章 トレースノルム正則化
MLP輪読スパース8章 トレースノルム正則化MLP輪読スパース8章 トレースノルム正則化
MLP輪読スパース8章 トレースノルム正則化
 
Recurrence Relation for Achromatic Number of Line Graph of Graph
Recurrence Relation for Achromatic Number of Line Graph of GraphRecurrence Relation for Achromatic Number of Line Graph of Graph
Recurrence Relation for Achromatic Number of Line Graph of Graph
 
Murphy: Machine learning A probabilistic perspective: Ch.9
Murphy: Machine learning A probabilistic perspective: Ch.9Murphy: Machine learning A probabilistic perspective: Ch.9
Murphy: Machine learning A probabilistic perspective: Ch.9
 
A common random fixed point theorem for rational inequality in hilbert space
A common random fixed point theorem for rational inequality in hilbert spaceA common random fixed point theorem for rational inequality in hilbert space
A common random fixed point theorem for rational inequality in hilbert space
 
PMED Transition Workshop - A Bayesian Model for Joint Longitudinal and Surviv...
PMED Transition Workshop - A Bayesian Model for Joint Longitudinal and Surviv...PMED Transition Workshop - A Bayesian Model for Joint Longitudinal and Surviv...
PMED Transition Workshop - A Bayesian Model for Joint Longitudinal and Surviv...
 
Fixed point theorems for random variables in complete metric spaces
Fixed point theorems for random variables in complete metric spacesFixed point theorems for random variables in complete metric spaces
Fixed point theorems for random variables in complete metric spaces
 
Ejercicios prueba de algebra de la UTN- widmar aguilar
Ejercicios prueba de algebra de la UTN-  widmar aguilarEjercicios prueba de algebra de la UTN-  widmar aguilar
Ejercicios prueba de algebra de la UTN- widmar aguilar
 
Radiation
RadiationRadiation
Radiation
 
stochastic processes assignment help
stochastic processes assignment helpstochastic processes assignment help
stochastic processes assignment help
 
Ejercicio de fasores
Ejercicio de fasoresEjercicio de fasores
Ejercicio de fasores
 
A common unique random fixed point theorem in hilbert space using integral ty...
A common unique random fixed point theorem in hilbert space using integral ty...A common unique random fixed point theorem in hilbert space using integral ty...
A common unique random fixed point theorem in hilbert space using integral ty...
 
A common random fixed point theorem for rational ineqality in hilbert space ...
 A common random fixed point theorem for rational ineqality in hilbert space ... A common random fixed point theorem for rational ineqality in hilbert space ...
A common random fixed point theorem for rational ineqality in hilbert space ...
 
ATT00001ATT00002ATT00003ATT00004ATT00005CARD.docx
ATT00001ATT00002ATT00003ATT00004ATT00005CARD.docxATT00001ATT00002ATT00003ATT00004ATT00005CARD.docx
ATT00001ATT00002ATT00003ATT00004ATT00005CARD.docx
 
Sparse Representation of Multivariate Extremes with Applications to Anomaly R...
Sparse Representation of Multivariate Extremes with Applications to Anomaly R...Sparse Representation of Multivariate Extremes with Applications to Anomaly R...
Sparse Representation of Multivariate Extremes with Applications to Anomaly R...
 
A derivation of the sampling formulas for An Entity-Topic Model for Entity Li...
A derivation of the sampling formulas for An Entity-Topic Model for Entity Li...A derivation of the sampling formulas for An Entity-Topic Model for Entity Li...
A derivation of the sampling formulas for An Entity-Topic Model for Entity Li...
 
Cálculo ii howard anton - capítulo 16 [tópicos do cálculo vetorial]
Cálculo ii   howard anton - capítulo 16 [tópicos do cálculo vetorial]Cálculo ii   howard anton - capítulo 16 [tópicos do cálculo vetorial]
Cálculo ii howard anton - capítulo 16 [tópicos do cálculo vetorial]
 

More from Center for Transportation Research - UT Austin

Flying with SAVES
Flying with SAVESFlying with SAVES
Regret of Queueing Bandits
Regret of Queueing BanditsRegret of Queueing Bandits
Advances in Millimeter Wave for V2X
Advances in Millimeter Wave for V2XAdvances in Millimeter Wave for V2X
Advances in Millimeter Wave for V2X
Center for Transportation Research - UT Austin
 
Collaborative Sensing and Heterogeneous Networking Leveraging Vehicular Fleets
Collaborative Sensing and Heterogeneous Networking Leveraging Vehicular FleetsCollaborative Sensing and Heterogeneous Networking Leveraging Vehicular Fleets
Collaborative Sensing and Heterogeneous Networking Leveraging Vehicular Fleets
Center for Transportation Research - UT Austin
 
Collaborative Sensing for Automated Vehicles
Collaborative Sensing for Automated VehiclesCollaborative Sensing for Automated Vehicles
Collaborative Sensing for Automated Vehicles
Center for Transportation Research - UT Austin
 
Statistical Inference Using Stochastic Gradient Descent
Statistical Inference Using Stochastic Gradient DescentStatistical Inference Using Stochastic Gradient Descent
Statistical Inference Using Stochastic Gradient Descent
Center for Transportation Research - UT Austin
 
CAV/Mixed Transportation Modeling
CAV/Mixed Transportation ModelingCAV/Mixed Transportation Modeling
CAV/Mixed Transportation Modeling
Center for Transportation Research - UT Austin
 
Real-time Signal Control and Traffic Stability / Improved Models for Managed ...
Real-time Signal Control and Traffic Stability / Improved Models for Managed ...Real-time Signal Control and Traffic Stability / Improved Models for Managed ...
Real-time Signal Control and Traffic Stability / Improved Models for Managed ...
Center for Transportation Research - UT Austin
 
Sharing Novel Data Sources to Promote Innovation Through Collaboration: Case ...
Sharing Novel Data Sources to Promote Innovation Through Collaboration: Case ...Sharing Novel Data Sources to Promote Innovation Through Collaboration: Case ...
Sharing Novel Data Sources to Promote Innovation Through Collaboration: Case ...
Center for Transportation Research - UT Austin
 
UT SAVES: Situation Aware Vehicular Engineering Systems
UT SAVES: Situation Aware Vehicular Engineering SystemsUT SAVES: Situation Aware Vehicular Engineering Systems
UT SAVES: Situation Aware Vehicular Engineering Systems
Center for Transportation Research - UT Austin
 
Regret of Queueing Bandits
Regret of Queueing BanditsRegret of Queueing Bandits
Sharing Novel Data Sources to Promote Innovation through Collaboration: Case ...
Sharing Novel Data Sources to Promote Innovation through Collaboration: Case ...Sharing Novel Data Sources to Promote Innovation through Collaboration: Case ...
Sharing Novel Data Sources to Promote Innovation through Collaboration: Case ...
Center for Transportation Research - UT Austin
 
CAV/Mixed Transportation Modeling
CAV/Mixed Transportation ModelingCAV/Mixed Transportation Modeling
CAV/Mixed Transportation Modeling
Center for Transportation Research - UT Austin
 
Collaborative Sensing for Automated Vehicles
Collaborative Sensing for Automated VehiclesCollaborative Sensing for Automated Vehicles
Collaborative Sensing for Automated Vehicles
Center for Transportation Research - UT Austin
 
Advances in Millimeter Wave for V2X
Advances in Millimeter Wave for V2XAdvances in Millimeter Wave for V2X
Advances in Millimeter Wave for V2X
Center for Transportation Research - UT Austin
 
Status of two projects: Real-time Signal Control and Traffic Stability; Impro...
Status of two projects: Real-time Signal Control and Traffic Stability; Impro...Status of two projects: Real-time Signal Control and Traffic Stability; Impro...
Status of two projects: Real-time Signal Control and Traffic Stability; Impro...
Center for Transportation Research - UT Austin
 
SAVES general overview
SAVES general overviewSAVES general overview
D-STOP Overview April 2018
D-STOP Overview April 2018D-STOP Overview April 2018
Managing Mobility during Design-Build Highway Construction: Successes and Les...
Managing Mobility during Design-Build Highway Construction: Successes and Les...Managing Mobility during Design-Build Highway Construction: Successes and Les...
Managing Mobility during Design-Build Highway Construction: Successes and Les...
Center for Transportation Research - UT Austin
 
The Future of Fly Ash in Texas Concrete
The Future of Fly Ash in Texas ConcreteThe Future of Fly Ash in Texas Concrete
The Future of Fly Ash in Texas Concrete
Center for Transportation Research - UT Austin
 

More from Center for Transportation Research - UT Austin (20)

Flying with SAVES
Flying with SAVESFlying with SAVES
Flying with SAVES
 
Regret of Queueing Bandits
Regret of Queueing BanditsRegret of Queueing Bandits
Regret of Queueing Bandits
 
Advances in Millimeter Wave for V2X
Advances in Millimeter Wave for V2XAdvances in Millimeter Wave for V2X
Advances in Millimeter Wave for V2X
 
Collaborative Sensing and Heterogeneous Networking Leveraging Vehicular Fleets
Collaborative Sensing and Heterogeneous Networking Leveraging Vehicular FleetsCollaborative Sensing and Heterogeneous Networking Leveraging Vehicular Fleets
Collaborative Sensing and Heterogeneous Networking Leveraging Vehicular Fleets
 
Collaborative Sensing for Automated Vehicles
Collaborative Sensing for Automated VehiclesCollaborative Sensing for Automated Vehicles
Collaborative Sensing for Automated Vehicles
 
Statistical Inference Using Stochastic Gradient Descent
Statistical Inference Using Stochastic Gradient DescentStatistical Inference Using Stochastic Gradient Descent
Statistical Inference Using Stochastic Gradient Descent
 
CAV/Mixed Transportation Modeling
CAV/Mixed Transportation ModelingCAV/Mixed Transportation Modeling
CAV/Mixed Transportation Modeling
 
Real-time Signal Control and Traffic Stability / Improved Models for Managed ...
Real-time Signal Control and Traffic Stability / Improved Models for Managed ...Real-time Signal Control and Traffic Stability / Improved Models for Managed ...
Real-time Signal Control and Traffic Stability / Improved Models for Managed ...
 
Sharing Novel Data Sources to Promote Innovation Through Collaboration: Case ...
Sharing Novel Data Sources to Promote Innovation Through Collaboration: Case ...Sharing Novel Data Sources to Promote Innovation Through Collaboration: Case ...
Sharing Novel Data Sources to Promote Innovation Through Collaboration: Case ...
 
UT SAVES: Situation Aware Vehicular Engineering Systems
UT SAVES: Situation Aware Vehicular Engineering SystemsUT SAVES: Situation Aware Vehicular Engineering Systems
UT SAVES: Situation Aware Vehicular Engineering Systems
 
Regret of Queueing Bandits
Regret of Queueing BanditsRegret of Queueing Bandits
Regret of Queueing Bandits
 
Sharing Novel Data Sources to Promote Innovation through Collaboration: Case ...
Sharing Novel Data Sources to Promote Innovation through Collaboration: Case ...Sharing Novel Data Sources to Promote Innovation through Collaboration: Case ...
Sharing Novel Data Sources to Promote Innovation through Collaboration: Case ...
 
CAV/Mixed Transportation Modeling
CAV/Mixed Transportation ModelingCAV/Mixed Transportation Modeling
CAV/Mixed Transportation Modeling
 
Collaborative Sensing for Automated Vehicles
Collaborative Sensing for Automated VehiclesCollaborative Sensing for Automated Vehicles
Collaborative Sensing for Automated Vehicles
 
Advances in Millimeter Wave for V2X
Advances in Millimeter Wave for V2XAdvances in Millimeter Wave for V2X
Advances in Millimeter Wave for V2X
 
Status of two projects: Real-time Signal Control and Traffic Stability; Impro...
Status of two projects: Real-time Signal Control and Traffic Stability; Impro...Status of two projects: Real-time Signal Control and Traffic Stability; Impro...
Status of two projects: Real-time Signal Control and Traffic Stability; Impro...
 
SAVES general overview
SAVES general overviewSAVES general overview
SAVES general overview
 
D-STOP Overview April 2018
D-STOP Overview April 2018D-STOP Overview April 2018
D-STOP Overview April 2018
 
Managing Mobility during Design-Build Highway Construction: Successes and Les...
Managing Mobility during Design-Build Highway Construction: Successes and Les...Managing Mobility during Design-Build Highway Construction: Successes and Les...
Managing Mobility during Design-Build Highway Construction: Successes and Les...
 
The Future of Fly Ash in Texas Concrete
The Future of Fly Ash in Texas ConcreteThe Future of Fly Ash in Texas Concrete
The Future of Fly Ash in Texas Concrete
 

Recently uploaded

一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理
aqzctr7x
 
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
Timothy Spann
 
原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样
原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样
原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样
u86oixdj
 
Predictably Improve Your B2B Tech Company's Performance by Leveraging Data
Predictably Improve Your B2B Tech Company's Performance by Leveraging DataPredictably Improve Your B2B Tech Company's Performance by Leveraging Data
Predictably Improve Your B2B Tech Company's Performance by Leveraging Data
Kiwi Creative
 
The Ipsos - AI - Monitor 2024 Report.pdf
The  Ipsos - AI - Monitor 2024 Report.pdfThe  Ipsos - AI - Monitor 2024 Report.pdf
The Ipsos - AI - Monitor 2024 Report.pdf
Social Samosa
 
一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理
一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理
一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理
mzpolocfi
 
Influence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business PlanInfluence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business Plan
jerlynmaetalle
 
Natural Language Processing (NLP), RAG and its applications .pptx
Natural Language Processing (NLP), RAG and its applications .pptxNatural Language Processing (NLP), RAG and its applications .pptx
Natural Language Processing (NLP), RAG and its applications .pptx
fkyes25
 
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
ahzuo
 
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data LakeViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
Walaa Eldin Moustafa
 
Intelligence supported media monitoring in veterinary medicine
Intelligence supported media monitoring in veterinary medicineIntelligence supported media monitoring in veterinary medicine
Intelligence supported media monitoring in veterinary medicine
AndrzejJarynowski
 
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
Social Samosa
 
一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理
一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理
一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理
bopyb
 
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
u86oixdj
 
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
slg6lamcq
 
一比一原版(Harvard毕业证书)哈佛大学毕业证如何办理
一比一原版(Harvard毕业证书)哈佛大学毕业证如何办理一比一原版(Harvard毕业证书)哈佛大学毕业证如何办理
一比一原版(Harvard毕业证书)哈佛大学毕业证如何办理
zsjl4mimo
 
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
74nqk8xf
 
Challenges of Nation Building-1.pptx with more important
Challenges of Nation Building-1.pptx with more importantChallenges of Nation Building-1.pptx with more important
Challenges of Nation Building-1.pptx with more important
Sm321
 
一比一原版(Chester毕业证书)切斯特大学毕业证如何办理
一比一原版(Chester毕业证书)切斯特大学毕业证如何办理一比一原版(Chester毕业证书)切斯特大学毕业证如何办理
一比一原版(Chester毕业证书)切斯特大学毕业证如何办理
74nqk8xf
 
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
apvysm8
 

Recently uploaded (20)

一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理
 
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
 
原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样
原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样
原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样
 
Predictably Improve Your B2B Tech Company's Performance by Leveraging Data
Predictably Improve Your B2B Tech Company's Performance by Leveraging DataPredictably Improve Your B2B Tech Company's Performance by Leveraging Data
Predictably Improve Your B2B Tech Company's Performance by Leveraging Data
 
The Ipsos - AI - Monitor 2024 Report.pdf
The  Ipsos - AI - Monitor 2024 Report.pdfThe  Ipsos - AI - Monitor 2024 Report.pdf
The Ipsos - AI - Monitor 2024 Report.pdf
 
一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理
一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理
一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理
 
Influence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business PlanInfluence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business Plan
 
Natural Language Processing (NLP), RAG and its applications .pptx
Natural Language Processing (NLP), RAG and its applications .pptxNatural Language Processing (NLP), RAG and its applications .pptx
Natural Language Processing (NLP), RAG and its applications .pptx
 
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
 
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data LakeViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
 
Intelligence supported media monitoring in veterinary medicine
Intelligence supported media monitoring in veterinary medicineIntelligence supported media monitoring in veterinary medicine
Intelligence supported media monitoring in veterinary medicine
 
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
 
一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理
一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理
一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理
 
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
 
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
 
一比一原版(Harvard毕业证书)哈佛大学毕业证如何办理
一比一原版(Harvard毕业证书)哈佛大学毕业证如何办理一比一原版(Harvard毕业证书)哈佛大学毕业证如何办理
一比一原版(Harvard毕业证书)哈佛大学毕业证如何办理
 
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
 
Challenges of Nation Building-1.pptx with more important
Challenges of Nation Building-1.pptx with more importantChallenges of Nation Building-1.pptx with more important
Challenges of Nation Building-1.pptx with more important
 
一比一原版(Chester毕业证书)切斯特大学毕业证如何办理
一比一原版(Chester毕业证书)切斯特大学毕业证如何办理一比一原版(Chester毕业证书)切斯特大学毕业证如何办理
一比一原版(Chester毕业证书)切斯特大学毕业证如何办理
 
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
 

Statistical Inference Using Stochastic Gradient Descent

  • 1. Statistical inference using stochastic gradient descent Constantine Caramanis1 Liu Liu1 Anastasios (Tasos) Kyrillidis2 Tianyang Li1 1The University of Texas at Austin 2IBM T.J. Watson Research Center, Yorktown Heights → Rice University
  • 2. Statistical inference is important Quantifying uncertainty Signal? Noise? Skill? Luck? Frequentist inference confidence interval hypothesis testing
  • 3. Statistical inference is important Quantifying uncertainty Signal? Noise? Skill? Luck? Frequentist inference confidence interval hypothesis testing Confidence intervals can be used to detect adversarial attacks.
  • 4. Outline of This Work (a) Large Scale Problems: Point Estimates computed via SGD (b) Confidence Intervals computed by Boostrap: too expensive. (c) This talk: we can compute using SGD. (d) Application to adversarial attacks: implicitly learning the manifold.
  • 5. SGD in ERM – mini batch SGD To solve empirical risk minimization (ERM) f (θ) = 1 n n i=1 fi (θ), where fi (θ) = θ(Zi ). At each step: Draw S i.i.d. uniformly random indices It from [n] (with replacement) Compute stochastic gradient gs(θt) = 1 S i∈It fi (θt) θt+1 = θt − ηgs(θt)
  • 6. Asymptotic normality – classical results M-estimator – statistics When number of samples n → ∞, √ n(θ − θ∗ ) N(0, H∗−1 G∗ H∗−1 ), where G∗ = EZ [ θ θ∗ (Z) θ θ∗ (Z) ] and H∗ = EZ [ 2 θ θ∗ (Z)]. Stochastic approximation – optimization When number of steps t → ∞, √ t 1 t t i=1 θt − θ N(0, H−1 GH−1 ), where G = E[gs(θ)gs(θ) |= θ] and H = 2f (θ).
  • 7. Asymptotic normality – classical results M-estimator – statistics When number of samples n → ∞, √ n(θ − θ∗ ) N(0, H∗−1 G∗ H∗−1 ), where G∗ = EZ [ θ θ∗ (Z) θ θ∗ (Z) ] and H∗ = EZ [ 2 θ θ∗ (Z)]. Stochastic approximation – optimization When number of steps t → ∞, √ t 1 t t i=1 θt − θ N(0, H−1 GH−1 ), where G = E[gs(θ)gs(θ) |= θ] and H = 2f (θ). SGD not only useful for optimization, but also useful for statistical inference!
  • 8. Statistical inference using mini batch SGD burn in θ−b, θ−b+1, · · · θ−1, θ0, ¯θ (i) t =1 t t j=1 θ (i) j θ (1) 1 , θ (1) 2 , · · · , θ (1) t discarded θ (1) t+1, θ (1) t+2, · · · , θ (1) t+d θ (2) 1 , θ (2) 2 , · · · , θ (2) t θ (2) t+1, θ (2) t+2, · · · , θ (2) t+d ... θ (R) 1 , θ (R) 2 , · · · , θ (R) t θ (R) t+1, θ (R) t+2, · · · , θ (R) t+d At each step: Draw S i.i.d. uniformly random indices It from [n] (with replacement) Compute stochastic gradient gs(θt) = 1 S i∈It fi (θt) θt+1 = θt − ηgs(θt) Use an ensemble of i = 1, 2, . . . , R estima- tors for statistical inference: θ(i) = θ + √ S √ t √ n (¯θ (i) t − θ).
  • 9. Advantages of SGD inference empirically not more expensive, uses many fewer operations than bootstrap can be used when training neural networks with SGD easy to plug into existing SGD code Other statistical inference methods directly computing inverse Fisher information matrix resampling: bootstrap, subsampling
  • 10. Advantages of SGD inference empirically not more expensive, uses many fewer operations than bootstrap can be used when training neural networks with SGD easy to plug into existing SGD code Other statistical inference methods directly computing inverse Fisher information matrix resampling: bootstrap, subsampling Too computationally expensive, not suited for “big data”!
  • 11. Intuition – Ornstein-Uhlenbeck process approximation In SGD, denote ∆t = θt − θ, and we have ∆t+1 = ∆t − ηgs(θ + ∆t). ∆t can be approximated by the Ornstein-Uhlenbeck process d∆(T) = −H∆ dT + √ ηG 1 2 dB(T), where B(T) is a standard Brownian motion.
  • 12. Intuition – Ornstein-Uhlenbeck process approximation Denote ¯θt = 1 t t i=1 θt. √ t(¯θt − θ) can be approximated as √ t(¯θt − θ) = 1√ t t i=1 (θi − θ) = 1 η √ t t i=1 (θi − θ)η ≈ 1 η √ t tη 0 ∆(T) dT, (1) where we use the approximation that η ≈ dT. By rearranging terms and multiplying both sides by H−1, we can rewrite the stochastic differential equation as ∆(T) dT = −H−1 d∆(T) + √ ηH−1G 1 2 dB(T). Thus, we have tη 0 ∆(T) dT = −H−1 (∆(tη) − ∆(0)) + √ ηH−1 G 1 2 B(tη). (2) After plugging (2) into (1) we have √ t ¯θt − θ ≈ − 1 η √ t H−1 (∆(tη) − ∆(0)) + 1√ tη H−1 G 1 2 B(tη). When ∆(0) = 0, the variance Var −1/η √ t · H−1 (∆(tη) − ∆(0)) = O (1/tη). Since 1/√ tη · H−1G 1 2 B(tη) ∼ N(0, H−1GH−1), when η → 0 and ηt → ∞, we conclude that √ t(¯θt − θ) ∼ N(0, H−1 GH−1 ).
  • 13. Theoretical guarantee Theorem For a differentiable convex function f (θ) = 1 n n i=1 fi (θ), with gradient f (θ), let θ ∈ Rp be its minimizer, and denote its Hessian at θ by H := 2f (θ) . Assume that ∀θ ∈ Rp, f satisfies: (F1) Weak strong convexity: (θ − θ) f (θ) ≥ α θ − θ 2 2, for constant α > 0, (F2) Lipschitz gradient continuity: f (θ) 2 ≤ L θ − θ 2, for constant L > 0, (F3) Bounded Taylor remainder: f (θ) − H(θ − θ) 2 ≤ E θ − θ 2 2, for constant E > 0, (F4) Bounded Hessian spectrum at θ: 0 < λL ≤ λi (H) ≤ λU < ∞, ∀i. Furthermore, let gs(θ) be a stochastic gradient of f , satisfying: (G1) E [gs(θ) | θ] = f (θ), (G2) E gs(θ) 2 2 | θ ≤ A θ − θ 2 2 + B, (G3) E gs(θ) 4 2 | θ ≤ C θ − θ 4 2 + D, (G4) E gs(θ)gs(θ) | θ − G 2 ≤ A1 θ − θ 2 + A2 θ − θ 2 2 + A3 θ − θ 3 2 + A4 θ − θ 4 2, for positive, data dependent constants A, B, C, D, Ai , for i = 1, . . . , 4. Assume that θ1 − θ 2 2 = O(η); then for sufficiently small step size η > 0, the average SGD sequence θt = 1 t n i=1 θi satisfies: tE[(¯θt − θ)(¯θt − θ) ] − H−1 GH−1 2 √ η + 1 tη + tη2, where G = E[gs(θ)gs(θ) | θ].
  • 14. Theoretical guarantee Theorem For a differentiable convex function f (θ) = 1 n n i=1 fi (θ), with gradient f (θ), let θ ∈ Rp be its minimizer, and denote its Hessian at θ by H := 2f (θ) . Assume that ∀θ ∈ Rp, f satisfies: (F1) Weak strong convexity: (θ − θ) f (θ) ≥ α θ − θ 2 2, for constant α > 0, (F2) Lipschitz gradient continuity: f (θ) 2 ≤ L θ − θ 2, for constant L > 0, (F3) Bounded Taylor remainder: f (θ) − H(θ − θ) 2 ≤ E θ − θ 2 2, for constant E > 0, (F4) Bounded Hessian spectrum at θ: 0 < λL ≤ λi (H) ≤ λU < ∞, ∀i. Furthermore, let gs(θ) be a stochastic gradient of f , satisfying: (G1) E [gs(θ) | θ] = f (θ), (G2) E gs(θ) 2 2 | θ ≤ A θ − θ 2 2 + B, (G3) E gs(θ) 4 2 | θ ≤ C θ − θ 4 2 + D, (G4) E gs(θ)gs(θ) | θ − G 2 ≤ A1 θ − θ 2 + A2 θ − θ 2 2 + A3 θ − θ 3 2 + A4 θ − θ 4 2, for positive, data dependent constants A, B, C, D, Ai , for i = 1, . . . , 4. Assume that θ1 − θ 2 2 = O(η); then for sufficiently small step size η > 0, the average SGD sequence θt = 1 t n i=1 θi satisfies: tE[(¯θt − θ)(¯θt − θ) ] − H−1 GH−1 2 √ η + 1 tη + tη2, where G = E[gs(θ)gs(θ) | θ]. Proof idea: H−1 = η i≥0(I − ηH)i
  • 15. Comparison with bootstrap Univariate model estimation −1.00 −0.75 −0.50 −0.25 0.00 0.25 0.50 0.75 1.00 0.0 0.5 1.0 1.5 2.0 N(0, 1/n) θSGD − ¯θSGD θbootstrap − ¯θbootstrap (a) Normal 1√ 2π exp(−(x−µ)2 2 ) µ = 0 0.8 1.0 1.2 1.4 1.6 0 1 2 3 4 SGD bootstrap (b) Exponential µe−µx µ = 1 0.8 1.0 1.2 1.4 0 1 2 3 4 5 SGD bootstrap (c) Poisson µx e−µx x! µ = 1
  • 16. 95% confidence interval coverage simulation η t = 100 t = 500 t = 2500 0.1 (0.957, 4.41) (0.955, 4.51) (0.960, 4.53) 0.02 (0.869, 3.30) (0.923, 3.77) (0.918, 3.87) 0.004 (0.634, 2.01) (0.862, 3.20) (0.916, 3.70) (a) Bootstrap (0.941, 4.14), normal approximation (0.928, 3.87) η t = 100 t = 500 t = 2500 0.1 (0.949, 4.74) (0.962, 4.91) (0.963, 4.94) 0.02 (0.845, 3.37) (0.916, 4.01) (0.927, 4.17) 0.004 (0.616, 2.00) (0.832, 3.30) (0.897, 3.93) (b) Bootstrap (0.938, 4.47), normal approximation (0.925, 4.18) Table 1: Linear regression: dimension = 10, 100 samples. (a) diagonal covariance (b) non-diagonal covariance η t = 100 t = 500 t = 2500 0.1 (0.872, 0.204) (0.937, 0.249) (0.939, 0.258) 0.02 (0.610, 0.112) (0.871, 0.196) (0.926, 0.237) 0.004 (0.312, 0.051) (0.596, 0.111) (0.86, 0.194) (a) Bootstrap (0.932, 0.253), normal approximation (0.957, 0.264) η t = 100 t = 500 t = 2500 0.1 (0.859, 0.206) (0.931, 0.255) (0.947, 0.266) 0.02 (0.600, 0.112) (0.847, 0.197) (0.931, 0.244) 0.004 (0.302, 0.051) (0.583, 0.111) (0.851, 0.195) (b) Bootstrap (0.932, 0.245), normal approximation (0.954, 0.256) Table 2: Logistic regression: dimension = 10, 1000 samples. (a) diagonal covariance (b) non-diagonal covariance Better when each replicate’s average uses a longer consecutive sequence larger step size (coverage probability, confidence interval width)
  • 17. Adversarial Attacks Neural network classifiers with very high accuracy on test sets are extremely susceptible to nearly imperceptible adversarial attacks.
  • 20. Confidence intervals for mitigating adversarial examples MNIST – logistic regression 0 5 10 15 20 25 0 5 10 15 20 25 (b) Original “0”: P{0 | image} ≈ 1 − e−46 CI ≈ (1 − e−28 , 1 − e−64 ) 0 5 10 15 20 25 0 5 10 15 20 25 (c) Adversarial “0”: P{0 | image} ≈ e−17 CI ≈ (e−31 , 1 − e−11 ) 0 5 10 15 20 25 0 5 10 15 20 25 Figure 1: MNIST adversarial perturbation (scaled for display) Adversarial examples produced by gradient attack have large confidence intervals!