QMC: Transition Workshop - Importance Sampling the Union of Rare Events with an Application to Power Systems Analysis - Art Owen, May 9, 2018

SAMSI QMC Transition workshop 1
SAMSI QMC transition
workshop
Art Owen
Stanford University
May 9, 2018

Importance sampling the union
of rare events, with bounded
error and an application to
power systems analysis and
another application to genomic
testing, plus something about
the weight working group
A. B. Owen + Yury Maximov + Michael Chertkov with Kenneth J. Tay and Yue Hui
May 9, 2018

Acknowledgments
1) Mac Hyman, Ilse Ipsen & SAMSI staff
2) Fred, Frances, Pierre
May 9, 2018

Weight working group
• We want µ = f(x) dx
• We use ˆµ = (1/n)
n
i=1 f(xi)
The RKHS theory is strong for f in a weighted Sobolev space
Which weighted space?
For most applications, f belongs to
• All of the RHKS, or
• none of them.
We had in mind a pipeline
f
1
→ weights
2
→ polynomial lattice rule
3
→ ˆµ
We got stuck on 2. Note from after the talk: chatting with Pierre L’Ecuyer it is clear
that the sticking point is more like step 1.5, choosing a ﬁgure of merit.
Lattice builder + SSJ: Godin, Jemel, L’Ecuyer, Marion, Munger
May 9, 2018

Interleaving
This is a strategy by Dick & Baldeaux to build a new QMC input by smushing
together digits from two old ones.
You need a net in [0, 1]2s
to get points in [0, 1]s
.
Should we smush them all, or just some?
Yue Hui has looked into strategies for interleaving some but not all inputs to QMC
Preliminary results:
• small differences from which inputs get interleaved
• enormous gains for a variable with a large additive component
• lesser gains from variables that interact
Tentative ﬁnding: Variables with large closed Sobol’ index but not so large total
Sobol’ index are promising.
May 9, 2018

Rare event sampling
Motivation: an electrical grid has N nodes. Power p1, p2, · · · , pN
• Random pi > 0, e.g., wind generation,
• Random pi < 0, e.g., consumption,
• Fixed pi for controllable nodes,
AC phase angles θi
• (θ1, . . . , θN ) = F(p1, . . . , pN )
• Constraints: |θi − θj| < ¯θ if i ∼ j
• Find P maxi∼j |θi − θj| ¯θ
Simpliﬁed model
• (p1, . . . , pN ) Gaussian
• θ linear in p1, . . . , pN
May 9, 2018

Gaussian setup
For x ∼ N(0, Id), Hj = {x | ωT
j x τj} ﬁnd µ = P(x ∈ ∪J
j=1Hj).
WLOG ωj = 1. Ordinarily τj > 0. d hundreds. J thousands.
−10 −5 0 5 10
−6−4−20246
c(−7,6)
q
q
q
q
q
q
Solid: deciles of x . Dashed: 10−3
· · · 10−7
. May 9, 2018

Basic bounds
Let Pj ≡ P(ωT
j x τj) = Φ(−τj)
then µ ¯µ ≡
J
j=1
Pj (union bound)
and µ µ ≡ max
1 j J
Pj
Inclusion-exclusion
For u ⊆ 1:J = {1, 2, . . . , J}, let
Hu = ∪j∈uHj Hu(x) ≡ 1{x ∈ Hu} Hc
u(x) ≡ 1 − Hu(x)
Pu = P(Hu) = E(Hu(x))
µ =
|u|>0
(−1)|u|−1
Pu
May 9, 2018

Importance sampling
For x ∼ p, seek η = Ep(f(x)) = f(x)p(x) dx. Take
ˆη =
1
n
n
i=1
f(xi)p(xi)
q(xi)
, xi
iid
∼ q
Unbiased if q(x) > 0 whenever f(x)p(x) = 0.
Variance
Var(ˆη) = σ2
q /n, where
σ2
q =
f2
p2
q
− µ2
= · · · =
(fp − µq)2
q
Num: seek q ≈ fp/µ, i.e., nearly proportional to fp
Den: watch out for small q.
May 9, 2018

Mixture sampling
For αj 0, j αj = 1
ˆηα =
1
n
n
i=1
f(xi)p(xi)
j αjqj(xi)
, xi
iid
∼ qα ≡
j
αjqj
Defensive mixtures
Take q0 = p, α0 > 0. Get p/qα 1/α0. Hesterberg (1995)
Additional refs
Use qj = 1 as control variates. O & Zhou (2000).
Optimization over α is convex. He & O (2014).
Multiple IS. Veach & Guibas (1994). Veach’s Oscar.
Elvira, Martino, Luengo, Bugallo (2015) Generalizations.
May 9, 2018

Instantons
µj = arg maxx∈Hj
p(x) Chertkov, Pan, Stepanov (2011)
−10 −5 0 5 10
−6−4−20246
c(−7,6)
q
q
q
q
q
q
• Solid dots µj
• Initial thought: qj = N(µj, I)
• and qα = j αjqj
• I.e., mixture of exponential tilting
Conditional sampling
qj = L(x | x ∈ Hj) = p(x)Hj(x)/Pj
May 9, 2018

Mixture of conditional sampling
α1, . . . , αJ 0,
j
αj = 1, qα =
j
αjqj
µ = P(x ∈ ∪J
j=1Hj) = P(x ∈ H1:J ) = E H1:J (x)
Mixture IS
ˆµα =
1
n
n
i=1
H1:J (xi)p(xi)
J
j=1 αjqj(xi)
(where qj = pHj/Pj)
=
1
n
n
i=1
H1:J (xi)
J
j=1 αjHj(xi)/Pj
with H0 ≡ 1 and P0 = 1.
It would be nice to have αj/Pj constant.
May 9, 2018

ALOE
At Least One (Rare) Event
Like AMIS of Cornuet, Marin, Mira, Robert (2012)
Take αj = α∗
j ∝ Pj. That is
α∗
j =
Pj
J
j =1 Pj
=
Pj
¯µ
, ( ¯µ is the union bound )
Then
ˆµα∗ =
1
n
n
i=1
H1:J (xi)
J
j=1 αjHj(xi)/Pj
= ¯µ ×
1
n
n
i=1
H1:J (xi)
J
j=1 Hj(xi)
If x ∼ qα∗ then H1:J (x) = 1. So
ˆµα∗ = ¯µ ×
1
n
n
i=1
1
S(xi)
, S(x) ≡
J
j=1
Hj(x)
May 9, 2018

Some prior uses
Frigessi & Vercelis (1985)
Combinatorial enumeration.
Naiman & Priebe (2001)
Scan statistics, genetics and imaging.
Naiman, Priebe & Cope (2001)
Scan statistics, marked point processes (e.g., mine ﬁelds)
Shi, Siegmund, Yakir (2007)
Scan statistics, linkage analysis
Adler, Blanchet, Liu (2012)
Excursions of random ﬁelds over T ⊂ Rd
Nobody seems to have named it.
Thanks to: Bert Zwart, Jose Blanchet, David Siegmund for literature pointers
May 9, 2018

Theorem
O, Maximov & Chertkov (2017)
Let H1, . . . , HJ be events deﬁned by x ∼ p.
Let qj(x) = p(x)Hj(x)/Pj for Pj = P(x ∈ Hj).
Let xi
iid
∼ qα∗ =
J
j=1 α∗
j qj for α∗
j = Pj/¯µ.
Take
ˆµ = ¯µ ×
1
n
n
i=1
1
S(xi)
, S(xi) =
J
j=1
Hj(xi)
Then E(ˆµ) = µ and
Var(ˆµ) =
1
n
¯µ
J
s=1
Ts
s
− µ2 µ(¯µ − µ)
n
where Ts ≡ P(S(x) = s), x ∼ p.
The RHS follows because
J
s=1
Ts
s
J
s=1
Ts = µ.
May 9, 2018

Remarks
With S(x) equal to the number of rare events at x,
ˆµ = ¯µ ×
1
n
n
i=1
1
S(xi)
1 S(x) J =⇒
¯µ
J
ˆµ ¯µ
Robustness
The usual problem with rare event estimation is getting no rare events in n tries.
Then ˆµ = 0.
Here the corresponding failure is never seeing S 2 rare events.
Then ˆµ = ¯µ, an upper bound and probably fairly accurate if all S(xi) = 1
Conditioning vs sampling from N(µj, I)
Avoids wasting samples outside the failure zone.
Avoids awkward likelihood ratios.
May 9, 2018

General Gaussians
For y ∼ N(η, Σ) let the event be γT
j y κj.
Same as xT
ωj τj for x dim N(0, I) and
ωj =
γT
j Σ1/2
γT
j Σγj
and τj =
κj − γT
j η
γT
j Σγj
Optimizations
Later we will want to vary η.
That changes τj but not ωj.
May 9, 2018

More bounds
For event count S(x) =
J
j=1 Hj(x) and x ∼ N(0, I):
E(S) =
J
j=1
Pj = ¯µ
P(S > 0) = E max
1 j J
Hj(x) = µ
E(S−1
| S > 0) =
J
s=1
P(S = s)
s
P(S > 0) =
J
s=1
P(S = s)
s
µ
Therefore
n × Var(ˆµ) = ¯µ
J
s=1
P(S = s)
s
− µ2
= ¯µ × µ × E(S−1
| S > 0) − µ2
= µ2
× E(S | S > 0) × E(S−1
| S > 0) − µ2
May 9, 2018

Bounds continued
Var(ˆµ)
µ2
1
n
E(S | S > 0) × E(S−1
| S > 0) − 1
Lemma
Let S be a random variable in {1, 2, . . . , J} for J ∈ N. Then
E(S)E(S−1
)
J + J−1
+ 2
4
by Cauchy-Schwarz, with equality if and only if S ∼ U{1, J}.
Corollary
Var(ˆµ)
µ2
n
J + J−1
− 2
4
.
For J > 2 there must be a sharper bound for half-spaces and a Gaussian.
Thanks to Yanbo Tang and Jeffrey Negrea for an improved proof of the lemma.
May 9, 2018

Numerical comparison
R function mvtnorm of Genz & Bretz (2009) gets
P a y b = P ∩j{aj yj bj} for y ∼ N(η, Σ)
Their code makes sophisticated use of quasi-Monte Carlo.
Adaptive, up to 25,000 evals in FORTRAN.
It was not designed for rare events.
It computes an intersection.
Usage
Pack ωj into Ω and τj into T
1 − µ = P(ΩT
x T ) = P(y T ), y ∼ N(0, ΩT
Ω)
It can handle up to 1000 inputs, i.e., J 1000
Upshot
ALOE works (much) better for rare events.
mvtnorm works better for non-rare events.
May 9, 2018

Circumscribed polygon
P(J, τ) regular J sided polygon in R2
outside circle of radius τ > 0.
q
a priori bounds
µ = P(x ∈ Pc
) P(χ2
(2) τ2
) = exp(−τ2
/2)
This is pretty tight. A trigonometric argument gives
1
µ
exp(−τ2/2)
1 −
J tan π
J − π τ2
2π
.
= 1 −
π2
τ2
6J2
Let’s use J = 360
So µ
.
= exp(−τ2
/2) for reasonable τ.
May 9, 2018

IS vs MVN
ALOE had n = 1000. MVN had up to 25,000. 100 random repeats.
τ µ E((ˆµALOE/µ − 1)2
) E((ˆµMVN/µ − 1)2
)
2 1.35×10−01
0.000399 9.42×10−08
3 1.11×10−02
0.000451 9.24×10−07
4 3.35×10−04
0.000549 2.37×10−02
5 3.73×10−06
0.000600 1.81×10+00
6 1.52×10−08
0.000543 4.39×10−01
7 2.29×10−11
0.000559 3.62×10−01
8 1.27×10−14
0.000540 1.34×10−01
For τ = 5 MVN had a few outliers.
May 9, 2018

Polygon again
ALOE
Relative estimate
Frequency
0.96 1.00 1.04
0246810 Pmvnorm
Relative estimate
Frequency
1 2 3 4 5 6
0102030405060
Results of 100 estimates of the P(x ∈ P(360, 6)), divided by exp(−62
/2).
Left panel: ALOE. Right panel: pmvnorm.
May 9, 2018

More examples
Similar results for high dimensions.
ALOE very effective on rare events.
MVN can go > 1000× the union bound.
May 9, 2018

Power systems
Network with N nodes called busses and M edges.
Typically M/N is not very large.
Power pi at bus i. Each bus is Fixed or Random or Slack:
slack bus S has pS = − i=S pi.
p = (pT
F , pT
R, pS)T
∈ RN
Randomness driven by pR ∼ N(ηR, ΣRR)
Inductances Bij
A Laplacian Bij = 0 if i ∼ j in the network. Bii = − j=i Bij.
Taylor approximation
θ = B+
p (pseudo inverse)
Therefore θ is Gaussian as are all θi − θj for i ∼ j.
May 9, 2018

Examples
Our examples come from MATPOWER (a Matlab toolbox).
Zimmernan, Murillo-S´anchez & Thomas (2011)
We used n = 10,000.
Polish winter peak grid
2383 busses, d = 326 random busses, J = 5772 phase constraints.
¯ω ˆµ se/ˆµ µ ¯µ
π/4 3.7 × 10−23
0.0024 3.6 × 10−23
4.2 × 10−23
π/5 2.6 × 10−12
0.0022 2.6 × 10−12
2.9 × 10−12
π/6 3.9 × 10−07
0.0024 3.9 × 10−07
4.4 × 10−07
π/7 2.0 × 10−03
0.0027 2.0 × 10−03
2.4 × 10−03
¯ω is the phase constraint, ˆµ is the ALOE estimate, se is the estimated standard
error, µ is the largest single event probability and ¯µ is the union bound. May 9, 2018

Pegase 2869
Fliscounakis et al. (2013) “large part of the European system”
N = 2869 busses. d = 509 random busses. J = 7936 phase constraints.
Results
¯ω ˆµ se/ˆµ µ ¯µ
π/2 3.5 × 10−20
0∗
3.3 × 10−20
3.5 × 10−20
π/3 8.9 × 10−10
5.0 × 10−5
7.7 × 10−10
8.9 × 10−10
π/4 4.3 × 10−06
1.8 × 10−3
3.5 × 10−06
4.6 × 10−06
π/5 2.9 × 10−03
3.5 × 10−3
1.8 × 10−03
4.1 × 10−03
Notes
¯θ = π/2 is unrealistically large. We got ˆµ = ¯µ. All 10,000 samples had S = 1.
Some sort of Wilson interval or Bayesian approach could help.
One half space was sampled 9408 times, another 592 times. May 9, 2018

Other models
IEEE case 14 and IEEE case 300 and Pegase 1354 were all dominated by one
failure mode so µ
.
= ¯µ and no sampling is needed.
Another model had random power corresponding to wind generators but phase
failures were not rare events in that model.
The Pegase 13659 model was too large for our computer. The Laplacian had
37,250 rows and columns.
The Pegase 9241 model was large and slow and phase failure was not a rare
event.
Caveats
We used a DC approximation to AC power ﬂow (which is common) and the phase
estimates were based on a Taylor approximation.
May 9, 2018

Genomics (in progress)
Similar problem comes up with false discoveries. Sketch
Measure G genes on n people.
Expression X = (X1 X2 · · · XG) ∈ Rn×G
And a phenotype: Y ∈ Rn
Exploration
Which genes (if any) correlate with Y ?
What could go wrong’?
Try G = 10,000 tests at α = 5% signiﬁcance.
Expect at least 500 discoveries by chance.
Bonferroni
So · · · test at level α/G, e.g., 5 × 10−6
.
Seems conservative
May 9, 2018

Multiple testing
FD = # False discoveries
TD = # True discoveries
False discovery proportion and rate
FDP =



FD
FD+TD
, denom not 0
0, else
FDR = E(FDP)
If Y Gaussian and independent of X,
then |XT
g Y | > τ brings a false discovery.
Benjamini-Hochberg
Controls FDR at e.g., 10% for independent tests
(and some dependent cases)
Correlations among genes =⇒ False discoveries come in bursts.
May 9, 2018

Testing for association
Center and scale columns Xg.
Gene g is signiﬁcantly associated with Y if
XT
g Y τ or XT
g Y −τ
For Gaussian Y we have similar geometry to the power grid problem
Using ALOE
We get P(FD k) X held ﬁxed Y random
Power systems case was k = 1.
May 9, 2018

Bursts from Benjamini-Hochberg
We will use actual X, (Duchenne Muscular dystrophy data)
Gaussian noise Y
Thanks to Jingshu Wang
Target false discovery rate 10%.
log2(1 + #FD) 0 1 2 3 4 6 7 8 9 10 11
3,756 genes, 30 subj 909 50 18 7 3 3 1 4 2 1 2
3,756 Indep. p-values 894 93 13
12,625 genes, 23 subj 929 30 19 6 3 1 1 3 1 5 2
12,625 Indep. p-values 902 85 12 1
Table 1: In 1000 simulations of BH with Y ∼ N(0, In), and real data X or indep
p-values.
May 9, 2018

*****************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************
******************************
*************************
***
*
*
**
*
*
******
*
0 200 400 600 800 1000
1550500
Duchenne muscular dystrophy data
Benjamini−Hochberg at FDR 10%
1000 null simulations
Rank
#B−Hdiscoveries
qqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqq
qqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqq
qqqqqqqqq
qqq
q
q
*
Independent p−values
12625 genes, 23 subj
3756 genes, 30 subj
FDR 10%
May 9, 2018

Duchenne data p = 0.005
3756 genes
Black: ALOE estimates (5 times)
Red: binomial reference
Source: Kenneth Tay May 9, 2018

Duchenne data p = 5 × 10−5
3756 genes
Black: ALOE estimates
Red: binomial reference
Source: Kenneth Tay
May 9, 2018

Complete null
In the complete null, no genes associate with Y .
What if some do?
Does that raise or lower the number of false discoveries?
May 9, 2018

Null domination
P FD k | TD = 0 P FD k | TD > 0
Dudoit and van der Laan show t-tests have asymptotic null domination
K. Tay simulations show: not even close for small n and big G.
May 9, 2018

Generalization I
Replace half spaces by convex sets
−10 −5 0 5 10
−6−4−20246
c(−7,6)
q
q
q
Algorithm will go through
Variance bounds too
Coefﬁcient of variation: ok unless P(blue) P(red)
May 9, 2018

Generalization II
Replace Gaussian by log concave density
−10 −5 0 5 10
−6−4−20246
c(−7,6)
q
q
q
We can ﬁnd the instantons
We may have to do location shifts (instead of conditional sampling) May 9, 2018

Generalization III
Sharper power systems models are nonlinear. Let
Hj(x) ≡ {x | gj(x) τj}
• Boundary not known a priori
• Sampling xi reveals gj(xi)
• Kriging for gj.
May 9, 2018

Generalization IV
To minimize cost subject to P(Hu) ε we want to estimate
∂E(Hu)
∂τj
After some algebra
∂µ
∂τj
= −ϕ(τj) 1 − E(H−j(x) | ωT
j x = τj)
= −ϕ(τj)E(Hc
−j(x) | ωT
j x = τj)
= −ϕ(τj)P(No other events | Hj barely happened)
The goal is to get noisy estimates from the same data used to estimate µ.
Notes
1) For large τj most ωT
j x τj satisfy ωT
j x
.
= τj
2) Adapt the sampling to ensure that several Hj happen even if τj is large.
May 9, 2018

Thanks
• Weight working group + Yue Hui
• Michael Chertkov and Yury Maximov, co-authors
• Center for Nonlinear Studies at LANL, for hospitality
• Kenneth Tay for genomic examples
• Jingshu Wang for Duchenne muscular dystrophy data
• Alan Genz for discussions on mvtnorm
• Bert Zwart, Jose Blanchet, David Siegmund for references
• Yanbo Tang and Jeffrey Negrea, improved Lemma proof
• NSF DMS-1407397 & DMS-1521145
• DOE/GMLC 2.0 Project: Emergency monitoring and controls through new
technologies and analytics
• SAMSI, Ilse Ipsen, David Banks
• Sue, Kerem, Thomas, Rita, Rebecca, Rick May 9, 2018

QMC: Transition Workshop - Importance Sampling the Union of Rare Events with an Application to Power Systems Analysis - Art Owen, May 9, 2018

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to QMC: Transition Workshop - Importance Sampling the Union of Rare Events with an Application to Power Systems Analysis - Art Owen, May 9, 2018

Similar to QMC: Transition Workshop - Importance Sampling the Union of Rare Events with an Application to Power Systems Analysis - Art Owen, May 9, 2018 (20)

More from The Statistical and Applied Mathematical Sciences Institute

More from The Statistical and Applied Mathematical Sciences Institute (20)

Recently uploaded

Recently uploaded (20)

QMC: Transition Workshop - Importance Sampling the Union of Rare Events with an Application to Power Systems Analysis - Art Owen, May 9, 2018