Ensembles of Many Diverse Weak Defenses can be Strong: Defending Deep Neural ...
optimal subsampling
1. Optimal Subsampling Strategy for Logistic
Regression
Qianshun Cheng and Tian Tian
University of Illinois at Chicago
Background
Introduction
Massive data are presented more and more frequently in
modern scientific research.
How to extract useful information from massive data has
been a hot spot problem.
– Truncate and merge
– Subsampling based algorithms
Advantage and disadvantage of
subsampling-based algorithms
Advantages
– Efficiently downsize data
– Easy computation and implementation
Disadvantages
– Sampling errors
– Efficiency of extracting informations
Motivation for our strategy
Is there a way to better preserve the majority information
contained in the full data?
Logistic regression model
Unknown parameter β = (β0, · · · , βm)T
;
Binary response Yi at feature vector Xi is modeled as
follows,
Prob(Yi = 1|Xi) = P(Xi, β) =
exp(XT
i β)
1 + exp(XT
i β)
, i = 1, ..., n. (1)
(locally) D-optimal designs
D-optimal designs: How to assign feature value Xi’s such
that the determinant of the information matrix with respect to
β can be maximized?
Theorem (Yang, Zhang and Huang, 2011):
Under logistic model (1), a D-optimal design with respect to
β is
ξ∗
= {(C∗
l1
, 1/2m
), (C∗
l2
, 1/2m
), l = 1, · · · , 2m−1
}
where C∗
lj
= (1, al,1, · · · , al,m−1, (−1)j−1
c∗
), j = 1, 2.
– c∗
minimizes function f(c) = c−2
(Ψ(c))−m−1
, where Ψ(c) = [P (x)]2
P(x)(1−P(x));
– al,k is the boundary point of the design space at the k-th dimension,
k = 1, · · · , m − 1.
Subsampling Algorithm
Algorithm
(I). Given data set {(Yi, XT
i ), i = 1, · · · , n}, choose a
subsample of size ro by random sampling;
(II). Fit the data and obtain an initial estimate
ˆβ = ( ˆβ0, · · · , ˆβm);
(III). Obtain B = {i | min{|ci − c∗
|, |ci + c∗
|} ≤ δ} by
calculating ci = XT
i
ˆβ;
(IV). From {(Yi, XT
i ), i ∈ B}, pick r1
2(m−1) Xi’s and Xj’s where
Xi1’s and Xj1’s are the first r1
2(m−1) largest and smallest
values, respectively, among the first-dimension
components.
(V). Remove the chosen points from set B, and then
continue to the next dimension. Collect data after the
maximums and minimums at each of the m − 1
dimensions have been searched for and located.
(VI). The newly collected r1 data points serve as the starting
subsample for the next iteration, where the above steps
are repeated.
Simulation settings for small sample size
scenarios
Total sample size n = 10000.
Starting subsample size r0 = 200.
Parameter dimension m = 7.
True parameter value β = (0.5, · · · , 0.5).
Variance-covariance structure Σ is compound symmetry
with diagonal entries being 1 and off-diagonal 0.5.
– NzNormal
– MzNormal
– Mixed Normal
– T3
Simulation settings for large sample size
scenarios
Total sample size n = 500000.
Starting subsample size r0 = 1000.
Other settings same as above.
– Mixed Normal
– T3
Simulation results
Simulation results (small sample size)
-2.75
-2.50
-2.25
-2.00
600 700 800 900 1000
r1
MSE
Algorithm
New Algorithm
mVc
Random Sampling
(a) MzNormal
-2.4
-2.0
-1.6
600 700 800 900 1000
r1
MSE
Algorithm
New Algorithm
mVc
Random Sampling
(b) NzNormal
-2.3
-2.1
-1.9
-1.7
600 700 800 900 1000
r1
MSE
Algorithm
New Algorithm
mVc
Random Sampling
(c) MixNormal
0.0
0.5
1.0
600 700 800 900 1000
r1
MSE
Algorithm
New Algorithm
mVc
Random Sampling
(d) T3
Simulation results (large sample size)
-4.0
-3.5
-3.0
-2.5
1000 2000 3000 4000 5000
r1
MSE
Algorithm
New Algorithm
mVc
Random Sampling
(a) MixNormal
-3
-2
-1
0
1000 2000 3000 4000 5000
r1
MSE
Algorithm
New Algorithm
mVc
Random Sampling
(b) T3
Ongoing Work
Incorporate LEV algorithm into sampling.
Incorporate higher order terms or interaction terms into
model building.
Incorporate model selection/averaging problem into current
structure.
Email: qcheng5@uic.edu, ttian3@uic.edu CCASA Student Showcase 2016 MSCS, UIC