Seminar0917

Contents
Random sampling on DB records
Preliminary
Theorems
Extra Slides
Student’s t distribution with the degrees of
freedom 2 and its applications
Toshiyuki Shimono
DG Lab
Data-driven Mathematical Science
2018-09-18 Tue 11:00
1 / 21

Contents
Preliminary
Theorems
Extra Slides
Contents
1 Random sampling on DB records
2 Preliminary
Terminology
Basic Stochastic Distributions
Student’s T distribution
Snedecor’s F distribution
3 Theorems
Logarithmic Variances
4 Extra Slides
Transcendental Functions
2 / 21

Contents
Preliminary
Theorems
Extra Slides
Application to Statistical Disclosure Control
Direction: Analyze data in a company/government.
3 / 21

Contents
Preliminary
Theorems
Extra Slides
Transaction data are accumulated (due to recent trend).
3 / 21

Contents
Preliminary
Theorems
Extra Slides
Often external experts handle/analyse the data.
3 / 21

Contents
Preliminary
Theorems
Extra Slides
And Keeping the data conﬁdentiality is necessary.
3 / 21

Contents
Preliminary
Theorems
Extra Slides
Statistical Disclosure Control is necessary.
3 / 21

Contents
Preliminary
Theorems
Extra Slides
Multiplicative noise is often useful on numerical data.
▷ Additive noise is also used, but may not be useful enough.
3 / 21

Contents
Preliminary
Theorems
Extra Slides
Only a few distributions seems to employed so far.
▷ N(µ, σ2),U[a, b] are mentioned by [Privacy-preserving data
mining, Agrawal, Srikant, ACM SIGMOD, 2000].
▷ Also Gamma dist. and log normal dist. are [Privacy protection
and quantile estimation from noise multiplied data, Sinha, Nayak,
Zayatz, Sankhya B, 2012].
3 / 21

Contents
Preliminary
Theorems
Extra Slides
Only a few distributions seems to employed so far.
|T(2) |, |T(1) |, F(2, 2) may be more useful.
Also utilizable for weighted random sampling.
3 / 21

Contents
Preliminary
Theorems
Extra Slides
Adding/multiplying noises preserves some statistical properties
such as sum and average. We also want to preserve “weighted
random sampling” property.
4 / 21

Contents
Preliminary
Theorems
Extra Slides
Why Weighted Random Sampling on a Table?
5 / 21

Contents
Preliminary
Theorems
Extra Slides
Human eyes can only see sampled records of table.
▶ A table may contains thousands, millions, billions of record. Too huge
for human eyes.
5 / 21

Contents
Preliminary
Theorems
Extra Slides
Without randomness they only leads to biased view.
▶ Without randomness one often only see :
only the beginning or
only the end parts
only the eye-catching records.
▷ [Sampling Techniques, W. G. Cochran, 1977] covers this topic above.
5 / 21

Contents
Preliminary
Theorems
Extra Slides
Without randomness they only leads to biased view.
Weight (such as price) helps to avoid trivial sampling.
▶ Weighted random sampling retrieves records according to the
probability proportional to an auxiliary variable such as price.
▶ Simple random sampling often retrieves the records with low prices
whose importance is not weighty.
5 / 21

Contents
Preliminary
Theorems
Extra Slides
Table: Word frequency table of ”Hamlet”. Simple rand. samp.
word count word count word count word count word count
OPHELIA 67 stuﬀ 3 lament 2 Looking 1 Mourners 1
doth 23 chief 3 translate 2 ’take 1 strokes 1
use 15 ambassadors 3 Excellent 2 frowningly 1 drains 1
devil 9 puﬀ’d 2 revolution 1 east 1 scent 1
home 6 plague 2 Pinch 1 profanely 1 warning 1
touch 6 venom 2 access 1 struggling 1 betimes 1
season 5 spokes 2 bravery 1 nerve 1 hent 1
get 5 lunacy 2 quietly 1 amities 1 assure 1
ha 4 Lady 2 counterfeit 1 Know 1 Stay’d 1
neck 3 Drown’d 2 consider’d 1 toys 1 moods 1
6 / 21

Contents
Preliminary
Theorems
Extra Slides
Table: Word frequency table of ”Hamlet”. Simple vs. Weighted.
OPHELIA 67 stuﬀ 3 lament 2 Looking 1 Mourners 1
doth 23 chief 3 translate 2 ’take 1 strokes 1
use 15 ambassadors 3 Excellent 2 frowningly 1 drains 1
devil 9 puﬀ’d 2 revolution 1 east 1 scent 1
home 6 plague 2 Pinch 1 profanely 1 warning 1
touch 6 venom 2 access 1 struggling 1 betimes 1
season 5 spokes 2 bravery 1 nerve 1 hent 1
get 5 lunacy 2 quietly 1 amities 1 assure 1
ha 4 Lady 2 counterfeit 1 Know 1 Stay’d 1
neck 3 Drown’d 2 consider’d 1 toys 1 moods 1
the 995 And 263 more 90 many 18 parts 3
and 706 this 248 at 75 command 10 ways 3
to 635 me 234 well 65 hell 10 antique 2
of 630 him 197 let 60 honour 10 yesternight 1
I 546 he 178 speak 55 Reads 5 constantly 1
my 441 HORATIO 128 go 52 Follow 5 emulate 1
HAMLET 407 do 127 night 47 stir 5 honour’s 1
it 361 what 116 into 27 knew 5 really 1
not 299 all 108 Good 25 ourself 3 revolution 1
that 266 our 107 Ghost 25 white 3 riotous 1
6 / 21

Contents
Preliminary
Theorems
Extra Slides
The Procedure – To Be Realized..
The Analyzing Procedure
1 Prepare a table T to be analyzed.
7 / 21

Contents
Preliminary
Theorems
Extra Slides
2 Apply noise on a sensitive variable (column) v of T.
7 / 21

Contents
Preliminary
Theorems
Extra Slides
3 ”Expert” gets the transformed table T′ with v′.
7 / 21

Contents
Preliminary
Theorems
Extra Slides
4 Apply various analysis on T′.
7 / 21

Contents
Preliminary
Theorems
Extra Slides
1 Performs several analysis on T′
as usual.
7 / 21

Contents
Preliminary
Theorems
Extra Slides
as usual.
2 Numerical sum of v′
may well reﬂects the sum of v.
7 / 21

Contents
Preliminary
Theorems
Extra Slides
as usual.
3 Random sampling of T′
by the weight v is possible!
Note: v is hidden. Only v′ can be seen by the expert.
7 / 21

Contents
Preliminary
Theorems
Extra Slides
as usual.
3 Random sampling of T′
by the weight v is possible!
5 The data provider can judge the ability of the expert without
showing the precise numerical values of v.
7 / 21

Contents
Preliminary
Theorems
Extra Slides
Terminology
Terminology
Variate : (Random) variate is a particular outcome of a
random variable.
iid : Independent and Identically Distributed.
8 / 21

Contents
Preliminary
Theorems
Extra Slides
Terminology
Basic Stochastic Distributions 基本的な確率分布
U[a, b] : uniform distribution between a and b.
N(µ, σ2) : Gaussian dist. with mean µ and variance σ2.
χ2(ν) : chi-squared dist. with ν degrees of freedom,
obtained by z2
1 + · · · + z2
ν by zi
iid
∼ N(0, 12).
T(1) is also called the Cauchy distribution.
T(ν) : obtained by
z
√
q/ν
with z ∼ N(0, 1), q ∼ χ2(ν).
F(ν1, ν2) : obtained by
q1/ν1
q2/ν2
with
{
q1 ∼ χ2(ν1)
q2 ∼ χ2(ν2),
9 / 21

Contents
Preliminary
Theorems
Extra Slides
Terminology
Student’s T distribution (1908)
T(ν) : Student’s T distribution
with the degrees of freedom ν = 1, 2, 3...
10 / 21

Contents
Preliminary
Theorems
Extra Slides
Terminology
T(ν) can be got by
√
ν z0
√∑ν
i=1 z2
i
with zi
iid
∼ N(0, 12).
10 / 21

Contents
Preliminary
Theorems
Extra Slides
Terminology
T(ν) can be got by
√
ν z0
√∑ν
i=1 z2
i
with zi
iid
∼ N(0, 12).
T(1) and T(2) are easily obtained.
T(1) : tan(πu) from u ∼ U[0, 1].
T(2) :
√
2 u
√
1 − u2
from u ∼ U[−1, 1].
|T(1) | and |T(2) | appear in this presentation
by taking the absolute value of the variates.
10 / 21

Contents
Preliminary
Theorems
Extra Slides
Terminology
-3 -2 -1 0 1 2 3
0.0
0.2
0.4
0.6
0.8
T(1)
|T(1)|
10 / 21

Contents
Preliminary
Theorems
Extra Slides
Terminology
-3 -2 -1 0 1 2 3
0.0
0.2
0.4
0.6
0.8
T(2)
|T(2)|
10 / 21

Contents
Preliminary
Theorems
Extra Slides
Terminology
And F(2, 2) appear,
which will be explained from next
-3 -2 -1 0 1 2 3
0.0
0.2
0.4
0.6
0.8
F(2, 2)
10 / 21

Contents
Preliminary
Theorems
Extra Slides
Terminology
Snedecor’s F distribution (1934)
F(ν1, ν2) : F-distribution with the degrees of ν1 and ν2.
11 / 21

Contents
Preliminary
Theorems
Extra Slides
Terminology
F(ν1, ν2) can be got
q1/ν1
q2/ν2
with
{
q1 ∼ χ2(ν1)
q2 ∼ χ2(ν2),
Density :
√
(ν1x)ν1 × ν2
ν2
(ν1x + ν2)ν1+ν2
x−1
/
B(
ν1
2
,
ν2
2
) for x ≥ 0.
11 / 21

Contents
Preliminary
Theorems
Extra Slides
Terminology
F(ν1, ν2) can be got
q1/ν1
q2/ν2
with
{
q1 ∼ χ2(ν1)
q2 ∼ χ2(ν2),
Density :
√
(ν1x)ν1 × ν2
ν2
(ν1x + ν2)ν1+ν2
x−1
/
B(
ν1
2
,
ν2
2
) for x ≥ 0.
Only F(2, 2) appears in this presentation.
Easily obtained by : u/(1 − u) from u ∼ U[0, 1].
Density :
{
x < 0 : 0
x ≥ 0 : 1/(1 + x)2.
11 / 21

Contents
Preliminary
Theorems
Extra Slides
A proposition about |T(2) |
Theorem
For any v1, v2 > 0 :
Prob
[
v1x1 > v2x2
]
: Prob
[
v1x1 < v2x2
]
= v1 : v2
where x1, x2
iid
∼ |T(2) |.
12 / 21

Contents
Preliminary
Theorems
Extra Slides
Theorem
Prob
[
v1x1 > v2x2
]
= v1/(v1+v2)
: Prob
[
v1x1 < v2x2
]
= v2/(v1+v2)
= v1 : v2
where x1, x2
iid
∼ |T(2) |.
12 / 21

Contents
Preliminary
Theorems
Extra Slides
Theorem
Prob
[
v1x1 > v2x2
]
= v1/(v1+v2)
: Prob
[
v1x1 < v2x2
]
= v2/(v1+v2)
= v1 : v2
where x1, x2
iid
∼ |T(2) |.
— cf. Bradley-Terry model (1952):
12 / 21

Contents
Preliminary
Theorems
Extra Slides
Theorem
Prob
[
v1x1 > v2x2
]
= v1/(v1+v2)
: Prob
[
v1x1 < v2x2
]
= v2/(v1+v2)
= v1 : v2
where x1, x2
iid
∼ |T(2) |.
— cf. Bradley-Terry model (1952):
Prob [ ”player i ” beats ”player j ” ] =
vi
vi + vj
Applied to food preferences, sports team strengths.
12 / 21

Contents
Preliminary
Theorems
Extra Slides
Theorem
Prob
[
v1x1 > v2x2
]
= v1/(v1+v2)
: Prob
[
v1x1 < v2x2
]
= v2/(v1+v2)
= v1 : v2
where x1/x2 ∼ |T(2) | ×⊥
|T(2) |−1
×⊥
means the independent variates multiplication.
−1 in superscript means the reciprocal of the variate.
12 / 21

Contents
Preliminary
Theorems
Extra Slides
Theorem
Prob
[
v1x1 > v2x2
]
= v1/(v1+v2)
: Prob
[
v1x1 < v2x2
]
= v2/(v1+v2)
= v1 : v2
where x1/x2 ∼ |T(2) | ×⊥
|T(2) |−1
≡ |T(1) | ×⊥
F(2, 2)1/2
The superscription means exponent.
1/2 means taking the square root of the variate.
1/4, 1/8 will appear as 4th root, 8th root.
12 / 21

Contents
Preliminary
Theorems
Extra Slides
Theorem
Prob
[
v1x1 > v2x2
]
= v1/(v1+v2)
: Prob
[
v1x1 < v2x2
]
= v2/(v1+v2)
= v1 : v2
where x1/x2 ∼ |T(2) | ×⊥
|T(2) |−1
≡ |T(1) | ×⊥
F(2, 2)1/2
Note:
T(1) ≡ T(1)−1, F(2, 2) ≡ F(2, 2)−1, T(2) ̸≡ T(2)−1.
12 / 21

Contents
Preliminary
Theorems
Extra Slides
Theorem
Prob
[
v1x1 > v2x2
]
= v1/(v1+v2)
: Prob
[
v1x1 < v2x2
]
= v2/(v1+v2)
= v1 : v2
where x1/x2 ∼ |T(2) | ×⊥
|T(2) |−1
≡ |T(1) | ×⊥
F(2, 2)1/2
≡ |T(1) | ×⊥
|T(1) |1/2 ×⊥
F(2, 2)1/4
12 / 21

Contents
Preliminary
Theorems
Extra Slides
Theorem
Prob
[
v1x1 > v2x2
]
= v1/(v1+v2)
: Prob
[
v1x1 < v2x2
]
= v2/(v1+v2)
= v1 : v2
where x1/x2 ∼ |T(2) | ×⊥
|T(2) |−1
≡ |T(1) | ×⊥
F(2, 2)1/2
≡ |T(1) | ×⊥
|T(1) |1/2 ×⊥
F(2, 2)1/4
≡ |T(1) | ×⊥
|T(1) |1/2 ×⊥
|T(1) |1/4 ×⊥
F(2, 2)1/8
12 / 21

Contents
Preliminary
Theorems
Extra Slides
Theorem
Prob
[
v1x1 > v2x2
]
= v1/(v1+v2)
: Prob
[
v1x1 < v2x2
]
= v2/(v1+v2)
= v1 : v2
where x1/x2 ∼ |T(2) | ×⊥
|T(2) |−1
≡ |T(1) | ×⊥
F(2, 2)1/2
≡ |T(1) | ×⊥
|T(1) |1/2 ×⊥
F(2, 2)1/4
≡ |T(1) | ×⊥
|T(1) |1/2 ×⊥
|T(1) |1/4 ×⊥
F(2, 2)1/8
≡ · · ·
12 / 21

Contents
Preliminary
Theorems
Extra Slides
Theorem
Prob
[
v1x1 > v2x2
]
= v1/(v1+v2)
: Prob
[
v1x1 < v2x2
]
= v2/(v1+v2)
= v1 : v2
where x1/x2 ∼ |T(2) | ×⊥
|T(2) |−1
≡ |T(1) | ×⊥
F(2, 2)1/2
≡ |T(1) | ×⊥
|T(1) |1/2 ×⊥
F(2, 2)1/4
≡ |T(1) | ×⊥
|T(1) |1/2 ×⊥
|T(1) |1/4 ×⊥
F(2, 2)1/8
≡ · · ·
≡ F(2, 2).
12 / 21

Contents
Preliminary
Theorems
Extra Slides
Combinations of the distributions
to enable WRS
and to enable Pr[v1x1 > v2x2] : Pr[v1x1 < v2x2] = v1 : v2
Dist. of x1 Dist. of x2 var[ log x1 ]
|T(2) | |T(2) | π2/6
F(2, 2)1/2 |T(1) | π2/3
F(2, 2)1/4 |T(1) | ×⊥
|T(1) |1/2 π2/12
F(2, 2)1/8 |T(1) | ×⊥
|T(1) |1/2 ×⊥
|T(1) |1/4 π2/48
· · · · · ·
|T(1) | F(2, 2)1/2 π2/4
|T(1) |1/2 |T(1) | ×⊥
F(2, 2)1/4 π2/16
|T(1) |1/4 |T(1) |1/2 ×⊥
|T(1) | ×⊥
F(2, 2)1/8 π2/64
· · · · · ·
13 / 21

Contents
Preliminary
Theorems
Extra Slides
Proof outline:
Relations such as Γ(2z) = 22z−1π−1/2Γ(z)Γ(z + 1/2) and
B(x, y) = Γ(x)Γ(y)
Γ(x+y) = 2
∫ π/2
0 sin2x−1
t cos2y−1 t dt are used.
E[Xm] for m ∈ R is calculated for each distribution X such as
F(2, 2), |T(1) |, |T(2) |, which are Γ(1 + m)Γ(1 − m),
Γ(1+m
2 )Γ(1−m
2 )/π,
√
2
m
√
π
Γ(1+m
2 )Γ(2−m
2 ), respectively.
14 / 21

Contents
Preliminary
Theorems
Extra Slides
Equivalency Lemma
Lemma
Assume variables x1, x2 >
a.s.
0 are independent. Then,
Prob
x1, x2
[ v1x1 > v2x2 ] =
v1
v1 + v2
for ∀v1, v2 > 0
⇔
x1
x1 + x2
∼ U[0, 1]
Proof: Maybe trivial.
15 / 21

Contents
Preliminary
Theorems
Extra Slides
Equivalency Lemma
Lemma
Assume variables x1, x2 >
a.s.
0 are independent. Then,
Prob
x1, x2
[ v1x1 > v2x2 ] =
v1
v1 + v2
for ∀v1, v2 > 0
⇔
x1
x1 + x2
=
1
1 + (x2/x1)
∼ U[0, 1]
15 / 21

Contents
Preliminary
Theorems
Extra Slides
Histogram of 1/(1 + abs(rt(n, 1)/rt(n, 1)))
1/(1 + abs(rt(n, 1)/rt(n, 1)))
Frequency
0.0 0.2 0.4 0.6 0.8 1.0
010000200003000040000
16 / 21

Contents
Preliminary
Theorems
Extra Slides
1/(1 + abs(rt(n, 2)/rt(n, 2)))
Frequency
0.0 0.2 0.4 0.6 0.8 1.0
010000200003000040000
16 / 21

Contents
Preliminary
Theorems
Extra Slides
1/(1 + abs(rt(n, 3)/rt(n, 3)))
Frequency
0.0 0.2 0.4 0.6 0.8 1.0
010000200003000040000
16 / 21

Contents
Preliminary
Theorems
Extra Slides
Histogram of 1/(1 + abs(rnorm(n)/rnorm(n)))
1/(1 + abs(rnorm(n)/rnorm(n)))
Frequency
0.0 0.2 0.4 0.6 0.8 1.0
010000200003000040000
16 / 21

Contents
Preliminary
Theorems
Extra Slides
Logaritimic Variances
Theorem
Var( F(2, 2) ) = ∞
17 / 21

Contents
Preliminary
Theorems
Extra Slides
Theorem
Var( F(2, 2) ) = ∞
Var( |T(1) | ) = ∞
17 / 21

Contents
Preliminary
Theorems
Extra Slides
Theorem
Var( F(2, 2) ) = ∞
Var( |T(1) | ) = ∞
Var( |T(2) | ) = ∞
17 / 21

Contents
Preliminary
Theorems
Extra Slides
Theorem
Var(log F(2, 2)) = π2/3
Var(log |T(1) |) = π2/4
Var(log |T(2) |) = π2/6
log above : taking log of the var and then forming a new dist.
Note: consistent with the previous theorem.
17 / 21

Contents
Preliminary
Theorems
Extra Slides
Histogram of log(abs(rf(n, 2, 2)))
0.0
0.1
0.2
0.3
0.4 log.001
log.01
log.1
log1
log10
log100
log1000
18 / 21

Contents
Preliminary
Theorems
Extra Slides
Histogram of log(abs(rt(n, 1)))
0.0
0.1
0.2
0.3
0.4 log.001
log.01
log.1
log1
log10
log100
log1000
18 / 21

Contents
Preliminary
Theorems
Extra Slides
Histogram of log(abs(rt(n, 2)))
0.0
0.1
0.2
0.3
0.4 log.001
log.01
log.1
log1
log10
log100
log1000
18 / 21

Contents
Preliminary
Theorems
Extra Slides
Why WRS of T by v-weight is possible?
Fox ﬁxed 0 ≤ v1 ≪ v2,
if random variable x1, x2 ≥ 0 satisﬁes
x1
x2
∼ F(2, 2) :
Prob[
v1x1
x2
> v2] =
v1
v1 + v2
≈
v1
v2
.
19 / 21

Contents
Preliminary
Theorems
Extra Slides
x1
x2
∼ F(2, 2) :
Prob[
v1x1
x2
> v2] =
v1
v1 + v2
≈
v1
v2
.
Thus, under the condition X ×⊥
X′ = F(2, 2),
along with x(i)
iid
∼ X and x′(i)
iid
∼ X′,
calculate v′′(i) := v′(i)/x′(i) where v′(i) := v(i) × x(i).
19 / 21

Contents
Preliminary
Theorems
Extra Slides
x1
x2
∼ F(2, 2) :
Prob[
v1x1
x2
> v2] =
v1
v1 + v2
≈
v1
v2
.
X′ = F(2, 2),
along with x(i)
iid
∼ X and x′(i)
iid
∼ X′,
Deﬁne Sn as #Sn = n and v′′(∀i ∈ Sn) ≥ v′′(∀j /∈ Sn).
19 / 21

Contents
Preliminary
Theorems
Extra Slides
x1
x2
∼ F(2, 2) :
Prob[
v1x1
x2
> v2] =
v1
v1 + v2
≈
v1
v2
.
X′ = F(2, 2),
along with x(i)
iid
∼ X and x′(i)
iid
∼ X′,
Deﬁne Sn as #Sn = n and v′′(∀i ∈ Sn) ≥ v′′(∀j /∈ Sn).
Then {T(i)|i ∈ Sn} is a sample from T
approximately by v-weight.
19 / 21

Contents
Preliminary
Theorems
Extra Slides
Summary

A curious relation of |T(2) | and Bradley-Terry model.
That are derived from various decomposition of F(2, 2).
The application to weighted random sampling, which leads
the understanding of initial DB understanding.
20 / 21

Contents
Preliminary
Theorems
Extra Slides
Transcendental Functions
Transcendental Functions 超越関数
Gamma Function :
Γ(z) =
∫ ∞
0
e−t
tz−1
dt =
∫ 1
0
(− log t)z−1
dt for Re z > 0,
Γ(z) = lim
n→∞
nz
/ n∏
k=0
(1 +
z
k
) for z ∈ C.
Beta function :
B(x, y) =
∫ 1
0
tx−1
(1 − t)y−1
dt for Re x > 0, Re y > 0,
B(x, y) = Γ(x)Γ(y)/Γ(x + y) for x, y ∈ C.
21 / 21

Seminar0917

Recommended

Recommended

More Related Content

Similar to Seminar0917

Similar to Seminar0917 (20)

More from Toshiyuki Shimono

More from Toshiyuki Shimono (20)

Recently uploaded

Recently uploaded (20)

Seminar0917