SlideShare a Scribd company logo
1 of 68
Download to read offline
Contents
Random sampling on DB records
Preliminary
Theorems
Extra Slides
Student’s t distribution with the degrees of
freedom 2 and its applications
Toshiyuki Shimono
DG Lab
Data-driven Mathematical Science
2018-09-18 Tue 11:00
1 / 21
Contents
Random sampling on DB records
Preliminary
Theorems
Extra Slides
Contents
1 Random sampling on DB records
2 Preliminary
Terminology
Basic Stochastic Distributions
Student’s T distribution
Snedecor’s F distribution
3 Theorems
Logarithmic Variances
4 Extra Slides
Transcendental Functions
2 / 21
Contents
Random sampling on DB records
Preliminary
Theorems
Extra Slides
Application to Statistical Disclosure Control
Direction: Analyze data in a company/government.
3 / 21
Contents
Random sampling on DB records
Preliminary
Theorems
Extra Slides
Application to Statistical Disclosure Control
Direction: Analyze data in a company/government.
Transaction data are accumulated (due to recent trend).
3 / 21
Contents
Random sampling on DB records
Preliminary
Theorems
Extra Slides
Application to Statistical Disclosure Control
Direction: Analyze data in a company/government.
Transaction data are accumulated (due to recent trend).
Often external experts handle/analyse the data.
3 / 21
Contents
Random sampling on DB records
Preliminary
Theorems
Extra Slides
Application to Statistical Disclosure Control
Direction: Analyze data in a company/government.
Transaction data are accumulated (due to recent trend).
Often external experts handle/analyse the data.
And Keeping the data confidentiality is necessary.
3 / 21
Contents
Random sampling on DB records
Preliminary
Theorems
Extra Slides
Application to Statistical Disclosure Control
Direction: Analyze data in a company/government.
Transaction data are accumulated (due to recent trend).
Often external experts handle/analyse the data.
And Keeping the data confidentiality is necessary.
Statistical Disclosure Control is necessary.
3 / 21
Contents
Random sampling on DB records
Preliminary
Theorems
Extra Slides
Application to Statistical Disclosure Control
Direction: Analyze data in a company/government.
Transaction data are accumulated (due to recent trend).
Often external experts handle/analyse the data.
And Keeping the data confidentiality is necessary.
Statistical Disclosure Control is necessary.
Multiplicative noise is often useful on numerical data.
▷ Additive noise is also used, but may not be useful enough.
3 / 21
Contents
Random sampling on DB records
Preliminary
Theorems
Extra Slides
Application to Statistical Disclosure Control
Direction: Analyze data in a company/government.
Transaction data are accumulated (due to recent trend).
Often external experts handle/analyse the data.
And Keeping the data confidentiality is necessary.
Statistical Disclosure Control is necessary.
Multiplicative noise is often useful on numerical data.
Only a few distributions seems to employed so far.
▷ N(µ, σ2),U[a, b] are mentioned by [Privacy-preserving data
mining, Agrawal, Srikant, ACM SIGMOD, 2000].
▷ Also Gamma dist. and log normal dist. are [Privacy protection
and quantile estimation from noise multiplied data, Sinha, Nayak,
Zayatz, Sankhya B, 2012].
3 / 21
Contents
Random sampling on DB records
Preliminary
Theorems
Extra Slides
Application to Statistical Disclosure Control
Direction: Analyze data in a company/government.
Transaction data are accumulated (due to recent trend).
Often external experts handle/analyse the data.
And Keeping the data confidentiality is necessary.
Statistical Disclosure Control is necessary.
Multiplicative noise is often useful on numerical data.
Only a few distributions seems to employed so far.
|T(2) |, |T(1) |, F(2, 2) may be more useful.
Also utilizable for weighted random sampling.
3 / 21
Contents
Random sampling on DB records
Preliminary
Theorems
Extra Slides
Adding/multiplying noises preserves some statistical properties
such as sum and average. We also want to preserve “weighted
random sampling” property.
4 / 21
Contents
Random sampling on DB records
Preliminary
Theorems
Extra Slides
Why Weighted Random Sampling on a Table?
5 / 21
Contents
Random sampling on DB records
Preliminary
Theorems
Extra Slides
Why Weighted Random Sampling on a Table?
Human eyes can only see sampled records of table.
▶ A table may contains thousands, millions, billions of record. Too huge
for human eyes.
5 / 21
Contents
Random sampling on DB records
Preliminary
Theorems
Extra Slides
Why Weighted Random Sampling on a Table?
Human eyes can only see sampled records of table.
Without randomness they only leads to biased view.
▶ Without randomness one often only see :
only the beginning or
only the end parts
only the eye-catching records.
▷ [Sampling Techniques, W. G. Cochran, 1977] covers this topic above.
5 / 21
Contents
Random sampling on DB records
Preliminary
Theorems
Extra Slides
Why Weighted Random Sampling on a Table?
Human eyes can only see sampled records of table.
Without randomness they only leads to biased view.
Weight (such as price) helps to avoid trivial sampling.
▶ Weighted random sampling retrieves records according to the
probability proportional to an auxiliary variable such as price.
▶ Simple random sampling often retrieves the records with low prices
whose importance is not weighty.
5 / 21
Contents
Random sampling on DB records
Preliminary
Theorems
Extra Slides
Table: Word frequency table of ”Hamlet”. Simple rand. samp.
word count word count word count word count word count
OPHELIA 67 stuff 3 lament 2 Looking 1 Mourners 1
doth 23 chief 3 translate 2 ’take 1 strokes 1
use 15 ambassadors 3 Excellent 2 frowningly 1 drains 1
devil 9 puff’d 2 revolution 1 east 1 scent 1
home 6 plague 2 Pinch 1 profanely 1 warning 1
touch 6 venom 2 access 1 struggling 1 betimes 1
season 5 spokes 2 bravery 1 nerve 1 hent 1
get 5 lunacy 2 quietly 1 amities 1 assure 1
ha 4 Lady 2 counterfeit 1 Know 1 Stay’d 1
neck 3 Drown’d 2 consider’d 1 toys 1 moods 1
6 / 21
Contents
Random sampling on DB records
Preliminary
Theorems
Extra Slides
Table: Word frequency table of ”Hamlet”. Simple vs. Weighted.
word count word count word count word count word count
OPHELIA 67 stuff 3 lament 2 Looking 1 Mourners 1
doth 23 chief 3 translate 2 ’take 1 strokes 1
use 15 ambassadors 3 Excellent 2 frowningly 1 drains 1
devil 9 puff’d 2 revolution 1 east 1 scent 1
home 6 plague 2 Pinch 1 profanely 1 warning 1
touch 6 venom 2 access 1 struggling 1 betimes 1
season 5 spokes 2 bravery 1 nerve 1 hent 1
get 5 lunacy 2 quietly 1 amities 1 assure 1
ha 4 Lady 2 counterfeit 1 Know 1 Stay’d 1
neck 3 Drown’d 2 consider’d 1 toys 1 moods 1
word count word count word count word count word count
the 995 And 263 more 90 many 18 parts 3
and 706 this 248 at 75 command 10 ways 3
to 635 me 234 well 65 hell 10 antique 2
of 630 him 197 let 60 honour 10 yesternight 1
I 546 he 178 speak 55 Reads 5 constantly 1
my 441 HORATIO 128 go 52 Follow 5 emulate 1
HAMLET 407 do 127 night 47 stir 5 honour’s 1
it 361 what 116 into 27 knew 5 really 1
not 299 all 108 Good 25 ourself 3 revolution 1
that 266 our 107 Ghost 25 white 3 riotous 1
6 / 21
Contents
Random sampling on DB records
Preliminary
Theorems
Extra Slides
The Procedure – To Be Realized..
The Analyzing Procedure
1 Prepare a table T to be analyzed.
7 / 21
Contents
Random sampling on DB records
Preliminary
Theorems
Extra Slides
The Procedure – To Be Realized..
The Analyzing Procedure
1 Prepare a table T to be analyzed.
2 Apply noise on a sensitive variable (column) v of T.
7 / 21
Contents
Random sampling on DB records
Preliminary
Theorems
Extra Slides
The Procedure – To Be Realized..
The Analyzing Procedure
1 Prepare a table T to be analyzed.
2 Apply noise on a sensitive variable (column) v of T.
3 ”Expert” gets the transformed table T′ with v′.
7 / 21
Contents
Random sampling on DB records
Preliminary
Theorems
Extra Slides
The Procedure – To Be Realized..
The Analyzing Procedure
1 Prepare a table T to be analyzed.
2 Apply noise on a sensitive variable (column) v of T.
3 ”Expert” gets the transformed table T′ with v′.
4 Apply various analysis on T′.
7 / 21
Contents
Random sampling on DB records
Preliminary
Theorems
Extra Slides
The Procedure – To Be Realized..
The Analyzing Procedure
1 Prepare a table T to be analyzed.
2 Apply noise on a sensitive variable (column) v of T.
3 ”Expert” gets the transformed table T′ with v′.
4 Apply various analysis on T′.
1 Performs several analysis on T′
as usual.
7 / 21
Contents
Random sampling on DB records
Preliminary
Theorems
Extra Slides
The Procedure – To Be Realized..
The Analyzing Procedure
1 Prepare a table T to be analyzed.
2 Apply noise on a sensitive variable (column) v of T.
3 ”Expert” gets the transformed table T′ with v′.
4 Apply various analysis on T′.
1 Performs several analysis on T′
as usual.
2 Numerical sum of v′
may well reflects the sum of v.
7 / 21
Contents
Random sampling on DB records
Preliminary
Theorems
Extra Slides
The Procedure – To Be Realized..
The Analyzing Procedure
1 Prepare a table T to be analyzed.
2 Apply noise on a sensitive variable (column) v of T.
3 ”Expert” gets the transformed table T′ with v′.
4 Apply various analysis on T′.
1 Performs several analysis on T′
as usual.
2 Numerical sum of v′
may well reflects the sum of v.
3 Random sampling of T′
by the weight v is possible!
Note: v is hidden. Only v′ can be seen by the expert.
7 / 21
Contents
Random sampling on DB records
Preliminary
Theorems
Extra Slides
The Procedure – To Be Realized..
The Analyzing Procedure
1 Prepare a table T to be analyzed.
2 Apply noise on a sensitive variable (column) v of T.
3 ”Expert” gets the transformed table T′ with v′.
4 Apply various analysis on T′.
1 Performs several analysis on T′
as usual.
2 Numerical sum of v′
may well reflects the sum of v.
3 Random sampling of T′
by the weight v is possible!
5 The data provider can judge the ability of the expert without
showing the precise numerical values of v.
7 / 21
Contents
Random sampling on DB records
Preliminary
Theorems
Extra Slides
Terminology
Basic Stochastic Distributions
Student’s T distribution
Snedecor’s F distribution
Terminology
Variate : (Random) variate is a particular outcome of a
random variable.
iid : Independent and Identically Distributed.
8 / 21
Contents
Random sampling on DB records
Preliminary
Theorems
Extra Slides
Terminology
Basic Stochastic Distributions
Student’s T distribution
Snedecor’s F distribution
Basic Stochastic Distributions 基本的な確率分布
U[a, b] : uniform distribution between a and b.
N(µ, σ2) : Gaussian dist. with mean µ and variance σ2.
χ2(ν) : chi-squared dist. with ν degrees of freedom,
obtained by z2
1 + · · · + z2
ν by zi
iid
∼ N(0, 12).
T(1) is also called the Cauchy distribution.
T(ν) : obtained by
z
√
q/ν
with z ∼ N(0, 1), q ∼ χ2(ν).
F(ν1, ν2) : obtained by
q1/ν1
q2/ν2
with
{
q1 ∼ χ2(ν1)
q2 ∼ χ2(ν2),
9 / 21
Contents
Random sampling on DB records
Preliminary
Theorems
Extra Slides
Terminology
Basic Stochastic Distributions
Student’s T distribution
Snedecor’s F distribution
Student’s T distribution (1908)
T(ν) : Student’s T distribution
with the degrees of freedom ν = 1, 2, 3...
10 / 21
Contents
Random sampling on DB records
Preliminary
Theorems
Extra Slides
Terminology
Basic Stochastic Distributions
Student’s T distribution
Snedecor’s F distribution
Student’s T distribution (1908)
T(ν) : Student’s T distribution
with the degrees of freedom ν = 1, 2, 3...
T(ν) can be got by
√
ν z0
√∑ν
i=1 z2
i
with zi
iid
∼ N(0, 12).
10 / 21
Contents
Random sampling on DB records
Preliminary
Theorems
Extra Slides
Terminology
Basic Stochastic Distributions
Student’s T distribution
Snedecor’s F distribution
Student’s T distribution (1908)
T(ν) : Student’s T distribution
with the degrees of freedom ν = 1, 2, 3...
T(ν) can be got by
√
ν z0
√∑ν
i=1 z2
i
with zi
iid
∼ N(0, 12).
T(1) and T(2) are easily obtained.
T(1) : tan(πu) from u ∼ U[0, 1].
T(2) :
√
2 u
√
1 − u2
from u ∼ U[−1, 1].
|T(1) | and |T(2) | appear in this presentation
by taking the absolute value of the variates.
10 / 21
Contents
Random sampling on DB records
Preliminary
Theorems
Extra Slides
Terminology
Basic Stochastic Distributions
Student’s T distribution
Snedecor’s F distribution
Student’s T distribution (1908)
|T(1) | and |T(2) | appear in this presentation
by taking the absolute value of the variates.
-3 -2 -1 0 1 2 3
0.0
0.2
0.4
0.6
0.8
T(1)
|T(1)|
10 / 21
Contents
Random sampling on DB records
Preliminary
Theorems
Extra Slides
Terminology
Basic Stochastic Distributions
Student’s T distribution
Snedecor’s F distribution
Student’s T distribution (1908)
|T(1) | and |T(2) | appear in this presentation
by taking the absolute value of the variates.
-3 -2 -1 0 1 2 3
0.0
0.2
0.4
0.6
0.8
T(2)
|T(2)|
10 / 21
Contents
Random sampling on DB records
Preliminary
Theorems
Extra Slides
Terminology
Basic Stochastic Distributions
Student’s T distribution
Snedecor’s F distribution
Student’s T distribution (1908)
And F(2, 2) appear,
which will be explained from next
-3 -2 -1 0 1 2 3
0.0
0.2
0.4
0.6
0.8
F(2, 2)
10 / 21
Contents
Random sampling on DB records
Preliminary
Theorems
Extra Slides
Terminology
Basic Stochastic Distributions
Student’s T distribution
Snedecor’s F distribution
Snedecor’s F distribution (1934)
F(ν1, ν2) : F-distribution with the degrees of ν1 and ν2.
11 / 21
Contents
Random sampling on DB records
Preliminary
Theorems
Extra Slides
Terminology
Basic Stochastic Distributions
Student’s T distribution
Snedecor’s F distribution
Snedecor’s F distribution (1934)
F(ν1, ν2) : F-distribution with the degrees of ν1 and ν2.
F(ν1, ν2) can be got
q1/ν1
q2/ν2
with
{
q1 ∼ χ2(ν1)
q2 ∼ χ2(ν2),
Density :
√
(ν1x)ν1 × ν2
ν2
(ν1x + ν2)ν1+ν2
x−1
/
B(
ν1
2
,
ν2
2
) for x ≥ 0.
11 / 21
Contents
Random sampling on DB records
Preliminary
Theorems
Extra Slides
Terminology
Basic Stochastic Distributions
Student’s T distribution
Snedecor’s F distribution
Snedecor’s F distribution (1934)
F(ν1, ν2) : F-distribution with the degrees of ν1 and ν2.
F(ν1, ν2) can be got
q1/ν1
q2/ν2
with
{
q1 ∼ χ2(ν1)
q2 ∼ χ2(ν2),
Density :
√
(ν1x)ν1 × ν2
ν2
(ν1x + ν2)ν1+ν2
x−1
/
B(
ν1
2
,
ν2
2
) for x ≥ 0.
Only F(2, 2) appears in this presentation.
Easily obtained by : u/(1 − u) from u ∼ U[0, 1].
Density :
{
x < 0 : 0
x ≥ 0 : 1/(1 + x)2.
11 / 21
Contents
Random sampling on DB records
Preliminary
Theorems
Extra Slides
Logarithmic Variances
A proposition about |T(2) |
Theorem
For any v1, v2 > 0 :
Prob
[
v1x1 > v2x2
]
: Prob
[
v1x1 < v2x2
]
= v1 : v2
where x1, x2
iid
∼ |T(2) |.
12 / 21
Contents
Random sampling on DB records
Preliminary
Theorems
Extra Slides
Logarithmic Variances
A proposition about |T(2) |
Theorem
For any v1, v2 > 0 :
Prob
[
v1x1 > v2x2
]
= v1/(v1+v2)
: Prob
[
v1x1 < v2x2
]
= v2/(v1+v2)
= v1 : v2
where x1, x2
iid
∼ |T(2) |.
12 / 21
Contents
Random sampling on DB records
Preliminary
Theorems
Extra Slides
Logarithmic Variances
A proposition about |T(2) |
Theorem
For any v1, v2 > 0 :
Prob
[
v1x1 > v2x2
]
= v1/(v1+v2)
: Prob
[
v1x1 < v2x2
]
= v2/(v1+v2)
= v1 : v2
where x1, x2
iid
∼ |T(2) |.
— cf. Bradley-Terry model (1952):
12 / 21
Contents
Random sampling on DB records
Preliminary
Theorems
Extra Slides
Logarithmic Variances
A proposition about |T(2) |
Theorem
For any v1, v2 > 0 :
Prob
[
v1x1 > v2x2
]
= v1/(v1+v2)
: Prob
[
v1x1 < v2x2
]
= v2/(v1+v2)
= v1 : v2
where x1, x2
iid
∼ |T(2) |.
— cf. Bradley-Terry model (1952):
Prob [ ”player i ” beats ”player j ” ] =
vi
vi + vj
Applied to food preferences, sports team strengths.
12 / 21
Contents
Random sampling on DB records
Preliminary
Theorems
Extra Slides
Logarithmic Variances
A proposition about |T(2) |
Theorem
For any v1, v2 > 0 :
Prob
[
v1x1 > v2x2
]
= v1/(v1+v2)
: Prob
[
v1x1 < v2x2
]
= v2/(v1+v2)
= v1 : v2
where x1/x2 ∼ |T(2) | ×⊥
|T(2) |−1
×⊥
means the independent variates multiplication.
−1 in superscript means the reciprocal of the variate.
12 / 21
Contents
Random sampling on DB records
Preliminary
Theorems
Extra Slides
Logarithmic Variances
A proposition about |T(2) |
Theorem
For any v1, v2 > 0 :
Prob
[
v1x1 > v2x2
]
= v1/(v1+v2)
: Prob
[
v1x1 < v2x2
]
= v2/(v1+v2)
= v1 : v2
where x1/x2 ∼ |T(2) | ×⊥
|T(2) |−1
≡ |T(1) | ×⊥
F(2, 2)1/2
The superscription means exponent.
1/2 means taking the square root of the variate.
1/4, 1/8 will appear as 4th root, 8th root.
12 / 21
Contents
Random sampling on DB records
Preliminary
Theorems
Extra Slides
Logarithmic Variances
A proposition about |T(2) |
Theorem
For any v1, v2 > 0 :
Prob
[
v1x1 > v2x2
]
= v1/(v1+v2)
: Prob
[
v1x1 < v2x2
]
= v2/(v1+v2)
= v1 : v2
where x1/x2 ∼ |T(2) | ×⊥
|T(2) |−1
≡ |T(1) | ×⊥
F(2, 2)1/2
Note:
T(1) ≡ T(1)−1, F(2, 2) ≡ F(2, 2)−1, T(2) ̸≡ T(2)−1.
12 / 21
Contents
Random sampling on DB records
Preliminary
Theorems
Extra Slides
Logarithmic Variances
A proposition about |T(2) |
Theorem
For any v1, v2 > 0 :
Prob
[
v1x1 > v2x2
]
= v1/(v1+v2)
: Prob
[
v1x1 < v2x2
]
= v2/(v1+v2)
= v1 : v2
where x1/x2 ∼ |T(2) | ×⊥
|T(2) |−1
≡ |T(1) | ×⊥
F(2, 2)1/2
≡ |T(1) | ×⊥
|T(1) |1/2 ×⊥
F(2, 2)1/4
12 / 21
Contents
Random sampling on DB records
Preliminary
Theorems
Extra Slides
Logarithmic Variances
A proposition about |T(2) |
Theorem
For any v1, v2 > 0 :
Prob
[
v1x1 > v2x2
]
= v1/(v1+v2)
: Prob
[
v1x1 < v2x2
]
= v2/(v1+v2)
= v1 : v2
where x1/x2 ∼ |T(2) | ×⊥
|T(2) |−1
≡ |T(1) | ×⊥
F(2, 2)1/2
≡ |T(1) | ×⊥
|T(1) |1/2 ×⊥
F(2, 2)1/4
≡ |T(1) | ×⊥
|T(1) |1/2 ×⊥
|T(1) |1/4 ×⊥
F(2, 2)1/8
12 / 21
Contents
Random sampling on DB records
Preliminary
Theorems
Extra Slides
Logarithmic Variances
A proposition about |T(2) |
Theorem
For any v1, v2 > 0 :
Prob
[
v1x1 > v2x2
]
= v1/(v1+v2)
: Prob
[
v1x1 < v2x2
]
= v2/(v1+v2)
= v1 : v2
where x1/x2 ∼ |T(2) | ×⊥
|T(2) |−1
≡ |T(1) | ×⊥
F(2, 2)1/2
≡ |T(1) | ×⊥
|T(1) |1/2 ×⊥
F(2, 2)1/4
≡ |T(1) | ×⊥
|T(1) |1/2 ×⊥
|T(1) |1/4 ×⊥
F(2, 2)1/8
≡ · · ·
12 / 21
Contents
Random sampling on DB records
Preliminary
Theorems
Extra Slides
Logarithmic Variances
A proposition about |T(2) |
Theorem
For any v1, v2 > 0 :
Prob
[
v1x1 > v2x2
]
= v1/(v1+v2)
: Prob
[
v1x1 < v2x2
]
= v2/(v1+v2)
= v1 : v2
where x1/x2 ∼ |T(2) | ×⊥
|T(2) |−1
≡ |T(1) | ×⊥
F(2, 2)1/2
≡ |T(1) | ×⊥
|T(1) |1/2 ×⊥
F(2, 2)1/4
≡ |T(1) | ×⊥
|T(1) |1/2 ×⊥
|T(1) |1/4 ×⊥
F(2, 2)1/8
≡ · · ·
≡ F(2, 2).
12 / 21
Contents
Random sampling on DB records
Preliminary
Theorems
Extra Slides
Logarithmic Variances
Combinations of the distributions
to enable WRS
and to enable Pr[v1x1 > v2x2] : Pr[v1x1 < v2x2] = v1 : v2
Dist. of x1 Dist. of x2 var[ log x1 ]
|T(2) | |T(2) | π2/6
F(2, 2)1/2 |T(1) | π2/3
F(2, 2)1/4 |T(1) | ×⊥
|T(1) |1/2 π2/12
F(2, 2)1/8 |T(1) | ×⊥
|T(1) |1/2 ×⊥
|T(1) |1/4 π2/48
· · · · · ·
|T(1) | F(2, 2)1/2 π2/4
|T(1) |1/2 |T(1) | ×⊥
F(2, 2)1/4 π2/16
|T(1) |1/4 |T(1) |1/2 ×⊥
|T(1) | ×⊥
F(2, 2)1/8 π2/64
· · · · · ·
13 / 21
Contents
Random sampling on DB records
Preliminary
Theorems
Extra Slides
Logarithmic Variances
Proof outline:
Relations such as Γ(2z) = 22z−1π−1/2Γ(z)Γ(z + 1/2) and
B(x, y) = Γ(x)Γ(y)
Γ(x+y) = 2
∫ π/2
0 sin2x−1
t cos2y−1 t dt are used.
E[Xm] for m ∈ R is calculated for each distribution X such as
F(2, 2), |T(1) |, |T(2) |, which are Γ(1 + m)Γ(1 − m),
Γ(1+m
2 )Γ(1−m
2 )/π,
√
2
m
√
π
Γ(1+m
2 )Γ(2−m
2 ), respectively.
14 / 21
Contents
Random sampling on DB records
Preliminary
Theorems
Extra Slides
Logarithmic Variances
Equivalency Lemma
Lemma
Assume variables x1, x2 >
a.s.
0 are independent. Then,
Prob
x1, x2
[ v1x1 > v2x2 ] =
v1
v1 + v2
for ∀v1, v2 > 0
⇔
x1
x1 + x2
∼ U[0, 1]
Proof: Maybe trivial.
15 / 21
Contents
Random sampling on DB records
Preliminary
Theorems
Extra Slides
Logarithmic Variances
Equivalency Lemma
Lemma
Assume variables x1, x2 >
a.s.
0 are independent. Then,
Prob
x1, x2
[ v1x1 > v2x2 ] =
v1
v1 + v2
for ∀v1, v2 > 0
⇔
x1
x1 + x2
=
1
1 + (x2/x1)
∼ U[0, 1]
15 / 21
Contents
Random sampling on DB records
Preliminary
Theorems
Extra Slides
Logarithmic Variances
Histogram of 1/(1 + abs(rt(n, 1)/rt(n, 1)))
1/(1 + abs(rt(n, 1)/rt(n, 1)))
Frequency
0.0 0.2 0.4 0.6 0.8 1.0
010000200003000040000
16 / 21
Contents
Random sampling on DB records
Preliminary
Theorems
Extra Slides
Logarithmic Variances
Histogram of 1/(1 + abs(rt(n, 2)/rt(n, 2)))
1/(1 + abs(rt(n, 2)/rt(n, 2)))
Frequency
0.0 0.2 0.4 0.6 0.8 1.0
010000200003000040000
16 / 21
Contents
Random sampling on DB records
Preliminary
Theorems
Extra Slides
Logarithmic Variances
Histogram of 1/(1 + abs(rt(n, 3)/rt(n, 3)))
1/(1 + abs(rt(n, 3)/rt(n, 3)))
Frequency
0.0 0.2 0.4 0.6 0.8 1.0
010000200003000040000
16 / 21
Contents
Random sampling on DB records
Preliminary
Theorems
Extra Slides
Logarithmic Variances
Histogram of 1/(1 + abs(rnorm(n)/rnorm(n)))
1/(1 + abs(rnorm(n)/rnorm(n)))
Frequency
0.0 0.2 0.4 0.6 0.8 1.0
010000200003000040000
16 / 21
Contents
Random sampling on DB records
Preliminary
Theorems
Extra Slides
Logarithmic Variances
Logaritimic Variances
Theorem
Var( F(2, 2) ) = ∞
17 / 21
Contents
Random sampling on DB records
Preliminary
Theorems
Extra Slides
Logarithmic Variances
Logaritimic Variances
Theorem
Var( F(2, 2) ) = ∞
Var( |T(1) | ) = ∞
17 / 21
Contents
Random sampling on DB records
Preliminary
Theorems
Extra Slides
Logarithmic Variances
Logaritimic Variances
Theorem
Var( F(2, 2) ) = ∞
Var( |T(1) | ) = ∞
Var( |T(2) | ) = ∞
17 / 21
Contents
Random sampling on DB records
Preliminary
Theorems
Extra Slides
Logarithmic Variances
Logaritimic Variances
Theorem
Var(log F(2, 2)) = π2/3
Var(log |T(1) |) = π2/4
Var(log |T(2) |) = π2/6
log above : taking log of the var and then forming a new dist.
Note: consistent with the previous theorem.
17 / 21
Contents
Random sampling on DB records
Preliminary
Theorems
Extra Slides
Logarithmic Variances
Histogram of log(abs(rf(n, 2, 2)))
0.0
0.1
0.2
0.3
0.4 log.001
log.01
log.1
log1
log10
log100
log1000
18 / 21
Contents
Random sampling on DB records
Preliminary
Theorems
Extra Slides
Logarithmic Variances
Histogram of log(abs(rt(n, 1)))
0.0
0.1
0.2
0.3
0.4 log.001
log.01
log.1
log1
log10
log100
log1000
18 / 21
Contents
Random sampling on DB records
Preliminary
Theorems
Extra Slides
Logarithmic Variances
Histogram of log(abs(rt(n, 2)))
0.0
0.1
0.2
0.3
0.4 log.001
log.01
log.1
log1
log10
log100
log1000
18 / 21
Contents
Random sampling on DB records
Preliminary
Theorems
Extra Slides
Logarithmic Variances
Why WRS of T by v-weight is possible?
Fox fixed 0 ≤ v1 ≪ v2,
if random variable x1, x2 ≥ 0 satisfies
x1
x2
∼ F(2, 2) :
Prob[
v1x1
x2
> v2] =
v1
v1 + v2
≈
v1
v2
.
19 / 21
Contents
Random sampling on DB records
Preliminary
Theorems
Extra Slides
Logarithmic Variances
Why WRS of T by v-weight is possible?
Fox fixed 0 ≤ v1 ≪ v2,
if random variable x1, x2 ≥ 0 satisfies
x1
x2
∼ F(2, 2) :
Prob[
v1x1
x2
> v2] =
v1
v1 + v2
≈
v1
v2
.
Thus, under the condition X ×⊥
X′ = F(2, 2),
along with x(i)
iid
∼ X and x′(i)
iid
∼ X′,
calculate v′′(i) := v′(i)/x′(i) where v′(i) := v(i) × x(i).
19 / 21
Contents
Random sampling on DB records
Preliminary
Theorems
Extra Slides
Logarithmic Variances
Why WRS of T by v-weight is possible?
Fox fixed 0 ≤ v1 ≪ v2,
if random variable x1, x2 ≥ 0 satisfies
x1
x2
∼ F(2, 2) :
Prob[
v1x1
x2
> v2] =
v1
v1 + v2
≈
v1
v2
.
Thus, under the condition X ×⊥
X′ = F(2, 2),
along with x(i)
iid
∼ X and x′(i)
iid
∼ X′,
calculate v′′(i) := v′(i)/x′(i) where v′(i) := v(i) × x(i).
Define Sn as #Sn = n and v′′(∀i ∈ Sn) ≥ v′′(∀j /∈ Sn).
19 / 21
Contents
Random sampling on DB records
Preliminary
Theorems
Extra Slides
Logarithmic Variances
Why WRS of T by v-weight is possible?
Fox fixed 0 ≤ v1 ≪ v2,
if random variable x1, x2 ≥ 0 satisfies
x1
x2
∼ F(2, 2) :
Prob[
v1x1
x2
> v2] =
v1
v1 + v2
≈
v1
v2
.
Thus, under the condition X ×⊥
X′ = F(2, 2),
along with x(i)
iid
∼ X and x′(i)
iid
∼ X′,
calculate v′′(i) := v′(i)/x′(i) where v′(i) := v(i) × x(i).
Define Sn as #Sn = n and v′′(∀i ∈ Sn) ≥ v′′(∀j /∈ Sn).
Then {T(i)|i ∈ Sn} is a sample from T
approximately by v-weight.
19 / 21
Contents
Random sampling on DB records
Preliminary
Theorems
Extra Slides
Logarithmic Variances
Summary
 
A curious relation of |T(2) | and Bradley-Terry model.
That are derived from various decomposition of F(2, 2).
The application to weighted random sampling, which leads
the understanding of initial DB understanding.
20 / 21
Contents
Random sampling on DB records
Preliminary
Theorems
Extra Slides
Transcendental Functions
Transcendental Functions 超越関数
Gamma Function :
Γ(z) =
∫ ∞
0
e−t
tz−1
dt =
∫ 1
0
(− log t)z−1
dt for Re z > 0,
Γ(z) = lim
n→∞
nz
/ n∏
k=0
(1 +
z
k
) for z ∈ C.
Beta function :
B(x, y) =
∫ 1
0
tx−1
(1 − t)y−1
dt for Re x > 0, Re y > 0,
B(x, y) = Γ(x)Γ(y)/Γ(x + y) for x, y ∈ C.
21 / 21

More Related Content

Similar to Seminar0917

Lecture 25
Lecture 25Lecture 25
Lecture 25Shani729
 
What's "For Free" on Craigslist?
What's "For Free" on Craigslist? What's "For Free" on Craigslist?
What's "For Free" on Craigslist? Josh Mayer
 
Nuts and bolts
Nuts and boltsNuts and bolts
Nuts and boltsNBER
 
Ch17 lab r_verdu103: Entry level statistics exercise (descriptives)
Ch17 lab r_verdu103: Entry level statistics exercise (descriptives)Ch17 lab r_verdu103: Entry level statistics exercise (descriptives)
Ch17 lab r_verdu103: Entry level statistics exercise (descriptives)Sherri Gunder
 
Slide3.ppt
Slide3.pptSlide3.ppt
Slide3.pptbutest
 
Introduction to R Short course Fall 2016
Introduction to R Short course Fall 2016Introduction to R Short course Fall 2016
Introduction to R Short course Fall 2016Spencer Fox
 
2013 py con awesome big data algorithms
2013 py con awesome big data algorithms2013 py con awesome big data algorithms
2013 py con awesome big data algorithmsc.titus.brown
 
SPSS Statistics - Quick Descriptives in IBM SPSS Statistics.pptx
SPSS Statistics - Quick Descriptives in IBM SPSS Statistics.pptxSPSS Statistics - Quick Descriptives in IBM SPSS Statistics.pptx
SPSS Statistics - Quick Descriptives in IBM SPSS Statistics.pptxVersion 1 Analytics
 
Important Terminologies In Statistical Inference I I
Important Terminologies In  Statistical  Inference  I IImportant Terminologies In  Statistical  Inference  I I
Important Terminologies In Statistical Inference I IZoha Qureshi
 
Introduction to Datamining Concept and Techniques
Introduction to Datamining Concept and TechniquesIntroduction to Datamining Concept and Techniques
Introduction to Datamining Concept and TechniquesSơn Còm Nhom
 
Spss basic Dr Marwa Zalat
Spss basic Dr Marwa ZalatSpss basic Dr Marwa Zalat
Spss basic Dr Marwa ZalatMarwa Zalat
 
Progression by Regression: How to increase your A/B Test Velocity
Progression by Regression: How to increase your A/B Test VelocityProgression by Regression: How to increase your A/B Test Velocity
Progression by Regression: How to increase your A/B Test VelocityStitch Fix Algorithms
 
2013 11-06 lsr-dublin_m_hausenblas_solr as recommendation engine
2013 11-06 lsr-dublin_m_hausenblas_solr as recommendation engine2013 11-06 lsr-dublin_m_hausenblas_solr as recommendation engine
2013 11-06 lsr-dublin_m_hausenblas_solr as recommendation enginelucenerevolution
 
Spsshelp 100608163328-phpapp01
Spsshelp 100608163328-phpapp01Spsshelp 100608163328-phpapp01
Spsshelp 100608163328-phpapp01Henock Beyene
 
Sean Kandel - Data profiling: Assessing the overall content and quality of a ...
Sean Kandel - Data profiling: Assessing the overall content and quality of a ...Sean Kandel - Data profiling: Assessing the overall content and quality of a ...
Sean Kandel - Data profiling: Assessing the overall content and quality of a ...huguk
 
Unstructured data processing webinar 06272016
Unstructured data processing webinar 06272016Unstructured data processing webinar 06272016
Unstructured data processing webinar 06272016George Roth
 

Similar to Seminar0917 (20)

Estadistica i unidad 1 tablas de frecuencia
Estadistica i   unidad 1 tablas de frecuencia Estadistica i   unidad 1 tablas de frecuencia
Estadistica i unidad 1 tablas de frecuencia
 
Lecture 25
Lecture 25Lecture 25
Lecture 25
 
7 QC - NEW.ppt
7 QC - NEW.ppt7 QC - NEW.ppt
7 QC - NEW.ppt
 
What's "For Free" on Craigslist?
What's "For Free" on Craigslist? What's "For Free" on Craigslist?
What's "For Free" on Craigslist?
 
Nuts and bolts
Nuts and boltsNuts and bolts
Nuts and bolts
 
Ch17 lab r_verdu103: Entry level statistics exercise (descriptives)
Ch17 lab r_verdu103: Entry level statistics exercise (descriptives)Ch17 lab r_verdu103: Entry level statistics exercise (descriptives)
Ch17 lab r_verdu103: Entry level statistics exercise (descriptives)
 
Slide3.ppt
Slide3.pptSlide3.ppt
Slide3.ppt
 
Introduction to R Short course Fall 2016
Introduction to R Short course Fall 2016Introduction to R Short course Fall 2016
Introduction to R Short course Fall 2016
 
2013 py con awesome big data algorithms
2013 py con awesome big data algorithms2013 py con awesome big data algorithms
2013 py con awesome big data algorithms
 
SPSS Statistics - Quick Descriptives in IBM SPSS Statistics.pptx
SPSS Statistics - Quick Descriptives in IBM SPSS Statistics.pptxSPSS Statistics - Quick Descriptives in IBM SPSS Statistics.pptx
SPSS Statistics - Quick Descriptives in IBM SPSS Statistics.pptx
 
Important Terminologies In Statistical Inference I I
Important Terminologies In  Statistical  Inference  I IImportant Terminologies In  Statistical  Inference  I I
Important Terminologies In Statistical Inference I I
 
EDA
EDAEDA
EDA
 
Introduction to Datamining Concept and Techniques
Introduction to Datamining Concept and TechniquesIntroduction to Datamining Concept and Techniques
Introduction to Datamining Concept and Techniques
 
Spss basic Dr Marwa Zalat
Spss basic Dr Marwa ZalatSpss basic Dr Marwa Zalat
Spss basic Dr Marwa Zalat
 
Progression by Regression: How to increase your A/B Test Velocity
Progression by Regression: How to increase your A/B Test VelocityProgression by Regression: How to increase your A/B Test Velocity
Progression by Regression: How to increase your A/B Test Velocity
 
2013 11-06 lsr-dublin_m_hausenblas_solr as recommendation engine
2013 11-06 lsr-dublin_m_hausenblas_solr as recommendation engine2013 11-06 lsr-dublin_m_hausenblas_solr as recommendation engine
2013 11-06 lsr-dublin_m_hausenblas_solr as recommendation engine
 
Panel slides
Panel slidesPanel slides
Panel slides
 
Spsshelp 100608163328-phpapp01
Spsshelp 100608163328-phpapp01Spsshelp 100608163328-phpapp01
Spsshelp 100608163328-phpapp01
 
Sean Kandel - Data profiling: Assessing the overall content and quality of a ...
Sean Kandel - Data profiling: Assessing the overall content and quality of a ...Sean Kandel - Data profiling: Assessing the overall content and quality of a ...
Sean Kandel - Data profiling: Assessing the overall content and quality of a ...
 
Unstructured data processing webinar 06272016
Unstructured data processing webinar 06272016Unstructured data processing webinar 06272016
Unstructured data processing webinar 06272016
 

More from Toshiyuki Shimono

国際産業数理・応用数理会議のポスター(作成中)
国際産業数理・応用数理会議のポスター(作成中)国際産業数理・応用数理会議のポスター(作成中)
国際産業数理・応用数理会議のポスター(作成中)Toshiyuki Shimono
 
インターネット等からデータを自動収集するソフトウェアに必要な補助機能とその実装
インターネット等からデータを自動収集するソフトウェアに必要な補助機能とその実装インターネット等からデータを自動収集するソフトウェアに必要な補助機能とその実装
インターネット等からデータを自動収集するソフトウェアに必要な補助機能とその実装Toshiyuki Shimono
 
extracting only a necessary file from a zip file
extracting only a necessary file from a zip fileextracting only a necessary file from a zip file
extracting only a necessary file from a zip fileToshiyuki Shimono
 
A Hacking Toolset for Big Tabular Files -- JAPAN.PM 2021
A Hacking Toolset for Big Tabular Files -- JAPAN.PM 2021A Hacking Toolset for Big Tabular Files -- JAPAN.PM 2021
A Hacking Toolset for Big Tabular Files -- JAPAN.PM 2021Toshiyuki Shimono
 
新型コロナの感染者数 全国の状況 2021年2月上旬まで
新型コロナの感染者数 全国の状況 2021年2月上旬まで新型コロナの感染者数 全国の状況 2021年2月上旬まで
新型コロナの感染者数 全国の状況 2021年2月上旬までToshiyuki Shimono
 
Multiplicative Decompositions of Stochastic Distributions and Their Applicat...
 Multiplicative Decompositions of Stochastic Distributions and Their Applicat... Multiplicative Decompositions of Stochastic Distributions and Their Applicat...
Multiplicative Decompositions of Stochastic Distributions and Their Applicat...Toshiyuki Shimono
 
Theory to consider an inaccurate testing and how to determine the prior proba...
Theory to consider an inaccurate testing and how to determine the prior proba...Theory to consider an inaccurate testing and how to determine the prior proba...
Theory to consider an inaccurate testing and how to determine the prior proba...Toshiyuki Shimono
 
Interpreting Multiple Regression via an Ellipse Inscribed in a Square Extensi...
Interpreting Multiple Regressionvia an Ellipse Inscribed in a Square Extensi...Interpreting Multiple Regressionvia an Ellipse Inscribed in a Square Extensi...
Interpreting Multiple Regression via an Ellipse Inscribed in a Square Extensi...Toshiyuki Shimono
 
BigQueryを使ってみた(2018年2月)
BigQueryを使ってみた(2018年2月)BigQueryを使ってみた(2018年2月)
BigQueryを使ってみた(2018年2月)Toshiyuki Shimono
 
既存分析ソフトへ
データを投入する前に
簡便な分析するためのソフトの作り方の提案
既存分析ソフトへ
データを投入する前に
簡便な分析するためのソフトの作り方の提案既存分析ソフトへ
データを投入する前に
簡便な分析するためのソフトの作り方の提案
既存分析ソフトへ
データを投入する前に
簡便な分析するためのソフトの作り方の提案Toshiyuki Shimono
 
To Make Graphs Such as Scatter Plots Numerically Readable (PacificVis 2018, K...
To Make Graphs Such as Scatter Plots Numerically Readable (PacificVis 2018, K...To Make Graphs Such as Scatter Plots Numerically Readable (PacificVis 2018, K...
To Make Graphs Such as Scatter Plots Numerically Readable (PacificVis 2018, K...Toshiyuki Shimono
 
To Make Graphs Such as Scatter Plots Numerically Readable (PacificVis 2018, K...
To Make Graphs Such as Scatter Plots Numerically Readable (PacificVis 2018, K...To Make Graphs Such as Scatter Plots Numerically Readable (PacificVis 2018, K...
To Make Graphs Such as Scatter Plots Numerically Readable (PacificVis 2018, K...Toshiyuki Shimono
 
Make Accumulated Data in Companies Eloquent by SQL Statement Constructors (PDF)
Make Accumulated Data in Companies Eloquent by SQL Statement Constructors (PDF)Make Accumulated Data in Companies Eloquent by SQL Statement Constructors (PDF)
Make Accumulated Data in Companies Eloquent by SQL Statement Constructors (PDF)Toshiyuki Shimono
 
企業等に蓄積されたデータを分析するための処理機能の提案
企業等に蓄積されたデータを分析するための処理機能の提案企業等に蓄積されたデータを分析するための処理機能の提案
企業等に蓄積されたデータを分析するための処理機能の提案Toshiyuki Shimono
 
新入社員の頃に教えて欲しかったようなことなど
新入社員の頃に教えて欲しかったようなことなど新入社員の頃に教えて欲しかったようなことなど
新入社員の頃に教えて欲しかったようなことなどToshiyuki Shimono
 
ページャ lessを使いこなす
ページャ lessを使いこなすページャ lessを使いこなす
ページャ lessを使いこなすToshiyuki Shimono
 
Guiを使わないテキストデータ処理
Guiを使わないテキストデータ処理Guiを使わないテキストデータ処理
Guiを使わないテキストデータ処理Toshiyuki Shimono
 
データ全貌把握の方法170324
データ全貌把握の方法170324データ全貌把握の方法170324
データ全貌把握の方法170324Toshiyuki Shimono
 
Macで開発環境を整える170420
Macで開発環境を整える170420Macで開発環境を整える170420
Macで開発環境を整える170420Toshiyuki Shimono
 
大きなテキストデータを閲覧するには
大きなテキストデータを閲覧するには大きなテキストデータを閲覧するには
大きなテキストデータを閲覧するにはToshiyuki Shimono
 

More from Toshiyuki Shimono (20)

国際産業数理・応用数理会議のポスター(作成中)
国際産業数理・応用数理会議のポスター(作成中)国際産業数理・応用数理会議のポスター(作成中)
国際産業数理・応用数理会議のポスター(作成中)
 
インターネット等からデータを自動収集するソフトウェアに必要な補助機能とその実装
インターネット等からデータを自動収集するソフトウェアに必要な補助機能とその実装インターネット等からデータを自動収集するソフトウェアに必要な補助機能とその実装
インターネット等からデータを自動収集するソフトウェアに必要な補助機能とその実装
 
extracting only a necessary file from a zip file
extracting only a necessary file from a zip fileextracting only a necessary file from a zip file
extracting only a necessary file from a zip file
 
A Hacking Toolset for Big Tabular Files -- JAPAN.PM 2021
A Hacking Toolset for Big Tabular Files -- JAPAN.PM 2021A Hacking Toolset for Big Tabular Files -- JAPAN.PM 2021
A Hacking Toolset for Big Tabular Files -- JAPAN.PM 2021
 
新型コロナの感染者数 全国の状況 2021年2月上旬まで
新型コロナの感染者数 全国の状況 2021年2月上旬まで新型コロナの感染者数 全国の状況 2021年2月上旬まで
新型コロナの感染者数 全国の状況 2021年2月上旬まで
 
Multiplicative Decompositions of Stochastic Distributions and Their Applicat...
 Multiplicative Decompositions of Stochastic Distributions and Their Applicat... Multiplicative Decompositions of Stochastic Distributions and Their Applicat...
Multiplicative Decompositions of Stochastic Distributions and Their Applicat...
 
Theory to consider an inaccurate testing and how to determine the prior proba...
Theory to consider an inaccurate testing and how to determine the prior proba...Theory to consider an inaccurate testing and how to determine the prior proba...
Theory to consider an inaccurate testing and how to determine the prior proba...
 
Interpreting Multiple Regression via an Ellipse Inscribed in a Square Extensi...
Interpreting Multiple Regressionvia an Ellipse Inscribed in a Square Extensi...Interpreting Multiple Regressionvia an Ellipse Inscribed in a Square Extensi...
Interpreting Multiple Regression via an Ellipse Inscribed in a Square Extensi...
 
BigQueryを使ってみた(2018年2月)
BigQueryを使ってみた(2018年2月)BigQueryを使ってみた(2018年2月)
BigQueryを使ってみた(2018年2月)
 
既存分析ソフトへ
データを投入する前に
簡便な分析するためのソフトの作り方の提案
既存分析ソフトへ
データを投入する前に
簡便な分析するためのソフトの作り方の提案既存分析ソフトへ
データを投入する前に
簡便な分析するためのソフトの作り方の提案
既存分析ソフトへ
データを投入する前に
簡便な分析するためのソフトの作り方の提案
 
To Make Graphs Such as Scatter Plots Numerically Readable (PacificVis 2018, K...
To Make Graphs Such as Scatter Plots Numerically Readable (PacificVis 2018, K...To Make Graphs Such as Scatter Plots Numerically Readable (PacificVis 2018, K...
To Make Graphs Such as Scatter Plots Numerically Readable (PacificVis 2018, K...
 
To Make Graphs Such as Scatter Plots Numerically Readable (PacificVis 2018, K...
To Make Graphs Such as Scatter Plots Numerically Readable (PacificVis 2018, K...To Make Graphs Such as Scatter Plots Numerically Readable (PacificVis 2018, K...
To Make Graphs Such as Scatter Plots Numerically Readable (PacificVis 2018, K...
 
Make Accumulated Data in Companies Eloquent by SQL Statement Constructors (PDF)
Make Accumulated Data in Companies Eloquent by SQL Statement Constructors (PDF)Make Accumulated Data in Companies Eloquent by SQL Statement Constructors (PDF)
Make Accumulated Data in Companies Eloquent by SQL Statement Constructors (PDF)
 
企業等に蓄積されたデータを分析するための処理機能の提案
企業等に蓄積されたデータを分析するための処理機能の提案企業等に蓄積されたデータを分析するための処理機能の提案
企業等に蓄積されたデータを分析するための処理機能の提案
 
新入社員の頃に教えて欲しかったようなことなど
新入社員の頃に教えて欲しかったようなことなど新入社員の頃に教えて欲しかったようなことなど
新入社員の頃に教えて欲しかったようなことなど
 
ページャ lessを使いこなす
ページャ lessを使いこなすページャ lessを使いこなす
ページャ lessを使いこなす
 
Guiを使わないテキストデータ処理
Guiを使わないテキストデータ処理Guiを使わないテキストデータ処理
Guiを使わないテキストデータ処理
 
データ全貌把握の方法170324
データ全貌把握の方法170324データ全貌把握の方法170324
データ全貌把握の方法170324
 
Macで開発環境を整える170420
Macで開発環境を整える170420Macで開発環境を整える170420
Macで開発環境を整える170420
 
大きなテキストデータを閲覧するには
大きなテキストデータを閲覧するには大きなテキストデータを閲覧するには
大きなテキストデータを閲覧するには
 

Recently uploaded

Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfLars Albertsson
 
Data Warehouse , Data Cube Computation
Data Warehouse   , Data Cube ComputationData Warehouse   , Data Cube Computation
Data Warehouse , Data Cube Computationsit20ad004
 
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...soniya singh
 
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130Suhani Kapoor
 
B2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxB2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxStephen266013
 
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptdokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptSonatrach
 
Brighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingBrighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingNeil Barnes
 
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...Suhani Kapoor
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 
20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdfHuman37
 
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...Suhani Kapoor
 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptxAnupama Kate
 
Customer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxCustomer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxEmmanuel Dauda
 
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Callshivangimorya083
 
Dubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls DubaiDubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls Dubaihf8803863
 
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Jack DiGiovanna
 
RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998YohFuh
 

Recently uploaded (20)

Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdf
 
Data Warehouse , Data Cube Computation
Data Warehouse   , Data Cube ComputationData Warehouse   , Data Cube Computation
Data Warehouse , Data Cube Computation
 
Russian Call Girls Dwarka Sector 15 💓 Delhi 9999965857 @Sabina Modi VVIP MODE...
Russian Call Girls Dwarka Sector 15 💓 Delhi 9999965857 @Sabina Modi VVIP MODE...Russian Call Girls Dwarka Sector 15 💓 Delhi 9999965857 @Sabina Modi VVIP MODE...
Russian Call Girls Dwarka Sector 15 💓 Delhi 9999965857 @Sabina Modi VVIP MODE...
 
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
 
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
 
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
 
B2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxB2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docx
 
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptdokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
 
Brighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingBrighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data Storytelling
 
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf
 
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx
 
Customer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxCustomer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptx
 
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
 
Dubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls DubaiDubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls Dubai
 
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
 
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in  KishangarhDelhi 99530 vip 56974 Genuine Escort Service Call Girls in  Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
 
RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998
 

Seminar0917

  • 1. Contents Random sampling on DB records Preliminary Theorems Extra Slides Student’s t distribution with the degrees of freedom 2 and its applications Toshiyuki Shimono DG Lab Data-driven Mathematical Science 2018-09-18 Tue 11:00 1 / 21
  • 2. Contents Random sampling on DB records Preliminary Theorems Extra Slides Contents 1 Random sampling on DB records 2 Preliminary Terminology Basic Stochastic Distributions Student’s T distribution Snedecor’s F distribution 3 Theorems Logarithmic Variances 4 Extra Slides Transcendental Functions 2 / 21
  • 3. Contents Random sampling on DB records Preliminary Theorems Extra Slides Application to Statistical Disclosure Control Direction: Analyze data in a company/government. 3 / 21
  • 4. Contents Random sampling on DB records Preliminary Theorems Extra Slides Application to Statistical Disclosure Control Direction: Analyze data in a company/government. Transaction data are accumulated (due to recent trend). 3 / 21
  • 5. Contents Random sampling on DB records Preliminary Theorems Extra Slides Application to Statistical Disclosure Control Direction: Analyze data in a company/government. Transaction data are accumulated (due to recent trend). Often external experts handle/analyse the data. 3 / 21
  • 6. Contents Random sampling on DB records Preliminary Theorems Extra Slides Application to Statistical Disclosure Control Direction: Analyze data in a company/government. Transaction data are accumulated (due to recent trend). Often external experts handle/analyse the data. And Keeping the data confidentiality is necessary. 3 / 21
  • 7. Contents Random sampling on DB records Preliminary Theorems Extra Slides Application to Statistical Disclosure Control Direction: Analyze data in a company/government. Transaction data are accumulated (due to recent trend). Often external experts handle/analyse the data. And Keeping the data confidentiality is necessary. Statistical Disclosure Control is necessary. 3 / 21
  • 8. Contents Random sampling on DB records Preliminary Theorems Extra Slides Application to Statistical Disclosure Control Direction: Analyze data in a company/government. Transaction data are accumulated (due to recent trend). Often external experts handle/analyse the data. And Keeping the data confidentiality is necessary. Statistical Disclosure Control is necessary. Multiplicative noise is often useful on numerical data. ▷ Additive noise is also used, but may not be useful enough. 3 / 21
  • 9. Contents Random sampling on DB records Preliminary Theorems Extra Slides Application to Statistical Disclosure Control Direction: Analyze data in a company/government. Transaction data are accumulated (due to recent trend). Often external experts handle/analyse the data. And Keeping the data confidentiality is necessary. Statistical Disclosure Control is necessary. Multiplicative noise is often useful on numerical data. Only a few distributions seems to employed so far. ▷ N(µ, σ2),U[a, b] are mentioned by [Privacy-preserving data mining, Agrawal, Srikant, ACM SIGMOD, 2000]. ▷ Also Gamma dist. and log normal dist. are [Privacy protection and quantile estimation from noise multiplied data, Sinha, Nayak, Zayatz, Sankhya B, 2012]. 3 / 21
  • 10. Contents Random sampling on DB records Preliminary Theorems Extra Slides Application to Statistical Disclosure Control Direction: Analyze data in a company/government. Transaction data are accumulated (due to recent trend). Often external experts handle/analyse the data. And Keeping the data confidentiality is necessary. Statistical Disclosure Control is necessary. Multiplicative noise is often useful on numerical data. Only a few distributions seems to employed so far. |T(2) |, |T(1) |, F(2, 2) may be more useful. Also utilizable for weighted random sampling. 3 / 21
  • 11. Contents Random sampling on DB records Preliminary Theorems Extra Slides Adding/multiplying noises preserves some statistical properties such as sum and average. We also want to preserve “weighted random sampling” property. 4 / 21
  • 12. Contents Random sampling on DB records Preliminary Theorems Extra Slides Why Weighted Random Sampling on a Table? 5 / 21
  • 13. Contents Random sampling on DB records Preliminary Theorems Extra Slides Why Weighted Random Sampling on a Table? Human eyes can only see sampled records of table. ▶ A table may contains thousands, millions, billions of record. Too huge for human eyes. 5 / 21
  • 14. Contents Random sampling on DB records Preliminary Theorems Extra Slides Why Weighted Random Sampling on a Table? Human eyes can only see sampled records of table. Without randomness they only leads to biased view. ▶ Without randomness one often only see : only the beginning or only the end parts only the eye-catching records. ▷ [Sampling Techniques, W. G. Cochran, 1977] covers this topic above. 5 / 21
  • 15. Contents Random sampling on DB records Preliminary Theorems Extra Slides Why Weighted Random Sampling on a Table? Human eyes can only see sampled records of table. Without randomness they only leads to biased view. Weight (such as price) helps to avoid trivial sampling. ▶ Weighted random sampling retrieves records according to the probability proportional to an auxiliary variable such as price. ▶ Simple random sampling often retrieves the records with low prices whose importance is not weighty. 5 / 21
  • 16. Contents Random sampling on DB records Preliminary Theorems Extra Slides Table: Word frequency table of ”Hamlet”. Simple rand. samp. word count word count word count word count word count OPHELIA 67 stuff 3 lament 2 Looking 1 Mourners 1 doth 23 chief 3 translate 2 ’take 1 strokes 1 use 15 ambassadors 3 Excellent 2 frowningly 1 drains 1 devil 9 puff’d 2 revolution 1 east 1 scent 1 home 6 plague 2 Pinch 1 profanely 1 warning 1 touch 6 venom 2 access 1 struggling 1 betimes 1 season 5 spokes 2 bravery 1 nerve 1 hent 1 get 5 lunacy 2 quietly 1 amities 1 assure 1 ha 4 Lady 2 counterfeit 1 Know 1 Stay’d 1 neck 3 Drown’d 2 consider’d 1 toys 1 moods 1 6 / 21
  • 17. Contents Random sampling on DB records Preliminary Theorems Extra Slides Table: Word frequency table of ”Hamlet”. Simple vs. Weighted. word count word count word count word count word count OPHELIA 67 stuff 3 lament 2 Looking 1 Mourners 1 doth 23 chief 3 translate 2 ’take 1 strokes 1 use 15 ambassadors 3 Excellent 2 frowningly 1 drains 1 devil 9 puff’d 2 revolution 1 east 1 scent 1 home 6 plague 2 Pinch 1 profanely 1 warning 1 touch 6 venom 2 access 1 struggling 1 betimes 1 season 5 spokes 2 bravery 1 nerve 1 hent 1 get 5 lunacy 2 quietly 1 amities 1 assure 1 ha 4 Lady 2 counterfeit 1 Know 1 Stay’d 1 neck 3 Drown’d 2 consider’d 1 toys 1 moods 1 word count word count word count word count word count the 995 And 263 more 90 many 18 parts 3 and 706 this 248 at 75 command 10 ways 3 to 635 me 234 well 65 hell 10 antique 2 of 630 him 197 let 60 honour 10 yesternight 1 I 546 he 178 speak 55 Reads 5 constantly 1 my 441 HORATIO 128 go 52 Follow 5 emulate 1 HAMLET 407 do 127 night 47 stir 5 honour’s 1 it 361 what 116 into 27 knew 5 really 1 not 299 all 108 Good 25 ourself 3 revolution 1 that 266 our 107 Ghost 25 white 3 riotous 1 6 / 21
  • 18. Contents Random sampling on DB records Preliminary Theorems Extra Slides The Procedure – To Be Realized.. The Analyzing Procedure 1 Prepare a table T to be analyzed. 7 / 21
  • 19. Contents Random sampling on DB records Preliminary Theorems Extra Slides The Procedure – To Be Realized.. The Analyzing Procedure 1 Prepare a table T to be analyzed. 2 Apply noise on a sensitive variable (column) v of T. 7 / 21
  • 20. Contents Random sampling on DB records Preliminary Theorems Extra Slides The Procedure – To Be Realized.. The Analyzing Procedure 1 Prepare a table T to be analyzed. 2 Apply noise on a sensitive variable (column) v of T. 3 ”Expert” gets the transformed table T′ with v′. 7 / 21
  • 21. Contents Random sampling on DB records Preliminary Theorems Extra Slides The Procedure – To Be Realized.. The Analyzing Procedure 1 Prepare a table T to be analyzed. 2 Apply noise on a sensitive variable (column) v of T. 3 ”Expert” gets the transformed table T′ with v′. 4 Apply various analysis on T′. 7 / 21
  • 22. Contents Random sampling on DB records Preliminary Theorems Extra Slides The Procedure – To Be Realized.. The Analyzing Procedure 1 Prepare a table T to be analyzed. 2 Apply noise on a sensitive variable (column) v of T. 3 ”Expert” gets the transformed table T′ with v′. 4 Apply various analysis on T′. 1 Performs several analysis on T′ as usual. 7 / 21
  • 23. Contents Random sampling on DB records Preliminary Theorems Extra Slides The Procedure – To Be Realized.. The Analyzing Procedure 1 Prepare a table T to be analyzed. 2 Apply noise on a sensitive variable (column) v of T. 3 ”Expert” gets the transformed table T′ with v′. 4 Apply various analysis on T′. 1 Performs several analysis on T′ as usual. 2 Numerical sum of v′ may well reflects the sum of v. 7 / 21
  • 24. Contents Random sampling on DB records Preliminary Theorems Extra Slides The Procedure – To Be Realized.. The Analyzing Procedure 1 Prepare a table T to be analyzed. 2 Apply noise on a sensitive variable (column) v of T. 3 ”Expert” gets the transformed table T′ with v′. 4 Apply various analysis on T′. 1 Performs several analysis on T′ as usual. 2 Numerical sum of v′ may well reflects the sum of v. 3 Random sampling of T′ by the weight v is possible! Note: v is hidden. Only v′ can be seen by the expert. 7 / 21
  • 25. Contents Random sampling on DB records Preliminary Theorems Extra Slides The Procedure – To Be Realized.. The Analyzing Procedure 1 Prepare a table T to be analyzed. 2 Apply noise on a sensitive variable (column) v of T. 3 ”Expert” gets the transformed table T′ with v′. 4 Apply various analysis on T′. 1 Performs several analysis on T′ as usual. 2 Numerical sum of v′ may well reflects the sum of v. 3 Random sampling of T′ by the weight v is possible! 5 The data provider can judge the ability of the expert without showing the precise numerical values of v. 7 / 21
  • 26. Contents Random sampling on DB records Preliminary Theorems Extra Slides Terminology Basic Stochastic Distributions Student’s T distribution Snedecor’s F distribution Terminology Variate : (Random) variate is a particular outcome of a random variable. iid : Independent and Identically Distributed. 8 / 21
  • 27. Contents Random sampling on DB records Preliminary Theorems Extra Slides Terminology Basic Stochastic Distributions Student’s T distribution Snedecor’s F distribution Basic Stochastic Distributions 基本的な確率分布 U[a, b] : uniform distribution between a and b. N(µ, σ2) : Gaussian dist. with mean µ and variance σ2. χ2(ν) : chi-squared dist. with ν degrees of freedom, obtained by z2 1 + · · · + z2 ν by zi iid ∼ N(0, 12). T(1) is also called the Cauchy distribution. T(ν) : obtained by z √ q/ν with z ∼ N(0, 1), q ∼ χ2(ν). F(ν1, ν2) : obtained by q1/ν1 q2/ν2 with { q1 ∼ χ2(ν1) q2 ∼ χ2(ν2), 9 / 21
  • 28. Contents Random sampling on DB records Preliminary Theorems Extra Slides Terminology Basic Stochastic Distributions Student’s T distribution Snedecor’s F distribution Student’s T distribution (1908) T(ν) : Student’s T distribution with the degrees of freedom ν = 1, 2, 3... 10 / 21
  • 29. Contents Random sampling on DB records Preliminary Theorems Extra Slides Terminology Basic Stochastic Distributions Student’s T distribution Snedecor’s F distribution Student’s T distribution (1908) T(ν) : Student’s T distribution with the degrees of freedom ν = 1, 2, 3... T(ν) can be got by √ ν z0 √∑ν i=1 z2 i with zi iid ∼ N(0, 12). 10 / 21
  • 30. Contents Random sampling on DB records Preliminary Theorems Extra Slides Terminology Basic Stochastic Distributions Student’s T distribution Snedecor’s F distribution Student’s T distribution (1908) T(ν) : Student’s T distribution with the degrees of freedom ν = 1, 2, 3... T(ν) can be got by √ ν z0 √∑ν i=1 z2 i with zi iid ∼ N(0, 12). T(1) and T(2) are easily obtained. T(1) : tan(πu) from u ∼ U[0, 1]. T(2) : √ 2 u √ 1 − u2 from u ∼ U[−1, 1]. |T(1) | and |T(2) | appear in this presentation by taking the absolute value of the variates. 10 / 21
  • 31. Contents Random sampling on DB records Preliminary Theorems Extra Slides Terminology Basic Stochastic Distributions Student’s T distribution Snedecor’s F distribution Student’s T distribution (1908) |T(1) | and |T(2) | appear in this presentation by taking the absolute value of the variates. -3 -2 -1 0 1 2 3 0.0 0.2 0.4 0.6 0.8 T(1) |T(1)| 10 / 21
  • 32. Contents Random sampling on DB records Preliminary Theorems Extra Slides Terminology Basic Stochastic Distributions Student’s T distribution Snedecor’s F distribution Student’s T distribution (1908) |T(1) | and |T(2) | appear in this presentation by taking the absolute value of the variates. -3 -2 -1 0 1 2 3 0.0 0.2 0.4 0.6 0.8 T(2) |T(2)| 10 / 21
  • 33. Contents Random sampling on DB records Preliminary Theorems Extra Slides Terminology Basic Stochastic Distributions Student’s T distribution Snedecor’s F distribution Student’s T distribution (1908) And F(2, 2) appear, which will be explained from next -3 -2 -1 0 1 2 3 0.0 0.2 0.4 0.6 0.8 F(2, 2) 10 / 21
  • 34. Contents Random sampling on DB records Preliminary Theorems Extra Slides Terminology Basic Stochastic Distributions Student’s T distribution Snedecor’s F distribution Snedecor’s F distribution (1934) F(ν1, ν2) : F-distribution with the degrees of ν1 and ν2. 11 / 21
  • 35. Contents Random sampling on DB records Preliminary Theorems Extra Slides Terminology Basic Stochastic Distributions Student’s T distribution Snedecor’s F distribution Snedecor’s F distribution (1934) F(ν1, ν2) : F-distribution with the degrees of ν1 and ν2. F(ν1, ν2) can be got q1/ν1 q2/ν2 with { q1 ∼ χ2(ν1) q2 ∼ χ2(ν2), Density : √ (ν1x)ν1 × ν2 ν2 (ν1x + ν2)ν1+ν2 x−1 / B( ν1 2 , ν2 2 ) for x ≥ 0. 11 / 21
  • 36. Contents Random sampling on DB records Preliminary Theorems Extra Slides Terminology Basic Stochastic Distributions Student’s T distribution Snedecor’s F distribution Snedecor’s F distribution (1934) F(ν1, ν2) : F-distribution with the degrees of ν1 and ν2. F(ν1, ν2) can be got q1/ν1 q2/ν2 with { q1 ∼ χ2(ν1) q2 ∼ χ2(ν2), Density : √ (ν1x)ν1 × ν2 ν2 (ν1x + ν2)ν1+ν2 x−1 / B( ν1 2 , ν2 2 ) for x ≥ 0. Only F(2, 2) appears in this presentation. Easily obtained by : u/(1 − u) from u ∼ U[0, 1]. Density : { x < 0 : 0 x ≥ 0 : 1/(1 + x)2. 11 / 21
  • 37. Contents Random sampling on DB records Preliminary Theorems Extra Slides Logarithmic Variances A proposition about |T(2) | Theorem For any v1, v2 > 0 : Prob [ v1x1 > v2x2 ] : Prob [ v1x1 < v2x2 ] = v1 : v2 where x1, x2 iid ∼ |T(2) |. 12 / 21
  • 38. Contents Random sampling on DB records Preliminary Theorems Extra Slides Logarithmic Variances A proposition about |T(2) | Theorem For any v1, v2 > 0 : Prob [ v1x1 > v2x2 ] = v1/(v1+v2) : Prob [ v1x1 < v2x2 ] = v2/(v1+v2) = v1 : v2 where x1, x2 iid ∼ |T(2) |. 12 / 21
  • 39. Contents Random sampling on DB records Preliminary Theorems Extra Slides Logarithmic Variances A proposition about |T(2) | Theorem For any v1, v2 > 0 : Prob [ v1x1 > v2x2 ] = v1/(v1+v2) : Prob [ v1x1 < v2x2 ] = v2/(v1+v2) = v1 : v2 where x1, x2 iid ∼ |T(2) |. — cf. Bradley-Terry model (1952): 12 / 21
  • 40. Contents Random sampling on DB records Preliminary Theorems Extra Slides Logarithmic Variances A proposition about |T(2) | Theorem For any v1, v2 > 0 : Prob [ v1x1 > v2x2 ] = v1/(v1+v2) : Prob [ v1x1 < v2x2 ] = v2/(v1+v2) = v1 : v2 where x1, x2 iid ∼ |T(2) |. — cf. Bradley-Terry model (1952): Prob [ ”player i ” beats ”player j ” ] = vi vi + vj Applied to food preferences, sports team strengths. 12 / 21
  • 41. Contents Random sampling on DB records Preliminary Theorems Extra Slides Logarithmic Variances A proposition about |T(2) | Theorem For any v1, v2 > 0 : Prob [ v1x1 > v2x2 ] = v1/(v1+v2) : Prob [ v1x1 < v2x2 ] = v2/(v1+v2) = v1 : v2 where x1/x2 ∼ |T(2) | ×⊥ |T(2) |−1 ×⊥ means the independent variates multiplication. −1 in superscript means the reciprocal of the variate. 12 / 21
  • 42. Contents Random sampling on DB records Preliminary Theorems Extra Slides Logarithmic Variances A proposition about |T(2) | Theorem For any v1, v2 > 0 : Prob [ v1x1 > v2x2 ] = v1/(v1+v2) : Prob [ v1x1 < v2x2 ] = v2/(v1+v2) = v1 : v2 where x1/x2 ∼ |T(2) | ×⊥ |T(2) |−1 ≡ |T(1) | ×⊥ F(2, 2)1/2 The superscription means exponent. 1/2 means taking the square root of the variate. 1/4, 1/8 will appear as 4th root, 8th root. 12 / 21
  • 43. Contents Random sampling on DB records Preliminary Theorems Extra Slides Logarithmic Variances A proposition about |T(2) | Theorem For any v1, v2 > 0 : Prob [ v1x1 > v2x2 ] = v1/(v1+v2) : Prob [ v1x1 < v2x2 ] = v2/(v1+v2) = v1 : v2 where x1/x2 ∼ |T(2) | ×⊥ |T(2) |−1 ≡ |T(1) | ×⊥ F(2, 2)1/2 Note: T(1) ≡ T(1)−1, F(2, 2) ≡ F(2, 2)−1, T(2) ̸≡ T(2)−1. 12 / 21
  • 44. Contents Random sampling on DB records Preliminary Theorems Extra Slides Logarithmic Variances A proposition about |T(2) | Theorem For any v1, v2 > 0 : Prob [ v1x1 > v2x2 ] = v1/(v1+v2) : Prob [ v1x1 < v2x2 ] = v2/(v1+v2) = v1 : v2 where x1/x2 ∼ |T(2) | ×⊥ |T(2) |−1 ≡ |T(1) | ×⊥ F(2, 2)1/2 ≡ |T(1) | ×⊥ |T(1) |1/2 ×⊥ F(2, 2)1/4 12 / 21
  • 45. Contents Random sampling on DB records Preliminary Theorems Extra Slides Logarithmic Variances A proposition about |T(2) | Theorem For any v1, v2 > 0 : Prob [ v1x1 > v2x2 ] = v1/(v1+v2) : Prob [ v1x1 < v2x2 ] = v2/(v1+v2) = v1 : v2 where x1/x2 ∼ |T(2) | ×⊥ |T(2) |−1 ≡ |T(1) | ×⊥ F(2, 2)1/2 ≡ |T(1) | ×⊥ |T(1) |1/2 ×⊥ F(2, 2)1/4 ≡ |T(1) | ×⊥ |T(1) |1/2 ×⊥ |T(1) |1/4 ×⊥ F(2, 2)1/8 12 / 21
  • 46. Contents Random sampling on DB records Preliminary Theorems Extra Slides Logarithmic Variances A proposition about |T(2) | Theorem For any v1, v2 > 0 : Prob [ v1x1 > v2x2 ] = v1/(v1+v2) : Prob [ v1x1 < v2x2 ] = v2/(v1+v2) = v1 : v2 where x1/x2 ∼ |T(2) | ×⊥ |T(2) |−1 ≡ |T(1) | ×⊥ F(2, 2)1/2 ≡ |T(1) | ×⊥ |T(1) |1/2 ×⊥ F(2, 2)1/4 ≡ |T(1) | ×⊥ |T(1) |1/2 ×⊥ |T(1) |1/4 ×⊥ F(2, 2)1/8 ≡ · · · 12 / 21
  • 47. Contents Random sampling on DB records Preliminary Theorems Extra Slides Logarithmic Variances A proposition about |T(2) | Theorem For any v1, v2 > 0 : Prob [ v1x1 > v2x2 ] = v1/(v1+v2) : Prob [ v1x1 < v2x2 ] = v2/(v1+v2) = v1 : v2 where x1/x2 ∼ |T(2) | ×⊥ |T(2) |−1 ≡ |T(1) | ×⊥ F(2, 2)1/2 ≡ |T(1) | ×⊥ |T(1) |1/2 ×⊥ F(2, 2)1/4 ≡ |T(1) | ×⊥ |T(1) |1/2 ×⊥ |T(1) |1/4 ×⊥ F(2, 2)1/8 ≡ · · · ≡ F(2, 2). 12 / 21
  • 48. Contents Random sampling on DB records Preliminary Theorems Extra Slides Logarithmic Variances Combinations of the distributions to enable WRS and to enable Pr[v1x1 > v2x2] : Pr[v1x1 < v2x2] = v1 : v2 Dist. of x1 Dist. of x2 var[ log x1 ] |T(2) | |T(2) | π2/6 F(2, 2)1/2 |T(1) | π2/3 F(2, 2)1/4 |T(1) | ×⊥ |T(1) |1/2 π2/12 F(2, 2)1/8 |T(1) | ×⊥ |T(1) |1/2 ×⊥ |T(1) |1/4 π2/48 · · · · · · |T(1) | F(2, 2)1/2 π2/4 |T(1) |1/2 |T(1) | ×⊥ F(2, 2)1/4 π2/16 |T(1) |1/4 |T(1) |1/2 ×⊥ |T(1) | ×⊥ F(2, 2)1/8 π2/64 · · · · · · 13 / 21
  • 49. Contents Random sampling on DB records Preliminary Theorems Extra Slides Logarithmic Variances Proof outline: Relations such as Γ(2z) = 22z−1π−1/2Γ(z)Γ(z + 1/2) and B(x, y) = Γ(x)Γ(y) Γ(x+y) = 2 ∫ π/2 0 sin2x−1 t cos2y−1 t dt are used. E[Xm] for m ∈ R is calculated for each distribution X such as F(2, 2), |T(1) |, |T(2) |, which are Γ(1 + m)Γ(1 − m), Γ(1+m 2 )Γ(1−m 2 )/π, √ 2 m √ π Γ(1+m 2 )Γ(2−m 2 ), respectively. 14 / 21
  • 50. Contents Random sampling on DB records Preliminary Theorems Extra Slides Logarithmic Variances Equivalency Lemma Lemma Assume variables x1, x2 > a.s. 0 are independent. Then, Prob x1, x2 [ v1x1 > v2x2 ] = v1 v1 + v2 for ∀v1, v2 > 0 ⇔ x1 x1 + x2 ∼ U[0, 1] Proof: Maybe trivial. 15 / 21
  • 51. Contents Random sampling on DB records Preliminary Theorems Extra Slides Logarithmic Variances Equivalency Lemma Lemma Assume variables x1, x2 > a.s. 0 are independent. Then, Prob x1, x2 [ v1x1 > v2x2 ] = v1 v1 + v2 for ∀v1, v2 > 0 ⇔ x1 x1 + x2 = 1 1 + (x2/x1) ∼ U[0, 1] 15 / 21
  • 52. Contents Random sampling on DB records Preliminary Theorems Extra Slides Logarithmic Variances Histogram of 1/(1 + abs(rt(n, 1)/rt(n, 1))) 1/(1 + abs(rt(n, 1)/rt(n, 1))) Frequency 0.0 0.2 0.4 0.6 0.8 1.0 010000200003000040000 16 / 21
  • 53. Contents Random sampling on DB records Preliminary Theorems Extra Slides Logarithmic Variances Histogram of 1/(1 + abs(rt(n, 2)/rt(n, 2))) 1/(1 + abs(rt(n, 2)/rt(n, 2))) Frequency 0.0 0.2 0.4 0.6 0.8 1.0 010000200003000040000 16 / 21
  • 54. Contents Random sampling on DB records Preliminary Theorems Extra Slides Logarithmic Variances Histogram of 1/(1 + abs(rt(n, 3)/rt(n, 3))) 1/(1 + abs(rt(n, 3)/rt(n, 3))) Frequency 0.0 0.2 0.4 0.6 0.8 1.0 010000200003000040000 16 / 21
  • 55. Contents Random sampling on DB records Preliminary Theorems Extra Slides Logarithmic Variances Histogram of 1/(1 + abs(rnorm(n)/rnorm(n))) 1/(1 + abs(rnorm(n)/rnorm(n))) Frequency 0.0 0.2 0.4 0.6 0.8 1.0 010000200003000040000 16 / 21
  • 56. Contents Random sampling on DB records Preliminary Theorems Extra Slides Logarithmic Variances Logaritimic Variances Theorem Var( F(2, 2) ) = ∞ 17 / 21
  • 57. Contents Random sampling on DB records Preliminary Theorems Extra Slides Logarithmic Variances Logaritimic Variances Theorem Var( F(2, 2) ) = ∞ Var( |T(1) | ) = ∞ 17 / 21
  • 58. Contents Random sampling on DB records Preliminary Theorems Extra Slides Logarithmic Variances Logaritimic Variances Theorem Var( F(2, 2) ) = ∞ Var( |T(1) | ) = ∞ Var( |T(2) | ) = ∞ 17 / 21
  • 59. Contents Random sampling on DB records Preliminary Theorems Extra Slides Logarithmic Variances Logaritimic Variances Theorem Var(log F(2, 2)) = π2/3 Var(log |T(1) |) = π2/4 Var(log |T(2) |) = π2/6 log above : taking log of the var and then forming a new dist. Note: consistent with the previous theorem. 17 / 21
  • 60. Contents Random sampling on DB records Preliminary Theorems Extra Slides Logarithmic Variances Histogram of log(abs(rf(n, 2, 2))) 0.0 0.1 0.2 0.3 0.4 log.001 log.01 log.1 log1 log10 log100 log1000 18 / 21
  • 61. Contents Random sampling on DB records Preliminary Theorems Extra Slides Logarithmic Variances Histogram of log(abs(rt(n, 1))) 0.0 0.1 0.2 0.3 0.4 log.001 log.01 log.1 log1 log10 log100 log1000 18 / 21
  • 62. Contents Random sampling on DB records Preliminary Theorems Extra Slides Logarithmic Variances Histogram of log(abs(rt(n, 2))) 0.0 0.1 0.2 0.3 0.4 log.001 log.01 log.1 log1 log10 log100 log1000 18 / 21
  • 63. Contents Random sampling on DB records Preliminary Theorems Extra Slides Logarithmic Variances Why WRS of T by v-weight is possible? Fox fixed 0 ≤ v1 ≪ v2, if random variable x1, x2 ≥ 0 satisfies x1 x2 ∼ F(2, 2) : Prob[ v1x1 x2 > v2] = v1 v1 + v2 ≈ v1 v2 . 19 / 21
  • 64. Contents Random sampling on DB records Preliminary Theorems Extra Slides Logarithmic Variances Why WRS of T by v-weight is possible? Fox fixed 0 ≤ v1 ≪ v2, if random variable x1, x2 ≥ 0 satisfies x1 x2 ∼ F(2, 2) : Prob[ v1x1 x2 > v2] = v1 v1 + v2 ≈ v1 v2 . Thus, under the condition X ×⊥ X′ = F(2, 2), along with x(i) iid ∼ X and x′(i) iid ∼ X′, calculate v′′(i) := v′(i)/x′(i) where v′(i) := v(i) × x(i). 19 / 21
  • 65. Contents Random sampling on DB records Preliminary Theorems Extra Slides Logarithmic Variances Why WRS of T by v-weight is possible? Fox fixed 0 ≤ v1 ≪ v2, if random variable x1, x2 ≥ 0 satisfies x1 x2 ∼ F(2, 2) : Prob[ v1x1 x2 > v2] = v1 v1 + v2 ≈ v1 v2 . Thus, under the condition X ×⊥ X′ = F(2, 2), along with x(i) iid ∼ X and x′(i) iid ∼ X′, calculate v′′(i) := v′(i)/x′(i) where v′(i) := v(i) × x(i). Define Sn as #Sn = n and v′′(∀i ∈ Sn) ≥ v′′(∀j /∈ Sn). 19 / 21
  • 66. Contents Random sampling on DB records Preliminary Theorems Extra Slides Logarithmic Variances Why WRS of T by v-weight is possible? Fox fixed 0 ≤ v1 ≪ v2, if random variable x1, x2 ≥ 0 satisfies x1 x2 ∼ F(2, 2) : Prob[ v1x1 x2 > v2] = v1 v1 + v2 ≈ v1 v2 . Thus, under the condition X ×⊥ X′ = F(2, 2), along with x(i) iid ∼ X and x′(i) iid ∼ X′, calculate v′′(i) := v′(i)/x′(i) where v′(i) := v(i) × x(i). Define Sn as #Sn = n and v′′(∀i ∈ Sn) ≥ v′′(∀j /∈ Sn). Then {T(i)|i ∈ Sn} is a sample from T approximately by v-weight. 19 / 21
  • 67. Contents Random sampling on DB records Preliminary Theorems Extra Slides Logarithmic Variances Summary   A curious relation of |T(2) | and Bradley-Terry model. That are derived from various decomposition of F(2, 2). The application to weighted random sampling, which leads the understanding of initial DB understanding. 20 / 21
  • 68. Contents Random sampling on DB records Preliminary Theorems Extra Slides Transcendental Functions Transcendental Functions 超越関数 Gamma Function : Γ(z) = ∫ ∞ 0 e−t tz−1 dt = ∫ 1 0 (− log t)z−1 dt for Re z > 0, Γ(z) = lim n→∞ nz / n∏ k=0 (1 + z k ) for z ∈ C. Beta function : B(x, y) = ∫ 1 0 tx−1 (1 − t)y−1 dt for Re x > 0, Re y > 0, B(x, y) = Γ(x)Γ(y)/Γ(x + y) for x, y ∈ C. 21 / 21