Deep Learning Opening Workshop - Horseshoe Regularization for Machine Learnin...
韓国での研究集会で用いた発表スライド
1. On Information Geometry and its Applications
.
.
Masaki Asano (M1)
Osaka City University
My advisor is Prof. Ohnita
July 24, 2012
Masaki Asano (Osaka City University) On Information Geometry and its Applications July 24, 2012 1 / 16
2. Contents of this talk
1 Statistical model, Fisher metric and α-connection
2 Statistical manifold
3 Applications
Masaki Asano (Osaka City University) On Information Geometry and its Applications July 24, 2012 2 / 16
3. Statistical model, Fisher metric and α-connection
Statistical model
(Ω, F , P) : a probability space
Ξ : an open domain of Rn (a parameter space)
Definition 1.1
S is a statistical model or parametric model on Ω
def
⇐⇒ S is a set of probability densities with parameter ξ ∈ Ξ such
that
S = p(x; ξ) p(x; ξ)dx = 1, p(x; ξ) > 0, ξ ∈ Ξ ⊂ Rn ,
Ω
where P(A) = A
p(x; ξ)dx, (A ∈ F ).
We assume S is a smooth manifold with local coordinate system Ξ.
.
Masaki Asano (Osaka City University) On Information Geometry and its Applications July 24, 2012 3 / 16
4. Statistical model, Fisher metric and α-connection
Statistical model
(Ω, F , P) : a probability space
Ξ : an open domain of Rn (a parameter space)
Definition 1.1
S is a statistical model or parametric model on Ω
def
⇐⇒ S is a set of probability densities with parameter ξ ∈ Ξ such
that
S = p(x; ξ) p(x; ξ)dx = 1, p(x; ξ) > 0, ξ ∈ Ξ ⊂ Rn ,
Ω
where P(A) = A
p(x; ξ)dx, (A ∈ F ).
We assume S is a smooth manifold with local coordinate system Ξ.
.
Masaki Asano (Osaka City University) On Information Geometry and its Applications July 24, 2012 3 / 16
5. Statistical model, Fisher metric and α-connection
Statistical model
(Ω, F , P) : a probability space
Ξ : an open domain of Rn (a parameter space)
Definition 1.1
S is a statistical model or parametric model on Ω
def
⇐⇒ S is a set of probability densities with parameter ξ ∈ Ξ such
that
S = p(x; ξ) p(x; ξ)dx = 1, p(x; ξ) > 0, ξ ∈ Ξ ⊂ Rn ,
Ω
where P(A) = A
p(x; ξ)dx, (A ∈ F ).
We assume S is a smooth manifold with local coordinate system Ξ.
.
Masaki Asano (Osaka City University) On Information Geometry and its Applications July 24, 2012 3 / 16
6. Statistical model, Fisher metric and α-connection
Fisher metric
For simplicity,
Eξ [ f ] = f (x)p(x; ξ)dx, (the expectation of f (x) w.r.t. p(x; ξ))
Ω
lξ = l(x; ξ) = log p(x; ξ) (the information of p(x; ξ))
∂
∂i = i
∂ξ
Definition 1.2 (Fisher information matrix)
g = (gi j ) is the Fisher information matrix of S .
def
⇐⇒
gi j (ξ) := Eξ ∂i lξ ∂ j lξ
Masaki Asano (Osaka City University) On Information Geometry and its Applications July 24, 2012 4 / 16
7. Statistical model, Fisher metric and α-connection
Fisher metric
For simplicity,
Eξ [ f ] = f (x)p(x; ξ)dx, (the expectation of f (x) w.r.t. p(x; ξ))
Ω
lξ = l(x; ξ) = log p(x; ξ) (the information of p(x; ξ))
∂
∂i = i
∂ξ
Definition 1.2 (Fisher information matrix)
g = (gi j ) is the Fisher information matrix of S .
def
⇐⇒
gi j (ξ) := Eξ ∂i lξ ∂ j lξ
Masaki Asano (Osaka City University) On Information Geometry and its Applications July 24, 2012 4 / 16
8. Statistical model, Fisher metric and α-connection
Proposition 1.3
The following conditions are equivalent.
• g is positive definite.
• {∂1 pξ , · · · , ∂n pξ } are linearly independent.
.
• {∂1 lξ , · · · , ∂n lξ } are linearly independent.
We assume that one of the above conditions is satisfied and gi j (ξ)
is finite for all i, j, ξ .
=⇒ We can define a Riemannian metric on S .
.
=⇒ The metric g is called Fisher metric.
Masaki Asano (Osaka City University) On Information Geometry and its Applications July 24, 2012 5 / 16
9. Statistical model, Fisher metric and α-connection
Proposition 1.3
The following conditions are equivalent.
• g is positive definite.
• {∂1 pξ , · · · , ∂n pξ } are linearly independent.
.
• {∂1 lξ , · · · , ∂n lξ } are linearly independent.
We assume that one of the above conditions is satisfied and gi j (ξ)
is finite for all i, j, ξ .
=⇒ We can define a Riemannian metric on S .
.
=⇒ The metric g is called Fisher metric.
Masaki Asano (Osaka City University) On Information Geometry and its Applications July 24, 2012 5 / 16
10. Statistical model, Fisher metric and α-connection
Proposition 1.3
The following conditions are equivalent.
• g is positive definite.
• {∂1 pξ , · · · , ∂n pξ } are linearly independent.
.
• {∂1 lξ , · · · , ∂n lξ } are linearly independent.
We assume that one of the above conditions is satisfied and gi j (ξ)
is finite for all i, j, ξ .
=⇒ We can define a Riemannian metric on S .
.
=⇒ The metric g is called Fisher metric.
Masaki Asano (Osaka City University) On Information Geometry and its Applications July 24, 2012 5 / 16
11. Statistical model, Fisher metric and α-connection
Proposition 1.3
The following conditions are equivalent.
• g is positive definite.
• {∂1 pξ , · · · , ∂n pξ } are linearly independent.
.
• {∂1 lξ , · · · , ∂n lξ } are linearly independent.
We assume that one of the above conditions is satisfied and gi j (ξ)
is finite for all i, j, ξ .
=⇒ We can define a Riemannian metric on S .
.
=⇒ The metric g is called Fisher metric.
Masaki Asano (Osaka City University) On Information Geometry and its Applications July 24, 2012 5 / 16
12. Statistical model, Fisher metric and α-connection
α-connection
Definition 1.4
For α ∈ R, we define the α-connection (α)
by the following formula,
(α) 1−α
g ∂i ∂ j , ∂k = E ∂i ∂ j l ξ + ∂i lξ ∂ j lξ ∂k lξ .
2 .
By the definition of α-connection,
(0)
(1) is the Levi-Civita connection of the Fisher metric g.
(2) (α)
is torsion-free (∀ α),
(α) (α) (α)
i.e. T (X, Y) := X Y − Y X − [X, Y] ≡ 0. .
(α) (α)
(3) ( X g)(Y, Z) =( Y g)(X, Z).
Masaki Asano (Osaka City University) On Information Geometry and its Applications July 24, 2012 6 / 16
13. Statistical model, Fisher metric and α-connection
α-connection
Definition 1.4
For α ∈ R, we define the α-connection (α)
by the following formula,
(α) 1−α
g ∂i ∂ j , ∂k = E ∂i ∂ j l ξ + ∂i lξ ∂ j lξ ∂k lξ .
2 .
By the definition of α-connection,
(0)
(1) is the Levi-Civita connection of the Fisher metric g.
(2) (α)
is torsion-free (∀ α),
(α) (α) (α)
i.e. T (X, Y) := X Y − Y X − [X, Y] ≡ 0. .
(α) (α)
(3) ( X g)(Y, Z) =( Y g)(X, Z).
Masaki Asano (Osaka City University) On Information Geometry and its Applications July 24, 2012 6 / 16
14. Statistical model, Fisher metric and α-connection
Example 1.5 (Normal distribution)
ξ = (µ, σ) ∈ Ξ = R × R+ (Upper half-plane)
µ : mean (−∞ < µ < ∞), σ : standard deviation (0 < σ < ∞),
1 (x − µ)2
S = p(x; ξ) p(x; µ, σ) = √ exp −
2πσ 2σ2
.
1 1 0 1
. The Fisher metric : g = 2 . The curvature of S : − .
σ 0 2 2
Hyperbolic plane.
(x, y) ∈ H = {(x, y) ∈ R2 | y > 0}
1 1 0
The Poincare metric : g = 2 . The curvature of H : −1.
y 0 1
.
Masaki Asano (Osaka City University) On Information Geometry and its Applications July 24, 2012 7 / 16
15. Statistical model, Fisher metric and α-connection
Example 1.5 (Normal distribution)
ξ = (µ, σ) ∈ Ξ = R × R+ (Upper half-plane)
µ : mean (−∞ < µ < ∞), σ : standard deviation (0 < σ < ∞),
1 (x − µ)2
S = p(x; ξ) p(x; µ, σ) = √ exp −
2πσ 2σ2
.
1 1 0 1
. The Fisher metric : g = 2 . The curvature of S : − .
σ 0 2 2
Hyperbolic plane.
(x, y) ∈ H = {(x, y) ∈ R2 | y > 0}
1 1 0
The Poincare metric : g = 2 . The curvature of H : −1.
y 0 1
.
Masaki Asano (Osaka City University) On Information Geometry and its Applications July 24, 2012 7 / 16
16. Statistical manifold
Statistical manifold
(M, g) : a Riemannian manifold
: a torsion-free affine connection on M
i.e. T (X, Y) := X Y − Y X − [X, Y] ≡ 0
Definition 2.1
(M, , g) is a statistical manifold
def
⇐⇒ g is totally symmetric (0, 3)-tensor field.
We can find that connection satisfies the properties (1)-(3) of
α-connection (α) .
.
Masaki Asano (Osaka City University) On Information Geometry and its Applications July 24, 2012 8 / 16
17. Statistical manifold
Statistical manifold
(M, g) : a Riemannian manifold
: a torsion-free affine connection on M
i.e. T (X, Y) := X Y − Y X − [X, Y] ≡ 0
Definition 2.1
(M, , g) is a statistical manifold
def
⇐⇒ g is totally symmetric (0, 3)-tensor field.
We can find that connection satisfies the properties (1)-(3) of
α-connection (α) .
.
Masaki Asano (Osaka City University) On Information Geometry and its Applications July 24, 2012 8 / 16
18. Statistical manifold
Statistical manifold
(M, g) : a Riemannian manifold
: a torsion-free affine connection on M
i.e. T (X, Y) := X Y − Y X − [X, Y] ≡ 0
Definition 2.1
(M, , g) is a statistical manifold
def
⇐⇒ g is totally symmetric (0, 3)-tensor field.
We can find that connection satisfies the properties (1)-(3) of
α-connection (α) .
.
Masaki Asano (Osaka City University) On Information Geometry and its Applications July 24, 2012 8 / 16
19. Applications
Application 1
Statistical Models
Theorem Theorem
(Hon Van Le(2005))⇑⇓(Hon Van Le(2005))
ˆ ˆ ˆ ˆ ˆ ˆ
Statistical Manifolds
Masaki Asano (Osaka City University) On Information Geometry and its Applications July 24, 2012 9 / 16
20. Applications
Application 1
Statistical Models
Theorem Theorem
(Hon Van Le(2005))⇑⇓(Hon Van Le(2005))
ˆ ˆ ˆ ˆ ˆ ˆ
Statistical Manifolds
Masaki Asano (Osaka City University) On Information Geometry and its Applications July 24, 2012 10 / 16
21. Applications
Application 2
Information geometry is related to ...
• statistics,
• information theory,
• (almost) complex geometry,
• symplectic geometry,
• contact geometry,
• Wasserstein geometry...etc.
However, I am not sure.
These are that I want to study in future.
Masaki Asano (Osaka City University) On Information Geometry and its Applications July 24, 2012 11 / 16
22. Applications
Application 2
Information geometry is related to ...
• statistics,
• information theory,
• (almost) complex geometry,
• symplectic geometry,
• contact geometry,
• Wasserstein geometry...etc.
However, I am not sure.
These are that I want to study in future.
Masaki Asano (Osaka City University) On Information Geometry and its Applications July 24, 2012 11 / 16
23. Applications
Application 2
Information geometry is related to ...
• statistics,
• information theory,
• (almost) complex geometry,
• symplectic geometry,
• contact geometry,
• Wasserstein geometry...etc.
However, I am not sure.
These are that I want to study in future.
Masaki Asano (Osaka City University) On Information Geometry and its Applications July 24, 2012 11 / 16
24. Applications
Thank you for your attention!!
Masaki Asano (Osaka City University) On Information Geometry and its Applications July 24, 2012 12 / 16
25. Applications
References
• S. Amari and H. Nagaoka, Methods of Information Geometry,
Trans. of Math. Monograph, AMS, 2000.
• M. Gromov, Partial differential relations. Springer-Verlag,
Berlin, 1986.
• H-V. Le, Statistical manifolds are statistical models. J. Geom.
ˆ
84 (2005), no. 1-2, 83-93.
• J. Nash, C 1 -isometric imbeddings. Ann. of Math. 60, (1954).
383-396.
Masaki Asano (Osaka City University) On Information Geometry and its Applications July 24, 2012 13 / 16
26. Applications
For Question
Proof of Proposition 1.3.
For any n-dimensional vector c = t (c1 , c2 , . . . , cn ) (t denotes
transpose),
2
t
cgc = ci c j gi j (ξ) = Eξ ci ∂i lξ
i, j
.
Masaki Asano (Osaka City University) On Information Geometry and its Applications July 24, 2012 14 / 16
27. Applications
For Question
Proof of Theorem (roughly).
To prove this theorem,
any statistical manifold can be immersed into the statistical manifold
which the space of positive probability distributions is generalized
to.
We use the following two important immersion theorems.
• The Nash Immersion Theorem
• The Gromov Immersion Theorem
.
Masaki Asano (Osaka City University) On Information Geometry and its Applications July 24, 2012 15 / 16
28. Applications
For Question
Proof of Theorem (roughly).
To prove this theorem,
any statistical manifold can be immersed into the statistical manifold
which the space of positive probability distributions is generalized
to.
We use the following two important immersion theorems.
• The Nash Immersion Theorem
• The Gromov Immersion Theorem
.
Masaki Asano (Osaka City University) On Information Geometry and its Applications July 24, 2012 15 / 16
29. Applications
For Question
Theorem 3.1 (THE NASH IMMERSION THEOREM (1954))
Any smooth Riemannian manifold (M m , g) can be isometrically
N
immersed into (RN , i=1 dxi2 ) for some N depending on M m
.
Theorem 3.2 (THE GROMOV IMMERSION THEOREM (1986))
Suppose that M m is given with a smooth symmetric 3-form T . Then
there exists an immersion f : M m → RN1 (m) with .
m+1 m+2 ∗ N1 (m)
N1 (m) = 3(m + 2 + 3 ) such that f ( i=1 dxi ) = T 3
Masaki Asano (Osaka City University) On Information Geometry and its Applications July 24, 2012 16 / 16