1 1
B
. 6 1 1
I + L
1 1
1 12
0 1 1
6 1 +
•
-
u
w x
•
- p
- u p p
•
- w x
- u
p
ʼt
u
u
p u
( O XO 3KbO d C O
u
( w Y 9 x
•
- ö p ~
p
– |p2ZZOXNSa
– W J J!W
(
u
C 9D!
u w x
k(x, y) = exp( ||x y||2
/ 2
)p u
X HX
NKM
(Xi) = k(·, Xi)
: X ! HX
f(x) = hf, k(·, x)iHX
, 8f 2 HX . )
u |
b
Y
C 9D
r
• p u
pA u r
• × p t
Z
k(·, y) x(y) = k(·, x)
X ⇠ pX k(x1, x2) = h (x1), (x2)iHX
µX :=
Z
k(·, x)dpX (x) =
Z
(x)dpX(x)
pX
u
• × u p
wA oJA x
p G u p
u
• × ʼ ~
-
- u ×
k(x, x0
) = exx0
µX(t) = EX⇠P [k(t, X)] =
Z
etx
dP(x)
=
Z
1 + tx +
t2
x2
2
+
t3
x3
3!
+ · · · dP(x)
= 1 + tEX⇠P [X] +
t2
2
EX⇠P [X2
] +
t3
3!
EX⇠P [X3
] + · · ·
ˆµX :=
1
n
nX
i=1
k(·, xi)
!
• w8 O YX O K ?:AD .x
- u w x
-
•
- 5 WKaSW W WOKX NS M OZKXMb!r
w x
- A0B 5
• u HY RS K K O K ?:AD !
- u w pFS SZONSKä
x 5
X1, ..., X` ⇠ P Y1, ..., Yn ⇠ Q
H0 : P = Q vs H1 : P 6= Q
C 9D
rBSS K
y
f |
u fz
M2
(P, Q) := ||µP µQ||2
Hk
µP
µQ
,
•
- C 9D
•
r
• u
ä Z
u pS O
hg, CY X fiHY
= E[f(X)g(Y )] 8f 2 HX , g 2 HY
CY X : HX ! HY
7 YW DYXQ O K (
k((x1, y1), (x2, y2)) = kX(x1, x2)kY (y1, y2)
HX ⌦ HY
X, Y ⇠ p HX and (x), HY and (y).
CY X = E[ (Y ) ⌦ (X)] 2 HY ⌦ HX
u
-
• a~ H A HeG0a! u
r
|
wERW x
- u r
× pDYXQ O K (
• r
7 YW DYXQ O K (
µY |X=x = E[ (Y )|X = x]
µY |X=x = CY X C 1
XX (x)
ˆµY |X=x = ˆCY X ( ˆCXX + I) 1
(x)
E[g(Y )|X = x] ⇡ hg, ˆµY |X=xiHY
,
8 g 2 HY .
.
u
• 3KbO d O
- ~ p p
-
• ~ r 4 4 D 4 234 O Mv
• ×
~
• u u
- u
p × |
t
p⇡
(Y |X = x) / ⇡(Y )p(X = x|Y )
ö
O XO 3KbO C O7 WSc O K ?:AD !
• ö
• 8YK
• p ö ~
u ~
• r
• u
ʼ
p
Y ⇡(Y ) p(X = x|Y )
x ⇡(Y )
p⇡
(Y |X = x)
µ⇡
Y (X = x)
µ⇡
Y (X = x) = C⇡
Y X C⇡ 1
XX (x)
3 C⇡
Y X = Ep⇡(Y,X)[ Y (Y ) ⌦ X (X)]
p⇡
(Y, X) = ⇡(Y )p(X|Y ) 6= p(Y, X).
p | ~p(Y, X)
• r
- ä u
• EROY OW A YYP SX 2ZZOXNSa! p
~ w u | x
• u | p
~
~ p p
- ×
ˆµ⇡
Y (X = x) = ˆC⇡
Y X ([ ˆC⇡
XX]2
+ I) 1 ˆC⇡
XX (x)
ˆC⇡
XX (B + I) 1
z
(B2
+ I) 1
Bz
C⇡
Y X = C(Y X)Y C 1
Y Y ⇡Y , C⇡
XX = C(XX)Y C 1
Y Y ⇡Y
u
• L O _K SYX
• u
8 lXO j NO O K :4 = !
•
COQ3KbO IR O K =C )!
u |s
H6D
|s
ER O RY NSXQ COQ K ScK SYX COQ3KbO
(
u
• u
A YYP SX 2ZZOXNSa!
• COQ3KbO IR O K =C )!
• p
min
p(Y |X=x)
KL(p(Y |X = x)||⇡(Y ))
Z
log p(X = x|Y )dp(Y |X = x)
s.t. p(Y |X = x) 2 Pprob
min
p(Y |X=x),⇠
KL(p(Y |X = x)||⇡(Y ))
Z
log p(X = x|Y )dp(Y |X = x) + U(⇠)
s.t. p(Y |X = x) 2 Pprob(⇠)
u
)
|
r u p
p
A YZ
ER O RY NSXQ
OQ K ScK SYX
COQ3KbO
ä
"[µ]
"[µ]
"S[µ]
ˆ"S[µ]
ˆ"+
S [µ]
ˆ" ,n[µ]
|
r ~ u
A YZ
ER O RY NSXQ
OQ K ScK SYX
COQ3KbO
ä
"[µ]
"[µ]
"S[µ]
ˆ"S[µ]
ˆ"+
S [µ]
ˆ" ,n[µ]
~ ERW )!
ERW !
ERW (!
ERW !
• p
p
• p p
- – p p
r u | p
ä A YYP SX 2ZZOXNSa!
• EROY OW )
p
→ ~
u
p⇡
(Y |X = x) µ⇡
Y (X = x) 2 HY
h 2 HY
hh, µ⇡
Y (X = x)iHY
= Ep⇡(Y |X=x)[h(Y )]
"[µ] := sup
||h||HY
1
EX [(EY [h(Y )|X]] hh, µ(X)iHY
)2
]
EX [·] p⇡
(X) EY [·|X] p⇡
(Y |X)
"S[µ] = E(X,Y )⇠p⇡(X,Y )[|| (Y ) µ(X)||2
HY
]
µ⇤
:= argminµ2HK
"[µ] = argminµ2HK
"s[µ] pX a.s.
EROY OW ) Y O NO KS SX 2ZZOXNSXa!
(xi, yi)n
i=1
"[µ] µ : X ! HY
,
"[µ] "S[µ]
s
• r
|
•
- r
→
ö u r
QS_OX
→ w pERW x
RO O
"S[µ]
f(x, y) := || (y) µ(x)||2
HY
2 HX ⌦ HY
"s[µ] = E(X,Y )[|| (Y ) µ(X)||2
HY
] = hf, µ(X,Y )iHX ⌦HY
ˆµ(X,Y ) = ˆC(X,Y )Y ( ˆCY Y + I) 1
ˆ⇡Y
ˆ"s[µ] = hˆµ(X,Y ), fiHX ⌦HY
p⇡
(X, Y )
A YZ A YYP SX 2ZZOXNSa!
ˆ⇡Y =
P`
i=1 ˜↵i (˜yi)
"S[µ]
-
ˆ✏s[µ] =
nX
i=1
i|| (yi) µ(xi)||2
HY
,
where. = (GY + n I) 1 ˜GY ˜↵, (GY )ij = kY (yi, yj), ( ˜GY )ij = kY (yi, ˜yj)
ER O RY NSXQ OQ K ScK SYX
• pA YZ ~ p
• EROY OW ( | p
R O RY NSXQ OQ K ScK SYX p?S RSbKWK O K K GS_
ü A 5A ! u
| p EROY OW (
MYX S OXMb
i
i
+
i := max(0, i)
ER O RY NSXQ OQ K ScK SYX
✏+
s [µ]✏s[µ]r
|ˆ✏+
s [µ] ✏s[µ]|
p
! 0.
.
SXSWScSXQ KK
• p
• EROY OW
- × u p
ˆ✏+
s [µ]
ˆ✏ ,n[µ] =
nX
i=1
+
i || (yi) µ(xi)||2
HY
+ ||µ||2
HK
A YZY S SYX A YYP SX 2ZZOXNSXa!
ˆµ ,n = argminµ ˆ" ,n[µ]
ˆµ ,n(x) = (KX + n⇤+
) 1
K:x
where. = ( (y1), ..., (yn)), (KX )ij = kX (xi, xj),
⇤+
= diag(1/ +
1 , ..., 1/ +
n ), K:x = (kX (x, x1), ..., kX (x, xn))T
.
"S[ˆµ n,n]
P
! min
µ
"S[µ]. u
COQ3KbO
• ER O RY NSXQ OQ K ScK SYX p
ʼ
• COQLKbO
- r
- A YZ p
- s
• –p ü m n – 11
• u ö t
L :=
mX
i=1
+
i || (yi) µ(xi)||2
HY
+ ||µ||2
HK
+
nX
i=m+1
||µ(xi) (ti)||2
HY
.
A YZY S SYX (
ˆµreg(x) = (KX + ⇤+
) 1
K:x
where. = ( (y1), ... (yn)), (KX )ij = kX (xi, xj),
⇤+
= diag(1/ +
1 , ..., 1/ +
m, 1/ , ..., 1/ ), K:x = (kX (x, x1), ..., kX (x, xn))T
.
6aZO SWOX
• 4KWO K ZY S SYX OMY_O b !
• r × p
p
r i × !
• r
w x
u u | p u ×
| p × !
{(xt, yt)}m
t=1.
✓t+1 = ✓t + 0.2 + N (0, 4e 1), rt+1 = max(R2, min(R1, rt + N (0, 1))
xt+1 = rt+1 cos ✓t+1, yt+1 = rt+1 sin ✓t+1
{(xt, yt)}m
t=1.
×
• u u
u
• ö COQ3KbO ~
→ ä | D ZO _S SYX NK K p
COQ3KbO
• u 3C!p
7!
ER O RY NSXQ OQ K ScK SYX Z 3C!
COQ3KbO D6
- COQ3KbO ä
u | ö ~ p
~
ä ʼ
~
(
R1 = 0, R2 = 10
R1 = 5, R2 = 7
0 ä
~ ʼ
ö
It It+1
(xt, yt) (xt+1, yt+1)
A ONSM SYX
ö !
w x
p((xt+1, yt+1)|I1, ..., IT )
p((xt+1, yt+1)|I1, ..., IT , It+1)
COPO OXMO
• 2 R 8 O YX K OX 3Y Q K N K O CK MR 3O XRK N DMRk YZP KXN 2 DWY K 2
O XO WO RYN PY RO Y KWZ O Z YL OW
ZKQO (g ,
• =O DYXQ YXK RKX 9 KXQ 2 Oa DWY K KXN OXTS 7 WSc 9S LO ZKMO OWLONNSXQ YP
MYXNS SYXK NS SL SYX S R KZZ SMK SYX Y NbXKWSMK b OW
ZKQO . g. - 24 .
• =O DYXQ OXTS 7 WSc KXN 2 R 8 O YX O XO OWLONNSXQ YP MYXNS SYXK
NS SL SYX 2 XSPSON O XO P KWO Y PY XYXZK KWO SM SXPO OXMO SX Q KZRSMK WYNO
( ) .- (
• OXTS 7 WSc =O DYXQ KXN 2 R 8 O YX O XO LKbO d O
ZKQO ,(,g ,)
• HKXQ DYXQ X IR KXN HYXQ COX O XO 3KbO SKX :XPO OXMO S R AY O SY
COQ K ScK SYX ZKQO ), ( ),,
• H bK HY RS K K EYWYRK : K K 9S Y RS DK KNK KXN EK O RS HKWKNK 4 Y NYWKSX
WK MRSXQ PY LKQ YP Y N NK K _SK O XO OWLONNSXQ YP K OX NS SL SYX
ZKQO ) ) (
• H ?S RSbKWK 2LNO KW 3Y K SK 2 R 8 O YX KXN OXTS 7 WSc 9S LO ZKMO
OWLONNSXQ YP ZYWNZ K GS_ Z OZ SX K GS_ )--,
• D OPPOX 8 lXO j NO 8 b =O_O = MK 3K NK K O DKW AK O YX 2 R 8 O YX KXN
K SWS SKXY AYX S 4YXNS SYXK WOKX OWLONNSXQ K OQ O Y
ZKQO - (g -(
• X IR ?SXQ 4ROX KXN 6 SM A GSXQ 3KbO SKX SXPO OXMO S R ZY O SY OQ K ScK SYX KXN
KZZ SMK SYX Y SXPSXS O K OX _W
! ,..g -), )
)
Kernel Bayesian Inference with Posterior Regularization (Appendix)
Yuchi Matsuoka
2017 3 18
1 Preliminaries
(X, BX ) pX HX k(·, ·) RKHS pX
µX = EpX
[φ(X)] ∈ HX φ(X) = k(X, ·). 1
f ∈ H EpX
[f(X)] = EpX
[⟨f, φ(X)⟩] =
⟨f, µX⟩ universal kernel RKHS H sup norm CX
2 (X, BX ), (Y, BY) φ(x), ψ(y) RKHS HX ,HY p X ×Y
(X, Y ) CXY CXY = Ep[φ(X)⊗ψ(Y )] k((x1, y1), (x2, y2)) = kX (x1, x2)kY(y1, y2)
RKHS HX ⊗ HY µ(XY )
Theorem 1 CXX µX ∈ R(CXX) g ∈ HY E[g(Y )|X = ·] ∈ HX
µY = CY XC−1
XXµX, µY |X=x = E[ψ(Y )|X = x] = CY XC−1
XXφ(x).
2
µX pX {xi}N
i=1 ˆµX = 1
N
N
i=1 φ(xi), CXY
ˆCXY = 1
N
N
i=1 φ(xi) ⊗ ψ(yi)
RKHS Op(N−1/2
)
1
,i.e. supx kX (x, x) < ∞
1
3 Theorem 1
Theorem 1 ((Song et al., 2009, Equation 6)) mΠ mQy HX Π HY QY CXX
mΠ ∈ R(CXX) g ∈ HY E[g(Y )|X = ·] ∈ HX
mQy = CY XC−1
XXmΠ.
C−1
XXmΠ CXX mΠ
Proof CXXf = mΠ f ∈ HX g ∈ HY
⟨CY Xf, g⟩ = ⟨f, CXY g⟩ = ⟨f, CXXE[g(Y )|X = ·]⟩
= ⟨CXXf, E[g(Y )|X = ·]⟩ = ⟨mΠ, E[g(Y )|X = ·]⟩ = ⟨mQy , g⟩.
⟨mΠ, E[g(Y )|X = ·]⟩ = ⟨mQy , g⟩
⟨f, mX⟩ = E[f(X)]
(X, Y ) ∼ p(x, y). U ∼ π(u). (Z, W) ∼ q(x, y) = π(x)p(y|x), qY(y) = q(x, y)dx
⟨mQY
, g⟩HY
= E[g(W)] = g(w)qY(w)dw
⟨mΠ, E[g(Y )|X = ·]⟩ = EU [EY [g(Y )|U]] =
X
(
Y
g(y)p(y|u)dx)π(u)du
=
Y
(
X
g(y)q(u, y)du)dx =
Y
g(y)qY(y)dy.
mQy = CY XC−1
XXmΠ mΠ kX (·, x) E[kY(·, Y )|X = x] = CY XC−1
XXkX (·, x) ✷
4
π(Y ) Y p(X = x|Y ) pπ
(Y |X = x) π(Y ) x pπ
(X, Y ) = π(Y )p(X|Y )
πY CXY µπ
Y (X = x)
pπ
(Y |X = x) ∝ π(Y )p(X = x|Y ).
2
p(X|Y ) X × Y p CXY Thm.
1 Cπ
Y X pπ
Cπ
XX pπ
X
µπ
Y (X = x) = Cπ
Y XCπ −1
XX φ(x).
Cπ
Y X HY ⊗ HX µ(Y X) Thm 1.
µ(Y X) = C(Y X)Y C−1
Y Y πY , where. C(Y X)Y := E[ψ(Y ) ⊗ φ(X) ⊗ ψ(Y )].
Cπ
XX
µ(XX) = C(XX)Y C−1
Y Y πY
5
Regularized Bayesian inference (RegBayes) Pprob
minp(Y |X=x) KL(p(Y |X = x)||π(Y )) − log p(X = x|Y )dp(Y |X = x)
s.t. p(Y |X = x) ∈ Pprob
Proof
KL(p(Y |X = x)||π(Y )) − log p(X = x|Y )dp(Y |X = x) = log
p(Y |X = x)
π(Y )
dp(Y |X = x) − log p(X = x|Y )dp(Y |X = x)
= log
p(Y |X = x)
π(Y )p(X = x|Y )
dp(Y |X = x)
= log
p(Y |X = x)
π(Y )p(X=x|Y )
pπ(X=x)
dp(Y |X = x) + log pπ
(X = x)dp(Y |X = x)
= KL p(Y |X = x)||
π(Y )p(X = x|Y )
pπ(X = x)
+ log pπ
(X = x).
arg minp(Y |X=x) KL(p(Y |X = x)||π(Y )) − log p(X = x|Y )dp(Y |X = x) = π(Y )p(X=x|Y )
pπ(X=x)
. ✷
3
6 Vector-valued regression
(RKHS )
E(f) :=
n
i=1
||yj − f(xj)||2
HY
+ λ||f||2
HK
,
where. yj ∈ HY, f : X → HY.
f RKHS HY f RKHS HK
6.0.1 Vector-values regression and RKHSs
{(xi, vi)}i≤m X × V i.i.d X (V, ⟨·, ·⟩V
E(X,V )[||f(X) − V ||2
V]
f : X → V vector-valued regression problem
[Definition] h : X → V (H, ⟨·, ·⟩Γ) x ∈ X, v ∈ V h → ⟨v, h(x)⟩V
RKHS HΓ
Riesz 2
x ∈ X, v ∈ V V HΓ Γx (Γxv ∈ HΓ ) h ∈ HΓ
⟨v, h(x)⟩V = ⟨h, Γxv⟩Γ
HΓ RKHS
Γx L(V) V V
Γ(x, x′
) ∈ L(V)
Γ(x, x′
)v ∈ (Γx′ v)(x) ∈ V
2
(Riesz ) H H R H∗
H φ ∈ H∗
yφ ∈ H x ∈ H φ(x) = ⟨x, yφ⟩
4
[Proposition 2.1] Γ : X × X → L(V)
(1) Γ(x, x′
) = Γ(x′
, x)∗
.
(2) n ∈ N, {(xi, vi)}i≤n ⊂ X × V i,j≤n⟨vi, Γ(xi, xj)vj⟩V ≥ 0.
E(X,V )[||f(X) − V ||2
V] n
i=1 ||vi − f(xi)||2
V f RKHS HΓ
HΓ
ˆϵλ(f) :=
n
i=1
||vi − f(xi)||2
V + λ||f||2
Γ.
Γxi
Theorem 2.2.(Adapted from G. Lever and S. Gr¨unew¨alder+ 2012) f∗
ˆϵλ HΓ
f∗
=
n
i=1
Γxi
ci
{ci}, ci ∈ V
i≤n
(Γ(xj, xi) + λδji)ci = vj, 1 ≤ j ≤ n.
ˆϵλ(f)
5
7 ϵs[µ] ε[µ]
Proof
ε[µ] := sup
||h||HY
≤1
EX[(EY [h(Y )|X]] − ⟨h, µ(X)⟩HY
)2
]
= sup
||h||HY ≤1
EX[(EY [⟨h, ψ(Y )⟩HY
|X] − ⟨, h, µ(X)⟩HY
)2
]
≤ sup
||h||HY ≤1
EX,Y [⟨h, ψ(Y ) − µ(X)⟩2
HY
]
≤ sup
||h||HY ≤1
||h||2
HY
EX,Y [||ψ(Y ) − µ(X)||2
HY
]
= EX,Y [||ψ(Y ) − µ(X)||2
HY
] = ϵs[µ].
✷
8 Proposition 1
Proposition 1 (X, Y ) X × Y Y prior π(Y ) p(X|Y ) HX kX φ(x)
RKHS HY kY ψ(y) RKHS φ(x, y) HX ⊗ HY
ˆπY = ℓ
i=1 ˜αiψ(˜yi) πY {(xi, yi)}n
i=1 p(X|Y )
f(x, y) = ||ψ(y) − µ(x)||2
HY
f ∈ HX ⊗ HY
ˆϵs[µ] =
n
i=1
βi||ψ(yi) − µ(xi)||2
HY
,
β = (β1, ..., βn)T
β = (GY + nλI)−1 ˜GY ˜α (GY )ij = kY(yi, yj), ( ˜GY )ij = kY(yi, ˜yj), ˜α = (˜α1, ...˜αℓ)T
.
Proof K. Fukumizu 2016. Kernel Bayes Rule. Proposition 4
6
ΦX,Y = (φ(x1, y1), ..., φ(xn, yn)) ˆµ(X,Y ) = ΦX,Y β = ΦX,Y (GY + nλI)−1 ˜GY ˜α
HX ⊗ HY
ϵs[µ] = ⟨ˆµ(X,Y ), f⟩HX ⊗HY
= ⟨ΦX,Y (GY + nλI)−1 ˜GY ˜α, f⟩HX ⊗HY
= ⟨ΦX,Y β, f⟩HX ⊗HY
=
n
i=1
βi||ψ(yi) − µ(xi)||2
HY
.
ˆµ(X,Y ) = ˆC(X,Y )Y ( ˆCY Y + λI)−1
ˆπY h = ( ˆCY Y + λI)−1
ˆπY
h =
n
i=1
aiψ(yi) + h⊥
h⊥ h span(ψ(y1), ..., ψ(yn)} ( ˆCY Y + λI)h = ˆπY
1
n i,j≤n
aikY(yi, yj)ψ(yj) + λ
i≤n
aiψ(yi) + h⊥ =
i≤ℓ
˜αiψ(˜yi)
ψ(yk)|n
k=1
1
n
G2
Y a + λGY a = ˜GY ˜α ⇔
1
n
(GY + nλI)GY a = ˜GY ˜α ⇔
1
n
GY a = (GY + nλI)−1 ˜GY ˜α
ˆµ(X,Y )
ˆµ(X,Y ) =
1
n i≤n
φ(xi, yi) ⊗ ψ(yi) h =
1
n
ΦX,Y GY a = ΦX,Y (GY + nλI)−1 ˜GY ˜α
✷
7
9 Proposition 2
Proposition 2 i β+
i ̸= 0 µ ∈ HK HK K(xi, xj) = kX (xi, xj)I
I : HK → HK
ˆµλ,n(x) = Ψ(KX + λnΛ+
)−1
K:x
Ψ = (ψ(y1), ..., ψ(yn)) (KX)ij = kX (xi, xj) Λ+
= diag(1/β+
1 , ..., 1/β+
n ) K:x = (kX (x, x1), ..., kX (x, xn))T
λn
Proof β+
i = 0 (xi, yi) i β+
i ̸= 0
µ = µ0 + g µ0 = n
i=1 Kxi
ci ˆϵλ,n[µ]
ˆϵλ,n[µ] =
n
i=1
β+
i ||ψ(yi) − µ(xi)||2
HY
+ λn||µ||2
HK
=
n
i=1
β+
i ||ψ(yi) − (µ0(xi) + g(xi))||2
HY
+ λn||µ0 + g||2
HK
=
n
i=1
β+
i ||ψ(yi) − µ0(xi)||2
+ λn||µ0||2
+
n
i=1
β+
i ||g(xi)||2
+ λn||g||2
+ 2λn⟨µ0, g⟩ − 2
n
i=1
β+
i ⟨g(xi), ψ(yi) − µ0(xi)⟩.
i ψ(yi) − n
j=1 kX (xi, xj)cj = λn
β+
i
ci ˆϵλ,n[µ]
λn⟨µ0, g⟩ −
n
i=1
β+
i ⟨g(xi), ψ(yi) − µ0(xi)⟩ = 0
ˆϵλ,n[µ] = ˆϵλ,n[µ0] +
n
i=1
β+
i ||g(xi)||2
+ λn||g||2
≥ ˆϵλ,n[µ0]
ψ(yi) − n
j=1 kX (xi, xj)cj = λn
β+
i
ci ci µ0 = n
i=1 Kxi
ci
(KX + λnΛ+
)c = Ψ
µ0(x) =
n
i=1
kX (x, xi)ci = Ψ(KX + λnΛ+
)−1
K:x
✷
8

関西NIPS+読み会発表スライド

  • 1.
    1 1 B . 61 1 I + L 1 1 1 12 0 1 1 6 1 +
  • 2.
    • - u w x • - p -u p p • - w x - u p ʼt
  • 3.
    u u p u ( OXO 3KbO d C O u ( w Y 9 x • - ö p ~ p – |p2ZZOXNSa – W J J!W (
  • 4.
    u C 9D! u wx k(x, y) = exp( ||x y||2 / 2 )p u X HX NKM (Xi) = k(·, Xi) : X ! HX f(x) = hf, k(·, x)iHX , 8f 2 HX . )
  • 5.
    u | b Y C 9D r •p u pA u r • × p t Z k(·, y) x(y) = k(·, x) X ⇠ pX k(x1, x2) = h (x1), (x2)iHX µX := Z k(·, x)dpX (x) = Z (x)dpX(x) pX
  • 6.
    u • × up wA oJA x p G u p u • × ʼ ~ - - u × k(x, x0 ) = exx0 µX(t) = EX⇠P [k(t, X)] = Z etx dP(x) = Z 1 + tx + t2 x2 2 + t3 x3 3! + · · · dP(x) = 1 + tEX⇠P [X] + t2 2 EX⇠P [X2 ] + t3 3! EX⇠P [X3 ] + · · · ˆµX := 1 n nX i=1 k(·, xi)
  • 7.
    ! • w8 OYX O K ?:AD .x - u w x - • - 5 WKaSW W WOKX NS M OZKXMb!r w x - A0B 5 • u HY RS K K O K ?:AD ! - u w pFS SZONSKä x 5 X1, ..., X` ⇠ P Y1, ..., Yn ⇠ Q H0 : P = Q vs H1 : P 6= Q C 9D rBSS K y f | u fz M2 (P, Q) := ||µP µQ||2 Hk µP µQ ,
  • 8.
    • - C 9D • r •u ä Z u pS O hg, CY X fiHY = E[f(X)g(Y )] 8f 2 HX , g 2 HY CY X : HX ! HY 7 YW DYXQ O K ( k((x1, y1), (x2, y2)) = kX(x1, x2)kY (y1, y2) HX ⌦ HY X, Y ⇠ p HX and (x), HY and (y). CY X = E[ (Y ) ⌦ (X)] 2 HY ⌦ HX u -
  • 9.
    • a~ HA HeG0a! u r | wERW x - u r × pDYXQ O K ( • r 7 YW DYXQ O K ( µY |X=x = E[ (Y )|X = x] µY |X=x = CY X C 1 XX (x) ˆµY |X=x = ˆCY X ( ˆCXX + I) 1 (x) E[g(Y )|X = x] ⇡ hg, ˆµY |X=xiHY , 8 g 2 HY . .
  • 10.
    u • 3KbO dO - ~ p p - • ~ r 4 4 D 4 234 O Mv • × ~ • u u - u p × | t p⇡ (Y |X = x) / ⇡(Y )p(X = x|Y ) ö
  • 11.
    O XO 3KbOC O7 WSc O K ?:AD ! • ö • 8YK • p ö ~ u ~ • r • u ʼ p Y ⇡(Y ) p(X = x|Y ) x ⇡(Y ) p⇡ (Y |X = x) µ⇡ Y (X = x) µ⇡ Y (X = x) = C⇡ Y X C⇡ 1 XX (x) 3 C⇡ Y X = Ep⇡(Y,X)[ Y (Y ) ⌦ X (X)] p⇡ (Y, X) = ⇡(Y )p(X|Y ) 6= p(Y, X). p | ~p(Y, X)
  • 12.
    • r - äu • EROY OW A YYP SX 2ZZOXNSa! p ~ w u | x • u | p ~ ~ p p - × ˆµ⇡ Y (X = x) = ˆC⇡ Y X ([ ˆC⇡ XX]2 + I) 1 ˆC⇡ XX (x) ˆC⇡ XX (B + I) 1 z (B2 + I) 1 Bz C⇡ Y X = C(Y X)Y C 1 Y Y ⇡Y , C⇡ XX = C(XX)Y C 1 Y Y ⇡Y
  • 13.
    u • L O_K SYX • u 8 lXO j NO O K :4 = ! • COQ3KbO IR O K =C )! u |s H6D |s ER O RY NSXQ COQ K ScK SYX COQ3KbO (
  • 14.
    u • u A YYPSX 2ZZOXNSa! • COQ3KbO IR O K =C )! • p min p(Y |X=x) KL(p(Y |X = x)||⇡(Y )) Z log p(X = x|Y )dp(Y |X = x) s.t. p(Y |X = x) 2 Pprob min p(Y |X=x),⇠ KL(p(Y |X = x)||⇡(Y )) Z log p(X = x|Y )dp(Y |X = x) + U(⇠) s.t. p(Y |X = x) 2 Pprob(⇠) u )
  • 15.
    | r u p p AYZ ER O RY NSXQ OQ K ScK SYX COQ3KbO ä "[µ] "[µ] "S[µ] ˆ"S[µ] ˆ"+ S [µ] ˆ" ,n[µ]
  • 16.
    | r ~ u AYZ ER O RY NSXQ OQ K ScK SYX COQ3KbO ä "[µ] "[µ] "S[µ] ˆ"S[µ] ˆ"+ S [µ] ˆ" ,n[µ] ~ ERW )! ERW ! ERW (! ERW !
  • 17.
    • p p • pp - – p p r u | p ä A YYP SX 2ZZOXNSa! • EROY OW ) p → ~ u p⇡ (Y |X = x) µ⇡ Y (X = x) 2 HY h 2 HY hh, µ⇡ Y (X = x)iHY = Ep⇡(Y |X=x)[h(Y )] "[µ] := sup ||h||HY 1 EX [(EY [h(Y )|X]] hh, µ(X)iHY )2 ] EX [·] p⇡ (X) EY [·|X] p⇡ (Y |X) "S[µ] = E(X,Y )⇠p⇡(X,Y )[|| (Y ) µ(X)||2 HY ] µ⇤ := argminµ2HK "[µ] = argminµ2HK "s[µ] pX a.s. EROY OW ) Y O NO KS SX 2ZZOXNSXa! (xi, yi)n i=1 "[µ] µ : X ! HY , "[µ] "S[µ]
  • 18.
    s • r | • - r → öu r QS_OX → w pERW x RO O "S[µ] f(x, y) := || (y) µ(x)||2 HY 2 HX ⌦ HY "s[µ] = E(X,Y )[|| (Y ) µ(X)||2 HY ] = hf, µ(X,Y )iHX ⌦HY ˆµ(X,Y ) = ˆC(X,Y )Y ( ˆCY Y + I) 1 ˆ⇡Y ˆ"s[µ] = hˆµ(X,Y ), fiHX ⌦HY p⇡ (X, Y ) A YZ A YYP SX 2ZZOXNSa! ˆ⇡Y = P` i=1 ˜↵i (˜yi) "S[µ] - ˆ✏s[µ] = nX i=1 i|| (yi) µ(xi)||2 HY , where. = (GY + n I) 1 ˜GY ˜↵, (GY )ij = kY (yi, yj), ( ˜GY )ij = kY (yi, ˜yj)
  • 19.
    ER O RYNSXQ OQ K ScK SYX • pA YZ ~ p • EROY OW ( | p R O RY NSXQ OQ K ScK SYX p?S RSbKWK O K K GS_ ü A 5A ! u | p EROY OW ( MYX S OXMb i i + i := max(0, i) ER O RY NSXQ OQ K ScK SYX ✏+ s [µ]✏s[µ]r |ˆ✏+ s [µ] ✏s[µ]| p ! 0. .
  • 20.
    SXSWScSXQ KK • p •EROY OW - × u p ˆ✏+ s [µ] ˆ✏ ,n[µ] = nX i=1 + i || (yi) µ(xi)||2 HY + ||µ||2 HK A YZY S SYX A YYP SX 2ZZOXNSXa! ˆµ ,n = argminµ ˆ" ,n[µ] ˆµ ,n(x) = (KX + n⇤+ ) 1 K:x where. = ( (y1), ..., (yn)), (KX )ij = kX (xi, xj), ⇤+ = diag(1/ + 1 , ..., 1/ + n ), K:x = (kX (x, x1), ..., kX (x, xn))T . "S[ˆµ n,n] P ! min µ "S[µ]. u
  • 21.
    COQ3KbO • ER ORY NSXQ OQ K ScK SYX p ʼ • COQLKbO - r - A YZ p - s • –p ü m n – 11 • u ö t L := mX i=1 + i || (yi) µ(xi)||2 HY + ||µ||2 HK + nX i=m+1 ||µ(xi) (ti)||2 HY . A YZY S SYX ( ˆµreg(x) = (KX + ⇤+ ) 1 K:x where. = ( (y1), ... (yn)), (KX )ij = kX (xi, xj), ⇤+ = diag(1/ + 1 , ..., 1/ + m, 1/ , ..., 1/ ), K:x = (kX (x, x1), ..., kX (x, xn))T .
  • 22.
    6aZO SWOX • 4KWOK ZY S SYX OMY_O b ! • r × p p r i × ! • r w x u u | p u × | p × ! {(xt, yt)}m t=1. ✓t+1 = ✓t + 0.2 + N (0, 4e 1), rt+1 = max(R2, min(R1, rt + N (0, 1)) xt+1 = rt+1 cos ✓t+1, yt+1 = rt+1 sin ✓t+1 {(xt, yt)}m t=1. ×
  • 23.
    • u u u •ö COQ3KbO ~ → ä | D ZO _S SYX NK K p COQ3KbO • u 3C!p 7! ER O RY NSXQ OQ K ScK SYX Z 3C! COQ3KbO D6 - COQ3KbO ä u | ö ~ p ~ ä ʼ ~ ( R1 = 0, R2 = 10 R1 = 5, R2 = 7 0 ä ~ ʼ ö It It+1 (xt, yt) (xt+1, yt+1) A ONSM SYX ö ! w x p((xt+1, yt+1)|I1, ..., IT ) p((xt+1, yt+1)|I1, ..., IT , It+1)
  • 24.
    COPO OXMO • 2R 8 O YX K OX 3Y Q K N K O CK MR 3O XRK N DMRk YZP KXN 2 DWY K 2 O XO WO RYN PY RO Y KWZ O Z YL OW ZKQO (g , • =O DYXQ YXK RKX 9 KXQ 2 Oa DWY K KXN OXTS 7 WSc 9S LO ZKMO OWLONNSXQ YP MYXNS SYXK NS SL SYX S R KZZ SMK SYX Y NbXKWSMK b OW ZKQO . g. - 24 . • =O DYXQ OXTS 7 WSc KXN 2 R 8 O YX O XO OWLONNSXQ YP MYXNS SYXK NS SL SYX 2 XSPSON O XO P KWO Y PY XYXZK KWO SM SXPO OXMO SX Q KZRSMK WYNO ( ) .- ( • OXTS 7 WSc =O DYXQ KXN 2 R 8 O YX O XO LKbO d O ZKQO ,(,g ,) • HKXQ DYXQ X IR KXN HYXQ COX O XO 3KbO SKX :XPO OXMO S R AY O SY COQ K ScK SYX ZKQO ), ( ),, • H bK HY RS K K EYWYRK : K K 9S Y RS DK KNK KXN EK O RS HKWKNK 4 Y NYWKSX WK MRSXQ PY LKQ YP Y N NK K _SK O XO OWLONNSXQ YP K OX NS SL SYX ZKQO ) ) ( • H ?S RSbKWK 2LNO KW 3Y K SK 2 R 8 O YX KXN OXTS 7 WSc 9S LO ZKMO OWLONNSXQ YP ZYWNZ K GS_ Z OZ SX K GS_ )--, • D OPPOX 8 lXO j NO 8 b =O_O = MK 3K NK K O DKW AK O YX 2 R 8 O YX KXN K SWS SKXY AYX S 4YXNS SYXK WOKX OWLONNSXQ K OQ O Y ZKQO - (g -( • X IR ?SXQ 4ROX KXN 6 SM A GSXQ 3KbO SKX SXPO OXMO S R ZY O SY OQ K ScK SYX KXN KZZ SMK SYX Y SXPSXS O K OX _W ! ,..g -), ) )
  • 25.
    Kernel Bayesian Inferencewith Posterior Regularization (Appendix) Yuchi Matsuoka 2017 3 18 1 Preliminaries (X, BX ) pX HX k(·, ·) RKHS pX µX = EpX [φ(X)] ∈ HX φ(X) = k(X, ·). 1 f ∈ H EpX [f(X)] = EpX [⟨f, φ(X)⟩] = ⟨f, µX⟩ universal kernel RKHS H sup norm CX 2 (X, BX ), (Y, BY) φ(x), ψ(y) RKHS HX ,HY p X ×Y (X, Y ) CXY CXY = Ep[φ(X)⊗ψ(Y )] k((x1, y1), (x2, y2)) = kX (x1, x2)kY(y1, y2) RKHS HX ⊗ HY µ(XY ) Theorem 1 CXX µX ∈ R(CXX) g ∈ HY E[g(Y )|X = ·] ∈ HX µY = CY XC−1 XXµX, µY |X=x = E[ψ(Y )|X = x] = CY XC−1 XXφ(x). 2 µX pX {xi}N i=1 ˆµX = 1 N N i=1 φ(xi), CXY ˆCXY = 1 N N i=1 φ(xi) ⊗ ψ(yi) RKHS Op(N−1/2 ) 1 ,i.e. supx kX (x, x) < ∞ 1
  • 26.
    3 Theorem 1 Theorem1 ((Song et al., 2009, Equation 6)) mΠ mQy HX Π HY QY CXX mΠ ∈ R(CXX) g ∈ HY E[g(Y )|X = ·] ∈ HX mQy = CY XC−1 XXmΠ. C−1 XXmΠ CXX mΠ Proof CXXf = mΠ f ∈ HX g ∈ HY ⟨CY Xf, g⟩ = ⟨f, CXY g⟩ = ⟨f, CXXE[g(Y )|X = ·]⟩ = ⟨CXXf, E[g(Y )|X = ·]⟩ = ⟨mΠ, E[g(Y )|X = ·]⟩ = ⟨mQy , g⟩. ⟨mΠ, E[g(Y )|X = ·]⟩ = ⟨mQy , g⟩ ⟨f, mX⟩ = E[f(X)] (X, Y ) ∼ p(x, y). U ∼ π(u). (Z, W) ∼ q(x, y) = π(x)p(y|x), qY(y) = q(x, y)dx ⟨mQY , g⟩HY = E[g(W)] = g(w)qY(w)dw ⟨mΠ, E[g(Y )|X = ·]⟩ = EU [EY [g(Y )|U]] = X ( Y g(y)p(y|u)dx)π(u)du = Y ( X g(y)q(u, y)du)dx = Y g(y)qY(y)dy. mQy = CY XC−1 XXmΠ mΠ kX (·, x) E[kY(·, Y )|X = x] = CY XC−1 XXkX (·, x) ✷ 4 π(Y ) Y p(X = x|Y ) pπ (Y |X = x) π(Y ) x pπ (X, Y ) = π(Y )p(X|Y ) πY CXY µπ Y (X = x) pπ (Y |X = x) ∝ π(Y )p(X = x|Y ). 2
  • 27.
    p(X|Y ) X× Y p CXY Thm. 1 Cπ Y X pπ Cπ XX pπ X µπ Y (X = x) = Cπ Y XCπ −1 XX φ(x). Cπ Y X HY ⊗ HX µ(Y X) Thm 1. µ(Y X) = C(Y X)Y C−1 Y Y πY , where. C(Y X)Y := E[ψ(Y ) ⊗ φ(X) ⊗ ψ(Y )]. Cπ XX µ(XX) = C(XX)Y C−1 Y Y πY 5 Regularized Bayesian inference (RegBayes) Pprob minp(Y |X=x) KL(p(Y |X = x)||π(Y )) − log p(X = x|Y )dp(Y |X = x) s.t. p(Y |X = x) ∈ Pprob Proof KL(p(Y |X = x)||π(Y )) − log p(X = x|Y )dp(Y |X = x) = log p(Y |X = x) π(Y ) dp(Y |X = x) − log p(X = x|Y )dp(Y |X = x) = log p(Y |X = x) π(Y )p(X = x|Y ) dp(Y |X = x) = log p(Y |X = x) π(Y )p(X=x|Y ) pπ(X=x) dp(Y |X = x) + log pπ (X = x)dp(Y |X = x) = KL p(Y |X = x)|| π(Y )p(X = x|Y ) pπ(X = x) + log pπ (X = x). arg minp(Y |X=x) KL(p(Y |X = x)||π(Y )) − log p(X = x|Y )dp(Y |X = x) = π(Y )p(X=x|Y ) pπ(X=x) . ✷ 3
  • 28.
    6 Vector-valued regression (RKHS) E(f) := n i=1 ||yj − f(xj)||2 HY + λ||f||2 HK , where. yj ∈ HY, f : X → HY. f RKHS HY f RKHS HK 6.0.1 Vector-values regression and RKHSs {(xi, vi)}i≤m X × V i.i.d X (V, ⟨·, ·⟩V E(X,V )[||f(X) − V ||2 V] f : X → V vector-valued regression problem [Definition] h : X → V (H, ⟨·, ·⟩Γ) x ∈ X, v ∈ V h → ⟨v, h(x)⟩V RKHS HΓ Riesz 2 x ∈ X, v ∈ V V HΓ Γx (Γxv ∈ HΓ ) h ∈ HΓ ⟨v, h(x)⟩V = ⟨h, Γxv⟩Γ HΓ RKHS Γx L(V) V V Γ(x, x′ ) ∈ L(V) Γ(x, x′ )v ∈ (Γx′ v)(x) ∈ V 2 (Riesz ) H H R H∗ H φ ∈ H∗ yφ ∈ H x ∈ H φ(x) = ⟨x, yφ⟩ 4
  • 29.
    [Proposition 2.1] Γ: X × X → L(V) (1) Γ(x, x′ ) = Γ(x′ , x)∗ . (2) n ∈ N, {(xi, vi)}i≤n ⊂ X × V i,j≤n⟨vi, Γ(xi, xj)vj⟩V ≥ 0. E(X,V )[||f(X) − V ||2 V] n i=1 ||vi − f(xi)||2 V f RKHS HΓ HΓ ˆϵλ(f) := n i=1 ||vi − f(xi)||2 V + λ||f||2 Γ. Γxi Theorem 2.2.(Adapted from G. Lever and S. Gr¨unew¨alder+ 2012) f∗ ˆϵλ HΓ f∗ = n i=1 Γxi ci {ci}, ci ∈ V i≤n (Γ(xj, xi) + λδji)ci = vj, 1 ≤ j ≤ n. ˆϵλ(f) 5
  • 30.
    7 ϵs[µ] ε[µ] Proof ε[µ]:= sup ||h||HY ≤1 EX[(EY [h(Y )|X]] − ⟨h, µ(X)⟩HY )2 ] = sup ||h||HY ≤1 EX[(EY [⟨h, ψ(Y )⟩HY |X] − ⟨, h, µ(X)⟩HY )2 ] ≤ sup ||h||HY ≤1 EX,Y [⟨h, ψ(Y ) − µ(X)⟩2 HY ] ≤ sup ||h||HY ≤1 ||h||2 HY EX,Y [||ψ(Y ) − µ(X)||2 HY ] = EX,Y [||ψ(Y ) − µ(X)||2 HY ] = ϵs[µ]. ✷ 8 Proposition 1 Proposition 1 (X, Y ) X × Y Y prior π(Y ) p(X|Y ) HX kX φ(x) RKHS HY kY ψ(y) RKHS φ(x, y) HX ⊗ HY ˆπY = ℓ i=1 ˜αiψ(˜yi) πY {(xi, yi)}n i=1 p(X|Y ) f(x, y) = ||ψ(y) − µ(x)||2 HY f ∈ HX ⊗ HY ˆϵs[µ] = n i=1 βi||ψ(yi) − µ(xi)||2 HY , β = (β1, ..., βn)T β = (GY + nλI)−1 ˜GY ˜α (GY )ij = kY(yi, yj), ( ˜GY )ij = kY(yi, ˜yj), ˜α = (˜α1, ...˜αℓ)T . Proof K. Fukumizu 2016. Kernel Bayes Rule. Proposition 4 6
  • 31.
    ΦX,Y = (φ(x1,y1), ..., φ(xn, yn)) ˆµ(X,Y ) = ΦX,Y β = ΦX,Y (GY + nλI)−1 ˜GY ˜α HX ⊗ HY ϵs[µ] = ⟨ˆµ(X,Y ), f⟩HX ⊗HY = ⟨ΦX,Y (GY + nλI)−1 ˜GY ˜α, f⟩HX ⊗HY = ⟨ΦX,Y β, f⟩HX ⊗HY = n i=1 βi||ψ(yi) − µ(xi)||2 HY . ˆµ(X,Y ) = ˆC(X,Y )Y ( ˆCY Y + λI)−1 ˆπY h = ( ˆCY Y + λI)−1 ˆπY h = n i=1 aiψ(yi) + h⊥ h⊥ h span(ψ(y1), ..., ψ(yn)} ( ˆCY Y + λI)h = ˆπY 1 n i,j≤n aikY(yi, yj)ψ(yj) + λ i≤n aiψ(yi) + h⊥ = i≤ℓ ˜αiψ(˜yi) ψ(yk)|n k=1 1 n G2 Y a + λGY a = ˜GY ˜α ⇔ 1 n (GY + nλI)GY a = ˜GY ˜α ⇔ 1 n GY a = (GY + nλI)−1 ˜GY ˜α ˆµ(X,Y ) ˆµ(X,Y ) = 1 n i≤n φ(xi, yi) ⊗ ψ(yi) h = 1 n ΦX,Y GY a = ΦX,Y (GY + nλI)−1 ˜GY ˜α ✷ 7
  • 32.
    9 Proposition 2 Proposition2 i β+ i ̸= 0 µ ∈ HK HK K(xi, xj) = kX (xi, xj)I I : HK → HK ˆµλ,n(x) = Ψ(KX + λnΛ+ )−1 K:x Ψ = (ψ(y1), ..., ψ(yn)) (KX)ij = kX (xi, xj) Λ+ = diag(1/β+ 1 , ..., 1/β+ n ) K:x = (kX (x, x1), ..., kX (x, xn))T λn Proof β+ i = 0 (xi, yi) i β+ i ̸= 0 µ = µ0 + g µ0 = n i=1 Kxi ci ˆϵλ,n[µ] ˆϵλ,n[µ] = n i=1 β+ i ||ψ(yi) − µ(xi)||2 HY + λn||µ||2 HK = n i=1 β+ i ||ψ(yi) − (µ0(xi) + g(xi))||2 HY + λn||µ0 + g||2 HK = n i=1 β+ i ||ψ(yi) − µ0(xi)||2 + λn||µ0||2 + n i=1 β+ i ||g(xi)||2 + λn||g||2 + 2λn⟨µ0, g⟩ − 2 n i=1 β+ i ⟨g(xi), ψ(yi) − µ0(xi)⟩. i ψ(yi) − n j=1 kX (xi, xj)cj = λn β+ i ci ˆϵλ,n[µ] λn⟨µ0, g⟩ − n i=1 β+ i ⟨g(xi), ψ(yi) − µ0(xi)⟩ = 0 ˆϵλ,n[µ] = ˆϵλ,n[µ0] + n i=1 β+ i ||g(xi)||2 + λn||g||2 ≥ ˆϵλ,n[µ0] ψ(yi) − n j=1 kX (xi, xj)cj = λn β+ i ci ci µ0 = n i=1 Kxi ci (KX + λnΛ+ )c = Ψ µ0(x) = n i=1 kX (x, xi)ci = Ψ(KX + λnΛ+ )−1 K:x ✷ 8