The detailed derivation of the derivatives in Table 2 of Marginalized Denoising Auto-encoders for Nonlinear Representations by M. Chen, K. Weinberger, F. Sha, and Y. Bengio
The detailed derivation of the derivatives in Table 2 of Marginalized Denoising Auto-encoders for Nonlinear Representations by M. Chen, K. Weinberger, F. Sha, and Y. Bengio
http://www.cse.wustl.edu/~mchen/papers/deepmsda.pdf
Similar to The detailed derivation of the derivatives in Table 2 of Marginalized Denoising Auto-encoders for Nonlinear Representations by M. Chen, K. Weinberger, F. Sha, and Y. Bengio
Similar to The detailed derivation of the derivatives in Table 2 of Marginalized Denoising Auto-encoders for Nonlinear Representations by M. Chen, K. Weinberger, F. Sha, and Y. Bengio (20)
Call Girls in South Ex (delhi) call me [🔝9953056974🔝] escort service 24X7
The detailed derivation of the derivatives in Table 2 of Marginalized Denoising Auto-encoders for Nonlinear Representations by M. Chen, K. Weinberger, F. Sha, and Y. Bengio
1. The detailed derivation of the derivatives in Table 2 of
Marginalized Denoising Auto-encoders for Nonlinear Representations
by M. Chen, K. Weinberger, F. Sha, and Y. Bengio
Tomonari MASADA @ Nagasaki University
October 14, 2014
The derivative @zh
@~xd
can be obtained as follows:
z =
(
W~x + b
)
=
1
1 + exp(W~x b)
(1)
) @zh
@~xd
=
@
@~xd
1
1 + exp(Σ
d whd~xd bh)
=
whd exp(Σ
d whd~xd bh)
f1 + exp(Σ
d whd~xd bh)g2
=
1
1 + exp(
Σ
d whd~xd bh)
{
1 1
1 + exp(
Σ
d whd~xd bh)
}
whd
= zh(1 zh)whd : (2)
For the cross-entropy loss, we obtain the following:
(
x; f(~x)
ℓ
)
= x
⊤
log (W
⊤
z + b
′
) (1 x)
⊤
log
{
1 (W
⊤
z + b
}
′
)
= x
⊤
log
{
1
1 + exp(W⊤z b′)
}
(1 x)
⊤
log
{
exp(W⊤z b′)
1 + exp(W⊤z b′)
}
= x
⊤
logf1 + exp(W
⊤
z b
′
)g (1 x)
⊤
(W
⊤
z b
′
) + (1 x)
⊤
log
{
1 + exp(W
⊤
z b
}
′
)
⊤
(W
= (1 x)
⊤
z b
′
) + 1
⊤
log
{
1 + exp(W
⊤
z b
′
)
}
=
Σ
d
(1 xd)
(
Σ
h
′
d
whdzh b
)
+
Σ
d
log
{
1 + exp
(
Σ
h
′
d
whdzh b
)}
(3)
) @ℓ
@zh
=
Σ
d
(1 xd)whd
Σ
d
whd exp(Σ
h whdzh b′
d)
1 + exp(Σ
h whdzh b′
d)
(4)
) @2ℓ
@z2h
= @
@zh
Σ
d
whd exp(Σ
h whdzh b′
d)
1 + exp(
Σ
h whdzh b′
d)
=
Σ
d
hd exp(
w2
Σ
h whdzh b′
d)
1 + exp(Σ
h whdzh b′
d)
Σ
d
w2
hd
fexp(
Σ
h whdzh b′
d)g2
f1 + exp(Σ
h whdzh b′
d)g2
=
Σ
d
hd exp(Σ
w2
h whdzh b′
d)
f1 + exp(Σ
h whdzh b′
d)g2
=
Σ
d
(
1
1 + exp(Σ
h whdzh b′
d)
)(
1 1
1 + exp(Σ
h whdzh b′
d)
)
w2
hd
=
Σ
d
yd(1 yd)w2
hd : (5)
1
2. For the squared loss, we obtain the following:
(
x; f(~x)
ℓ
)
= ∥x (W
⊤
z + b
′
)∥2 =
Σ
d
{
xd
(Σ
h
whdzh + b
′
d
)}2
(6)
) @ℓ
@zh
=
@
@zh
Σ
d
{
xd
(Σ
h
whdzh + b
′
d
)}2
= 2
Σ
d
whd
{
xd
(Σ
h
′
d
whdzh + b
)}
(7)
) @2ℓ
@z2h
= @
@zh
2
Σ
d
whd
{
xd
(Σ
h
′
d
whdzh + b
)}
= 2
Σ
d
w2
hd : (8)
2