Gradient Descent

Gradient Descent
Glossary
Cost Function
Define
An algorithm to find the parameters to minimize the Cost Function
, a constant representing the rate of step
Example
假設
Limit
只能得到區域最佳解，無法得到全域或絕對最佳解
變異太⼤，導致收斂緩慢
改善⽅式：Feature Scaling
太⼤，可能無限的震盪，⽽⼀直錯過最佳解
運⾏完 Gradient Descent 後反⽽容易有上升的情況
太⼩，很緩慢地找到最佳解
如何決定何時收斂完成
當在運⾏過⼀次 Gradient Descent 後所下降的成本⼩於千分之⼀就代表其已收斂
了。
建議從 0.001, 0.003, 0.01 ...，持續乘以 3 來試
Proof (參考1)
假設
=
=
J( , ) =θ0 θ1
1
2m
∑m
i=1 ( ( ) − )hθ x(i)
y(i) 2
:= − α J(θ)θj θj
∂
∂θj
α
(x) = + xhθ θ0 θ1
:= − α J( , )θj θj
∂
∂θj
θ0 θ1
:= − α ( ( ) − )θ0 θ0
1
m ∑
m
i=1 hθ x(i)
y(i)
:= − α ( ( ) − ) ⋅θ1 θ1
1
m ∑
m
i=1 hθ x(i)
y(i)
x(i)
x
α
J(θ)
J(θ)
(x) = + xhθ θ0 θ1
J( , ) =θ0 θ1
1
2m
∑m
i=1 ( ( ) − )hθ x(i)
y(i) 2
= 1
2m
∑m
i=1 ( + − )θ0 θ1 x(i)
y(i) 2
:= − α J( , ) = − α g( , )θj θj
∂
∂θj
θ0 θ1 θj
∂
∂θj
θ0 θ1
g( , ) =θ0 θ1
1
2m
∑m
i=1 (f( , )θ0 θ1 )(i) 2
f( , = + −θ0 θ1 )(i)
θ0 θ1 x(i)
y(i)
g( , )∂
∂θj
θ0 θ1
g(f( , )∂
∂θj
θ0 θ1 )(i)
g( , ) f( ,∂
∂θj
θ0 θ1
∂
∂θj
θ0 θ1 )(i)

With respect to
With respect to
θ0
g( , )∂
∂θ0
θ0 θ1
= ∂
∂θ0
1
2m
∑m
i=1 (f( , )θ0 θ1 )(i) 2
= 2 × 1
2m
∑m
i=1 (f( , )θ0 θ1 )(i) 2−1
= f( ,1
m ∑m
i=1 θ0 θ1 )(i)
f( ,∂
∂θ0
θ0 θ1 )(i)
= ( + [a number][a number − [a number )∂
∂θ0
θ0 ](i)
](i)
= ∂
∂θ0
θ0
= 1
g( , ) f( ,∂
∂θ0
θ0 θ1
∂
∂θ0
θ0 θ1 )(i)
= f( , f( ,1
m ∑m
i=1 θ0 θ1 )(i) ∂
∂θ0
θ0 θ1 )(i)
= ( + − ) × 11
m ∑
m
i=1 θ0 θ1 x(i)
y(i)
= ( + − )
1
m ∑m
i=1 θ0 θ1 x(i)
y(i)
θ1
g( , )∂
∂θ1
θ0 θ1
= ∂
∂θ1
1
2m
∑m
i=1 (f( , )θ0 θ1 )(i) 2
= 2 × 1
2m
∑m
i=1 (f( , )θ0 θ1 )(i) 2−1
= f( ,1
m ∑m
i=1 θ0 θ1 )(i)
f( ,∂
∂θ1
θ0 θ1 )(i)
= ([a number] + [a number, ] − [a number )∂
∂θ1
θ1 x(i)
](i)
= 0 + ( − 0d
dθ1
θ1 )1
x(i)
= 1 × θ(1−1=0)
1 x(i)
= 1 × 1 × x(i)
= x(i)
g( , ) f( ,∂
∂θ1
θ0 θ1
∂
∂θ1
θ0 θ1 )(i)
= f( , f( ,1
m ∑
m
i=1 θ0 θ1 )(i) ∂
∂θ1
θ0 θ1 )(i)
= ( + − ) ⋅1
m ∑m
i=1 θ0 θ1 x(i)
y(i)
x(i)
= ( + − )
1
m ∑
m
i=1 θ0 θ1 x(i)
y(i)
x(i)

Gradient Descent

Recommended

Recommended

More Related Content

What's hot

What's hot (16)

Similar to Gradient Descent

Similar to Gradient Descent (20)

Recently uploaded

Recently uploaded (20)

Gradient Descent