The document describes the support vector machine (SVM) algorithm for classification. It discusses how SVM finds the optimal separating hyperplane between two classes by maximizing the margin between them. It introduces the concepts of support vectors, Lagrange multipliers, and kernels. The sequential minimal optimization (SMO) algorithm is also summarized, which breaks the quadratic optimization problem of SVM training into smaller subproblems to optimize two Lagrange multipliers at a time.
10. Fisher (4)
J(w)
w T SB w
J(w) =
w T SW w
SB = (m+ − m− )(m+ − m− )T
SW = (x − m+ )(x − m+ )T + (x − m− )(x − m− )T
d(x)=1 d(x)=−1
∂J(w)
0 =0
∂w
f f g − f g
=
g g2
(wT SB w)SW w = (wT SW w)SB w
2 SB w m+ − m−
w ∝ S−1 (m+ − m− )
W
Sw 10
21. •
•
•
• ξ
yi (w · xi + b) ≥ 1 − ξi
where ξi ≥ 0 (i = 1, . . . , l)
l
1
w·w+C ξi
2 i=1
21
22. (1)
•
Λ = (α1 , . . . , αl ), R = (r1 , . . . , rl )
L
L(w, ξ, b, Λ, R)
l
l
l
1
= w·w+C ξi − αi [yi (xi · w + b) − 1 + ξi ] − ri ξi
2 i=1 i=1 i=1
w0 , b0 , ξi L
0
w, b, ξi KKT
l
∂L(w, ξ, b, Λ, R)
= w0 − α i y i xi = 0
∂w w=w0 i=0
l
∂L(w, ξ, b, Λ, R)
= − αi yi = 0
∂b
b=b0 i=0
∂L(w, ξ, b, Λ, R)
0 = C − αi − ri = 0
∂ξi ξ=ξ 22
i
23. (2)
• l
L
l
1
l
L(w, ξ, b, Λ, R) = αi − αi αj yi yj xi · xj
2 i=1 j=1
•
i=1
C ξ
SVM
• αi C
• C
• C - αi - ri = 0 ri 0≦αi≦C
l
w,b
αi yi = 0, 0 ≤ αi ≤ C
i=1
l
l l
1
L(w, ξ, b, Λ, R) = αi − αi αj yi yj xi · xj
i=1
2 i=1 j=1
Λ 23
25. SMO (Sequence Minimal Optimization)
• SVM
• Λ=(α1, α2, ...,αl)
• αi
• 6000 6000
•
• 2 (αi, αj)
2
• 2 αi
• SMO
• LD
l
l
l
1
LD = L(w, ξ, b, Λ, R) = αi − αi αj yi yj xi · xj
i=1
2 i=1 j=1
25
26. 2 (1)
• α 1 , α2 LD
• old old
α 1 , α2 new new
α 1 , α2
Ei ≡ wold · xi + bold − yi
old
η ≡ 2K12 − K11 − K22 , where Kij = xi · xj
α2
y2 (E1 − E2 )
old old
new
α2 = α2 −
old
η
l
i=1 αi y i = 0 γ ≡ α1 + sα2 = Const.
LD LD’=0
η = 2K12 − K11 − K22 = − | x2 − x1 |2 ≤ 0 26
28. 2 (3)
y1 = y1 (s = 1)
L = max(0, α1 + α2 − C),
old old
H = min(C, α1 + α2 )
old old
y1 = y2 (s = −1)
L = max(0, α2 − α1 ),
old old
H = min(C, C + α2 − α1 )
old old
L ≤ α2 ≤ H
s γ
clipped
α2
H, if α2 ≥ H
new
clipped
α2 = new
α2 , if L α2 H
new
L, if α2 ≤ L
new
LD
28
30. • clipped
α2
(B)
(C)
(A)
(D)
: (α1 , α2 )
new new
clipped
: (α1 , α2
new
)
31. 2
1. η = 2K12 − K11 − K22
2. η 0 α
old old
y2 (E2 −E1 )
(a) α2 = α2 +
new old
η
clipped
(b) α2
clipped
(c) α1 = α1 − s(α2
new old
− α2 )
old
3. η = 0 LD α2 1 L H
α1 2(c)
4. α1,2
• bnew E new = 0
clipped
wnew = wold + (α1 − α1 )y1 x1 + (α2
new old
− α2 )y2 x2
old
E new (x, y) = E old (x, y) + y1 (α1 − α1 )x1 · x
new old
clipped
+y2 (α2 − α2 )x2 · x − bold + bnew
old
clipped
bnew = bold − E old (x, y) − y1 (α1 − α1 )x1 · x − y2 (α2
new old
− α2 )x2 · x
old
31