Like this presentation? Why not share!

# Datamining 6th svm

## on Dec 27, 2010

• 526 views

### Views

Total Views
526
Views on SlideShare
436
Embed Views
90

Likes
0
1
0

### 2 Embeds90

 http://togodb.sel.is.ocha.ac.jp 83 http://togodb.seselab.org 7

### Report content

• Comment goes here.
Are you sure you want to

## Datamining 6th svmPresentation Transcript

• • k-NN • Yes, NoTraining Data Test Data 3
• 4
• • xi yi i 1 -1 (xi , yi )(i = 1, . . . , l, xi ∈ Rn , yi ∈ {1, −1})• w, b yi (w · (xi − b)) > 0 (i = 0, . . . , l) 5
• • w.x+b≧0• ￿ 1, x d(x) if w · x + b ≥ 0 d(x) = −1, otherwise• • 6
• Fisher (1)• 2 2 • aw+b • aw+b 7
• Fisher (2)• ￿ m+ m- ￿ d(x)=1 x d(x)=−1 x m+ = , m− = |{x|d(x) = 1}| |{x|d(x) = −1}|• |(m+ − m− ) · w| •• ￿ w.x+b=0 ￿ 2 2 ((x − m+ ) · w) + ((x − m− ) · w) d(x)=1 d(x)=−1 • 8
• Fisher (3)•• |w|=1 J(w) w • w.x+b • b 2 |(m+ − m− ) · w| J(w) = ￿ 2 ￿ 2 d(x)=1 ((x − m+ ) · w) + d(x)=−1 ((x − m− ) · w) J(w) w J(w) w 0 9
• Fisher (4) J(w) w T SB wJ(w) = w T SW w SB = (m+ − m− )(m+ − m− )T ￿ ￿ SW = (x − m+ )(x − m+ )T + (x − m− )(x − m− )T d(x)=1 d(x)=−1 ∂J(w) 0 =0 ∂w ￿ ￿￿ f f ￿ g − f g￿ = g g2(wT SB w)SW w = (wT SW w)SB w 2 SB w ￿ m+ − m− w ∝ S−1 (m+ − m− ) W Sw 10
• SVM (Support Vector Machine)• •• 11
• • ρ(w,b) xi · w xi · w ρ(w, b) = min − max {xi |yi =1} |w| {xi |yi =−1} |w| 12
• 2/|w0 | w0 w0 · x + b0 ≥ 1 w0 · x + b0 = 1 w0 · x + b0 = 0 w0 · x + b0 ≤ −1 w0 · x + b0 = −1w0 · x + b0 = ±1 w0, b0 xi · w0 xi · w0 ρ(w0 , b0 ) = min − max {xi |yi =1} |w0 | {xi |yi =−1} |w0 | 1 − b0 −1 − b0 2 = − = |w0 | |w0 | |w0 | 13
• • 2/|w0 | w0 · w0 yi (w0 · xi + b) ≥ 1 (i = 1, . . . , l) w0 · w0 w0• 2 2 • 2 • 1• 2 • 14
• (1) yi (w0 · xi + b) ≥ 1 (i = 1, . . . , l) (1) w0 · w0 w0 Λ = (α1 , . . . , αl ) (αi ≥ 0) l |w|2 ￿ L(w, b, Λ) = − αi (yi (xi · w + b) − 1) 2 i=1• w, b Λ 15
• (2)• w=w0, b=b0 ￿ L(w, b, Λ) l ∂L(w, b, Λ) ￿ ￿ ￿ = w0 − αi yi xi = 0 ∂w ￿ ￿ w=w0 l i=1 (2) ∂L(w, b, Λ) ￿ ￿ ￿ = − αi yi = 0 ∂b ￿ b=b0 i=1 l ￿ l ￿ w0 = αi yi xi , αi yi = 0 i=1 i=1• w=w0, b=b0 ￿l 1 L(w0 , b0 , Λ) = w0 · w0 − αi [yi (xi · w0 + b0 ) − 1] 2 i=1 l ￿ l l 1 ￿￿ = αi − αi αj yi yj xi · xj i=1 2 i=1 j=1• w b Λ 16
• SVM• l ￿ w, b αi yi = 0, αi ≥ 0 i=1 (3) l ￿ l l 1 ￿￿ L(w0 , b0 , Λ) = αi − αi αj yi yj xi · xj i=1 2 i=1 j=1 Λ• SVM• w0 Λ ￿l • (2) ( w0 = i=1 αi yi xi )• (2) αi≠0 xi w KKKT • KKT : αi [yi (xi · w0 + b0 ) − 1] = 0 17
• ••• (A) (B) 18
• ( )• • •• • l ￿ l l 1 ￿￿ L(w0 , b0 , Λ) = αi − αi αj yi yj xi · xj i=1 2 i=1 j=1 • x l ￿ Φ(x) l l 1 ￿￿ L(w0 , b0 , Λ) = αi − αi αj yi yj Φ(xi ) · Φ(xj ) i=1 2 i=1 j=1 • l ￿ Φ(x) · w0 + b0 = αi yi Φ(x) · Φ(xi ) + b0 = 0 i=1 • Φ 19
• Kernel• K(x, y) = Φ(x) √ Φ(y) √ √• Φ((x1 , x2 )) = (x1 , 2x1 x2 , x2 , 2x1 , 2x2 , 1) 2 2 Φ((x1 , x2 )) · Φ((y1 , y2 )) = (x1 y1 )2 + 2x1 y1 x2 y2 + (x2 y2 )2 + 2x1 y1 + 2x2 y2 + 1 = (x1 y1 + x2 y2 + 1)2 = ((x1 , x2 ) · (y1 , y2 ) + 1)2 • (6 )• • (x · y + 1)d , • RBF exp(−||x − y||2 /2σ 2 ), • tanh(κx · y − δ) • σ κ δ • Mercer 20
• •• • • ξyi (w · xi + b) ≥ 1 − ξi where ξi ≥ 0 (i = 1, . . . , l) ￿ l ￿ 1 ￿ w·w+C ξi 2 i=1 21
• (1) • Λ = (α1 , . . . , αl ), R = (r1 , . . . , rl ) L L(w, ξ, b, Λ, R) l ￿ l ￿ l ￿ 1 = w·w+C ξi − αi [yi (xi · w + b) − 1 + ξi ] − ri ξi 2 i=1 i=1 i=1w0 , b0 , ξi L 0 w, b, ξi KKT ￿ l ￿ ∂L(w, ξ, b, Λ, R) ￿ ￿ ￿ = w0 − α i y i xi = 0 ∂w w=w0 i=0 ￿ l ￿ ∂L(w, ξ, b, Λ, R) ￿ ￿ ￿ = − αi yi = 0 ∂b ￿ b=b0 i=0 ∂L(w, ξ, b, Λ, R) ￿ ￿ ￿ 0 = C − αi − ri = 0 ∂ξi ξ=ξ 22 i
• (2)• l ￿ L l 1 ￿￿ l L(w, ξ, b, Λ, R) = αi − αi αj yi yj xi · xj 2 i=1 j=1• i=1 C ξ SVM • αi C • C• C - αi - ri = 0 ri 0≦αi≦C l w,b ￿ αi yi = 0, 0 ≤ αi ≤ C i=1 l ￿ l l 1 ￿￿ L(w, ξ, b, Λ, R) = αi − αi αj yi yj xi · xj i=1 2 i=1 j=1 Λ 23
• : Karush-Kuhn-Tucker (KKT )•• gi(x) ≦ 0 (x = (x1, x2, ..., xn)) f(x)• KKT : m ￿ ∂gi (x) ∂f (x) + λi = 0, j = 1, 2, ..., n ∂xj i=1 ∂xj λi gi (x) = 0, λi ≥ 0, gi (x) ≤ 0, i = 1, 2, ..., m• f(x) gi(x) x, λ KKT f(x) 24
• SMO (Sequence Minimal Optimization) • SVM • Λ=(α1, α2, ...,αl) • αi • 6000 6000 • • 2 (αi, αj) 2 • 2 αi • SMO • LD l ￿ l ￿￿l 1 LD = L(w, ξ, b, Λ, R) = αi − αi αj yi yj xi · xj i=1 2 i=1 j=1 25
• 2 (1)• α 1 , α2 LD• old old α 1 , α2 new new α 1 , α2 Ei ≡ wold · xi + bold − yi old η ≡ 2K12 − K11 − K22 , where Kij = xi · xj α2 y2 (E1 − E2 ) old old new α2 = α2 − old η ￿l i=1 αi y i = 0 γ ≡ α1 + sα2 = Const. LD LD’=0 η = 2K12 − K11 − K22 = − | x2 − x1 |2 ≤ 0 26
• 2 (2)• α 1 , α2 γ ≡ α1 + sα2 = Const.• new new α 1 , α2 0 C • α2 clipped α2 (A) (B) 27
• 2 (3)y1 = y1 (s = 1) L = max(0, α1 + α2 − C), old old H = min(C, α1 + α2 ) old oldy1 ￿= y2 (s = −1) L = max(0, α2 − α1 ), old old H = min(C, C + α2 − α1 ) old old L ≤ α2 ≤ H s γ clippedα2   H, if α2 ≥ H new clipped α2 = new α2 , if L < α2 < H new  L, if α2 ≤ L new LD 28
• • L ≤ α2 ≤ H (A) (B) (C) (D)
• • clipped α2 (B) (C)(A) (D): (α1 , α2 ) new new clipped: (α1 , α2 new )
• 2 1. η = 2K12 − K11 − K22 2. η < 0 α old old y2 (E2 −E1 ) (a) α2 = α2 + new old η clipped (b) α2 clipped (c) α1 = α1 − s(α2 new old − α2 ) old 3. η = 0 LD α2 1 L H α1 2(c) 4. α1,2 • bnew E new = 0 clippedwnew = wold + (α1 − α1 )y1 x1 + (α2 new old − α2 )y2 x2 oldE new (x, y) = E old (x, y) + y1 (α1 − α1 )x1 · x new old clipped +y2 (α2 − α2 )x2 · x − bold + bnew old clippedbnew = bold − E old (x, y) − y1 (α1 − α1 )x1 · x − y2 (α2 new old − α2 )x2 · x old 31
• αi• α1 α2• α1 • KKT KKT • • 2 • 0 < αi < C •• α2 • LD • |E1-E2| • E1 E2 E1 32
• SMO SVM•• • α≠0• α 2 • 2• 2 α • |E2-E1|• LD KKT 33
• • 3 ( ) • A B 2• • (regression problem) • 0 100 0 10, 10 20,• 1 • Web 100 100 Web • • One Class SVM 34