Datamining 6th Svm

924 views

Published on

Published in: Technology, News & Politics
0 Comments
3 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
924
On SlideShare
0
From Embeds
0
Number of Embeds
255
Actions
Shares
0
Downloads
0
Comments
0
Likes
3
Embeds 0
No embeds

No notes for slide

Datamining 6th Svm

  1. 1. • k-NN • Yes, No Training Data Test Data 3
  2. 2. 4
  3. 3. • xi yi i 1 -1 (xi , yi )(i = 1, . . . , l, xi ∈ Rn , yi ∈ {1, −1}) • w, b yi (w · (xi − b)) > 0 (i = 0, . . . , l) 5
  4. 4. • w.x+b≧0 • 1, x d(x) if w · x + b ≥ 0 d(x) = −1, otherwise • • 6
  5. 5. Fisher (1) • 2 2 • w w+b=0 • w x+b=0 7
  6. 6. Fisher (2) • m+ m- d(x)=1 x d(x)=−1 x m+ = , m− = |{x|d(x) = 1}| |{x|d(x) = −1}| • |(m+ − m− ) · w| • • w.x+b=0 2 2 ((x − m+ ) · w) + ((x − m− ) · w) d(x)=1 d(x)=−1 • 8
  7. 7. Fisher (3) • • |w|=1 J(w) w • w.x+b • b 2 |(m+ − m− ) · w| J(w) = 2 2 d(x)=1 ((x − m+ ) · w) + d(x)=−1 ((x − m− ) · w) J(w) w J(w) w 0 9
  8. 8. Fisher (4) J(w) w T SB w J(w) = w T SW w SB = (m+ − m− )(m+ − m− )T SW = (x − m+ )(x − m+ )T + (x − m− )(x − m− )T d(x)=1 d(x)=−1 ∂J(w) 0 =0 ∂w f f g − fg = g g2 (wT SB w)SW w = (wT SW w)SB w 2 SB w m+ − m− w ∝ S−1 (m+ − m− ) W Sw 10
  9. 9. SVM (Support Vector Machine) • • • 11
  10. 10. • ρ(w,b) xi · w xi · w ρ(w, b) = min − max {xi |yi =1} |w| {xi |yi =−1} |w| 12
  11. 11. 2 w0 · x + b0 = ±1 w0, b0 xi · w0 xi · w0 ρ(w0 , b0 ) = min − max {xi |yi =1} |w0 | {xi |yi =−1} |w0 | 1 − b0 −1 − b0 2 = − = |w0 | |w0 | |w0 | 13
  12. 12. • 2/|w0 | w0 · w0 yi (w0 · xi + b) ≥ 1 (i = 1, . . . , l) w0 · w0 w0 • 2 2 • 2 • 1 • 2 • 14
  13. 13. (1) yi (w0 · xi + b) ≥ 1 (i = 1, . . . , l) (1) w0 · w0 w0 Λ = (α1 , . . . , αl ) (αi ≥ 0) l |w|2 L(w, b, Λ) = − αi (yi (xi · w + b) − 1) 2 i=1 • w, b Λ 15
  14. 14. (2) • w=w0, b=b0 L(w, b, Λ) l ∂L(w, b, Λ) = w0 − αi yi xi = 0 ∂w w=w0 l i=1 (2) ∂L(w, b, Λ) = − αi yi = 0 ∂b b=b0 i=1 l l w0 = αi yi xi , αi yi = 0 i=1 i=1 • w=w0, b=b0 l 1 L(w0 , b0 , Λ) = w0 · w0 − αi [yi (xi · w0 + b0 ) − 1] 2 i=1 l l l 1 = αi − αi αj yi yj xi · xj i=1 2 i=1 j=1 • w b Λ 16
  15. 15. SVM • l w, b αi yi = 0, αi ≥ 0 i=1 (3) l l l 1 L(w0 , b0 , Λ) = αi − αi αj yi yj xi · xj i=1 2 i=1 j=1 Λ • SVM • w0 Λ l • (2) ( w0 = i=1 αi yi xi ) • (2) αi≠0 xi w KKKT • KKT : αi [yi (xi · w0 + b0 ) − 1] = 0 17
  16. 16. • • • (A) (B) 18
  17. 17. ( ) • • • • • l l l 1 L(w0 , b0 , Λ) = αi − αi αj yi yj xi · xj i=1 2 i=1 j=1 • x l Φ(x) l l 1 L(w0 , b0 , Λ) = αi − αi αj yi yj Φ(xi ) · Φ(xj ) i=1 2 i=1 j=1 • l Φ(x) · w0 + b0 = αi yi Φ(x) · Φ(xi ) + b0 = 0 i=1 • Φ 19
  18. 18. Kernel • K(x, y) = Φ(x) √ Φ(y) √ √ • Φ((x1 , x2 )) = (x1 , 2x1 x2 , x2 , 2x1 , 2x2 , 1) 2 2 Φ((x1 , x2 )) · Φ((y1 , y2 )) = (x1 y1 )2 + 2x1 y1 x2 y2 + (x2 y2 )2 + 2x1 y1 + 2x2 y2 + 1 = (x1 y1 + x2 y2 + 1)2 = ((x1 , x2 ) · (y1 , y2 ) + 1)2 • (6 ) • • (x · y + 1)d , • RBF exp(−||x − y||2 /2σ 2 ), • tanh(κx · y − δ) • σ κ δ • Mercer 20
  19. 19. • • • • ξ yi (w · xi + b) ≥ 1 − ξi where ξi ≥ 0 (i = 1, . . . , l) l 1 w·w+C ξi 2 i=1 21
  20. 20. (1) • Λ = (α1 , . . . , αl ), R = (r1 , . . . , rl ) L L(w, ξ, b, Λ, R) l l l 1 = w·w+C ξi − αi [yi (xi · w + b) − 1 + ξi ] − ri ξi 2 i=1 i=1 i=1 w0 , b0 , ξi L 0 w, b, ξi KKT l ∂L(w, ξ, b, Λ, R) = w0 − α i y i xi = 0 ∂w w=w0 i=0 l ∂L(w, ξ, b, Λ, R) = − αi yi = 0 ∂b b=b0 i=0 ∂L(w, ξ, b, Λ, R) = C − αi − ri = 0 ∂ξi 0 ξ=ξi 22
  21. 21. (2) • l L 1 l l L(w, ξ, b, Λ, R) = αi − αi αj yi yj xi · xj 2 • i=1 i=1 j=1 C ξ SVM • αi C • C • C - αi - ri = 0 ri 0≦αi≦C l w,b αi yi = 0, 0 ≤ αi ≤ C i=1 l l l 1 L(w, ξ, b, Λ, R) = αi − αi αj yi yj xi · xj i=1 2 i=1 j=1 Λ 23
  22. 22. : Karush-Kuhn-Tucker (KKT ) • • gi(x) ≦ 0 (x = (x1, x2, ..., xn)) f(x) • KKT : m ∂f (x) ∂gi (x) + λi = 0, j = 1, 2, ..., n ∂xj i=1 ∂xj λi gi (x) = 0, λi ≥ 0, gi (x) ≤ 0, i = 1, 2, ..., m • f(x) gi(x) x, λ KKT f(x) 24
  23. 23. SMO (Sequence Minimal Optimization) • SVM • Λ=(α1, α2, ...,αl) • αi • 6000 6000 • • 2 (αi, αj) 2 • 2 αi • SMO • LD l l l 1 LD = L(w, ξ, b, Λ, R) = αi − αi αj yi yj xi · xj i=1 2 i=1 j=1 25
  24. 24. 2 (1) • α 1 , α2 LD • old old α 1 , α2 new new α 1 , α2 Ei ≡ wold · xi + bold − yi old η ≡ 2K12 − K11 − K22 , where Kij = xi · xj α2 y2 (E1 − E2 ) old old new α2 = α2 − old η l i=1 αi y i = 0 γ ≡ α1 + sα2 = Const. LD LD’=0 η = 2K12 − K11 − K22 = − | x2 − x1 |2 ≤ 0 26
  25. 25. 2 (2) • α 1 , α2 γ ≡ α1 + sα2 = Const. • new new α 1 , α2 0 C • α2 clipped α2 (A) (B) 27
  26. 26. 2 (3) y1 = y1 (s = 1) L = max(0, α1 + α2 − C), old old H = min(C, α1 + α2 ) old old y1 = y2 (s = −1) L = max(0, α2 − α1 ), old old H = min(C, C + α2 − α1 ) old old L ≤ α2 ≤ H s γ clipped α2   H, if α2 ≥ H new clipped α2 = new α2 , if L < α2 < H new  L, if α2 ≤ L new LD 28
  27. 27. • L ≤ α2 ≤ H (A) (B) (C) (D)
  28. 28. • clipped α2 (B) (C) (A) (D) : (α1 , α2 ) new new clipped : (α1 , α2 new )
  29. 29. 2 1. η = 2K12 − K11 − K22 2. η < 0 α old old y2 (E2 −E1 ) (a) α2 = α2 + new old η clipped (b) α2 clipped (c) α1 = α1 − s(α2 new old − α2 ) old 3. η = 0 LD α2 1 L H α1 2(c) 4. α1,2 • bnew E new = 0 clipped wnew = wold + (α1 − α1 )y1 x1 + (α2 new old − α2 )y2 x2 old E new (x, y) = E old (x, y) + y1 (α1 − α1 )x1 · x new old clipped +y2 (α2 − α2 )x2 · x − bold + bnew old clipped bnew = bold − E old (x, y) − y1 (α1 − α1 )x1 · x − y2 (α2 new old − α2 )x2 · x old 31
  30. 30. αi • α1 α2 • α1 • KKT KKT • • 2 • 0 < αi < C • • α2 • LD • |E1-E2| • E1 E2 E1 32
  31. 31. SMO SVM • • • α≠0 • α 2 • 2 • 2 α • |E2-E1| • LD KKT 33
  32. 32. • 3 ( ) • A B 2 • • (regression problem) • 0 100 0 10, 10 20, • 1 • Web 100 100 Web • • One Class SVM 34

×