• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
Datamining 6th svm
 

Datamining 6th svm

on

  • 526 views

 

Statistics

Views

Total Views
526
Views on SlideShare
436
Embed Views
90

Actions

Likes
0
Downloads
1
Comments
0

2 Embeds 90

http://togodb.sel.is.ocha.ac.jp 83
http://togodb.seselab.org 7

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

    Datamining 6th svm Datamining 6th svm Presentation Transcript

    • • k-NN • Yes, NoTraining Data Test Data 3
    • 4
    • • xi yi i 1 -1 (xi , yi )(i = 1, . . . , l, xi ∈ Rn , yi ∈ {1, −1})• w, b yi (w · (xi − b)) > 0 (i = 0, . . . , l) 5
    • • w.x+b≧0• ￿ 1, x d(x) if w · x + b ≥ 0 d(x) = −1, otherwise• • 6
    • Fisher (1)• 2 2 • aw+b • aw+b 7
    • Fisher (2)• ￿ m+ m- ￿ d(x)=1 x d(x)=−1 x m+ = , m− = |{x|d(x) = 1}| |{x|d(x) = −1}|• |(m+ − m− ) · w| •• ￿ w.x+b=0 ￿ 2 2 ((x − m+ ) · w) + ((x − m− ) · w) d(x)=1 d(x)=−1 • 8
    • Fisher (3)•• |w|=1 J(w) w • w.x+b • b 2 |(m+ − m− ) · w| J(w) = ￿ 2 ￿ 2 d(x)=1 ((x − m+ ) · w) + d(x)=−1 ((x − m− ) · w) J(w) w J(w) w 0 9
    • Fisher (4) J(w) w T SB wJ(w) = w T SW w SB = (m+ − m− )(m+ − m− )T ￿ ￿ SW = (x − m+ )(x − m+ )T + (x − m− )(x − m− )T d(x)=1 d(x)=−1 ∂J(w) 0 =0 ∂w ￿ ￿￿ f f ￿ g − f g￿ = g g2(wT SB w)SW w = (wT SW w)SB w 2 SB w ￿ m+ − m− w ∝ S−1 (m+ − m− ) W Sw 10
    • SVM (Support Vector Machine)• •• 11
    • • ρ(w,b) xi · w xi · w ρ(w, b) = min − max {xi |yi =1} |w| {xi |yi =−1} |w| 12
    • 2/|w0 | w0 w0 · x + b0 ≥ 1 w0 · x + b0 = 1 w0 · x + b0 = 0 w0 · x + b0 ≤ −1 w0 · x + b0 = −1w0 · x + b0 = ±1 w0, b0 xi · w0 xi · w0 ρ(w0 , b0 ) = min − max {xi |yi =1} |w0 | {xi |yi =−1} |w0 | 1 − b0 −1 − b0 2 = − = |w0 | |w0 | |w0 | 13
    • • 2/|w0 | w0 · w0 yi (w0 · xi + b) ≥ 1 (i = 1, . . . , l) w0 · w0 w0• 2 2 • 2 • 1• 2 • 14
    • (1) yi (w0 · xi + b) ≥ 1 (i = 1, . . . , l) (1) w0 · w0 w0 Λ = (α1 , . . . , αl ) (αi ≥ 0) l |w|2 ￿ L(w, b, Λ) = − αi (yi (xi · w + b) − 1) 2 i=1• w, b Λ 15
    • (2)• w=w0, b=b0 ￿ L(w, b, Λ) l ∂L(w, b, Λ) ￿ ￿ ￿ = w0 − αi yi xi = 0 ∂w ￿ ￿ w=w0 l i=1 (2) ∂L(w, b, Λ) ￿ ￿ ￿ = − αi yi = 0 ∂b ￿ b=b0 i=1 l ￿ l ￿ w0 = αi yi xi , αi yi = 0 i=1 i=1• w=w0, b=b0 ￿l 1 L(w0 , b0 , Λ) = w0 · w0 − αi [yi (xi · w0 + b0 ) − 1] 2 i=1 l ￿ l l 1 ￿￿ = αi − αi αj yi yj xi · xj i=1 2 i=1 j=1• w b Λ 16
    • SVM• l ￿ w, b αi yi = 0, αi ≥ 0 i=1 (3) l ￿ l l 1 ￿￿ L(w0 , b0 , Λ) = αi − αi αj yi yj xi · xj i=1 2 i=1 j=1 Λ• SVM• w0 Λ ￿l • (2) ( w0 = i=1 αi yi xi )• (2) αi≠0 xi w KKKT • KKT : αi [yi (xi · w0 + b0 ) − 1] = 0 17
    • ••• (A) (B) 18
    • ( )• • •• • l ￿ l l 1 ￿￿ L(w0 , b0 , Λ) = αi − αi αj yi yj xi · xj i=1 2 i=1 j=1 • x l ￿ Φ(x) l l 1 ￿￿ L(w0 , b0 , Λ) = αi − αi αj yi yj Φ(xi ) · Φ(xj ) i=1 2 i=1 j=1 • l ￿ Φ(x) · w0 + b0 = αi yi Φ(x) · Φ(xi ) + b0 = 0 i=1 • Φ 19
    • Kernel• K(x, y) = Φ(x) √ Φ(y) √ √• Φ((x1 , x2 )) = (x1 , 2x1 x2 , x2 , 2x1 , 2x2 , 1) 2 2 Φ((x1 , x2 )) · Φ((y1 , y2 )) = (x1 y1 )2 + 2x1 y1 x2 y2 + (x2 y2 )2 + 2x1 y1 + 2x2 y2 + 1 = (x1 y1 + x2 y2 + 1)2 = ((x1 , x2 ) · (y1 , y2 ) + 1)2 • (6 )• • (x · y + 1)d , • RBF exp(−||x − y||2 /2σ 2 ), • tanh(κx · y − δ) • σ κ δ • Mercer 20
    • •• • • ξyi (w · xi + b) ≥ 1 − ξi where ξi ≥ 0 (i = 1, . . . , l) ￿ l ￿ 1 ￿ w·w+C ξi 2 i=1 21
    • (1) • Λ = (α1 , . . . , αl ), R = (r1 , . . . , rl ) L L(w, ξ, b, Λ, R) l ￿ l ￿ l ￿ 1 = w·w+C ξi − αi [yi (xi · w + b) − 1 + ξi ] − ri ξi 2 i=1 i=1 i=1w0 , b0 , ξi L 0 w, b, ξi KKT ￿ l ￿ ∂L(w, ξ, b, Λ, R) ￿ ￿ ￿ = w0 − α i y i xi = 0 ∂w w=w0 i=0 ￿ l ￿ ∂L(w, ξ, b, Λ, R) ￿ ￿ ￿ = − αi yi = 0 ∂b ￿ b=b0 i=0 ∂L(w, ξ, b, Λ, R) ￿ ￿ ￿ 0 = C − αi − ri = 0 ∂ξi ξ=ξ 22 i
    • (2)• l ￿ L l 1 ￿￿ l L(w, ξ, b, Λ, R) = αi − αi αj yi yj xi · xj 2 i=1 j=1• i=1 C ξ SVM • αi C • C• C - αi - ri = 0 ri 0≦αi≦C l w,b ￿ αi yi = 0, 0 ≤ αi ≤ C i=1 l ￿ l l 1 ￿￿ L(w, ξ, b, Λ, R) = αi − αi αj yi yj xi · xj i=1 2 i=1 j=1 Λ 23
    • : Karush-Kuhn-Tucker (KKT )•• gi(x) ≦ 0 (x = (x1, x2, ..., xn)) f(x)• KKT : m ￿ ∂gi (x) ∂f (x) + λi = 0, j = 1, 2, ..., n ∂xj i=1 ∂xj λi gi (x) = 0, λi ≥ 0, gi (x) ≤ 0, i = 1, 2, ..., m• f(x) gi(x) x, λ KKT f(x) 24
    • SMO (Sequence Minimal Optimization) • SVM • Λ=(α1, α2, ...,αl) • αi • 6000 6000 • • 2 (αi, αj) 2 • 2 αi • SMO • LD l ￿ l ￿￿l 1 LD = L(w, ξ, b, Λ, R) = αi − αi αj yi yj xi · xj i=1 2 i=1 j=1 25
    • 2 (1)• α 1 , α2 LD• old old α 1 , α2 new new α 1 , α2 Ei ≡ wold · xi + bold − yi old η ≡ 2K12 − K11 − K22 , where Kij = xi · xj α2 y2 (E1 − E2 ) old old new α2 = α2 − old η ￿l i=1 αi y i = 0 γ ≡ α1 + sα2 = Const. LD LD’=0 η = 2K12 − K11 − K22 = − | x2 − x1 |2 ≤ 0 26
    • 2 (2)• α 1 , α2 γ ≡ α1 + sα2 = Const.• new new α 1 , α2 0 C • α2 clipped α2 (A) (B) 27
    • 2 (3)y1 = y1 (s = 1) L = max(0, α1 + α2 − C), old old H = min(C, α1 + α2 ) old oldy1 ￿= y2 (s = −1) L = max(0, α2 − α1 ), old old H = min(C, C + α2 − α1 ) old old L ≤ α2 ≤ H s γ clippedα2   H, if α2 ≥ H new clipped α2 = new α2 , if L < α2 < H new  L, if α2 ≤ L new LD 28
    • • L ≤ α2 ≤ H (A) (B) (C) (D)
    • • clipped α2 (B) (C)(A) (D): (α1 , α2 ) new new clipped: (α1 , α2 new )
    • 2 1. η = 2K12 − K11 − K22 2. η < 0 α old old y2 (E2 −E1 ) (a) α2 = α2 + new old η clipped (b) α2 clipped (c) α1 = α1 − s(α2 new old − α2 ) old 3. η = 0 LD α2 1 L H α1 2(c) 4. α1,2 • bnew E new = 0 clippedwnew = wold + (α1 − α1 )y1 x1 + (α2 new old − α2 )y2 x2 oldE new (x, y) = E old (x, y) + y1 (α1 − α1 )x1 · x new old clipped +y2 (α2 − α2 )x2 · x − bold + bnew old clippedbnew = bold − E old (x, y) − y1 (α1 − α1 )x1 · x − y2 (α2 new old − α2 )x2 · x old 31
    • αi• α1 α2• α1 • KKT KKT • • 2 • 0 < αi < C •• α2 • LD • |E1-E2| • E1 E2 E1 32
    • SMO SVM•• • α≠0• α 2 • 2• 2 α • |E2-E1|• LD KKT 33
    • • 3 ( ) • A B 2• • (regression problem) • 0 100 0 10, 10 20,• 1 • Web 100 100 Web • • One Class SVM 34