SlideShare a Scribd company logo
1 of 20
Download to read offline
Part I. The Fundamentals of ML
Ch.5 Support Vector Machines
In this chapter, we will discuss about
• What is SVM
- supervised learning
- binary classifier
- linear and unlinear regressor
- linear and unlinear learning using kernel trick
Support
vectors
Linear SVM Classification
• What is Margin?
• SVM is sensitive to feature scale
→ use Scikit-Learn's StandardScaler
Decision boundary
= Decision line
= Decision hyperplane
Soft and Hard Margin SVM
• Hard Margin SVM : error 를 허용하지 않고 margin 을 최대화함
* Hard Margin SVM 은 outlier 에 취약
Soft and Hard Margin SVM
• 그래서 약간의 에누리를 줄 수 있는데, 에누리의 크기를
margin violation 이라고 한다. 이 경우에 support vector와
decision line 사이에 feature 들이 들어있을 수 있음
Mathematical approach 1
• Decision boundary(=hyperplane)을
𝑤 ⋅ 𝑥 + 𝑏 = 𝑤1 ⋅ 𝑥1 + ⋯ + 𝑤 𝑛 ⋅ 𝑥 𝑛 + 𝑏 = 0
라고 하자.
• SVM 은 2진분류기(binary classifier)이고, 분류방법은 𝑤 ⋅ 𝑥 +
𝑏 > 0을 만족하는 positive case와 𝑤 ⋅ 𝑥 + 𝑏 < 0 을 만족하는
negative case 이다.
• Define a map s ∶ 𝑋 → ±1 by 𝑠 𝑥 = 𝑠𝑖𝑔𝑛 𝑤 ⋅ 𝑥 + 𝑏
Then we have 𝑤 ⋅ 𝑥𝑖 + 𝑏 𝑦𝑖 ≥ 0
∵ 𝑤 ⋅ 𝑥𝑖 + 𝑏 > 0 ⇒ 𝑦𝑖 = 1 & 𝑤 ⋅ 𝑥𝑖 + 𝑏 < 0 ⇒ 𝑦𝑖= −1
Mathematical approach 2 : Margin Distance
• Define a map 𝑓 𝑥 = 𝑤 ⋅ 𝑥 + 𝑏
• A point 𝑥 is said to be on the boundary if 𝑓 𝑥 = 0.
• Suppose that 𝑓 𝑥 = 𝑎(≠ 0).
• For a point 𝑥 𝑝 on the boundary, we have
𝑥 = 𝑥 𝑝 + 𝑟
𝑤
𝑤
where 𝑟 is the distance between 𝑥 and the decision boundary.
Decision
boundary
𝑥 𝑝
𝑥
90°
r
Mathematical approach 2 : Margin Distance
• Since 𝑓 𝑥 𝑝 = 𝑤 ⋅ 𝑥 𝑝 + 𝑏 = 0, we have
𝑓 𝑥 = 𝑤 ⋅ 𝑥 + 𝑏 = 𝑤 𝑥 𝑝 + 𝑟
𝑤
𝑤
+ 𝑏 = 𝑟
𝑤⋅𝑤
𝑤
= 𝑟 𝑤 .
Therefore the distance between 𝑥 and the decision boundary is
𝑟 =
𝑓(𝑥)
𝑤
.
Decision boundary
𝑥 𝑝
𝑥
90°
𝑓(𝑥)
𝑤
Maximizing the margin
• Optimization problem :
max
𝑤,𝑏
𝑎
| 𝑤 |
such that 𝑤 ⋅ 𝑥𝑖 + 𝑏 𝑦𝑖 ≥ 𝑎, ∀𝑖
• 우리는 𝑎 를 임의의 숫자로 사용했으므로, normalize 하여
아래의 optimization problem 을 얻을 수 있다.
min
𝑤,𝑏
| 𝑤 | such that 𝑤 ⋅ 𝑥𝑖 + 𝑏 𝑦𝑖 ≥ 1, ∀𝑖
⇒ Quadratic Programming(QP) problem
∵ | 𝑤 | 에 제곱항이 포함되어 있음
Dual problem
• Constrained optimization :
min
𝑥
𝑓(𝑥) s.t. 𝑔 𝑥 ≤ 0, ℎ 𝑥 = 0.
• Lagrange method:
Lagrange prime function : 𝐿 𝑥, 𝛼, 𝛽 = 𝑓 𝑥 + 𝛼𝑔 𝑥 + 𝛽ℎ(𝑥)
Lagrange multiplier : 𝛼 ≥ 0, 𝛽
Lagrange dual function :
𝑑 𝛼, 𝛽 = inf
𝑥∈𝑋
𝐿(𝑥, 𝛼, 𝛽) = min
𝑥∈𝑋
𝐿(𝑥, 𝛼, 𝛽).
Then we have
max
𝛼≥0,𝛽
𝐿 𝑥, 𝛼, 𝛽 =
𝑓 𝑥 ∶ 𝑖𝑓 𝑥 𝑖𝑠 𝑓𝑒𝑎𝑠𝑖𝑏𝑙𝑒
∞ ∶ 𝑜. 𝑤.
우리의 optimization problem
min
𝑤,𝑏
| 𝑤 | such that 𝑤 ⋅ 𝑥𝑖 + 𝑏 𝑦𝑖 ≥ 1, ∀𝑖
을 대입해보면 아래와 같다.
𝑓 𝑤 =
1
2
𝑤 ⋅ 𝑤,
𝑔 𝑤 = 𝑤 ⋅ 𝑥 + 𝑏 𝑦𝑖 − 1 ≤ 0,
ℎ 𝑤 = 0
원래 𝑓 𝑥 = 𝑤 이지만,
arg min 𝑤 = arg min 𝑤 2
Prime & Dual problem
• Primal Problem
min
𝑥
𝑓(𝑥) s.t 𝑓 𝑥 ≤ 0, ℎ 𝑥 = 0
• Dual Problem
m𝑎𝑥
𝛼>0,𝛽
𝑑 𝛼, 𝛽 s.t 𝛼 > 0
min
𝑥
max
𝛼≥0,𝛽
𝐿(𝑥, 𝛼, 𝛽) max
𝛼≥0,𝛽
min
𝑥
𝐿(𝑥, 𝛼, 𝛽)
• Weak duality theorem
1. 𝑑 𝛼, 𝛽 ≤ 𝑓(𝑥)
2. 𝑑∗ = max
𝛼≥0,𝛽
min
𝑥
𝐿(𝑥, 𝛼, 𝛽) ≤ min
𝑥
max
𝛼≥0,𝛽
𝐿 𝑥, 𝛼, 𝛽 = 𝑝∗
Strong Duality and KKT-conditions
• The Karush-Kuhn-Tucker(KKT) conditions:
Parallel gradients conditions :
𝜕𝐿(𝑥 𝑖,𝛼,𝛽)
𝜕𝑥 𝑖
= 0 ∀𝑖
Orthogonality conditions : 𝛼∗ 𝑔 𝑥∗ = 0
Satisfaction of original constraints : 𝑔 𝑥∗
≤ 0
Lagrange multiplier nonnegativity : 𝛼∗
≥ 0
• KKT-condition 을 만족하면, strong duality 가 성립한다. 즉,
𝑑∗
= max
𝛼≥0,𝛽
min
𝑥
𝐿(𝑥, 𝛼, 𝛽) = min
𝑥
max
𝛼≥0,𝛽
𝐿 𝑥, 𝛼, 𝛽 = 𝑝∗
Strong Duality and KKT-conditions
• 우리의 문제 SVM의 Primal problem, Dual Problem, KKT-
conditions 는 아래와 같다.
• Primal Problem
min
𝑤,𝑏
max
𝛼>0,𝛽
(
1
2
𝑤 ⋅ 𝑤 −
𝑖
𝛼𝑖 𝑤 ⋅ 𝑥𝑖 + 𝑏 𝑦𝑖 − 1 ) 𝑠. 𝑡. 𝛼𝑖 ≥ 0 ∀𝑖.
• Dual Problem
max
𝛼>0,𝛽
min
𝑤,𝑏
(
1
2
𝑤 ⋅ 𝑤 −
𝑖
𝛼𝑖 𝑤 ⋅ 𝑥𝑖 + 𝑏 𝑦𝑖 − 1 ) 𝑠. 𝑡. 𝛼𝑖 ≥ 0 ∀𝑖.
∴ Our KKT-conditions:
𝜕𝐿(𝑤,𝑏,𝛼)
𝜕𝑤
= 0,
𝜕𝐿(𝑤,𝑏,𝛼)
𝜕𝑏
= 0
𝛼𝑖 ≥ 0, ∀𝑖
𝛼𝑖 𝑤 ⋅ 𝑥𝑖 + 𝑏 𝑦𝑖 − 1 = 0, ∀𝑖
※ 𝒙𝒊가 decision boundary에 있지 않으면 𝒘 ⋅ 𝒙𝒊 + 𝒃 𝒚𝒊 − 𝟏 ≠ 𝟎 이고,
𝜶𝒊 ≥ 𝟎 이므로, 𝜶𝒊 𝒘 ⋅ 𝒙𝒊 + 𝒃 𝒚𝒊 − 𝟏 = 𝟎 으로부터 𝜶𝒊 = 𝟎를 얻을 수 있다.
따라서, SVM에서는 support vector 가 중요하다!!
Strong Duality and KKT-conditions
• 우리의 함수 𝑓 𝑥 = 𝑤 ⋅ 𝑥 + 𝑏 의 Lagrange prime function 은
𝐿 𝑤, 𝑏, 𝛼 =
1
2
𝑤 ⋅ 𝑤 − 𝑖 𝛼𝑖 𝑤 ⋅ 𝑥𝑖 + 𝑏 𝑦𝑖 − 1
이므로, KKT-condition 을 풀어보면
• 결과적으로 우리는 아래의 식을 얻을 수 있다.
• 𝐿 𝑤, 𝑏, 𝛼 =
1
2
𝑤 ⋅ 𝑤 − 𝑖 𝛼𝑖 𝑤 ⋅ 𝑥𝑖 + 𝑏 𝑦𝑖 − 1
=
1
2
𝑤 ⋅ 𝑤 −
𝑖
𝛼𝑖 𝑦𝑖 𝑤 𝑥𝑖 − 𝑏
𝑖
𝛼𝑖 𝑦𝑖 +
𝑖
𝛼𝑖
=
1
2 𝑖 𝑗 𝛼𝑖 𝛼𝑗 𝑦𝑖 𝑦𝑖 𝑥𝑖 𝑥𝑗 − 𝑖 𝑗 𝛼𝑖 𝛼𝑗 𝑦𝑖 𝑦𝑖 𝑥𝑖 𝑥𝑗 + 𝑖 𝛼𝑖
= 𝑖 𝛼𝑖 −
1
2 𝑖 𝑗 𝛼𝑖 𝛼𝑗 𝑦𝑖 𝑦𝑖 𝑥𝑖 𝑥𝑗
•
𝜕𝐿(𝑤,𝑏,𝛼)
𝜕𝑤
= 0 → 𝑤 = 𝑖 𝛼𝑖 𝑥𝑖 𝑦𝑖
•
𝜕𝐿(𝑤,𝑏,𝛼)
𝜕𝑏
= 0 → 𝑖 𝛼𝑖 𝑦𝑖 = 0
Mercer’s Theorem
• We assume that:
∙ 𝑋 = 𝑥1, ⋯ , 𝑥 𝑛 : finite input space
∙ 𝑘 ∶ 𝑋 × 𝑋 → ℝ : kernel
∙ 𝐾 = 𝑘 𝑥𝑖, 𝑥𝑗
𝑖,𝑗=1
𝑛
: Gram matrix induced by 𝑘
- since 𝐾 is a symmetric matrix, we have spectral decomposition
then the map 𝜙: 𝑋 → ℝ 𝑛 defined by
𝜙 𝑥𝑖 = 𝜆𝑖 𝑣𝑡𝑖 ∈ ℝ 𝑛, 𝑖 = 1, ⋯ , 𝑛
where 𝜆𝑖 is the 𝑖-th eigenvalue of 𝐾 and 𝑣 𝑡 = 𝑣 𝑡𝑖 𝑖=1
𝑛
is the 𝑖-th
eigenvector of 𝐾 and
𝜙 𝑥𝑖 ⋅ 𝜙 𝑥𝑗 =
𝑡=1
𝑛
𝜆 𝑡 𝑣 𝑡𝑖 𝑣 𝑡𝑗 = 𝑉Λ𝑉 𝑇
𝑖𝑗 = 𝐾𝑖𝑗 = 𝑘 𝑥𝑖, 𝑥𝑗 .
𝐾
𝑉 𝑉 𝑇
𝑣1 𝑣2 𝑣 𝑛
𝑣2
𝑣1
𝑣 𝑛
https://cseweb.ucsd.edu/~dasg
upta/291-unsup/lec7.pdf
Kernel Trick
• A function 𝐾 ∶ 𝑋 × 𝑋 → ℝ is a kernel of 𝜙 if
𝐾 𝑥𝑖, 𝑥𝑗 = 𝜙 𝑥𝑖 ⋅ 𝜙 𝑥𝑗 .
여기서 𝜙는 𝑋 보다 높은 차원으로 embedding 해주는 함수이다.
• Polynomial kernel
𝐾 𝑥𝑖, 𝑥𝑗 = 𝑥𝑖 ⋅ 𝑥𝑗 + 1
𝑛
• Gaussian kernel
𝐾 𝑥𝑖, 𝑥𝑗 = exp −
𝑥𝑖 − 𝑥𝑗
2
2𝜎2
• Gaussian radial basis function(RBF) :
It is general-purpose kernel; used when there is no prior knowledge
about the data
𝐾 𝑥𝑖, 𝑥𝑗 = exp(−𝛾 𝑥𝑖 − 𝑥𝑗
2
), 𝛾 > 0
• Laplace RBL kernel
𝐾 𝑥𝑖, 𝑥𝑗 = exp −
| 𝑥𝑖 − 𝑥𝑗 |
𝜎
• Hyperbolic tangent kernel
𝐾 𝑥𝑖, 𝑥𝑗 = tanh 𝜅𝑥𝑖 ⋅ 𝑥𝑗 + 𝑐 , 𝜅 > 0, 𝑐 < 0.
참고. https://data-flair.training/blogs/svm-kernel-functions/
Convert dual problem using Kernel Trick
• P.14 에서 우리는 아래의 dual problem 을 얻었다.
𝐿 𝑤, 𝑏, 𝛼 =
𝑖
𝛼𝑖 −
1
2
𝑖 𝑗
𝛼𝑖 𝛼𝑗 𝑦𝑖 𝑦𝑖 𝑥𝑖 𝑥𝑗
• 위의 식을 𝜙 함수와 kernel 함수 𝐾를 이용하여 아래와 같이 변형할 수
있다.
max
𝛼≥0
( 𝑖 𝛼𝑖 −
1
2 𝑖 𝑗 𝛼𝑖 𝛼𝑗 𝑦𝑖 𝑦𝑖 𝜙 𝑥𝑖 𝜙(𝑥𝑗))
= max
𝛼≥0
( 𝑖 𝛼𝑖 −
1
2 𝑖 𝑗 𝛼𝑖 𝛼𝑗 𝑦𝑖 𝑦𝑖 𝐾(𝑥𝑖, 𝑥𝑗))
where
𝑤 = 𝑖 𝛼𝑖 𝑦𝑖 𝜙(𝑥𝑖),
𝑏 = 𝑦𝑖 −
𝑖
𝛼𝑖 𝑦𝑖 𝜙 𝑥𝑖 𝜙 𝑥𝑗 = 𝑦𝑖 −
𝑖
𝛼𝑖 𝑦𝑖 𝐾(𝑥𝑖, 𝑥𝑗)
Convert dual problem using Kernel Trick
• P.16의 𝑤와 𝑏를 이용하여 아래를 얻을 수 있다.
𝑠𝑖𝑔𝑛 𝑤 ⋅ 𝜙 𝑥 + 𝑏
= 𝑠𝑖𝑔𝑛( 𝑖 𝛼𝑖 𝑦𝑖 𝜙 𝑥𝑖 𝜙 𝑥 − 𝑦𝑖 − 𝑖 𝛼𝑖 𝑦𝑖 𝐾 𝑥𝑖, 𝑥𝑗 )
= 𝑠𝑖𝑔𝑛( 𝑖 𝛼𝑖 𝐾(𝑥𝑖, 𝑥) − 𝑦𝑖 − 𝑖 𝛼𝑖 𝑦𝑖 𝐾 𝑥𝑖, 𝑥𝑗 )
∴ Kernel Trick 의 장점은 높은 차원으로 올려주는 함수 𝜙를 사용하
지만, 실제로 계산은 kernel 로 되므로 𝜙함수의 존재성만 이용하여
Kernel을 사용할 수 있다.
The cost function of SVM : Hinge Loss
Decision boundary
90°
𝑤 ⋅ 𝑥𝑖 + 𝑏 𝑦𝑖
1
1
Hinge Loss Function

More Related Content

What's hot

Approximate Methods
Approximate MethodsApproximate Methods
Approximate MethodsTeja Ande
 
Rotation in 3d Space: Euler Angles, Quaternions, Marix Descriptions
Rotation in 3d Space: Euler Angles, Quaternions, Marix DescriptionsRotation in 3d Space: Euler Angles, Quaternions, Marix Descriptions
Rotation in 3d Space: Euler Angles, Quaternions, Marix DescriptionsSolo Hermelin
 
3 151010205457-lva1-app6892
3 151010205457-lva1-app68923 151010205457-lva1-app6892
3 151010205457-lva1-app6892Ranjeet Kumar
 
Torsion of circular shafts
Torsion of circular shaftsTorsion of circular shafts
Torsion of circular shaftsaero103
 
Limites et-continuite-cours-1-2 (1)
Limites et-continuite-cours-1-2 (1)Limites et-continuite-cours-1-2 (1)
Limites et-continuite-cours-1-2 (1)haitam4
 
application of first order ordinary Differential equations
application of first order ordinary Differential equationsapplication of first order ordinary Differential equations
application of first order ordinary Differential equationsEmdadul Haque Milon
 
Dynamics of structures 5th edition chopra solutions manual
Dynamics of structures 5th edition chopra solutions manualDynamics of structures 5th edition chopra solutions manual
Dynamics of structures 5th edition chopra solutions manualSchneiderxds
 

What's hot (17)

Bazzucchi-Campolmi-Zatti
Bazzucchi-Campolmi-ZattiBazzucchi-Campolmi-Zatti
Bazzucchi-Campolmi-Zatti
 
Disjoint sets
Disjoint setsDisjoint sets
Disjoint sets
 
Unit23
Unit23Unit23
Unit23
 
Approximate Methods
Approximate MethodsApproximate Methods
Approximate Methods
 
Rotation in 3d Space: Euler Angles, Quaternions, Marix Descriptions
Rotation in 3d Space: Euler Angles, Quaternions, Marix DescriptionsRotation in 3d Space: Euler Angles, Quaternions, Marix Descriptions
Rotation in 3d Space: Euler Angles, Quaternions, Marix Descriptions
 
3 151010205457-lva1-app6892
3 151010205457-lva1-app68923 151010205457-lva1-app6892
3 151010205457-lva1-app6892
 
Sol73
Sol73Sol73
Sol73
 
M1l2
M1l2M1l2
M1l2
 
2nd order ode applications
2nd order ode applications2nd order ode applications
2nd order ode applications
 
Ch05 1
Ch05 1Ch05 1
Ch05 1
 
Multi degree of freedom systems
Multi degree of freedom systemsMulti degree of freedom systems
Multi degree of freedom systems
 
Special functions
Special functionsSpecial functions
Special functions
 
Torsion of circular shafts
Torsion of circular shaftsTorsion of circular shafts
Torsion of circular shafts
 
String theory basics
String theory basicsString theory basics
String theory basics
 
Limites et-continuite-cours-1-2 (1)
Limites et-continuite-cours-1-2 (1)Limites et-continuite-cours-1-2 (1)
Limites et-continuite-cours-1-2 (1)
 
application of first order ordinary Differential equations
application of first order ordinary Differential equationsapplication of first order ordinary Differential equations
application of first order ordinary Differential equations
 
Dynamics of structures 5th edition chopra solutions manual
Dynamics of structures 5th edition chopra solutions manualDynamics of structures 5th edition chopra solutions manual
Dynamics of structures 5th edition chopra solutions manual
 

Similar to Support vector machines

Optimum Engineering Design - Day 2b. Classical Optimization methods
Optimum Engineering Design - Day 2b. Classical Optimization methodsOptimum Engineering Design - Day 2b. Classical Optimization methods
Optimum Engineering Design - Day 2b. Classical Optimization methodsSantiagoGarridoBulln
 
10_support_vector_machines (1).pptx
10_support_vector_machines (1).pptx10_support_vector_machines (1).pptx
10_support_vector_machines (1).pptxshyedshahriar
 
Paper Study: Melding the data decision pipeline
Paper Study: Melding the data decision pipelinePaper Study: Melding the data decision pipeline
Paper Study: Melding the data decision pipelineChenYiHuang5
 
Solving Poisson Equation using Conjugate Gradient Method and its implementation
Solving Poisson Equation using Conjugate Gradient Methodand its implementationSolving Poisson Equation using Conjugate Gradient Methodand its implementation
Solving Poisson Equation using Conjugate Gradient Method and its implementationJongsu "Liam" Kim
 
Optimum engineering design - Day 5. Clasical optimization methods
Optimum engineering design - Day 5. Clasical optimization methodsOptimum engineering design - Day 5. Clasical optimization methods
Optimum engineering design - Day 5. Clasical optimization methodsSantiagoGarridoBulln
 
13Kernel_Machines.pptx
13Kernel_Machines.pptx13Kernel_Machines.pptx
13Kernel_Machines.pptxKarasuLee
 
Stochastic optimal control &amp; rl
Stochastic optimal control &amp; rlStochastic optimal control &amp; rl
Stochastic optimal control &amp; rlChoiJinwon3
 
DL_lecture3_regularization_I.pdf
DL_lecture3_regularization_I.pdfDL_lecture3_regularization_I.pdf
DL_lecture3_regularization_I.pdfsagayalavanya2
 
Page rank - from theory to application
Page rank - from theory to applicationPage rank - from theory to application
Page rank - from theory to applicationGAYO3
 
Learning group em - 20171025 - copy
Learning group   em - 20171025 - copyLearning group   em - 20171025 - copy
Learning group em - 20171025 - copyShuai Zhang
 
Paper Study: OptNet: Differentiable Optimization as a Layer in Neural Networks
Paper Study: OptNet: Differentiable Optimization as a Layer in Neural NetworksPaper Study: OptNet: Differentiable Optimization as a Layer in Neural Networks
Paper Study: OptNet: Differentiable Optimization as a Layer in Neural NetworksChenYiHuang5
 
Reinforcement Learning basics part1
Reinforcement Learning basics part1Reinforcement Learning basics part1
Reinforcement Learning basics part1Euijin Jeong
 
Notes relating to Machine Learning and SVM
Notes relating to Machine Learning and SVMNotes relating to Machine Learning and SVM
Notes relating to Machine Learning and SVMSyedSaimGardezi
 
Extra Lecture - Support Vector Machines (SVM), a lecture in subject module St...
Extra Lecture - Support Vector Machines (SVM), a lecture in subject module St...Extra Lecture - Support Vector Machines (SVM), a lecture in subject module St...
Extra Lecture - Support Vector Machines (SVM), a lecture in subject module St...Maninda Edirisooriya
 
Support Vector Machines is the the the the the the the the the
Support Vector Machines is the the the the the the the the theSupport Vector Machines is the the the the the the the the the
Support Vector Machines is the the the the the the the the thesanjaibalajeessn
 
lecture-3 laplce and poisson.pptx .
lecture-3 laplce and poisson.pptx                .lecture-3 laplce and poisson.pptx                .
lecture-3 laplce and poisson.pptx .happycocoman
 
Support Vector Machines Simply
Support Vector Machines SimplySupport Vector Machines Simply
Support Vector Machines SimplyEmad Nabil
 

Similar to Support vector machines (20)

Optimum Engineering Design - Day 2b. Classical Optimization methods
Optimum Engineering Design - Day 2b. Classical Optimization methodsOptimum Engineering Design - Day 2b. Classical Optimization methods
Optimum Engineering Design - Day 2b. Classical Optimization methods
 
10_support_vector_machines (1).pptx
10_support_vector_machines (1).pptx10_support_vector_machines (1).pptx
10_support_vector_machines (1).pptx
 
CPP.pptx
CPP.pptxCPP.pptx
CPP.pptx
 
Paper Study: Melding the data decision pipeline
Paper Study: Melding the data decision pipelinePaper Study: Melding the data decision pipeline
Paper Study: Melding the data decision pipeline
 
Solving Poisson Equation using Conjugate Gradient Method and its implementation
Solving Poisson Equation using Conjugate Gradient Methodand its implementationSolving Poisson Equation using Conjugate Gradient Methodand its implementation
Solving Poisson Equation using Conjugate Gradient Method and its implementation
 
Optimum engineering design - Day 5. Clasical optimization methods
Optimum engineering design - Day 5. Clasical optimization methodsOptimum engineering design - Day 5. Clasical optimization methods
Optimum engineering design - Day 5. Clasical optimization methods
 
13Kernel_Machines.pptx
13Kernel_Machines.pptx13Kernel_Machines.pptx
13Kernel_Machines.pptx
 
Stochastic optimal control &amp; rl
Stochastic optimal control &amp; rlStochastic optimal control &amp; rl
Stochastic optimal control &amp; rl
 
DL_lecture3_regularization_I.pdf
DL_lecture3_regularization_I.pdfDL_lecture3_regularization_I.pdf
DL_lecture3_regularization_I.pdf
 
Page rank - from theory to application
Page rank - from theory to applicationPage rank - from theory to application
Page rank - from theory to application
 
Learning group em - 20171025 - copy
Learning group   em - 20171025 - copyLearning group   em - 20171025 - copy
Learning group em - 20171025 - copy
 
Paper Study: OptNet: Differentiable Optimization as a Layer in Neural Networks
Paper Study: OptNet: Differentiable Optimization as a Layer in Neural NetworksPaper Study: OptNet: Differentiable Optimization as a Layer in Neural Networks
Paper Study: OptNet: Differentiable Optimization as a Layer in Neural Networks
 
Reinforcement Learning basics part1
Reinforcement Learning basics part1Reinforcement Learning basics part1
Reinforcement Learning basics part1
 
Lecture_10_SVD.pptx
Lecture_10_SVD.pptxLecture_10_SVD.pptx
Lecture_10_SVD.pptx
 
Notes relating to Machine Learning and SVM
Notes relating to Machine Learning and SVMNotes relating to Machine Learning and SVM
Notes relating to Machine Learning and SVM
 
Stochastic Optimization
Stochastic OptimizationStochastic Optimization
Stochastic Optimization
 
Extra Lecture - Support Vector Machines (SVM), a lecture in subject module St...
Extra Lecture - Support Vector Machines (SVM), a lecture in subject module St...Extra Lecture - Support Vector Machines (SVM), a lecture in subject module St...
Extra Lecture - Support Vector Machines (SVM), a lecture in subject module St...
 
Support Vector Machines is the the the the the the the the the
Support Vector Machines is the the the the the the the the theSupport Vector Machines is the the the the the the the the the
Support Vector Machines is the the the the the the the the the
 
lecture-3 laplce and poisson.pptx .
lecture-3 laplce and poisson.pptx                .lecture-3 laplce and poisson.pptx                .
lecture-3 laplce and poisson.pptx .
 
Support Vector Machines Simply
Support Vector Machines SimplySupport Vector Machines Simply
Support Vector Machines Simply
 

More from Jinho Lee

Quantum computing and quantum information 1.2, 1.3
Quantum computing and quantum information 1.2, 1.3Quantum computing and quantum information 1.2, 1.3
Quantum computing and quantum information 1.2, 1.3Jinho Lee
 
Effective active learning strategy for multi label learning
Effective active learning strategy for multi label learningEffective active learning strategy for multi label learning
Effective active learning strategy for multi label learningJinho Lee
 
Wasserstein GAN
Wasserstein GANWasserstein GAN
Wasserstein GANJinho Lee
 
Generative Adversarial Nets
Generative Adversarial NetsGenerative Adversarial Nets
Generative Adversarial NetsJinho Lee
 
Wasserstein gan
Wasserstein ganWasserstein gan
Wasserstein ganJinho Lee
 
Ch.5 machine learning basics
Ch.5  machine learning basicsCh.5  machine learning basics
Ch.5 machine learning basicsJinho Lee
 
Ch.4 numerical computation
Ch.4  numerical computationCh.4  numerical computation
Ch.4 numerical computationJinho Lee
 
Auto-Encoders and Variational Auto-Encoders
Auto-Encoders and Variational Auto-EncodersAuto-Encoders and Variational Auto-Encoders
Auto-Encoders and Variational Auto-EncodersJinho Lee
 
Ch.3 Probability and Information Theory
Ch.3  Probability and Information TheoryCh.3  Probability and Information Theory
Ch.3 Probability and Information TheoryJinho Lee
 
Ch.2 Linear Algebra
Ch.2  Linear AlgebraCh.2  Linear Algebra
Ch.2 Linear AlgebraJinho Lee
 

More from Jinho Lee (10)

Quantum computing and quantum information 1.2, 1.3
Quantum computing and quantum information 1.2, 1.3Quantum computing and quantum information 1.2, 1.3
Quantum computing and quantum information 1.2, 1.3
 
Effective active learning strategy for multi label learning
Effective active learning strategy for multi label learningEffective active learning strategy for multi label learning
Effective active learning strategy for multi label learning
 
Wasserstein GAN
Wasserstein GANWasserstein GAN
Wasserstein GAN
 
Generative Adversarial Nets
Generative Adversarial NetsGenerative Adversarial Nets
Generative Adversarial Nets
 
Wasserstein gan
Wasserstein ganWasserstein gan
Wasserstein gan
 
Ch.5 machine learning basics
Ch.5  machine learning basicsCh.5  machine learning basics
Ch.5 machine learning basics
 
Ch.4 numerical computation
Ch.4  numerical computationCh.4  numerical computation
Ch.4 numerical computation
 
Auto-Encoders and Variational Auto-Encoders
Auto-Encoders and Variational Auto-EncodersAuto-Encoders and Variational Auto-Encoders
Auto-Encoders and Variational Auto-Encoders
 
Ch.3 Probability and Information Theory
Ch.3  Probability and Information TheoryCh.3  Probability and Information Theory
Ch.3 Probability and Information Theory
 
Ch.2 Linear Algebra
Ch.2  Linear AlgebraCh.2  Linear Algebra
Ch.2 Linear Algebra
 

Recently uploaded

costume and set research powerpoint presentation
costume and set research powerpoint presentationcostume and set research powerpoint presentation
costume and set research powerpoint presentationphoebematthew05
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsMiki Katsuragi
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr LapshynFwdays
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 

Recently uploaded (20)

costume and set research powerpoint presentation
costume and set research powerpoint presentationcostume and set research powerpoint presentation
costume and set research powerpoint presentation
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering Tips
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
Hot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort Service
Hot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort ServiceHot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort Service
Hot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort Service
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 

Support vector machines

  • 1.
  • 2. Part I. The Fundamentals of ML Ch.5 Support Vector Machines
  • 3. In this chapter, we will discuss about • What is SVM - supervised learning - binary classifier - linear and unlinear regressor - linear and unlinear learning using kernel trick Support vectors
  • 4. Linear SVM Classification • What is Margin? • SVM is sensitive to feature scale → use Scikit-Learn's StandardScaler Decision boundary = Decision line = Decision hyperplane
  • 5. Soft and Hard Margin SVM • Hard Margin SVM : error 를 허용하지 않고 margin 을 최대화함 * Hard Margin SVM 은 outlier 에 취약
  • 6. Soft and Hard Margin SVM • 그래서 약간의 에누리를 줄 수 있는데, 에누리의 크기를 margin violation 이라고 한다. 이 경우에 support vector와 decision line 사이에 feature 들이 들어있을 수 있음
  • 7. Mathematical approach 1 • Decision boundary(=hyperplane)을 𝑤 ⋅ 𝑥 + 𝑏 = 𝑤1 ⋅ 𝑥1 + ⋯ + 𝑤 𝑛 ⋅ 𝑥 𝑛 + 𝑏 = 0 라고 하자. • SVM 은 2진분류기(binary classifier)이고, 분류방법은 𝑤 ⋅ 𝑥 + 𝑏 > 0을 만족하는 positive case와 𝑤 ⋅ 𝑥 + 𝑏 < 0 을 만족하는 negative case 이다. • Define a map s ∶ 𝑋 → ±1 by 𝑠 𝑥 = 𝑠𝑖𝑔𝑛 𝑤 ⋅ 𝑥 + 𝑏 Then we have 𝑤 ⋅ 𝑥𝑖 + 𝑏 𝑦𝑖 ≥ 0 ∵ 𝑤 ⋅ 𝑥𝑖 + 𝑏 > 0 ⇒ 𝑦𝑖 = 1 & 𝑤 ⋅ 𝑥𝑖 + 𝑏 < 0 ⇒ 𝑦𝑖= −1
  • 8. Mathematical approach 2 : Margin Distance • Define a map 𝑓 𝑥 = 𝑤 ⋅ 𝑥 + 𝑏 • A point 𝑥 is said to be on the boundary if 𝑓 𝑥 = 0. • Suppose that 𝑓 𝑥 = 𝑎(≠ 0). • For a point 𝑥 𝑝 on the boundary, we have 𝑥 = 𝑥 𝑝 + 𝑟 𝑤 𝑤 where 𝑟 is the distance between 𝑥 and the decision boundary. Decision boundary 𝑥 𝑝 𝑥 90° r
  • 9. Mathematical approach 2 : Margin Distance • Since 𝑓 𝑥 𝑝 = 𝑤 ⋅ 𝑥 𝑝 + 𝑏 = 0, we have 𝑓 𝑥 = 𝑤 ⋅ 𝑥 + 𝑏 = 𝑤 𝑥 𝑝 + 𝑟 𝑤 𝑤 + 𝑏 = 𝑟 𝑤⋅𝑤 𝑤 = 𝑟 𝑤 . Therefore the distance between 𝑥 and the decision boundary is 𝑟 = 𝑓(𝑥) 𝑤 . Decision boundary 𝑥 𝑝 𝑥 90° 𝑓(𝑥) 𝑤
  • 10. Maximizing the margin • Optimization problem : max 𝑤,𝑏 𝑎 | 𝑤 | such that 𝑤 ⋅ 𝑥𝑖 + 𝑏 𝑦𝑖 ≥ 𝑎, ∀𝑖 • 우리는 𝑎 를 임의의 숫자로 사용했으므로, normalize 하여 아래의 optimization problem 을 얻을 수 있다. min 𝑤,𝑏 | 𝑤 | such that 𝑤 ⋅ 𝑥𝑖 + 𝑏 𝑦𝑖 ≥ 1, ∀𝑖 ⇒ Quadratic Programming(QP) problem ∵ | 𝑤 | 에 제곱항이 포함되어 있음
  • 11. Dual problem • Constrained optimization : min 𝑥 𝑓(𝑥) s.t. 𝑔 𝑥 ≤ 0, ℎ 𝑥 = 0. • Lagrange method: Lagrange prime function : 𝐿 𝑥, 𝛼, 𝛽 = 𝑓 𝑥 + 𝛼𝑔 𝑥 + 𝛽ℎ(𝑥) Lagrange multiplier : 𝛼 ≥ 0, 𝛽 Lagrange dual function : 𝑑 𝛼, 𝛽 = inf 𝑥∈𝑋 𝐿(𝑥, 𝛼, 𝛽) = min 𝑥∈𝑋 𝐿(𝑥, 𝛼, 𝛽). Then we have max 𝛼≥0,𝛽 𝐿 𝑥, 𝛼, 𝛽 = 𝑓 𝑥 ∶ 𝑖𝑓 𝑥 𝑖𝑠 𝑓𝑒𝑎𝑠𝑖𝑏𝑙𝑒 ∞ ∶ 𝑜. 𝑤. 우리의 optimization problem min 𝑤,𝑏 | 𝑤 | such that 𝑤 ⋅ 𝑥𝑖 + 𝑏 𝑦𝑖 ≥ 1, ∀𝑖 을 대입해보면 아래와 같다. 𝑓 𝑤 = 1 2 𝑤 ⋅ 𝑤, 𝑔 𝑤 = 𝑤 ⋅ 𝑥 + 𝑏 𝑦𝑖 − 1 ≤ 0, ℎ 𝑤 = 0 원래 𝑓 𝑥 = 𝑤 이지만, arg min 𝑤 = arg min 𝑤 2
  • 12. Prime & Dual problem • Primal Problem min 𝑥 𝑓(𝑥) s.t 𝑓 𝑥 ≤ 0, ℎ 𝑥 = 0 • Dual Problem m𝑎𝑥 𝛼>0,𝛽 𝑑 𝛼, 𝛽 s.t 𝛼 > 0 min 𝑥 max 𝛼≥0,𝛽 𝐿(𝑥, 𝛼, 𝛽) max 𝛼≥0,𝛽 min 𝑥 𝐿(𝑥, 𝛼, 𝛽) • Weak duality theorem 1. 𝑑 𝛼, 𝛽 ≤ 𝑓(𝑥) 2. 𝑑∗ = max 𝛼≥0,𝛽 min 𝑥 𝐿(𝑥, 𝛼, 𝛽) ≤ min 𝑥 max 𝛼≥0,𝛽 𝐿 𝑥, 𝛼, 𝛽 = 𝑝∗
  • 13. Strong Duality and KKT-conditions • The Karush-Kuhn-Tucker(KKT) conditions: Parallel gradients conditions : 𝜕𝐿(𝑥 𝑖,𝛼,𝛽) 𝜕𝑥 𝑖 = 0 ∀𝑖 Orthogonality conditions : 𝛼∗ 𝑔 𝑥∗ = 0 Satisfaction of original constraints : 𝑔 𝑥∗ ≤ 0 Lagrange multiplier nonnegativity : 𝛼∗ ≥ 0 • KKT-condition 을 만족하면, strong duality 가 성립한다. 즉, 𝑑∗ = max 𝛼≥0,𝛽 min 𝑥 𝐿(𝑥, 𝛼, 𝛽) = min 𝑥 max 𝛼≥0,𝛽 𝐿 𝑥, 𝛼, 𝛽 = 𝑝∗
  • 14. Strong Duality and KKT-conditions • 우리의 문제 SVM의 Primal problem, Dual Problem, KKT- conditions 는 아래와 같다. • Primal Problem min 𝑤,𝑏 max 𝛼>0,𝛽 ( 1 2 𝑤 ⋅ 𝑤 − 𝑖 𝛼𝑖 𝑤 ⋅ 𝑥𝑖 + 𝑏 𝑦𝑖 − 1 ) 𝑠. 𝑡. 𝛼𝑖 ≥ 0 ∀𝑖. • Dual Problem max 𝛼>0,𝛽 min 𝑤,𝑏 ( 1 2 𝑤 ⋅ 𝑤 − 𝑖 𝛼𝑖 𝑤 ⋅ 𝑥𝑖 + 𝑏 𝑦𝑖 − 1 ) 𝑠. 𝑡. 𝛼𝑖 ≥ 0 ∀𝑖. ∴ Our KKT-conditions: 𝜕𝐿(𝑤,𝑏,𝛼) 𝜕𝑤 = 0, 𝜕𝐿(𝑤,𝑏,𝛼) 𝜕𝑏 = 0 𝛼𝑖 ≥ 0, ∀𝑖 𝛼𝑖 𝑤 ⋅ 𝑥𝑖 + 𝑏 𝑦𝑖 − 1 = 0, ∀𝑖 ※ 𝒙𝒊가 decision boundary에 있지 않으면 𝒘 ⋅ 𝒙𝒊 + 𝒃 𝒚𝒊 − 𝟏 ≠ 𝟎 이고, 𝜶𝒊 ≥ 𝟎 이므로, 𝜶𝒊 𝒘 ⋅ 𝒙𝒊 + 𝒃 𝒚𝒊 − 𝟏 = 𝟎 으로부터 𝜶𝒊 = 𝟎를 얻을 수 있다. 따라서, SVM에서는 support vector 가 중요하다!!
  • 15. Strong Duality and KKT-conditions • 우리의 함수 𝑓 𝑥 = 𝑤 ⋅ 𝑥 + 𝑏 의 Lagrange prime function 은 𝐿 𝑤, 𝑏, 𝛼 = 1 2 𝑤 ⋅ 𝑤 − 𝑖 𝛼𝑖 𝑤 ⋅ 𝑥𝑖 + 𝑏 𝑦𝑖 − 1 이므로, KKT-condition 을 풀어보면 • 결과적으로 우리는 아래의 식을 얻을 수 있다. • 𝐿 𝑤, 𝑏, 𝛼 = 1 2 𝑤 ⋅ 𝑤 − 𝑖 𝛼𝑖 𝑤 ⋅ 𝑥𝑖 + 𝑏 𝑦𝑖 − 1 = 1 2 𝑤 ⋅ 𝑤 − 𝑖 𝛼𝑖 𝑦𝑖 𝑤 𝑥𝑖 − 𝑏 𝑖 𝛼𝑖 𝑦𝑖 + 𝑖 𝛼𝑖 = 1 2 𝑖 𝑗 𝛼𝑖 𝛼𝑗 𝑦𝑖 𝑦𝑖 𝑥𝑖 𝑥𝑗 − 𝑖 𝑗 𝛼𝑖 𝛼𝑗 𝑦𝑖 𝑦𝑖 𝑥𝑖 𝑥𝑗 + 𝑖 𝛼𝑖 = 𝑖 𝛼𝑖 − 1 2 𝑖 𝑗 𝛼𝑖 𝛼𝑗 𝑦𝑖 𝑦𝑖 𝑥𝑖 𝑥𝑗 • 𝜕𝐿(𝑤,𝑏,𝛼) 𝜕𝑤 = 0 → 𝑤 = 𝑖 𝛼𝑖 𝑥𝑖 𝑦𝑖 • 𝜕𝐿(𝑤,𝑏,𝛼) 𝜕𝑏 = 0 → 𝑖 𝛼𝑖 𝑦𝑖 = 0
  • 16. Mercer’s Theorem • We assume that: ∙ 𝑋 = 𝑥1, ⋯ , 𝑥 𝑛 : finite input space ∙ 𝑘 ∶ 𝑋 × 𝑋 → ℝ : kernel ∙ 𝐾 = 𝑘 𝑥𝑖, 𝑥𝑗 𝑖,𝑗=1 𝑛 : Gram matrix induced by 𝑘 - since 𝐾 is a symmetric matrix, we have spectral decomposition then the map 𝜙: 𝑋 → ℝ 𝑛 defined by 𝜙 𝑥𝑖 = 𝜆𝑖 𝑣𝑡𝑖 ∈ ℝ 𝑛, 𝑖 = 1, ⋯ , 𝑛 where 𝜆𝑖 is the 𝑖-th eigenvalue of 𝐾 and 𝑣 𝑡 = 𝑣 𝑡𝑖 𝑖=1 𝑛 is the 𝑖-th eigenvector of 𝐾 and 𝜙 𝑥𝑖 ⋅ 𝜙 𝑥𝑗 = 𝑡=1 𝑛 𝜆 𝑡 𝑣 𝑡𝑖 𝑣 𝑡𝑗 = 𝑉Λ𝑉 𝑇 𝑖𝑗 = 𝐾𝑖𝑗 = 𝑘 𝑥𝑖, 𝑥𝑗 . 𝐾 𝑉 𝑉 𝑇 𝑣1 𝑣2 𝑣 𝑛 𝑣2 𝑣1 𝑣 𝑛 https://cseweb.ucsd.edu/~dasg upta/291-unsup/lec7.pdf
  • 17. Kernel Trick • A function 𝐾 ∶ 𝑋 × 𝑋 → ℝ is a kernel of 𝜙 if 𝐾 𝑥𝑖, 𝑥𝑗 = 𝜙 𝑥𝑖 ⋅ 𝜙 𝑥𝑗 . 여기서 𝜙는 𝑋 보다 높은 차원으로 embedding 해주는 함수이다. • Polynomial kernel 𝐾 𝑥𝑖, 𝑥𝑗 = 𝑥𝑖 ⋅ 𝑥𝑗 + 1 𝑛 • Gaussian kernel 𝐾 𝑥𝑖, 𝑥𝑗 = exp − 𝑥𝑖 − 𝑥𝑗 2 2𝜎2 • Gaussian radial basis function(RBF) : It is general-purpose kernel; used when there is no prior knowledge about the data 𝐾 𝑥𝑖, 𝑥𝑗 = exp(−𝛾 𝑥𝑖 − 𝑥𝑗 2 ), 𝛾 > 0 • Laplace RBL kernel 𝐾 𝑥𝑖, 𝑥𝑗 = exp − | 𝑥𝑖 − 𝑥𝑗 | 𝜎 • Hyperbolic tangent kernel 𝐾 𝑥𝑖, 𝑥𝑗 = tanh 𝜅𝑥𝑖 ⋅ 𝑥𝑗 + 𝑐 , 𝜅 > 0, 𝑐 < 0. 참고. https://data-flair.training/blogs/svm-kernel-functions/
  • 18. Convert dual problem using Kernel Trick • P.14 에서 우리는 아래의 dual problem 을 얻었다. 𝐿 𝑤, 𝑏, 𝛼 = 𝑖 𝛼𝑖 − 1 2 𝑖 𝑗 𝛼𝑖 𝛼𝑗 𝑦𝑖 𝑦𝑖 𝑥𝑖 𝑥𝑗 • 위의 식을 𝜙 함수와 kernel 함수 𝐾를 이용하여 아래와 같이 변형할 수 있다. max 𝛼≥0 ( 𝑖 𝛼𝑖 − 1 2 𝑖 𝑗 𝛼𝑖 𝛼𝑗 𝑦𝑖 𝑦𝑖 𝜙 𝑥𝑖 𝜙(𝑥𝑗)) = max 𝛼≥0 ( 𝑖 𝛼𝑖 − 1 2 𝑖 𝑗 𝛼𝑖 𝛼𝑗 𝑦𝑖 𝑦𝑖 𝐾(𝑥𝑖, 𝑥𝑗)) where 𝑤 = 𝑖 𝛼𝑖 𝑦𝑖 𝜙(𝑥𝑖), 𝑏 = 𝑦𝑖 − 𝑖 𝛼𝑖 𝑦𝑖 𝜙 𝑥𝑖 𝜙 𝑥𝑗 = 𝑦𝑖 − 𝑖 𝛼𝑖 𝑦𝑖 𝐾(𝑥𝑖, 𝑥𝑗)
  • 19. Convert dual problem using Kernel Trick • P.16의 𝑤와 𝑏를 이용하여 아래를 얻을 수 있다. 𝑠𝑖𝑔𝑛 𝑤 ⋅ 𝜙 𝑥 + 𝑏 = 𝑠𝑖𝑔𝑛( 𝑖 𝛼𝑖 𝑦𝑖 𝜙 𝑥𝑖 𝜙 𝑥 − 𝑦𝑖 − 𝑖 𝛼𝑖 𝑦𝑖 𝐾 𝑥𝑖, 𝑥𝑗 ) = 𝑠𝑖𝑔𝑛( 𝑖 𝛼𝑖 𝐾(𝑥𝑖, 𝑥) − 𝑦𝑖 − 𝑖 𝛼𝑖 𝑦𝑖 𝐾 𝑥𝑖, 𝑥𝑗 ) ∴ Kernel Trick 의 장점은 높은 차원으로 올려주는 함수 𝜙를 사용하 지만, 실제로 계산은 kernel 로 되므로 𝜙함수의 존재성만 이용하여 Kernel을 사용할 수 있다.
  • 20. The cost function of SVM : Hinge Loss Decision boundary 90° 𝑤 ⋅ 𝑥𝑖 + 𝑏 𝑦𝑖 1 1 Hinge Loss Function