SlideShare a Scribd company logo
1 of 31
Download to read offline
SVM & Kernel Trick
Kyuyong Shin
Naver Clova A.I Research (CLAIR)
Biz AI
SVM
+-
-
+
ഥ𝑤
ത𝑢
ഥ𝑤 ∙ ത𝑢 ≥ 𝑐
ഥ𝑤 ∙ ത𝑢 + 𝑏 ≥ 0, 𝑇ℎ𝑒𝑛 +
Decision Rule
(𝑐 = −𝑏)
How to compute ഥ𝑤?
ഥ𝑤 ∙ 𝑥+ + 𝑏 ≥ 1
ഥ𝑤 ∙ 𝑥− + 𝑏 ≤ −1
𝑦𝑖(ഥ𝑤 ∙ ഥ𝑥𝑖 + 𝑏) ≥ 1
𝑦𝑖(ഥ𝑤 ∙ ഥ𝑥𝑖 + 𝑏) ≥ 1
𝑦𝑖(ഥ𝑤 ∙ ഥ𝑥𝑖 + 𝑏) − 1 ≥ 0
𝑦𝑖(ഥ𝑤 ∙ ഥ𝑥𝑖 + 𝑏) − 1 = 0
𝑦𝑖 such that. 𝑦𝑖=+1 for + samples
𝑦𝑖=-1 for - samples
For 𝑥𝑖 in gutter
How to compute ഥ𝑤?
+-
-
+
𝑥+ − 𝑥−
𝑥+
𝑥−
MAX
2
ഥw
→ MIN ഥw → MIN
1
2
ഥW 2
Width = 𝑥+ − 𝑥− ∙
ഥw
ഥw
=
2
ഥw
1 − b 1 + b
How to minimize it?
𝑐𝑜𝑛𝑠𝑡𝑟𝑎𝑖𝑛𝑡𝑠 ∶ 𝑦𝑖 (ഥ𝑤 ∙ ഥ𝑥𝑖 + 𝑏) − 1 ≥ 0
If the constraints are equality, we call it equality constraint problem
and we solve it by Lagrangian multiplier.
The method of Lagrange multipliers is generalized by the Karush–
Kuhn–Tucker conditions, which can also take into account inequality
constraints.
How to minimize it?
Karush–Kuhn–Tucker conditions
Conditions on primal
Kuhn–Tucker conditions, are first
derivative tests (sometimes called
first-order necessary conditions) for a
solution in nonlinear programming to
be optimal, provided that some
regularity conditions are satisfied (i.e
KKT-condtions derive from the
relationship between primal and dual
when some regularity conditions are
satisfied).
Duality
We want to minimize the primal problem
through maximize the dual problem.
Due to Slater’s condition, the difference between
the primal and dual solutions (duality gap) equal to zero.
If the optimal duality gap is zero, then
we say that strong duality holds.
How to minimize it?
In order for a minimum point 𝑥∗
satisfy the above KKT conditions,
the problem should satisfy some regularity conditions;
some common examples are tabulated here:
Slater's condition
How to minimize it?
For a problem with strong duality
𝑥∗
and 𝑢∗
, 𝑣∗
are primal and dual solutions
↔
𝑥∗
and 𝑢∗
, 𝑣∗
satisfy KKT-conditions
How to minimize it?
We will find 𝑠𝑜𝑙𝑢𝑡𝑖𝑜𝑛𝑠 that satisfy KKT-conditions,
because SVM satisfy Slater’s condition,
𝑜𝑢𝑟 𝑠𝑜𝑙𝑢𝑡𝑖𝑜𝑛𝑠 will be solutions of primal and dual
How to minimize it?
If primal is convex problem, then it satisfies Slater’s condition.
Fortunately, SVM satisfy Slater's condition
and thus a strong duality hold.
The strong duality leads us to KKT-condition.
(if 𝑥∗
satisfy KKT-condition, then 𝑥∗
will be solution of primal)
Let ℎ𝑖 𝑥∗
= 𝑦𝑖 (ഥ𝑤 ∙ ഥ𝑥𝑖 + 𝑏) − 1, then
important conclusion of satisfying KKT-condition is that σ 𝛼𝑖
∗
ℎ𝑖 𝑥∗
= 0.
Since each term in this sum is nonpositive, we conclude that 𝛼𝑖
∗
ℎ𝑖 𝑥∗ = 0 for all 𝑖.
𝐿 =
1
2
ഥW 2
− ෍ 𝛼𝑖[𝑦𝑖 (ഥ𝑤 ∙ ഥ𝑥𝑖 + 𝑏) − 1]
𝜕𝐿
𝜕 ഥW
= ഥ𝑤 − ෍ 𝛼𝑖 𝑦𝑖 ഥ𝑥𝑖 = 0 ⇒ ഥ𝑤 = ෍ 𝛼𝑖 𝑦𝑖 ഥ𝑥𝑖
𝜕𝐿
𝜕𝑏
= − ෍ 𝛼𝑖 𝑦𝑖 = 0 ⇒ ෍ 𝛼𝑖 𝑦𝑖 = 0
𝑐𝑜𝑛𝑠𝑡𝑟𝑎𝑖𝑛𝑡𝑠 ∶ 𝑦𝑖 (ഥ𝑤 ∙ ഥ𝑥𝑖 + 𝑏) − 1 ≥ 0
Lagrange multiplier
Therefore, ഥ𝑤 is the linear sum of several ഥ𝑥𝑖. To say ‘several' is not
included in the sum in the case of 𝛼𝑖 =0. This means that the
algorithm is designed so that the value of α is zero for samples( ഥ𝑥𝑖)
that do not affect classification.
𝛼𝑖[𝑦𝑖 (ഥ𝑤 ∙ ഥ𝑥𝑖 + 𝑏) − 1] = 0
complementary slackness
by KKT-conditions
If 𝛼𝑖 is non-zero, then ഥ𝑥𝑖 is a support vector (which lies on the
gutter).
𝐿 =
1
2
σ 𝛼𝑖 𝑦𝑖 ഥ𝑥𝑖 ∙ σ 𝛼𝑖 𝑦𝑖 ഥ𝑥𝑖 − σ 𝛼𝑖 𝑦𝑖 ഥ𝑥𝑖 ∙ σ 𝛼𝑖 𝑦𝑖 ഥ𝑥𝑖 − σ 𝛼𝑖 𝑦𝑖 𝑏 + σ 𝛼𝑖
∴ 𝐿 = σ 𝛼𝑖 −
1
2
σ σ 𝛼𝑖 𝛼𝑗 𝑦𝑖 𝑦𝑗 ഥ𝑥𝑖 ∙ ഥ𝑥𝑗
ഥ𝑥𝑖 ∙ ഥ𝑥𝑗
Maximization will depend only
on the dot product of pairs of
support vector!!
σ 𝛼𝑖 𝑦𝑖 ഥ𝑥𝑖 ∙ 𝑢 + 𝑏 > 0, Then +
ഥ𝑥𝑖 ∙ 𝑢
Decision Rule only depends on the
dot product of the support vectors
and the unknown sample. Thanks to
this property, we can conduct magical
kernel trick.
Kernel Tricks
𝐾 𝑥𝑖, 𝑢 = ∅(𝑥𝑖) ∙ ∅(𝑢)
1) Linear kernel 𝐾 𝑥𝑖, 𝑢 = (𝑥𝑖 ∙ 𝑢 + 1) 𝑛
, n=1
2) Non-Linear kernel RBF 𝐾 𝑥𝑖, 𝑢 = 𝑒−𝛾(𝑥 𝑖−𝑢)2
Logit calculates the 'distance' or 'error' respect to the decision boundary based on
each data point, while SVM calculates the distance from the decision boundary based
on the data divided into 'groups' of 0/1.
Although SVM is mathematically more elegant, they are inevitably very vulnerable to
outliers.
A typical SVM ignores values other than support vectors, and by default hinge loss is
less sensitive than logistic loss, but can SVM be considered more vulnerable to
outlier than Logit?
What if the outlier defines the support vector?
How is SVM different from Logit?
Noised
σ 𝛼𝑖 𝑦𝑖 ഥ𝑥𝑖 ∙ 𝑢 + 𝑏 > 0, Then +
ഥ𝑥𝑖 ∙ 𝑢
Decision Rule only depends on the
dot product of the support vectors
and the unknown sample. Thanks to
this property, we can conduct magical
kernel trick.
Kernel Tricks
𝐾 𝑥𝑖, 𝑢 = ∅(𝑥𝑖) ∙ ∅(𝑢)
1) Linear kernel 𝐾 𝑥𝑖, 𝑢 = (𝑥𝑖 ∙ 𝑢 + 1) 𝑛
, n=1
2) Non-Linear kernel RBF 𝐾 𝑥𝑖, 𝑢 = 𝑒−𝛾(𝑥 𝑖−𝑢)2
After KernelData
Non-Linear Transformation
Kernel Tricks
𝐾 𝑥𝑖, 𝑢 = ∅(𝑥𝑖) ∙ ∅(𝑢)
In the case of Gaussian Kernel, the same
effect as the embedding to infinite
dimensions. (by Taylor expansion)
Kernel Trick: In effect, the dimension expansion to
high dimension leads to an increase in computational
capacity, so the Kernel method is used to achieve the
same effect even if it does not actually scale up the
dimension.
More on Kernels
A mathematical definition of Kernel is very
simple, if there is a mapping function ∅ on
Hilbert space, it can be defined immediately.
But the problem is ∅. How do we find ∅ on
Hilbert Space or how do we know that there
exist ∅
−>
By Mercer’s Theorem, if Kernel K satisfies
certain conditions (symmetric, PD, etc.) then K
can be decomposed by eigenfunction and
eigenvalue. That is, if the K satisfies the above
conditions, ∅ always exists, so we can create a
Kernel even if we do not define ∅ explicitly .
More on Kernels
The above equations indicates that if Kernel K is
positive semi definite, it can be decomposed to
eigenfunctions and eigenvalues.
*It is an infinite summation formula.
Remember when we proved the Fourier series!
it is just an approximation if not infinite (important)
RKHS
RKHS
Informal definition of RKHS : Hilbert Space with Reproducing Kernel is RKHS.
Formal definition of RKHS
Functional 𝐿 𝑥 = <∙, 𝑘 ∙, 𝑥 > 𝐿 𝑥 𝑓 = 𝑓 𝑥
In other words, if we can reproduce putting function 𝑓 into evaluation functional 𝐿 𝑥 by a dot
product of any vector 𝑘 and 𝑓, then reproducing kernel property is satisfied.
RKHS
Riesz Representation Theorem
https://en.wikipedia.org/wiki/Reproducing_kernel_Hilbert_space
(Definition)
RKHS
WHY IS RKHS GOOD?
The evaluation of 𝒇 in 𝒙 is
expressed as eigenfunction ∅ of
𝒌𝒆𝒓𝒏𝒆𝒍.
RKHS
Another view on RKHS: Moore-Aronszajn Theorem-> The RKHS itself is in line with the Kernel Function
existence condition. In other words, if we define Kernel K that satisfies some conditions, it is the only
Kernel that define specific reproducing kernel Hilbert space H which respond to Kernel K.
The shape of f is affected by kernel, because the new function f is also decomposed by eigenfunction of
kernel.
e.g) GPR
A new function f similar to Gaussian kernel shape is created
Thus using the kernel method is fitting our data to the kernel function, so choosing the proper kernel is
what we must focus in kernel method.
https://www.edwith.org/bayesiandeeplearning/lecture/24684/
https://patternsofideas.wordpress.com/2016/12/12/mercers-theorem-and-svms/
https://bi.snu.ac.kr/Publications/Conferences/Domestic/KCC2005_LeeSK.pdf
https://stats.stackexchange.com/questions/268429/do-gaussian-process-regression-have-the-universal-approximation-
property
http://iera.name/a-story-of-basis-and-kernel-part-i-function-basis/
https://youtu.be/7JRwjCpKewQ
RKHS
Representer Theorem
References
• Strongly Based on MIT 6.034 Artificial Intelligence, Fall 2010
Instructor: Patrick Winston
video link: https://youtu.be/_PwhiWxHK8o
• Trevor Hastie et al. The Elements of Statistical Learning (2001)
• Machine Learning Lecture 26 "Gaussian Processes" -Cornell CS4780 SP17 by Kilian Weinberger
-video link: https://www.youtube.com/watch?v=R-NUdqxKjos&t=1000s
• 9.520/6.860S Statistical Learning Theory by Lorenzo Rosasco
http://www.mit.edu/~9.520/fall14/slides/class03/class03_rkhsPart1.pdf
-video link: https://www.youtube.com/watch?v=9-oxo_k69qs
• Bayesian Deep Learning by Sungjoon Choi
-video link: https://www.edwith.org/bayesiandeeplearning/joinLectures/14426
Thanks

More Related Content

What's hot

Lecture 2 fuzzy inference system
Lecture 2  fuzzy inference systemLecture 2  fuzzy inference system
Lecture 2 fuzzy inference systemParveenMalik18
 
Greens functions and finite elements
Greens functions and finite elementsGreens functions and finite elements
Greens functions and finite elementsSpringer
 
Reachability Analysis "Control Of Dynamical Non-Linear Systems"
Reachability Analysis "Control Of Dynamical Non-Linear Systems" Reachability Analysis "Control Of Dynamical Non-Linear Systems"
Reachability Analysis "Control Of Dynamical Non-Linear Systems" M Reza Rahmati
 
Numerical analysis m3 l6slides
Numerical analysis m3 l6slidesNumerical analysis m3 l6slides
Numerical analysis m3 l6slidesSHAMJITH KM
 
Skiena algorithm 2007 lecture15 backtracing
Skiena algorithm 2007 lecture15 backtracingSkiena algorithm 2007 lecture15 backtracing
Skiena algorithm 2007 lecture15 backtracingzukun
 
The klein gordon field in two-dimensional rindler space-time 28072020ver-drft...
The klein gordon field in two-dimensional rindler space-time 28072020ver-drft...The klein gordon field in two-dimensional rindler space-time 28072020ver-drft...
The klein gordon field in two-dimensional rindler space-time 28072020ver-drft...foxtrot jp R
 
The klein gordon field in two-dimensional rindler space-time 14072020
The klein gordon field in two-dimensional rindler space-time  14072020The klein gordon field in two-dimensional rindler space-time  14072020
The klein gordon field in two-dimensional rindler space-time 14072020foxtrot jp R
 
The klein gordon field in two-dimensional rindler space-time 200920ver-display
The klein gordon field in two-dimensional rindler space-time 200920ver-displayThe klein gordon field in two-dimensional rindler space-time 200920ver-display
The klein gordon field in two-dimensional rindler space-time 200920ver-displayfoxtrot jp R
 
Introduction to FEM
Introduction to FEMIntroduction to FEM
Introduction to FEMmezkurra
 
Tensor analysis EFE
Tensor analysis  EFETensor analysis  EFE
Tensor analysis EFEBAIJU V
 
The klein gordon field in two-dimensional rindler space-time - smcprt
The klein gordon field in two-dimensional rindler space-time - smcprtThe klein gordon field in two-dimensional rindler space-time - smcprt
The klein gordon field in two-dimensional rindler space-time - smcprtfoxtrot jp R
 
The klein gordon field in two-dimensional rindler space-time 23052020-sqrd
The klein gordon field in two-dimensional rindler space-time  23052020-sqrdThe klein gordon field in two-dimensional rindler space-time  23052020-sqrd
The klein gordon field in two-dimensional rindler space-time 23052020-sqrdfoxtrot jp R
 
The klein gordon field in two-dimensional rindler space-time -sqrdupdt41220
The klein gordon field in two-dimensional rindler space-time -sqrdupdt41220The klein gordon field in two-dimensional rindler space-time -sqrdupdt41220
The klein gordon field in two-dimensional rindler space-time -sqrdupdt41220foxtrot jp R
 
The klein gordon field in two-dimensional rindler space-time 16052020
The klein gordon field in two-dimensional rindler space-time 16052020The klein gordon field in two-dimensional rindler space-time 16052020
The klein gordon field in two-dimensional rindler space-time 16052020foxtrot jp R
 
The klein gordon field in two-dimensional rindler space-time 04232020updts
The klein gordon field in two-dimensional rindler space-time  04232020updtsThe klein gordon field in two-dimensional rindler space-time  04232020updts
The klein gordon field in two-dimensional rindler space-time 04232020updtsfoxtrot jp R
 
Sienna 5 decreaseandconquer
Sienna 5 decreaseandconquerSienna 5 decreaseandconquer
Sienna 5 decreaseandconquerchidabdu
 
Weighted Analogue of Inverse Maxwell Distribution with Applications
Weighted Analogue of Inverse Maxwell Distribution with ApplicationsWeighted Analogue of Inverse Maxwell Distribution with Applications
Weighted Analogue of Inverse Maxwell Distribution with ApplicationsPremier Publishers
 
Artyom Makovetskii - An Efficient Algorithm for Total Variation Denoising
Artyom Makovetskii - An Efficient Algorithm for Total Variation DenoisingArtyom Makovetskii - An Efficient Algorithm for Total Variation Denoising
Artyom Makovetskii - An Efficient Algorithm for Total Variation DenoisingAIST
 

What's hot (20)

Discrete control
Discrete controlDiscrete control
Discrete control
 
Machine Learning 1
Machine Learning 1Machine Learning 1
Machine Learning 1
 
Lecture 2 fuzzy inference system
Lecture 2  fuzzy inference systemLecture 2  fuzzy inference system
Lecture 2 fuzzy inference system
 
Greens functions and finite elements
Greens functions and finite elementsGreens functions and finite elements
Greens functions and finite elements
 
Reachability Analysis "Control Of Dynamical Non-Linear Systems"
Reachability Analysis "Control Of Dynamical Non-Linear Systems" Reachability Analysis "Control Of Dynamical Non-Linear Systems"
Reachability Analysis "Control Of Dynamical Non-Linear Systems"
 
Numerical analysis m3 l6slides
Numerical analysis m3 l6slidesNumerical analysis m3 l6slides
Numerical analysis m3 l6slides
 
Skiena algorithm 2007 lecture15 backtracing
Skiena algorithm 2007 lecture15 backtracingSkiena algorithm 2007 lecture15 backtracing
Skiena algorithm 2007 lecture15 backtracing
 
The klein gordon field in two-dimensional rindler space-time 28072020ver-drft...
The klein gordon field in two-dimensional rindler space-time 28072020ver-drft...The klein gordon field in two-dimensional rindler space-time 28072020ver-drft...
The klein gordon field in two-dimensional rindler space-time 28072020ver-drft...
 
The klein gordon field in two-dimensional rindler space-time 14072020
The klein gordon field in two-dimensional rindler space-time  14072020The klein gordon field in two-dimensional rindler space-time  14072020
The klein gordon field in two-dimensional rindler space-time 14072020
 
The klein gordon field in two-dimensional rindler space-time 200920ver-display
The klein gordon field in two-dimensional rindler space-time 200920ver-displayThe klein gordon field in two-dimensional rindler space-time 200920ver-display
The klein gordon field in two-dimensional rindler space-time 200920ver-display
 
Introduction to FEM
Introduction to FEMIntroduction to FEM
Introduction to FEM
 
Tensor analysis EFE
Tensor analysis  EFETensor analysis  EFE
Tensor analysis EFE
 
The klein gordon field in two-dimensional rindler space-time - smcprt
The klein gordon field in two-dimensional rindler space-time - smcprtThe klein gordon field in two-dimensional rindler space-time - smcprt
The klein gordon field in two-dimensional rindler space-time - smcprt
 
The klein gordon field in two-dimensional rindler space-time 23052020-sqrd
The klein gordon field in two-dimensional rindler space-time  23052020-sqrdThe klein gordon field in two-dimensional rindler space-time  23052020-sqrd
The klein gordon field in two-dimensional rindler space-time 23052020-sqrd
 
The klein gordon field in two-dimensional rindler space-time -sqrdupdt41220
The klein gordon field in two-dimensional rindler space-time -sqrdupdt41220The klein gordon field in two-dimensional rindler space-time -sqrdupdt41220
The klein gordon field in two-dimensional rindler space-time -sqrdupdt41220
 
The klein gordon field in two-dimensional rindler space-time 16052020
The klein gordon field in two-dimensional rindler space-time 16052020The klein gordon field in two-dimensional rindler space-time 16052020
The klein gordon field in two-dimensional rindler space-time 16052020
 
The klein gordon field in two-dimensional rindler space-time 04232020updts
The klein gordon field in two-dimensional rindler space-time  04232020updtsThe klein gordon field in two-dimensional rindler space-time  04232020updts
The klein gordon field in two-dimensional rindler space-time 04232020updts
 
Sienna 5 decreaseandconquer
Sienna 5 decreaseandconquerSienna 5 decreaseandconquer
Sienna 5 decreaseandconquer
 
Weighted Analogue of Inverse Maxwell Distribution with Applications
Weighted Analogue of Inverse Maxwell Distribution with ApplicationsWeighted Analogue of Inverse Maxwell Distribution with Applications
Weighted Analogue of Inverse Maxwell Distribution with Applications
 
Artyom Makovetskii - An Efficient Algorithm for Total Variation Denoising
Artyom Makovetskii - An Efficient Algorithm for Total Variation DenoisingArtyom Makovetskii - An Efficient Algorithm for Total Variation Denoising
Artyom Makovetskii - An Efficient Algorithm for Total Variation Denoising
 

Similar to SVM (Support Vector Machine & Kernel)

Gauge Theory for Beginners.pptx
Gauge Theory for Beginners.pptxGauge Theory for Beginners.pptx
Gauge Theory for Beginners.pptxHassaan Saleem
 
The world of loss function
The world of loss functionThe world of loss function
The world of loss function홍배 김
 
Optimum Engineering Design - Day 2b. Classical Optimization methods
Optimum Engineering Design - Day 2b. Classical Optimization methodsOptimum Engineering Design - Day 2b. Classical Optimization methods
Optimum Engineering Design - Day 2b. Classical Optimization methodsSantiagoGarridoBulln
 
Conformal Boundary conditions
Conformal Boundary conditionsConformal Boundary conditions
Conformal Boundary conditionsHassaan Saleem
 
Random Matrix Theory and Machine Learning - Part 3
Random Matrix Theory and Machine Learning - Part 3Random Matrix Theory and Machine Learning - Part 3
Random Matrix Theory and Machine Learning - Part 3Fabian Pedregosa
 
Knowledge extraction from support vector machines
Knowledge extraction from support vector machinesKnowledge extraction from support vector machines
Knowledge extraction from support vector machinesEyad Alshami
 
Strong convexity on gradient descent and newton's method
Strong convexity on gradient descent and newton's methodStrong convexity on gradient descent and newton's method
Strong convexity on gradient descent and newton's methodSEMINARGROOT
 
Lecture 5 backpropagation
Lecture 5 backpropagationLecture 5 backpropagation
Lecture 5 backpropagationParveenMalik18
 
Extra Lecture - Support Vector Machines (SVM), a lecture in subject module St...
Extra Lecture - Support Vector Machines (SVM), a lecture in subject module St...Extra Lecture - Support Vector Machines (SVM), a lecture in subject module St...
Extra Lecture - Support Vector Machines (SVM), a lecture in subject module St...Maninda Edirisooriya
 
Anomaly detection using deep one class classifier
Anomaly detection using deep one class classifierAnomaly detection using deep one class classifier
Anomaly detection using deep one class classifier홍배 김
 
PRML Chapter 7
PRML Chapter 7PRML Chapter 7
PRML Chapter 7Sunwoo Kim
 
about power system operation and control13197214.ppt
about power system operation and control13197214.pptabout power system operation and control13197214.ppt
about power system operation and control13197214.pptMohammedAhmed66819
 
MetiTarski: An Automatic Prover for Real-Valued Special Functions
MetiTarski: An Automatic Prover for Real-Valued Special FunctionsMetiTarski: An Automatic Prover for Real-Valued Special Functions
MetiTarski: An Automatic Prover for Real-Valued Special FunctionsLawrence Paulson
 
Support Vector Machines Simply
Support Vector Machines SimplySupport Vector Machines Simply
Support Vector Machines SimplyEmad Nabil
 
Solving Poisson Equation using Conjugate Gradient Method and its implementation
Solving Poisson Equation using Conjugate Gradient Methodand its implementationSolving Poisson Equation using Conjugate Gradient Methodand its implementation
Solving Poisson Equation using Conjugate Gradient Method and its implementationJongsu "Liam" Kim
 

Similar to SVM (Support Vector Machine & Kernel) (20)

CPP.pptx
CPP.pptxCPP.pptx
CPP.pptx
 
Gauge Theory for Beginners.pptx
Gauge Theory for Beginners.pptxGauge Theory for Beginners.pptx
Gauge Theory for Beginners.pptx
 
The world of loss function
The world of loss functionThe world of loss function
The world of loss function
 
Optimum Engineering Design - Day 2b. Classical Optimization methods
Optimum Engineering Design - Day 2b. Classical Optimization methodsOptimum Engineering Design - Day 2b. Classical Optimization methods
Optimum Engineering Design - Day 2b. Classical Optimization methods
 
Conformal Boundary conditions
Conformal Boundary conditionsConformal Boundary conditions
Conformal Boundary conditions
 
Random Matrix Theory and Machine Learning - Part 3
Random Matrix Theory and Machine Learning - Part 3Random Matrix Theory and Machine Learning - Part 3
Random Matrix Theory and Machine Learning - Part 3
 
The Gaussian Hardy-Littlewood Maximal Function
The Gaussian Hardy-Littlewood Maximal FunctionThe Gaussian Hardy-Littlewood Maximal Function
The Gaussian Hardy-Littlewood Maximal Function
 
Knowledge extraction from support vector machines
Knowledge extraction from support vector machinesKnowledge extraction from support vector machines
Knowledge extraction from support vector machines
 
Strong convexity on gradient descent and newton's method
Strong convexity on gradient descent and newton's methodStrong convexity on gradient descent and newton's method
Strong convexity on gradient descent and newton's method
 
Lecture 5 backpropagation
Lecture 5 backpropagationLecture 5 backpropagation
Lecture 5 backpropagation
 
Extra Lecture - Support Vector Machines (SVM), a lecture in subject module St...
Extra Lecture - Support Vector Machines (SVM), a lecture in subject module St...Extra Lecture - Support Vector Machines (SVM), a lecture in subject module St...
Extra Lecture - Support Vector Machines (SVM), a lecture in subject module St...
 
Anomaly detection using deep one class classifier
Anomaly detection using deep one class classifierAnomaly detection using deep one class classifier
Anomaly detection using deep one class classifier
 
PRML Chapter 7
PRML Chapter 7PRML Chapter 7
PRML Chapter 7
 
about power system operation and control13197214.ppt
about power system operation and control13197214.pptabout power system operation and control13197214.ppt
about power system operation and control13197214.ppt
 
lec19.ppt
lec19.pptlec19.ppt
lec19.ppt
 
MetiTarski: An Automatic Prover for Real-Valued Special Functions
MetiTarski: An Automatic Prover for Real-Valued Special FunctionsMetiTarski: An Automatic Prover for Real-Valued Special Functions
MetiTarski: An Automatic Prover for Real-Valued Special Functions
 
Support Vector Machines Simply
Support Vector Machines SimplySupport Vector Machines Simply
Support Vector Machines Simply
 
04 Multi-layer Feedforward Networks
04 Multi-layer Feedforward Networks04 Multi-layer Feedforward Networks
04 Multi-layer Feedforward Networks
 
AJMS_482_23.pdf
AJMS_482_23.pdfAJMS_482_23.pdf
AJMS_482_23.pdf
 
Solving Poisson Equation using Conjugate Gradient Method and its implementation
Solving Poisson Equation using Conjugate Gradient Methodand its implementationSolving Poisson Equation using Conjugate Gradient Methodand its implementation
Solving Poisson Equation using Conjugate Gradient Method and its implementation
 

More from SEMINARGROOT

Metric based meta_learning
Metric based meta_learningMetric based meta_learning
Metric based meta_learningSEMINARGROOT
 
Sampling method : MCMC
Sampling method : MCMCSampling method : MCMC
Sampling method : MCMCSEMINARGROOT
 
Demystifying Neural Style Transfer
Demystifying Neural Style TransferDemystifying Neural Style Transfer
Demystifying Neural Style TransferSEMINARGROOT
 
Towards Deep Learning Models Resistant to Adversarial Attacks.
Towards Deep Learning Models Resistant to Adversarial Attacks.Towards Deep Learning Models Resistant to Adversarial Attacks.
Towards Deep Learning Models Resistant to Adversarial Attacks.SEMINARGROOT
 
The ways of node embedding
The ways of node embeddingThe ways of node embedding
The ways of node embeddingSEMINARGROOT
 
Graph Convolutional Network
Graph  Convolutional NetworkGraph  Convolutional Network
Graph Convolutional NetworkSEMINARGROOT
 
Denoising With Frequency Domain
Denoising With Frequency DomainDenoising With Frequency Domain
Denoising With Frequency DomainSEMINARGROOT
 
Bayesian Statistics
Bayesian StatisticsBayesian Statistics
Bayesian StatisticsSEMINARGROOT
 
Coding Test Review 3
Coding Test Review 3Coding Test Review 3
Coding Test Review 3SEMINARGROOT
 
Time Series Analysis - ARMA
Time Series Analysis - ARMATime Series Analysis - ARMA
Time Series Analysis - ARMASEMINARGROOT
 
Differential Geometry for Machine Learning
Differential Geometry for Machine LearningDifferential Geometry for Machine Learning
Differential Geometry for Machine LearningSEMINARGROOT
 
Generative models : VAE and GAN
Generative models : VAE and GANGenerative models : VAE and GAN
Generative models : VAE and GANSEMINARGROOT
 
Understanding Blackbox Prediction via Influence Functions
Understanding Blackbox Prediction via Influence FunctionsUnderstanding Blackbox Prediction via Influence Functions
Understanding Blackbox Prediction via Influence FunctionsSEMINARGROOT
 
Attention Is All You Need
Attention Is All You NeedAttention Is All You Need
Attention Is All You NeedSEMINARGROOT
 
WWW 2020 XAI Tutorial Review
WWW 2020 XAI Tutorial ReviewWWW 2020 XAI Tutorial Review
WWW 2020 XAI Tutorial ReviewSEMINARGROOT
 
Coding test review 2
Coding test review 2Coding test review 2
Coding test review 2SEMINARGROOT
 
Locality sensitive hashing
Locality sensitive hashingLocality sensitive hashing
Locality sensitive hashingSEMINARGROOT
 
Coding Test Review1
Coding Test Review1Coding Test Review1
Coding Test Review1SEMINARGROOT
 

More from SEMINARGROOT (20)

Metric based meta_learning
Metric based meta_learningMetric based meta_learning
Metric based meta_learning
 
Sampling method : MCMC
Sampling method : MCMCSampling method : MCMC
Sampling method : MCMC
 
Demystifying Neural Style Transfer
Demystifying Neural Style TransferDemystifying Neural Style Transfer
Demystifying Neural Style Transfer
 
Towards Deep Learning Models Resistant to Adversarial Attacks.
Towards Deep Learning Models Resistant to Adversarial Attacks.Towards Deep Learning Models Resistant to Adversarial Attacks.
Towards Deep Learning Models Resistant to Adversarial Attacks.
 
The ways of node embedding
The ways of node embeddingThe ways of node embedding
The ways of node embedding
 
Graph Convolutional Network
Graph  Convolutional NetworkGraph  Convolutional Network
Graph Convolutional Network
 
Denoising With Frequency Domain
Denoising With Frequency DomainDenoising With Frequency Domain
Denoising With Frequency Domain
 
Bayesian Statistics
Bayesian StatisticsBayesian Statistics
Bayesian Statistics
 
Coding Test Review 3
Coding Test Review 3Coding Test Review 3
Coding Test Review 3
 
Time Series Analysis - ARMA
Time Series Analysis - ARMATime Series Analysis - ARMA
Time Series Analysis - ARMA
 
Differential Geometry for Machine Learning
Differential Geometry for Machine LearningDifferential Geometry for Machine Learning
Differential Geometry for Machine Learning
 
Generative models : VAE and GAN
Generative models : VAE and GANGenerative models : VAE and GAN
Generative models : VAE and GAN
 
Effective Python
Effective PythonEffective Python
Effective Python
 
Understanding Blackbox Prediction via Influence Functions
Understanding Blackbox Prediction via Influence FunctionsUnderstanding Blackbox Prediction via Influence Functions
Understanding Blackbox Prediction via Influence Functions
 
Attention Is All You Need
Attention Is All You NeedAttention Is All You Need
Attention Is All You Need
 
Attention
AttentionAttention
Attention
 
WWW 2020 XAI Tutorial Review
WWW 2020 XAI Tutorial ReviewWWW 2020 XAI Tutorial Review
WWW 2020 XAI Tutorial Review
 
Coding test review 2
Coding test review 2Coding test review 2
Coding test review 2
 
Locality sensitive hashing
Locality sensitive hashingLocality sensitive hashing
Locality sensitive hashing
 
Coding Test Review1
Coding Test Review1Coding Test Review1
Coding Test Review1
 

Recently uploaded

result management system report for college project
result management system report for college projectresult management system report for college project
result management system report for college projectTonystark477637
 
Call Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service Nashik
Call Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service NashikCall Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service Nashik
Call Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service NashikCall Girls in Nagpur High Profile
 
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...ranjana rawat
 
(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service
(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service
(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Serviceranjana rawat
 
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINEMANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINESIVASHANKAR N
 
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...ranjana rawat
 
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...Christo Ananth
 
SPICE PARK APR2024 ( 6,793 SPICE Models )
SPICE PARK APR2024 ( 6,793 SPICE Models )SPICE PARK APR2024 ( 6,793 SPICE Models )
SPICE PARK APR2024 ( 6,793 SPICE Models )Tsuyoshi Horigome
 
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130Suhani Kapoor
 
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...Dr.Costas Sachpazis
 
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur EscortsHigh Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escortsranjana rawat
 
HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICS
HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICSHARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICS
HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICSRajkumarAkumalla
 
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escortsranjana rawat
 
Extrusion Processes and Their Limitations
Extrusion Processes and Their LimitationsExtrusion Processes and Their Limitations
Extrusion Processes and Their Limitations120cr0395
 
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLSMANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLSSIVASHANKAR N
 
Software Development Life Cycle By Team Orange (Dept. of Pharmacy)
Software Development Life Cycle By  Team Orange (Dept. of Pharmacy)Software Development Life Cycle By  Team Orange (Dept. of Pharmacy)
Software Development Life Cycle By Team Orange (Dept. of Pharmacy)Suman Mia
 
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...ranjana rawat
 
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...Dr.Costas Sachpazis
 
Processing & Properties of Floor and Wall Tiles.pptx
Processing & Properties of Floor and Wall Tiles.pptxProcessing & Properties of Floor and Wall Tiles.pptx
Processing & Properties of Floor and Wall Tiles.pptxpranjaldaimarysona
 
Coefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptxCoefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptxAsutosh Ranjan
 

Recently uploaded (20)

result management system report for college project
result management system report for college projectresult management system report for college project
result management system report for college project
 
Call Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service Nashik
Call Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service NashikCall Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service Nashik
Call Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service Nashik
 
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
 
(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service
(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service
(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service
 
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINEMANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
 
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
 
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
 
SPICE PARK APR2024 ( 6,793 SPICE Models )
SPICE PARK APR2024 ( 6,793 SPICE Models )SPICE PARK APR2024 ( 6,793 SPICE Models )
SPICE PARK APR2024 ( 6,793 SPICE Models )
 
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
 
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
 
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur EscortsHigh Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
 
HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICS
HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICSHARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICS
HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICS
 
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
 
Extrusion Processes and Their Limitations
Extrusion Processes and Their LimitationsExtrusion Processes and Their Limitations
Extrusion Processes and Their Limitations
 
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLSMANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
 
Software Development Life Cycle By Team Orange (Dept. of Pharmacy)
Software Development Life Cycle By  Team Orange (Dept. of Pharmacy)Software Development Life Cycle By  Team Orange (Dept. of Pharmacy)
Software Development Life Cycle By Team Orange (Dept. of Pharmacy)
 
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
 
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
 
Processing & Properties of Floor and Wall Tiles.pptx
Processing & Properties of Floor and Wall Tiles.pptxProcessing & Properties of Floor and Wall Tiles.pptx
Processing & Properties of Floor and Wall Tiles.pptx
 
Coefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptxCoefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptx
 

SVM (Support Vector Machine & Kernel)

  • 1. SVM & Kernel Trick Kyuyong Shin Naver Clova A.I Research (CLAIR) Biz AI
  • 2. SVM +- - + ഥ𝑤 ത𝑢 ഥ𝑤 ∙ ത𝑢 ≥ 𝑐 ഥ𝑤 ∙ ത𝑢 + 𝑏 ≥ 0, 𝑇ℎ𝑒𝑛 + Decision Rule (𝑐 = −𝑏)
  • 3. How to compute ഥ𝑤? ഥ𝑤 ∙ 𝑥+ + 𝑏 ≥ 1 ഥ𝑤 ∙ 𝑥− + 𝑏 ≤ −1 𝑦𝑖(ഥ𝑤 ∙ ഥ𝑥𝑖 + 𝑏) ≥ 1 𝑦𝑖(ഥ𝑤 ∙ ഥ𝑥𝑖 + 𝑏) ≥ 1 𝑦𝑖(ഥ𝑤 ∙ ഥ𝑥𝑖 + 𝑏) − 1 ≥ 0 𝑦𝑖(ഥ𝑤 ∙ ഥ𝑥𝑖 + 𝑏) − 1 = 0 𝑦𝑖 such that. 𝑦𝑖=+1 for + samples 𝑦𝑖=-1 for - samples For 𝑥𝑖 in gutter
  • 4. How to compute ഥ𝑤? +- - + 𝑥+ − 𝑥− 𝑥+ 𝑥− MAX 2 ഥw → MIN ഥw → MIN 1 2 ഥW 2 Width = 𝑥+ − 𝑥− ∙ ഥw ഥw = 2 ഥw 1 − b 1 + b
  • 5. How to minimize it? 𝑐𝑜𝑛𝑠𝑡𝑟𝑎𝑖𝑛𝑡𝑠 ∶ 𝑦𝑖 (ഥ𝑤 ∙ ഥ𝑥𝑖 + 𝑏) − 1 ≥ 0 If the constraints are equality, we call it equality constraint problem and we solve it by Lagrangian multiplier. The method of Lagrange multipliers is generalized by the Karush– Kuhn–Tucker conditions, which can also take into account inequality constraints.
  • 6. How to minimize it? Karush–Kuhn–Tucker conditions Conditions on primal Kuhn–Tucker conditions, are first derivative tests (sometimes called first-order necessary conditions) for a solution in nonlinear programming to be optimal, provided that some regularity conditions are satisfied (i.e KKT-condtions derive from the relationship between primal and dual when some regularity conditions are satisfied).
  • 7. Duality We want to minimize the primal problem through maximize the dual problem. Due to Slater’s condition, the difference between the primal and dual solutions (duality gap) equal to zero. If the optimal duality gap is zero, then we say that strong duality holds.
  • 8. How to minimize it? In order for a minimum point 𝑥∗ satisfy the above KKT conditions, the problem should satisfy some regularity conditions; some common examples are tabulated here: Slater's condition
  • 9. How to minimize it? For a problem with strong duality 𝑥∗ and 𝑢∗ , 𝑣∗ are primal and dual solutions ↔ 𝑥∗ and 𝑢∗ , 𝑣∗ satisfy KKT-conditions
  • 10. How to minimize it? We will find 𝑠𝑜𝑙𝑢𝑡𝑖𝑜𝑛𝑠 that satisfy KKT-conditions, because SVM satisfy Slater’s condition, 𝑜𝑢𝑟 𝑠𝑜𝑙𝑢𝑡𝑖𝑜𝑛𝑠 will be solutions of primal and dual
  • 11. How to minimize it? If primal is convex problem, then it satisfies Slater’s condition. Fortunately, SVM satisfy Slater's condition and thus a strong duality hold. The strong duality leads us to KKT-condition. (if 𝑥∗ satisfy KKT-condition, then 𝑥∗ will be solution of primal) Let ℎ𝑖 𝑥∗ = 𝑦𝑖 (ഥ𝑤 ∙ ഥ𝑥𝑖 + 𝑏) − 1, then important conclusion of satisfying KKT-condition is that σ 𝛼𝑖 ∗ ℎ𝑖 𝑥∗ = 0. Since each term in this sum is nonpositive, we conclude that 𝛼𝑖 ∗ ℎ𝑖 𝑥∗ = 0 for all 𝑖.
  • 12. 𝐿 = 1 2 ഥW 2 − ෍ 𝛼𝑖[𝑦𝑖 (ഥ𝑤 ∙ ഥ𝑥𝑖 + 𝑏) − 1] 𝜕𝐿 𝜕 ഥW = ഥ𝑤 − ෍ 𝛼𝑖 𝑦𝑖 ഥ𝑥𝑖 = 0 ⇒ ഥ𝑤 = ෍ 𝛼𝑖 𝑦𝑖 ഥ𝑥𝑖 𝜕𝐿 𝜕𝑏 = − ෍ 𝛼𝑖 𝑦𝑖 = 0 ⇒ ෍ 𝛼𝑖 𝑦𝑖 = 0 𝑐𝑜𝑛𝑠𝑡𝑟𝑎𝑖𝑛𝑡𝑠 ∶ 𝑦𝑖 (ഥ𝑤 ∙ ഥ𝑥𝑖 + 𝑏) − 1 ≥ 0 Lagrange multiplier
  • 13. Therefore, ഥ𝑤 is the linear sum of several ഥ𝑥𝑖. To say ‘several' is not included in the sum in the case of 𝛼𝑖 =0. This means that the algorithm is designed so that the value of α is zero for samples( ഥ𝑥𝑖) that do not affect classification. 𝛼𝑖[𝑦𝑖 (ഥ𝑤 ∙ ഥ𝑥𝑖 + 𝑏) − 1] = 0 complementary slackness by KKT-conditions If 𝛼𝑖 is non-zero, then ഥ𝑥𝑖 is a support vector (which lies on the gutter).
  • 14. 𝐿 = 1 2 σ 𝛼𝑖 𝑦𝑖 ഥ𝑥𝑖 ∙ σ 𝛼𝑖 𝑦𝑖 ഥ𝑥𝑖 − σ 𝛼𝑖 𝑦𝑖 ഥ𝑥𝑖 ∙ σ 𝛼𝑖 𝑦𝑖 ഥ𝑥𝑖 − σ 𝛼𝑖 𝑦𝑖 𝑏 + σ 𝛼𝑖 ∴ 𝐿 = σ 𝛼𝑖 − 1 2 σ σ 𝛼𝑖 𝛼𝑗 𝑦𝑖 𝑦𝑗 ഥ𝑥𝑖 ∙ ഥ𝑥𝑗 ഥ𝑥𝑖 ∙ ഥ𝑥𝑗 Maximization will depend only on the dot product of pairs of support vector!!
  • 15. σ 𝛼𝑖 𝑦𝑖 ഥ𝑥𝑖 ∙ 𝑢 + 𝑏 > 0, Then + ഥ𝑥𝑖 ∙ 𝑢 Decision Rule only depends on the dot product of the support vectors and the unknown sample. Thanks to this property, we can conduct magical kernel trick. Kernel Tricks 𝐾 𝑥𝑖, 𝑢 = ∅(𝑥𝑖) ∙ ∅(𝑢) 1) Linear kernel 𝐾 𝑥𝑖, 𝑢 = (𝑥𝑖 ∙ 𝑢 + 1) 𝑛 , n=1 2) Non-Linear kernel RBF 𝐾 𝑥𝑖, 𝑢 = 𝑒−𝛾(𝑥 𝑖−𝑢)2
  • 16. Logit calculates the 'distance' or 'error' respect to the decision boundary based on each data point, while SVM calculates the distance from the decision boundary based on the data divided into 'groups' of 0/1. Although SVM is mathematically more elegant, they are inevitably very vulnerable to outliers. A typical SVM ignores values other than support vectors, and by default hinge loss is less sensitive than logistic loss, but can SVM be considered more vulnerable to outlier than Logit? What if the outlier defines the support vector? How is SVM different from Logit?
  • 18. σ 𝛼𝑖 𝑦𝑖 ഥ𝑥𝑖 ∙ 𝑢 + 𝑏 > 0, Then + ഥ𝑥𝑖 ∙ 𝑢 Decision Rule only depends on the dot product of the support vectors and the unknown sample. Thanks to this property, we can conduct magical kernel trick. Kernel Tricks 𝐾 𝑥𝑖, 𝑢 = ∅(𝑥𝑖) ∙ ∅(𝑢) 1) Linear kernel 𝐾 𝑥𝑖, 𝑢 = (𝑥𝑖 ∙ 𝑢 + 1) 𝑛 , n=1 2) Non-Linear kernel RBF 𝐾 𝑥𝑖, 𝑢 = 𝑒−𝛾(𝑥 𝑖−𝑢)2
  • 21. Kernel Tricks 𝐾 𝑥𝑖, 𝑢 = ∅(𝑥𝑖) ∙ ∅(𝑢) In the case of Gaussian Kernel, the same effect as the embedding to infinite dimensions. (by Taylor expansion) Kernel Trick: In effect, the dimension expansion to high dimension leads to an increase in computational capacity, so the Kernel method is used to achieve the same effect even if it does not actually scale up the dimension.
  • 22. More on Kernels A mathematical definition of Kernel is very simple, if there is a mapping function ∅ on Hilbert space, it can be defined immediately. But the problem is ∅. How do we find ∅ on Hilbert Space or how do we know that there exist ∅ −> By Mercer’s Theorem, if Kernel K satisfies certain conditions (symmetric, PD, etc.) then K can be decomposed by eigenfunction and eigenvalue. That is, if the K satisfies the above conditions, ∅ always exists, so we can create a Kernel even if we do not define ∅ explicitly .
  • 23. More on Kernels The above equations indicates that if Kernel K is positive semi definite, it can be decomposed to eigenfunctions and eigenvalues. *It is an infinite summation formula. Remember when we proved the Fourier series! it is just an approximation if not infinite (important)
  • 24. RKHS
  • 25. RKHS Informal definition of RKHS : Hilbert Space with Reproducing Kernel is RKHS. Formal definition of RKHS Functional 𝐿 𝑥 = <∙, 𝑘 ∙, 𝑥 > 𝐿 𝑥 𝑓 = 𝑓 𝑥 In other words, if we can reproduce putting function 𝑓 into evaluation functional 𝐿 𝑥 by a dot product of any vector 𝑘 and 𝑓, then reproducing kernel property is satisfied.
  • 27. RKHS WHY IS RKHS GOOD? The evaluation of 𝒇 in 𝒙 is expressed as eigenfunction ∅ of 𝒌𝒆𝒓𝒏𝒆𝒍.
  • 28. RKHS Another view on RKHS: Moore-Aronszajn Theorem-> The RKHS itself is in line with the Kernel Function existence condition. In other words, if we define Kernel K that satisfies some conditions, it is the only Kernel that define specific reproducing kernel Hilbert space H which respond to Kernel K. The shape of f is affected by kernel, because the new function f is also decomposed by eigenfunction of kernel. e.g) GPR A new function f similar to Gaussian kernel shape is created Thus using the kernel method is fitting our data to the kernel function, so choosing the proper kernel is what we must focus in kernel method. https://www.edwith.org/bayesiandeeplearning/lecture/24684/ https://patternsofideas.wordpress.com/2016/12/12/mercers-theorem-and-svms/ https://bi.snu.ac.kr/Publications/Conferences/Domestic/KCC2005_LeeSK.pdf https://stats.stackexchange.com/questions/268429/do-gaussian-process-regression-have-the-universal-approximation- property http://iera.name/a-story-of-basis-and-kernel-part-i-function-basis/ https://youtu.be/7JRwjCpKewQ
  • 30. References • Strongly Based on MIT 6.034 Artificial Intelligence, Fall 2010 Instructor: Patrick Winston video link: https://youtu.be/_PwhiWxHK8o • Trevor Hastie et al. The Elements of Statistical Learning (2001) • Machine Learning Lecture 26 "Gaussian Processes" -Cornell CS4780 SP17 by Kilian Weinberger -video link: https://www.youtube.com/watch?v=R-NUdqxKjos&t=1000s • 9.520/6.860S Statistical Learning Theory by Lorenzo Rosasco http://www.mit.edu/~9.520/fall14/slides/class03/class03_rkhsPart1.pdf -video link: https://www.youtube.com/watch?v=9-oxo_k69qs • Bayesian Deep Learning by Sungjoon Choi -video link: https://www.edwith.org/bayesiandeeplearning/joinLectures/14426