SlideShare a Scribd company logo
1 of 30
Digg Data
Support Vector Machine
Ankit Sharma
www.diggdata.in
without tears
Digg Data
Content
SVM and its application
Basic SVM
•Hyperplane
•Understanding of basics
•Optimization
Soft margin SVM
Non-linear decision boundary
SVMs in “loss + penalty” form
Kernel method
•Gaussian kernel
SVM usage beyond classification
Thursday, August 7, 2014 WITHOUT TEARS SERIES | www.diggdata.in 2
Digg Data
• In machine learning, support vector machines are supervised
learning models with associated learning algorithms that analyze
data and recognize patterns, used for classification and regression
analysis.
• Properties of SVM :
Duality
Kernels
Margin
Convexity
Sparseness
SVM : Support Vector Machine
Thursday, August 7, 2014 WITHOUT TEARS SERIES | www.diggdata.in 3
Digg Data
Time Series
analysis
Classification
Anomaly
detection
Regression
Machine
Vision
Text
categorization
Application of SVM
Thursday, August 7, 2014 WITHOUT TEARS SERIES | www.diggdata.in 4
Digg Data
Basic concept of SVM
Find a linear decision surface (“hyperplane”) that can separate classes and has the largest
distance (i.e., largest “gap” or “margin”) between border-line patients (i.e., “support vectors”)
Thursday, August 7, 2014 WITHOUT TEARS SERIES | www.diggdata.in 5
Digg Data
Hyperplane as a Decision boundary
• A hyperplane is a linear decision surface that splits the space into two parts;
• It is obvious that a hyperplane is a binary classifier
Thursday, August 7, 2014 WITHOUT TEARS SERIES | www.diggdata.in 6
Digg Data
Equation of a hyperplane
An equation of a hyperplane is defined by a
point (P0) and a perpendicular vector to the
plane (𝑤) at that point.
Thursday, August 7, 2014 WITHOUT TEARS SERIES | www.diggdata.in 7
Digg Data
• g(x) is a linear function:
x1
x2
w x + b < 0
w x + b > 0
 A hyper-plane in the feature
space
 (Unit-length) normal vector of
the hyper-plane:

w
n
w
n
Understanding the basics
Thursday, August 7, 2014 WITHOUT TEARS SERIES | www.diggdata.in 8
Digg Data
x1
x2How to classify these points using
a linear discriminant function in
order to minimize the error rate?
 Infinite number of answers!
 Which one is the best?
Understanding the basics
denotes +1
denotes -1Thursday, August 7, 2014 WITHOUT TEARS SERIES | www.diggdata.in 9
Digg Data
• The linear discriminant
function (classifier) with the
maximum margin is the best
“safe zone”
 Margin is defined as the width
that the boundary could be
increased by before hitting a
data point
 Why it is the best?
Robust to outliners and thus
strong generalization ability
Margin
x1
x2
Understanding the basics
denotes +1
denotes -1Thursday, August 7, 2014 WITHOUT TEARS SERIES | www.diggdata.in 10
Digg Data
• Given a set of data points:
 With a scale transformation on
both w and b, the above is
equivalent to
x1
x2
{( , )}, 1,2, ,i iy i nx , where
𝑭𝒐𝒓 𝒚𝒊 = +𝟏, 𝑾 𝑿𝒊 + 𝒃> 0
𝑭𝒐𝒓 𝒚𝒊 = −𝟏, 𝑾 𝑿𝒊 + 𝒃 < 𝟎
𝑭𝒐𝒓 𝒚𝒊 = +𝟏, 𝑾 𝑿𝒊 + 𝒃> +1
𝑭𝒐𝒓 𝒚𝒊 = −𝟏, 𝑾 𝑿𝒊 + 𝒃 < -1
Understanding the basics
denotes +1
denotes -1Thursday, August 7, 2014 WITHOUT TEARS SERIES | www.diggdata.in 11
Digg Data
• We know that
 The margin width is:
x1
x2
Margin
x+
x+
x-
( )
2
( )
M  
 
  
   
x x n
w
x x
w w
n
Support Vectors
𝑾 𝑿+ + 𝒃 = +𝟏
𝑾 𝑿− + 𝒃 = −𝟏
Understanding the basics
denotes +1
denotes -1Thursday, August 7, 2014 WITHOUT TEARS SERIES | www.diggdata.in 12
Digg Data
• Formulation:
x1
x2
Margin
x+
x+
x-
n
such that
2
maximize
w
𝑭𝒐𝒓 𝒚𝒊 = +𝟏, 𝑾 𝑿𝒊 + 𝒃> +1
𝑭𝒐𝒓 𝒚𝒊 = −𝟏, 𝑾 𝑿𝒊 + 𝒃 < -1
Understanding the basics
denotes +1
denotes -1Thursday, August 7, 2014 WITHOUT TEARS SERIES | www.diggdata.in 13
Digg Data
• Formulation:
x1
x2
Margin
x+
x+
x-
n
21
minimize
2
w
such that
𝑭𝒐𝒓 𝒚𝒊 = +𝟏, 𝑾 𝑿𝒊 + 𝒃> +1
𝑭𝒐𝒓 𝒚𝒊 = −𝟏, 𝑾 𝑿𝒊 + 𝒃 < -1
Understanding the basics
denotes +1
denotes -1Thursday, August 7, 2014 WITHOUT TEARS SERIES | www.diggdata.in 14
Digg Data
• Formulation:
x1
x2
Margin
x+
x+
x-
n
21
minimize
2
w
such that
𝐲𝐢 𝐖 𝐗 + 𝐛 ≥ 𝟏
Understanding the basics
denotes +1
denotes -1Thursday, August 7, 2014 WITHOUT TEARS SERIES | www.diggdata.in 15
Digg Data
Basics of optimization: Convex functions
• A function is called convex if the function lies below the straight line
segment connecting two points, for any two points in the interval.
• Property: Any local minimum is a global minimum!
Thursday, August 7, 2014 WITHOUT TEARS SERIES | www.diggdata.in 16
Digg Data
Basics of optimization: Quadratic programming
• Quadratic programming (QP) is a special optimization problem: the function to
optimize (“objective”) is quadratic, subject to linear constraints.
• Convex QP problems have convex objective functions.
• These problems can be solved easily and efficiently by greedy algorithms (because
every local minimum is a global minimum).
Thursday, August 7, 2014 WITHOUT TEARS SERIES | www.diggdata.in 17
Digg Data
SVM optimization problem: Primal formulation
• This is called “primal formulation of linear SVMs”
• It is a convex quadratic programming (QP) optimization problem with n
variables (wi, i= 1,…,n), where n is the number of features in the dataset.
Thursday, August 7, 2014 WITHOUT TEARS SERIES | www.diggdata.in 18
Digg Data
SVM optimization problem: Dual formulation
• The previous problem can be recast in the so-called “dual form” giving rise to
“dual formulation of linear SVMs”.
• Apply the method of Lagrange multipliers.
• We need to minimize this Lagrangian with respect to and simultaneously require
that the derivative with respect to vanishes , all subject to the constraints that
αi > 0
Thursday, August 7, 2014 WITHOUT TEARS SERIES | www.diggdata.in 19
Digg Data
SVM optimization problem: Dual formulation
Cond…
It is also a convex quadratic programming problem but with N variables (αi, i= 1,…,N), where N is
the number of samples.
Thursday, August 7, 2014 WITHOUT TEARS SERIES | www.diggdata.in 20
Digg Data
SVM optimization problem: Benefits of using
dual formulation
1) No need to access original data, need to access only dot products.
2) Number of free parameters is bounded by the number of support vectors
and not by the number of variables (beneficial for high-dimensional
problems).
Thursday, August 7, 2014 WITHOUT TEARS SERIES | www.diggdata.in 21
Digg Data
Non linearly separable data: “Soft-margin” linear SVM
Assign a “slack variable” to each instance ,
ξi > 0 which can be thought of distance from the
separating hyperplane if an instance is misclassified
and 0 otherwise.
Primal formulation:
Dual formulation:
• When C is very large, the soft-margin SVM is equivalent
to hard-margin SVM;
• When C is very small, we admit misclassifications in the
training data at the expense of having w-vector with
small norm;
• C has to be selected for the distribution at hand as it will
be discussed later in this tutorial.
Thursday, August 7, 2014 WITHOUT TEARS SERIES | www.diggdata.in 22
Digg Data
SVMs in “loss + penalty” form
• Many statistical learning algorithms (including SVMs) search for a decision function by solving the
following optimization problem:
Minimize (Loss+ λ Penalty)
– Loss measures error of fitting the data
– Penalty penalizes complexity of the learned function
– λ is regularization parameter that balances Loss and Penalty
• Overfitting → Poor generalization
Can also be stated as
Thursday, August 7, 2014 WITHOUT TEARS SERIES | www.diggdata.in 23
Digg Data
Nonlinear decision boundary
Non Linear
Decision
Boundary
Kernel
method
Thursday, August 7, 2014 WITHOUT TEARS SERIES | www.diggdata.in 24
Digg Data
Kernel method
• Kernel methods involve
– Nonlinear transformation of data to a higher dimensional feature space induced by a Mercer kernel
– Detection of optimal linear solutions in the kernel feature space
• Transformation to a higher dimensional space is expected to be helpful in conversion of nonlinear relations
into linear relations (Cover’s theorem)
– Nonlinearly separable patterns to linearly separable patterns
– Nonlinear regression to linear regression
– Nonlinear separation of clusters to linear separation of clusters
• Pattern analysis methods are implemented in such a way that the kernel feature space representation is
not explicitly required. They involve computation of pair-wise inner-products only.
• The pair-wise inner-products are computed efficiently directly from the original representation of data
using a kernel function (Kernel trick)
Thursday, August 7, 2014 WITHOUT TEARS SERIES | www.diggdata.in 25
Digg Data
Kernel trick
Not every function RN×RN -> R can be a valid kernel; it has to satisfy so-called Mercer conditions.
Otherwise, the underlying quadratic program may not be solvable.
Thursday, August 7, 2014 WITHOUT TEARS SERIES | www.diggdata.in 26
Digg Data
Popular kernels
Thursday, August 7, 2014 WITHOUT TEARS SERIES | www.diggdata.in 27
Digg Data
Gaussian kernel
Consider the Gaussian kernel:
Geometrically, this is a “bump” or “cavity”
centered at the training data point 𝑥j :
The resulting mapping function is a
combination of bumps and cavities.
Thursday, August 7, 2014 WITHOUT TEARS SERIES | www.diggdata.in 28
Digg Data
SVM usage beyond classification
Regression analysis
(ε-Support vector
regression)
Anomaly detection
(One-class SVM)
Clustering analysis
(Support Vector
Domain
Description)
Thursday, August 7, 2014 WITHOUT TEARS SERIES | www.diggdata.in 29
Digg Data
Thank you
Thursday, August 7, 2014 WITHOUT TEARS SERIES | www.diggdata.in 30

More Related Content

What's hot

Decision Tree - C4.5&CART
Decision Tree - C4.5&CARTDecision Tree - C4.5&CART
Decision Tree - C4.5&CARTXueping Peng
 
Neural Networks: Multilayer Perceptron
Neural Networks: Multilayer PerceptronNeural Networks: Multilayer Perceptron
Neural Networks: Multilayer PerceptronMostafa G. M. Mostafa
 
Optimization in Deep Learning
Optimization in Deep LearningOptimization in Deep Learning
Optimization in Deep LearningYan Xu
 
Backpropagation And Gradient Descent In Neural Networks | Neural Network Tuto...
Backpropagation And Gradient Descent In Neural Networks | Neural Network Tuto...Backpropagation And Gradient Descent In Neural Networks | Neural Network Tuto...
Backpropagation And Gradient Descent In Neural Networks | Neural Network Tuto...Simplilearn
 
Introduction to Graph Neural Networks: Basics and Applications - Katsuhiko Is...
Introduction to Graph Neural Networks: Basics and Applications - Katsuhiko Is...Introduction to Graph Neural Networks: Basics and Applications - Katsuhiko Is...
Introduction to Graph Neural Networks: Basics and Applications - Katsuhiko Is...Preferred Networks
 
Convolutional Neural Network Models - Deep Learning
Convolutional Neural Network Models - Deep LearningConvolutional Neural Network Models - Deep Learning
Convolutional Neural Network Models - Deep LearningMohamed Loey
 
Overview on Optimization algorithms in Deep Learning
Overview on Optimization algorithms in Deep LearningOverview on Optimization algorithms in Deep Learning
Overview on Optimization algorithms in Deep LearningKhang Pham
 
Support Vector Machine - How Support Vector Machine works | SVM in Machine Le...
Support Vector Machine - How Support Vector Machine works | SVM in Machine Le...Support Vector Machine - How Support Vector Machine works | SVM in Machine Le...
Support Vector Machine - How Support Vector Machine works | SVM in Machine Le...Simplilearn
 
Linear models for classification
Linear models for classificationLinear models for classification
Linear models for classificationSung Yub Kim
 
Naive Bayes Classifier
Naive Bayes ClassifierNaive Bayes Classifier
Naive Bayes ClassifierYiqun Hu
 
Introduction to CNN
Introduction to CNNIntroduction to CNN
Introduction to CNNShuai Zhang
 
Recurrent Neural Networks. Part 1: Theory
Recurrent Neural Networks. Part 1: TheoryRecurrent Neural Networks. Part 1: Theory
Recurrent Neural Networks. Part 1: TheoryAndrii Gakhov
 
Support vector machines (svm)
Support vector machines (svm)Support vector machines (svm)
Support vector machines (svm)Sharayu Patil
 
Bayesian Neural Networks
Bayesian Neural NetworksBayesian Neural Networks
Bayesian Neural NetworksNatan Katz
 
Machine Learning lecture2(linear regression)
Machine Learning lecture2(linear regression)Machine Learning lecture2(linear regression)
Machine Learning lecture2(linear regression)cairo university
 
K-Nearest Neighbor Classifier
K-Nearest Neighbor ClassifierK-Nearest Neighbor Classifier
K-Nearest Neighbor ClassifierNeha Kulkarni
 

What's hot (20)

Decision Tree - C4.5&CART
Decision Tree - C4.5&CARTDecision Tree - C4.5&CART
Decision Tree - C4.5&CART
 
Neural Networks: Multilayer Perceptron
Neural Networks: Multilayer PerceptronNeural Networks: Multilayer Perceptron
Neural Networks: Multilayer Perceptron
 
Optimization in Deep Learning
Optimization in Deep LearningOptimization in Deep Learning
Optimization in Deep Learning
 
Backpropagation And Gradient Descent In Neural Networks | Neural Network Tuto...
Backpropagation And Gradient Descent In Neural Networks | Neural Network Tuto...Backpropagation And Gradient Descent In Neural Networks | Neural Network Tuto...
Backpropagation And Gradient Descent In Neural Networks | Neural Network Tuto...
 
Introduction to Graph Neural Networks: Basics and Applications - Katsuhiko Is...
Introduction to Graph Neural Networks: Basics and Applications - Katsuhiko Is...Introduction to Graph Neural Networks: Basics and Applications - Katsuhiko Is...
Introduction to Graph Neural Networks: Basics and Applications - Katsuhiko Is...
 
Convolutional Neural Network Models - Deep Learning
Convolutional Neural Network Models - Deep LearningConvolutional Neural Network Models - Deep Learning
Convolutional Neural Network Models - Deep Learning
 
Overview on Optimization algorithms in Deep Learning
Overview on Optimization algorithms in Deep LearningOverview on Optimization algorithms in Deep Learning
Overview on Optimization algorithms in Deep Learning
 
K means Clustering Algorithm
K means Clustering AlgorithmK means Clustering Algorithm
K means Clustering Algorithm
 
Support Vector Machine - How Support Vector Machine works | SVM in Machine Le...
Support Vector Machine - How Support Vector Machine works | SVM in Machine Le...Support Vector Machine - How Support Vector Machine works | SVM in Machine Le...
Support Vector Machine - How Support Vector Machine works | SVM in Machine Le...
 
Linear models for classification
Linear models for classificationLinear models for classification
Linear models for classification
 
Naive Bayes Classifier
Naive Bayes ClassifierNaive Bayes Classifier
Naive Bayes Classifier
 
Introduction to CNN
Introduction to CNNIntroduction to CNN
Introduction to CNN
 
K Nearest Neighbors
K Nearest NeighborsK Nearest Neighbors
K Nearest Neighbors
 
Recurrent Neural Networks. Part 1: Theory
Recurrent Neural Networks. Part 1: TheoryRecurrent Neural Networks. Part 1: Theory
Recurrent Neural Networks. Part 1: Theory
 
Support vector machines (svm)
Support vector machines (svm)Support vector machines (svm)
Support vector machines (svm)
 
neural networks
 neural networks neural networks
neural networks
 
Bayesian Neural Networks
Bayesian Neural NetworksBayesian Neural Networks
Bayesian Neural Networks
 
Machine Learning lecture2(linear regression)
Machine Learning lecture2(linear regression)Machine Learning lecture2(linear regression)
Machine Learning lecture2(linear regression)
 
04 Multi-layer Feedforward Networks
04 Multi-layer Feedforward Networks04 Multi-layer Feedforward Networks
04 Multi-layer Feedforward Networks
 
K-Nearest Neighbor Classifier
K-Nearest Neighbor ClassifierK-Nearest Neighbor Classifier
K-Nearest Neighbor Classifier
 

Similar to Support Vector Machine without tears

background.pptx
background.pptxbackground.pptx
background.pptxKabileshCm
 
Structured Forests for Fast Edge Detection [Paper Presentation]
Structured Forests for Fast Edge Detection [Paper Presentation]Structured Forests for Fast Edge Detection [Paper Presentation]
Structured Forests for Fast Edge Detection [Paper Presentation]Mohammad Shaker
 
OM-DS-Fall2022-Session10-Support vector machine.pdf
OM-DS-Fall2022-Session10-Support vector machine.pdfOM-DS-Fall2022-Session10-Support vector machine.pdf
OM-DS-Fall2022-Session10-Support vector machine.pdfssuserb016ab
 
How Machine Learning Helps Organizations to Work More Efficiently?
How Machine Learning Helps Organizations to Work More Efficiently?How Machine Learning Helps Organizations to Work More Efficiently?
How Machine Learning Helps Organizations to Work More Efficiently?Tuan Yang
 
Support Vector Machines USING MACHINE LEARNING HOW IT WORKS
Support Vector Machines USING MACHINE LEARNING HOW IT WORKSSupport Vector Machines USING MACHINE LEARNING HOW IT WORKS
Support Vector Machines USING MACHINE LEARNING HOW IT WORKSrajalakshmi5921
 
Machine Learning workshop by GDSC Amity University Chhattisgarh
Machine Learning workshop by GDSC Amity University ChhattisgarhMachine Learning workshop by GDSC Amity University Chhattisgarh
Machine Learning workshop by GDSC Amity University ChhattisgarhPoorabpatel
 
Computer Vision for Beginners
Computer Vision for BeginnersComputer Vision for Beginners
Computer Vision for BeginnersSanghamitra Deb
 
Machine learning, biomarker accuracy and best practices
Machine learning, biomarker accuracy and best practicesMachine learning, biomarker accuracy and best practices
Machine learning, biomarker accuracy and best practicesPradeep Redddy Raamana
 
Machine Learning Foundations for Professional Managers
Machine Learning Foundations for Professional ManagersMachine Learning Foundations for Professional Managers
Machine Learning Foundations for Professional ManagersAlbert Y. C. Chen
 
Regression vs Deep Neural net vs SVM
Regression vs Deep Neural net vs SVMRegression vs Deep Neural net vs SVM
Regression vs Deep Neural net vs SVMRatul Alahy
 
Throttling Malware Families in 2D
Throttling Malware Families in 2DThrottling Malware Families in 2D
Throttling Malware Families in 2DMohamed Nassar
 
Talk@rmit 09112017
Talk@rmit 09112017Talk@rmit 09112017
Talk@rmit 09112017Shuai Zhang
 
Data Science - Part IX - Support Vector Machine
Data Science - Part IX -  Support Vector MachineData Science - Part IX -  Support Vector Machine
Data Science - Part IX - Support Vector MachineDerek Kane
 

Similar to Support Vector Machine without tears (20)

background.pptx
background.pptxbackground.pptx
background.pptx
 
Support vector machines
Support vector machinesSupport vector machines
Support vector machines
 
Structured Forests for Fast Edge Detection [Paper Presentation]
Structured Forests for Fast Edge Detection [Paper Presentation]Structured Forests for Fast Edge Detection [Paper Presentation]
Structured Forests for Fast Edge Detection [Paper Presentation]
 
lec10svm.ppt
lec10svm.pptlec10svm.ppt
lec10svm.ppt
 
Svm ms
Svm msSvm ms
Svm ms
 
lec10svm.ppt
lec10svm.pptlec10svm.ppt
lec10svm.ppt
 
lec10svm.ppt
lec10svm.pptlec10svm.ppt
lec10svm.ppt
 
OM-DS-Fall2022-Session10-Support vector machine.pdf
OM-DS-Fall2022-Session10-Support vector machine.pdfOM-DS-Fall2022-Session10-Support vector machine.pdf
OM-DS-Fall2022-Session10-Support vector machine.pdf
 
How Machine Learning Helps Organizations to Work More Efficiently?
How Machine Learning Helps Organizations to Work More Efficiently?How Machine Learning Helps Organizations to Work More Efficiently?
How Machine Learning Helps Organizations to Work More Efficiently?
 
Deeplearning
Deeplearning Deeplearning
Deeplearning
 
Support Vector Machines USING MACHINE LEARNING HOW IT WORKS
Support Vector Machines USING MACHINE LEARNING HOW IT WORKSSupport Vector Machines USING MACHINE LEARNING HOW IT WORKS
Support Vector Machines USING MACHINE LEARNING HOW IT WORKS
 
Machine Learning workshop by GDSC Amity University Chhattisgarh
Machine Learning workshop by GDSC Amity University ChhattisgarhMachine Learning workshop by GDSC Amity University Chhattisgarh
Machine Learning workshop by GDSC Amity University Chhattisgarh
 
Computer Vision for Beginners
Computer Vision for BeginnersComputer Vision for Beginners
Computer Vision for Beginners
 
Machine learning, biomarker accuracy and best practices
Machine learning, biomarker accuracy and best practicesMachine learning, biomarker accuracy and best practices
Machine learning, biomarker accuracy and best practices
 
Machine Learning Foundations for Professional Managers
Machine Learning Foundations for Professional ManagersMachine Learning Foundations for Professional Managers
Machine Learning Foundations for Professional Managers
 
Regression vs Deep Neural net vs SVM
Regression vs Deep Neural net vs SVMRegression vs Deep Neural net vs SVM
Regression vs Deep Neural net vs SVM
 
Throttling Malware Families in 2D
Throttling Malware Families in 2DThrottling Malware Families in 2D
Throttling Malware Families in 2D
 
Seminar nov2017
Seminar nov2017Seminar nov2017
Seminar nov2017
 
Talk@rmit 09112017
Talk@rmit 09112017Talk@rmit 09112017
Talk@rmit 09112017
 
Data Science - Part IX - Support Vector Machine
Data Science - Part IX -  Support Vector MachineData Science - Part IX -  Support Vector Machine
Data Science - Part IX - Support Vector Machine
 

Recently uploaded

Cyber awareness ppt on the recorded data
Cyber awareness ppt on the recorded dataCyber awareness ppt on the recorded data
Cyber awareness ppt on the recorded dataTecnoIncentive
 
Digital Marketing Plan, how digital marketing works
Digital Marketing Plan, how digital marketing worksDigital Marketing Plan, how digital marketing works
Digital Marketing Plan, how digital marketing worksdeepakthakur548787
 
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...Amil Baba Dawood bangali
 
The Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptx
The Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptxThe Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptx
The Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptxTasha Penwell
 
why-transparency-and-traceability-are-essential-for-sustainable-supply-chains...
why-transparency-and-traceability-are-essential-for-sustainable-supply-chains...why-transparency-and-traceability-are-essential-for-sustainable-supply-chains...
why-transparency-and-traceability-are-essential-for-sustainable-supply-chains...Jack Cole
 
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...Boston Institute of Analytics
 
Semantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptxSemantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptxMike Bennett
 
World Economic Forum Metaverse Ecosystem By Utpal Chakraborty.pdf
World Economic Forum Metaverse Ecosystem By Utpal Chakraborty.pdfWorld Economic Forum Metaverse Ecosystem By Utpal Chakraborty.pdf
World Economic Forum Metaverse Ecosystem By Utpal Chakraborty.pdfsimulationsindia
 
Rithik Kumar Singh codealpha pythohn.pdf
Rithik Kumar Singh codealpha pythohn.pdfRithik Kumar Singh codealpha pythohn.pdf
Rithik Kumar Singh codealpha pythohn.pdfrahulyadav957181
 
Networking Case Study prepared by teacher.pptx
Networking Case Study prepared by teacher.pptxNetworking Case Study prepared by teacher.pptx
Networking Case Study prepared by teacher.pptxHimangsuNath
 
Principles and Practices of Data Visualization
Principles and Practices of Data VisualizationPrinciples and Practices of Data Visualization
Principles and Practices of Data VisualizationKianJazayeri1
 
Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...Seán Kennedy
 
Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...
Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...
Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...Boston Institute of Analytics
 
IBEF report on the Insurance market in India
IBEF report on the Insurance market in IndiaIBEF report on the Insurance market in India
IBEF report on the Insurance market in IndiaManalVerma4
 
modul pembelajaran robotic Workshop _ by Slidesgo.pptx
modul pembelajaran robotic Workshop _ by Slidesgo.pptxmodul pembelajaran robotic Workshop _ by Slidesgo.pptx
modul pembelajaran robotic Workshop _ by Slidesgo.pptxaleedritatuxx
 
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024Susanna-Assunta Sansone
 
Decoding Patterns: Customer Churn Prediction Data Analysis Project
Decoding Patterns: Customer Churn Prediction Data Analysis ProjectDecoding Patterns: Customer Churn Prediction Data Analysis Project
Decoding Patterns: Customer Churn Prediction Data Analysis ProjectBoston Institute of Analytics
 
Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...Seán Kennedy
 
SMOTE and K-Fold Cross Validation-Presentation.pptx
SMOTE and K-Fold Cross Validation-Presentation.pptxSMOTE and K-Fold Cross Validation-Presentation.pptx
SMOTE and K-Fold Cross Validation-Presentation.pptxHaritikaChhatwal1
 

Recently uploaded (20)

Cyber awareness ppt on the recorded data
Cyber awareness ppt on the recorded dataCyber awareness ppt on the recorded data
Cyber awareness ppt on the recorded data
 
Digital Marketing Plan, how digital marketing works
Digital Marketing Plan, how digital marketing worksDigital Marketing Plan, how digital marketing works
Digital Marketing Plan, how digital marketing works
 
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
 
The Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptx
The Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptxThe Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptx
The Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptx
 
why-transparency-and-traceability-are-essential-for-sustainable-supply-chains...
why-transparency-and-traceability-are-essential-for-sustainable-supply-chains...why-transparency-and-traceability-are-essential-for-sustainable-supply-chains...
why-transparency-and-traceability-are-essential-for-sustainable-supply-chains...
 
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
 
Semantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptxSemantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptx
 
World Economic Forum Metaverse Ecosystem By Utpal Chakraborty.pdf
World Economic Forum Metaverse Ecosystem By Utpal Chakraborty.pdfWorld Economic Forum Metaverse Ecosystem By Utpal Chakraborty.pdf
World Economic Forum Metaverse Ecosystem By Utpal Chakraborty.pdf
 
Rithik Kumar Singh codealpha pythohn.pdf
Rithik Kumar Singh codealpha pythohn.pdfRithik Kumar Singh codealpha pythohn.pdf
Rithik Kumar Singh codealpha pythohn.pdf
 
Networking Case Study prepared by teacher.pptx
Networking Case Study prepared by teacher.pptxNetworking Case Study prepared by teacher.pptx
Networking Case Study prepared by teacher.pptx
 
Principles and Practices of Data Visualization
Principles and Practices of Data VisualizationPrinciples and Practices of Data Visualization
Principles and Practices of Data Visualization
 
Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...
 
Data Analysis Project: Stroke Prediction
Data Analysis Project: Stroke PredictionData Analysis Project: Stroke Prediction
Data Analysis Project: Stroke Prediction
 
Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...
Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...
Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...
 
IBEF report on the Insurance market in India
IBEF report on the Insurance market in IndiaIBEF report on the Insurance market in India
IBEF report on the Insurance market in India
 
modul pembelajaran robotic Workshop _ by Slidesgo.pptx
modul pembelajaran robotic Workshop _ by Slidesgo.pptxmodul pembelajaran robotic Workshop _ by Slidesgo.pptx
modul pembelajaran robotic Workshop _ by Slidesgo.pptx
 
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024
 
Decoding Patterns: Customer Churn Prediction Data Analysis Project
Decoding Patterns: Customer Churn Prediction Data Analysis ProjectDecoding Patterns: Customer Churn Prediction Data Analysis Project
Decoding Patterns: Customer Churn Prediction Data Analysis Project
 
Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...
 
SMOTE and K-Fold Cross Validation-Presentation.pptx
SMOTE and K-Fold Cross Validation-Presentation.pptxSMOTE and K-Fold Cross Validation-Presentation.pptx
SMOTE and K-Fold Cross Validation-Presentation.pptx
 

Support Vector Machine without tears

  • 1. Digg Data Support Vector Machine Ankit Sharma www.diggdata.in without tears
  • 2. Digg Data Content SVM and its application Basic SVM •Hyperplane •Understanding of basics •Optimization Soft margin SVM Non-linear decision boundary SVMs in “loss + penalty” form Kernel method •Gaussian kernel SVM usage beyond classification Thursday, August 7, 2014 WITHOUT TEARS SERIES | www.diggdata.in 2
  • 3. Digg Data • In machine learning, support vector machines are supervised learning models with associated learning algorithms that analyze data and recognize patterns, used for classification and regression analysis. • Properties of SVM : Duality Kernels Margin Convexity Sparseness SVM : Support Vector Machine Thursday, August 7, 2014 WITHOUT TEARS SERIES | www.diggdata.in 3
  • 5. Digg Data Basic concept of SVM Find a linear decision surface (“hyperplane”) that can separate classes and has the largest distance (i.e., largest “gap” or “margin”) between border-line patients (i.e., “support vectors”) Thursday, August 7, 2014 WITHOUT TEARS SERIES | www.diggdata.in 5
  • 6. Digg Data Hyperplane as a Decision boundary • A hyperplane is a linear decision surface that splits the space into two parts; • It is obvious that a hyperplane is a binary classifier Thursday, August 7, 2014 WITHOUT TEARS SERIES | www.diggdata.in 6
  • 7. Digg Data Equation of a hyperplane An equation of a hyperplane is defined by a point (P0) and a perpendicular vector to the plane (𝑤) at that point. Thursday, August 7, 2014 WITHOUT TEARS SERIES | www.diggdata.in 7
  • 8. Digg Data • g(x) is a linear function: x1 x2 w x + b < 0 w x + b > 0  A hyper-plane in the feature space  (Unit-length) normal vector of the hyper-plane:  w n w n Understanding the basics Thursday, August 7, 2014 WITHOUT TEARS SERIES | www.diggdata.in 8
  • 9. Digg Data x1 x2How to classify these points using a linear discriminant function in order to minimize the error rate?  Infinite number of answers!  Which one is the best? Understanding the basics denotes +1 denotes -1Thursday, August 7, 2014 WITHOUT TEARS SERIES | www.diggdata.in 9
  • 10. Digg Data • The linear discriminant function (classifier) with the maximum margin is the best “safe zone”  Margin is defined as the width that the boundary could be increased by before hitting a data point  Why it is the best? Robust to outliners and thus strong generalization ability Margin x1 x2 Understanding the basics denotes +1 denotes -1Thursday, August 7, 2014 WITHOUT TEARS SERIES | www.diggdata.in 10
  • 11. Digg Data • Given a set of data points:  With a scale transformation on both w and b, the above is equivalent to x1 x2 {( , )}, 1,2, ,i iy i nx , where 𝑭𝒐𝒓 𝒚𝒊 = +𝟏, 𝑾 𝑿𝒊 + 𝒃> 0 𝑭𝒐𝒓 𝒚𝒊 = −𝟏, 𝑾 𝑿𝒊 + 𝒃 < 𝟎 𝑭𝒐𝒓 𝒚𝒊 = +𝟏, 𝑾 𝑿𝒊 + 𝒃> +1 𝑭𝒐𝒓 𝒚𝒊 = −𝟏, 𝑾 𝑿𝒊 + 𝒃 < -1 Understanding the basics denotes +1 denotes -1Thursday, August 7, 2014 WITHOUT TEARS SERIES | www.diggdata.in 11
  • 12. Digg Data • We know that  The margin width is: x1 x2 Margin x+ x+ x- ( ) 2 ( ) M            x x n w x x w w n Support Vectors 𝑾 𝑿+ + 𝒃 = +𝟏 𝑾 𝑿− + 𝒃 = −𝟏 Understanding the basics denotes +1 denotes -1Thursday, August 7, 2014 WITHOUT TEARS SERIES | www.diggdata.in 12
  • 13. Digg Data • Formulation: x1 x2 Margin x+ x+ x- n such that 2 maximize w 𝑭𝒐𝒓 𝒚𝒊 = +𝟏, 𝑾 𝑿𝒊 + 𝒃> +1 𝑭𝒐𝒓 𝒚𝒊 = −𝟏, 𝑾 𝑿𝒊 + 𝒃 < -1 Understanding the basics denotes +1 denotes -1Thursday, August 7, 2014 WITHOUT TEARS SERIES | www.diggdata.in 13
  • 14. Digg Data • Formulation: x1 x2 Margin x+ x+ x- n 21 minimize 2 w such that 𝑭𝒐𝒓 𝒚𝒊 = +𝟏, 𝑾 𝑿𝒊 + 𝒃> +1 𝑭𝒐𝒓 𝒚𝒊 = −𝟏, 𝑾 𝑿𝒊 + 𝒃 < -1 Understanding the basics denotes +1 denotes -1Thursday, August 7, 2014 WITHOUT TEARS SERIES | www.diggdata.in 14
  • 15. Digg Data • Formulation: x1 x2 Margin x+ x+ x- n 21 minimize 2 w such that 𝐲𝐢 𝐖 𝐗 + 𝐛 ≥ 𝟏 Understanding the basics denotes +1 denotes -1Thursday, August 7, 2014 WITHOUT TEARS SERIES | www.diggdata.in 15
  • 16. Digg Data Basics of optimization: Convex functions • A function is called convex if the function lies below the straight line segment connecting two points, for any two points in the interval. • Property: Any local minimum is a global minimum! Thursday, August 7, 2014 WITHOUT TEARS SERIES | www.diggdata.in 16
  • 17. Digg Data Basics of optimization: Quadratic programming • Quadratic programming (QP) is a special optimization problem: the function to optimize (“objective”) is quadratic, subject to linear constraints. • Convex QP problems have convex objective functions. • These problems can be solved easily and efficiently by greedy algorithms (because every local minimum is a global minimum). Thursday, August 7, 2014 WITHOUT TEARS SERIES | www.diggdata.in 17
  • 18. Digg Data SVM optimization problem: Primal formulation • This is called “primal formulation of linear SVMs” • It is a convex quadratic programming (QP) optimization problem with n variables (wi, i= 1,…,n), where n is the number of features in the dataset. Thursday, August 7, 2014 WITHOUT TEARS SERIES | www.diggdata.in 18
  • 19. Digg Data SVM optimization problem: Dual formulation • The previous problem can be recast in the so-called “dual form” giving rise to “dual formulation of linear SVMs”. • Apply the method of Lagrange multipliers. • We need to minimize this Lagrangian with respect to and simultaneously require that the derivative with respect to vanishes , all subject to the constraints that αi > 0 Thursday, August 7, 2014 WITHOUT TEARS SERIES | www.diggdata.in 19
  • 20. Digg Data SVM optimization problem: Dual formulation Cond… It is also a convex quadratic programming problem but with N variables (αi, i= 1,…,N), where N is the number of samples. Thursday, August 7, 2014 WITHOUT TEARS SERIES | www.diggdata.in 20
  • 21. Digg Data SVM optimization problem: Benefits of using dual formulation 1) No need to access original data, need to access only dot products. 2) Number of free parameters is bounded by the number of support vectors and not by the number of variables (beneficial for high-dimensional problems). Thursday, August 7, 2014 WITHOUT TEARS SERIES | www.diggdata.in 21
  • 22. Digg Data Non linearly separable data: “Soft-margin” linear SVM Assign a “slack variable” to each instance , ξi > 0 which can be thought of distance from the separating hyperplane if an instance is misclassified and 0 otherwise. Primal formulation: Dual formulation: • When C is very large, the soft-margin SVM is equivalent to hard-margin SVM; • When C is very small, we admit misclassifications in the training data at the expense of having w-vector with small norm; • C has to be selected for the distribution at hand as it will be discussed later in this tutorial. Thursday, August 7, 2014 WITHOUT TEARS SERIES | www.diggdata.in 22
  • 23. Digg Data SVMs in “loss + penalty” form • Many statistical learning algorithms (including SVMs) search for a decision function by solving the following optimization problem: Minimize (Loss+ λ Penalty) – Loss measures error of fitting the data – Penalty penalizes complexity of the learned function – λ is regularization parameter that balances Loss and Penalty • Overfitting → Poor generalization Can also be stated as Thursday, August 7, 2014 WITHOUT TEARS SERIES | www.diggdata.in 23
  • 24. Digg Data Nonlinear decision boundary Non Linear Decision Boundary Kernel method Thursday, August 7, 2014 WITHOUT TEARS SERIES | www.diggdata.in 24
  • 25. Digg Data Kernel method • Kernel methods involve – Nonlinear transformation of data to a higher dimensional feature space induced by a Mercer kernel – Detection of optimal linear solutions in the kernel feature space • Transformation to a higher dimensional space is expected to be helpful in conversion of nonlinear relations into linear relations (Cover’s theorem) – Nonlinearly separable patterns to linearly separable patterns – Nonlinear regression to linear regression – Nonlinear separation of clusters to linear separation of clusters • Pattern analysis methods are implemented in such a way that the kernel feature space representation is not explicitly required. They involve computation of pair-wise inner-products only. • The pair-wise inner-products are computed efficiently directly from the original representation of data using a kernel function (Kernel trick) Thursday, August 7, 2014 WITHOUT TEARS SERIES | www.diggdata.in 25
  • 26. Digg Data Kernel trick Not every function RN×RN -> R can be a valid kernel; it has to satisfy so-called Mercer conditions. Otherwise, the underlying quadratic program may not be solvable. Thursday, August 7, 2014 WITHOUT TEARS SERIES | www.diggdata.in 26
  • 27. Digg Data Popular kernels Thursday, August 7, 2014 WITHOUT TEARS SERIES | www.diggdata.in 27
  • 28. Digg Data Gaussian kernel Consider the Gaussian kernel: Geometrically, this is a “bump” or “cavity” centered at the training data point 𝑥j : The resulting mapping function is a combination of bumps and cavities. Thursday, August 7, 2014 WITHOUT TEARS SERIES | www.diggdata.in 28
  • 29. Digg Data SVM usage beyond classification Regression analysis (ε-Support vector regression) Anomaly detection (One-class SVM) Clustering analysis (Support Vector Domain Description) Thursday, August 7, 2014 WITHOUT TEARS SERIES | www.diggdata.in 29
  • 30. Digg Data Thank you Thursday, August 7, 2014 WITHOUT TEARS SERIES | www.diggdata.in 30