SlideShare a Scribd company logo
Presented by: Derek Kane
 What is a Support Vector Machine?
 Support Vector Machine Applications
 Linear Classifier Separators
 Classification Margin
 Maximum Margin / Support Vectors
 Soft Margin
 Non Linear Support Vector Machines
 Feature Space / Holographic Projection
 Kernels / Kernel Trick
 Classification
 Practical Application Example – Breast Cancer
 SVMs were originally proposed by Boser, Guyon and Vapnik in 1992 and gained
increasing popularity in late 1990s.
 SVMs are currently among the best performers for a number of classification tasks
ranging from text to genomic data.
 A SVM is primarily used for determining the classification of a dichotomous binary
response variable (0 or 1).
 They are incredibly sensitive to noise in the data. A relatively small number of
mislabeled examples can dramatically decrease the performance.
 SVMs can be applied to complex data types beyond feature vectors (e.g. graphs,
sequences, relational data) by designing kernel functions for such data.
 SVM techniques have been extended to a number of tasks such as regression [Vapnik et
al. ’97], principal component analysis [Schölkopf et al. ’99], etc.
 Tuning SVMs remains a black art: selecting a specific kernel and parameters is usually
done in a try-and-see manner.
Numerous real world applications:
 Hand-written character recognition.
 Image classification (facial recognition).
 Bioinformatics
 Protein classification
 Cancer classification
 Text (and hypertext) categorization.
 We can plot data along a 2-Dimensional feature space and find a separation between the
different type of data. This space can be split using a linear separator called a hyperplane.
 Is this a good split
between the classes?
 Which one of these hyperplanes create a better separation?
 Is this a good split
between the classes?
 Or is this version
better?
 Which one of these hyperplanes creates the optimal separation?
 How do you know?
 Distance from example xi to the
separator is:
 Examples closest to the hyperplane
are support vectors.
 Margin ρ of the separator is the
distance between support vectors.
 A SVM relies on the principal that the
optimal linear separator also
maximizes the margin.
 Maximizing the margin is good
according to intuition and PAC
theory.
 Implies that only support vectors
matter; other training examples are
ignorable.
 Most “important” training points are
support vectors; they define the
hyperplane.
 What if the training set is noisy?
 Solution 1: Utilize a soft margin
calculation.
 Solution 2: Use powerful kernels.
 What if the training set is not linearly
separable?
 Slack variables ξi can be added to
allow misclassification of difficult or
noisy examples, resulting margin
called soft.
 Overfitting can be controlled by soft
margin approach
 Rather than fitting nonlinear curves to the data,
SVM handles this by using a kernel function to
map the data into a different space where a
hyperplane can be used to do the separation.
 This method is prone to over fitting and should be
used sparingly.
 Datasets that are linearly separable with some noise work out great:
 But what are we going to do if the dataset is just too hard?
 How about… mapping data to a higher-dimensional space:
Φ: x → φ(x)
 The original feature space of a non-linear SVM can be mapped to some higher-dimensional
feature space where the training set is separable utilizing a kernel function.
2-Dimensional 3-Dimensional
 Fitting hyperplanes as separators is
mathematically easy. We can draw
from the kernel function.
 By replacing the raw input variables
with a much larger set of features we
get a nice property:
 A planar separator in the high-
dimensional space of feature vectors is
a curved separator in the low
dimensional space of the raw input
variables.
 Ex: A planar separator in a 20-Dimensional
feature space projected back to the original
2-Dimensional space
 The concept of a kernel mapping function is very powerful. It allows SVM models to
perform separations even with very complex boundaries such as shown below.
 If we map the input vectors into a
very high-dimensional feature space,
the task of finding the maximum-
margin separator becomes
computationally intractable.
 The mathematics is all linear, which is
good, but the vectors have a huge
number of components. So taking
the scalar product of two vectors is
very expensive.
 The way to keep things tractable is to
use “the kernel trick”.
Mercer’s theorem:
 Every semi-positive definite
symmetric function is a kernel
 For many mappings from a low-
dimensional space to a high-
dimensional space, there is a simple
operation on two vectors in the low-
dimensional space that can be used to
compute the scalar product of their
two images in the high-dimensional
space.
)(.)(),( baba
xxxxK 
 Letting the
Kernel do the
Work
 doing the scalar
products in the
obvious way.
 All of the computations that we need to do to find the maximum-margin separator can
be expressed in terms of scalar products between pairs of datapoints (in the high-
dimensional feature space).
 These scalar products are the only part of the computation that depends on the
dimensionality of the high-dimensional space.
 So if we had a fast way to do the scalar products we would not have to pay a price for
solving the learning problem in the high-dimensional space.
 The kernel trick is just a magic way of doing scalar products a whole lot faster than is
usually possible.
 It relies on choosing a way of mapping to the high-dimensional feature space that
allows fast scalar products.
 There may be an infinite number of kernels which one can employ. Here are some of
the more common kernels:
* For the neural network kernel, there is one “hidden unit” per support vector, so the
process of fitting the maximum margin hyperplane decides how many hidden units to
use. Also, it may violate Mercer’s condition.
)(tanh),(
),(
)1.(),(
22
2/||||






x.yyx
yx
yxyx
yx
kK
eK
K p
 Polynomial
 Gaussian Radial
Function
 Neural Network*
 The final classification rule is eloquently simple:
 All the cleverness goes into selecting the support vectors that maximize the margin and
computing the weight to use on each support vector.
 We also need to choose a good kernel function and we may need to choose a lambda
for dealing with non-separable cases.
 
SVs
stest
s xxKwbias

0),(
The set of support vectors
 Support Vector Machines work very well in practice.
 The user must choose the kernel function and its parameters, but the rest is automatic.
 The test performance is very good.
 They can be expensive in time and space for big datasets.
 The computation of the maximum-margin hyper-plane depends on the square of the
number of training cases.
 We need to store all the support vectors.
 SVM’s are very good if you have no idea about what structure to impose on the task.
 The kernel trick can also be used to do PCA in a much higher-dimensional space, thus
giving a non-linear version of PCA in the original space.
 The dataset contains information related to
females with tumors of different characteristics
and whether or not the tumor was benign or
malignant.
 The dataset contains 700 individuals, of which,
458 (65.5%) were benign and 241 (34.5%) were
malignant.
 Our goal is to devise:
 A Support Vector Machine to classify case
based upon tumor characteristics as benign or
malignant.
 First, lets run a tuning function to identify the best parameters to use for the SVM
model.
 The process runs a 10-fold cross validation methodology and identified the following:
 Gamma = 0.01
 Cost = 1
 The E1071 package in R uses a linear kernel as its standard algorithm.
 We created a random test sample of the dataset and included only the measurement variables.
This data was not involved in the initial training of the Support Vector Machine but will be used to
validate the results from passing the data through the SVM.
Prediction Accuracy:
97.1%
 Reside in Wayne, Illinois
 Active Semi-Professional Classical Musician
(Bassoon).
 Married my wife on 10/10/10 and been
together for 10 years.
 Pet Yorkshire Terrier / Toy Poodle named
Brunzie.
 Pet Maine Coons’ named Maximus Power and
Nemesis Gul du Cat.
 Enjoy Cooking, Hiking, Cycling, Kayaking, and
Astronomy.
 Self proclaimed Data Nerd and Technology
Lover.
Data Science - Part IX -  Support Vector Machine

More Related Content

What's hot

Bias and variance trade off
Bias and variance trade offBias and variance trade off
Bias and variance trade off
VARUN KUMAR
 
Logistic Regression in Python | Logistic Regression Example | Machine Learnin...
Logistic Regression in Python | Logistic Regression Example | Machine Learnin...Logistic Regression in Python | Logistic Regression Example | Machine Learnin...
Logistic Regression in Python | Logistic Regression Example | Machine Learnin...
Edureka!
 
05 Clustering in Data Mining
05 Clustering in Data Mining05 Clustering in Data Mining
05 Clustering in Data Mining
Valerii Klymchuk
 
Computational Learning Theory
Computational Learning TheoryComputational Learning Theory
Computational Learning Theory
butest
 
K nearest neighbor
K nearest neighborK nearest neighbor
K nearest neighbor
Ujjawal
 
Machine Learning with Decision trees
Machine Learning with Decision treesMachine Learning with Decision trees
Machine Learning with Decision trees
Knoldus Inc.
 
Support Vector Machines
Support Vector MachinesSupport Vector Machines
Support Vector Machines
nextlib
 
2.3 bayesian classification
2.3 bayesian classification2.3 bayesian classification
2.3 bayesian classification
Krish_ver2
 
Support vector machine
Support vector machineSupport vector machine
Support vector machine
Rishabh Gupta
 
Computational learning theory
Computational learning theoryComputational learning theory
Computational learning theory
swapnac12
 
Clusters techniques
Clusters techniquesClusters techniques
Clusters techniques
rajshreemuthiah
 
Support Vector Machines (SVM)
Support Vector Machines (SVM)Support Vector Machines (SVM)
Support Vector Machines (SVM)
FAO
 
MACHINE LEARNING - GENETIC ALGORITHM
MACHINE LEARNING - GENETIC ALGORITHMMACHINE LEARNING - GENETIC ALGORITHM
MACHINE LEARNING - GENETIC ALGORITHM
Puneet Kulyana
 
Linear Regression
Linear RegressionLinear Regression
Linear Regression
nep_test_account
 
Ensemble learning
Ensemble learningEnsemble learning
Ensemble learning
Haris Jamil
 
Logistic Regression | Logistic Regression In Python | Machine Learning Algori...
Logistic Regression | Logistic Regression In Python | Machine Learning Algori...Logistic Regression | Logistic Regression In Python | Machine Learning Algori...
Logistic Regression | Logistic Regression In Python | Machine Learning Algori...
Simplilearn
 
Naive Bayes
Naive BayesNaive Bayes
Naive Bayes
CloudxLab
 
Support vector machines
Support vector machinesSupport vector machines
Support vector machines
Ujjawal
 
web mining
web miningweb mining
web mining
Arpit Verma
 
Cross-validation Tutorial: What, how and which?
Cross-validation Tutorial: What, how and which?Cross-validation Tutorial: What, how and which?
Cross-validation Tutorial: What, how and which?
Pradeep Redddy Raamana
 

What's hot (20)

Bias and variance trade off
Bias and variance trade offBias and variance trade off
Bias and variance trade off
 
Logistic Regression in Python | Logistic Regression Example | Machine Learnin...
Logistic Regression in Python | Logistic Regression Example | Machine Learnin...Logistic Regression in Python | Logistic Regression Example | Machine Learnin...
Logistic Regression in Python | Logistic Regression Example | Machine Learnin...
 
05 Clustering in Data Mining
05 Clustering in Data Mining05 Clustering in Data Mining
05 Clustering in Data Mining
 
Computational Learning Theory
Computational Learning TheoryComputational Learning Theory
Computational Learning Theory
 
K nearest neighbor
K nearest neighborK nearest neighbor
K nearest neighbor
 
Machine Learning with Decision trees
Machine Learning with Decision treesMachine Learning with Decision trees
Machine Learning with Decision trees
 
Support Vector Machines
Support Vector MachinesSupport Vector Machines
Support Vector Machines
 
2.3 bayesian classification
2.3 bayesian classification2.3 bayesian classification
2.3 bayesian classification
 
Support vector machine
Support vector machineSupport vector machine
Support vector machine
 
Computational learning theory
Computational learning theoryComputational learning theory
Computational learning theory
 
Clusters techniques
Clusters techniquesClusters techniques
Clusters techniques
 
Support Vector Machines (SVM)
Support Vector Machines (SVM)Support Vector Machines (SVM)
Support Vector Machines (SVM)
 
MACHINE LEARNING - GENETIC ALGORITHM
MACHINE LEARNING - GENETIC ALGORITHMMACHINE LEARNING - GENETIC ALGORITHM
MACHINE LEARNING - GENETIC ALGORITHM
 
Linear Regression
Linear RegressionLinear Regression
Linear Regression
 
Ensemble learning
Ensemble learningEnsemble learning
Ensemble learning
 
Logistic Regression | Logistic Regression In Python | Machine Learning Algori...
Logistic Regression | Logistic Regression In Python | Machine Learning Algori...Logistic Regression | Logistic Regression In Python | Machine Learning Algori...
Logistic Regression | Logistic Regression In Python | Machine Learning Algori...
 
Naive Bayes
Naive BayesNaive Bayes
Naive Bayes
 
Support vector machines
Support vector machinesSupport vector machines
Support vector machines
 
web mining
web miningweb mining
web mining
 
Cross-validation Tutorial: What, how and which?
Cross-validation Tutorial: What, how and which?Cross-validation Tutorial: What, how and which?
Cross-validation Tutorial: What, how and which?
 

Similar to Data Science - Part IX - Support Vector Machine

Support vector machines
Support vector machinesSupport vector machines
Support vector machines
manaswinimysore
 
Support vector machine-SVM's
Support vector machine-SVM'sSupport vector machine-SVM's
Support vector machine-SVM's
Anudeep Chowdary Kamepalli
 
lec10svm.ppt
lec10svm.pptlec10svm.ppt
lec10svm.ppt
pushkarjoshi42
 
lec10svm.ppt
lec10svm.pptlec10svm.ppt
lec10svm.ppt
kibrualemu812
 
Svm ms
Svm msSvm ms
Svm ms
student
 
Classification-Support Vector Machines.pptx
Classification-Support Vector Machines.pptxClassification-Support Vector Machines.pptx
Classification-Support Vector Machines.pptx
Ciceer Ghimirey
 
classification algorithms in machine learning.pptx
classification algorithms in machine learning.pptxclassification algorithms in machine learning.pptx
classification algorithms in machine learning.pptx
jasontseng19
 
Introduction-to-SVM-Models_presentation.pptx
Introduction-to-SVM-Models_presentation.pptxIntroduction-to-SVM-Models_presentation.pptx
Introduction-to-SVM-Models_presentation.pptx
MAXKEVINSAENZNUUVERO
 
lec10svm.ppt
lec10svm.pptlec10svm.ppt
lec10svm.ppt
TheULTIMATEALLROUNDE
 
Support Vector Machines USING MACHINE LEARNING HOW IT WORKS
Support Vector Machines USING MACHINE LEARNING HOW IT WORKSSupport Vector Machines USING MACHINE LEARNING HOW IT WORKS
Support Vector Machines USING MACHINE LEARNING HOW IT WORKS
rajalakshmi5921
 
ABayesianApproachToLocalizedMultiKernelLearningUsingTheRelevanceVectorMachine...
ABayesianApproachToLocalizedMultiKernelLearningUsingTheRelevanceVectorMachine...ABayesianApproachToLocalizedMultiKernelLearningUsingTheRelevanceVectorMachine...
ABayesianApproachToLocalizedMultiKernelLearningUsingTheRelevanceVectorMachine...
grssieee
 
support vector machine 1.pptx
support vector machine 1.pptxsupport vector machine 1.pptx
support vector machine 1.pptx
surbhidutta4
 
Support Vector Machine Optimal Kernel Selection
Support Vector Machine Optimal Kernel SelectionSupport Vector Machine Optimal Kernel Selection
Support Vector Machine Optimal Kernel Selection
IRJET Journal
 
Module-3_SVM_Kernel_KNN.pptx
Module-3_SVM_Kernel_KNN.pptxModule-3_SVM_Kernel_KNN.pptx
Module-3_SVM_Kernel_KNN.pptx
VaishaliBagewadikar
 
November, 2006 CCKM'06 1
November, 2006 CCKM'06 1 November, 2006 CCKM'06 1
November, 2006 CCKM'06 1
butest
 
Support Vector Machine ppt presentation
Support Vector Machine ppt presentationSupport Vector Machine ppt presentation
Support Vector Machine ppt presentation
AyanaRukasar
 
PNN and inversion-B
PNN and inversion-BPNN and inversion-B
PNN and inversion-B
Stig-Arne Kristoffersen
 
The International Journal of Engineering and Science (The IJES)
The International Journal of Engineering and Science (The IJES)The International Journal of Engineering and Science (The IJES)
The International Journal of Engineering and Science (The IJES)
theijes
 
SVM - Functional Verification
SVM - Functional VerificationSVM - Functional Verification
SVM - Functional Verification
Sai Kiran Kadam
 
svm.pptx
svm.pptxsvm.pptx

Similar to Data Science - Part IX - Support Vector Machine (20)

Support vector machines
Support vector machinesSupport vector machines
Support vector machines
 
Support vector machine-SVM's
Support vector machine-SVM'sSupport vector machine-SVM's
Support vector machine-SVM's
 
lec10svm.ppt
lec10svm.pptlec10svm.ppt
lec10svm.ppt
 
lec10svm.ppt
lec10svm.pptlec10svm.ppt
lec10svm.ppt
 
Svm ms
Svm msSvm ms
Svm ms
 
Classification-Support Vector Machines.pptx
Classification-Support Vector Machines.pptxClassification-Support Vector Machines.pptx
Classification-Support Vector Machines.pptx
 
classification algorithms in machine learning.pptx
classification algorithms in machine learning.pptxclassification algorithms in machine learning.pptx
classification algorithms in machine learning.pptx
 
Introduction-to-SVM-Models_presentation.pptx
Introduction-to-SVM-Models_presentation.pptxIntroduction-to-SVM-Models_presentation.pptx
Introduction-to-SVM-Models_presentation.pptx
 
lec10svm.ppt
lec10svm.pptlec10svm.ppt
lec10svm.ppt
 
Support Vector Machines USING MACHINE LEARNING HOW IT WORKS
Support Vector Machines USING MACHINE LEARNING HOW IT WORKSSupport Vector Machines USING MACHINE LEARNING HOW IT WORKS
Support Vector Machines USING MACHINE LEARNING HOW IT WORKS
 
ABayesianApproachToLocalizedMultiKernelLearningUsingTheRelevanceVectorMachine...
ABayesianApproachToLocalizedMultiKernelLearningUsingTheRelevanceVectorMachine...ABayesianApproachToLocalizedMultiKernelLearningUsingTheRelevanceVectorMachine...
ABayesianApproachToLocalizedMultiKernelLearningUsingTheRelevanceVectorMachine...
 
support vector machine 1.pptx
support vector machine 1.pptxsupport vector machine 1.pptx
support vector machine 1.pptx
 
Support Vector Machine Optimal Kernel Selection
Support Vector Machine Optimal Kernel SelectionSupport Vector Machine Optimal Kernel Selection
Support Vector Machine Optimal Kernel Selection
 
Module-3_SVM_Kernel_KNN.pptx
Module-3_SVM_Kernel_KNN.pptxModule-3_SVM_Kernel_KNN.pptx
Module-3_SVM_Kernel_KNN.pptx
 
November, 2006 CCKM'06 1
November, 2006 CCKM'06 1 November, 2006 CCKM'06 1
November, 2006 CCKM'06 1
 
Support Vector Machine ppt presentation
Support Vector Machine ppt presentationSupport Vector Machine ppt presentation
Support Vector Machine ppt presentation
 
PNN and inversion-B
PNN and inversion-BPNN and inversion-B
PNN and inversion-B
 
The International Journal of Engineering and Science (The IJES)
The International Journal of Engineering and Science (The IJES)The International Journal of Engineering and Science (The IJES)
The International Journal of Engineering and Science (The IJES)
 
SVM - Functional Verification
SVM - Functional VerificationSVM - Functional Verification
SVM - Functional Verification
 
svm.pptx
svm.pptxsvm.pptx
svm.pptx
 

More from Derek Kane

Data Science - Part XVII - Deep Learning & Image Processing
Data Science - Part XVII - Deep Learning & Image ProcessingData Science - Part XVII - Deep Learning & Image Processing
Data Science - Part XVII - Deep Learning & Image Processing
Derek Kane
 
Data Science - Part XVI - Fourier Analysis
Data Science - Part XVI - Fourier AnalysisData Science - Part XVI - Fourier Analysis
Data Science - Part XVI - Fourier Analysis
Derek Kane
 
Data Science - Part XV - MARS, Logistic Regression, & Survival Analysis
Data Science -  Part XV - MARS, Logistic Regression, & Survival AnalysisData Science -  Part XV - MARS, Logistic Regression, & Survival Analysis
Data Science - Part XV - MARS, Logistic Regression, & Survival Analysis
Derek Kane
 
Data Science - Part XIV - Genetic Algorithms
Data Science - Part XIV - Genetic AlgorithmsData Science - Part XIV - Genetic Algorithms
Data Science - Part XIV - Genetic Algorithms
Derek Kane
 
Data Science - Part XIII - Hidden Markov Models
Data Science - Part XIII - Hidden Markov ModelsData Science - Part XIII - Hidden Markov Models
Data Science - Part XIII - Hidden Markov Models
Derek Kane
 
Data Science - Part XII - Ridge Regression, LASSO, and Elastic Nets
Data Science - Part XII - Ridge Regression, LASSO, and Elastic NetsData Science - Part XII - Ridge Regression, LASSO, and Elastic Nets
Data Science - Part XII - Ridge Regression, LASSO, and Elastic Nets
Derek Kane
 
Data Science - Part XI - Text Analytics
Data Science - Part XI - Text AnalyticsData Science - Part XI - Text Analytics
Data Science - Part XI - Text Analytics
Derek Kane
 
Data Science - Part X - Time Series Forecasting
Data Science - Part X - Time Series ForecastingData Science - Part X - Time Series Forecasting
Data Science - Part X - Time Series Forecasting
Derek Kane
 
Data Science - Part VIII - Artifical Neural Network
Data Science - Part VIII -  Artifical Neural NetworkData Science - Part VIII -  Artifical Neural Network
Data Science - Part VIII - Artifical Neural Network
Derek Kane
 
Data Science - Part VII - Cluster Analysis
Data Science - Part VII -  Cluster AnalysisData Science - Part VII -  Cluster Analysis
Data Science - Part VII - Cluster Analysis
Derek Kane
 
Data Science - Part VI - Market Basket and Product Recommendation Engines
Data Science - Part VI - Market Basket and Product Recommendation EnginesData Science - Part VI - Market Basket and Product Recommendation Engines
Data Science - Part VI - Market Basket and Product Recommendation Engines
Derek Kane
 
Data Science - Part V - Decision Trees & Random Forests
Data Science - Part V - Decision Trees & Random Forests Data Science - Part V - Decision Trees & Random Forests
Data Science - Part V - Decision Trees & Random Forests
Derek Kane
 
Data Science - Part IV - Regression Analysis & ANOVA
Data Science - Part IV - Regression Analysis & ANOVAData Science - Part IV - Regression Analysis & ANOVA
Data Science - Part IV - Regression Analysis & ANOVA
Derek Kane
 
Data Science - Part III - EDA & Model Selection
Data Science - Part III - EDA & Model SelectionData Science - Part III - EDA & Model Selection
Data Science - Part III - EDA & Model Selection
Derek Kane
 
Data Science - Part II - Working with R & R studio
Data Science - Part II -  Working with R & R studioData Science - Part II -  Working with R & R studio
Data Science - Part II - Working with R & R studio
Derek Kane
 
Data Science - Part I - Sustaining Predictive Analytics Capabilities
Data Science - Part I - Sustaining Predictive Analytics CapabilitiesData Science - Part I - Sustaining Predictive Analytics Capabilities
Data Science - Part I - Sustaining Predictive Analytics Capabilities
Derek Kane
 

More from Derek Kane (16)

Data Science - Part XVII - Deep Learning & Image Processing
Data Science - Part XVII - Deep Learning & Image ProcessingData Science - Part XVII - Deep Learning & Image Processing
Data Science - Part XVII - Deep Learning & Image Processing
 
Data Science - Part XVI - Fourier Analysis
Data Science - Part XVI - Fourier AnalysisData Science - Part XVI - Fourier Analysis
Data Science - Part XVI - Fourier Analysis
 
Data Science - Part XV - MARS, Logistic Regression, & Survival Analysis
Data Science -  Part XV - MARS, Logistic Regression, & Survival AnalysisData Science -  Part XV - MARS, Logistic Regression, & Survival Analysis
Data Science - Part XV - MARS, Logistic Regression, & Survival Analysis
 
Data Science - Part XIV - Genetic Algorithms
Data Science - Part XIV - Genetic AlgorithmsData Science - Part XIV - Genetic Algorithms
Data Science - Part XIV - Genetic Algorithms
 
Data Science - Part XIII - Hidden Markov Models
Data Science - Part XIII - Hidden Markov ModelsData Science - Part XIII - Hidden Markov Models
Data Science - Part XIII - Hidden Markov Models
 
Data Science - Part XII - Ridge Regression, LASSO, and Elastic Nets
Data Science - Part XII - Ridge Regression, LASSO, and Elastic NetsData Science - Part XII - Ridge Regression, LASSO, and Elastic Nets
Data Science - Part XII - Ridge Regression, LASSO, and Elastic Nets
 
Data Science - Part XI - Text Analytics
Data Science - Part XI - Text AnalyticsData Science - Part XI - Text Analytics
Data Science - Part XI - Text Analytics
 
Data Science - Part X - Time Series Forecasting
Data Science - Part X - Time Series ForecastingData Science - Part X - Time Series Forecasting
Data Science - Part X - Time Series Forecasting
 
Data Science - Part VIII - Artifical Neural Network
Data Science - Part VIII -  Artifical Neural NetworkData Science - Part VIII -  Artifical Neural Network
Data Science - Part VIII - Artifical Neural Network
 
Data Science - Part VII - Cluster Analysis
Data Science - Part VII -  Cluster AnalysisData Science - Part VII -  Cluster Analysis
Data Science - Part VII - Cluster Analysis
 
Data Science - Part VI - Market Basket and Product Recommendation Engines
Data Science - Part VI - Market Basket and Product Recommendation EnginesData Science - Part VI - Market Basket and Product Recommendation Engines
Data Science - Part VI - Market Basket and Product Recommendation Engines
 
Data Science - Part V - Decision Trees & Random Forests
Data Science - Part V - Decision Trees & Random Forests Data Science - Part V - Decision Trees & Random Forests
Data Science - Part V - Decision Trees & Random Forests
 
Data Science - Part IV - Regression Analysis & ANOVA
Data Science - Part IV - Regression Analysis & ANOVAData Science - Part IV - Regression Analysis & ANOVA
Data Science - Part IV - Regression Analysis & ANOVA
 
Data Science - Part III - EDA & Model Selection
Data Science - Part III - EDA & Model SelectionData Science - Part III - EDA & Model Selection
Data Science - Part III - EDA & Model Selection
 
Data Science - Part II - Working with R & R studio
Data Science - Part II -  Working with R & R studioData Science - Part II -  Working with R & R studio
Data Science - Part II - Working with R & R studio
 
Data Science - Part I - Sustaining Predictive Analytics Capabilities
Data Science - Part I - Sustaining Predictive Analytics CapabilitiesData Science - Part I - Sustaining Predictive Analytics Capabilities
Data Science - Part I - Sustaining Predictive Analytics Capabilities
 

Recently uploaded

Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)
TravisMalana
 
The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...
jerlynmaetalle
 
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
74nqk8xf
 
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
nuttdpt
 
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
nuttdpt
 
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
Social Samosa
 
Unleashing the Power of Data_ Choosing a Trusted Analytics Platform.pdf
Unleashing the Power of Data_ Choosing a Trusted Analytics Platform.pdfUnleashing the Power of Data_ Choosing a Trusted Analytics Platform.pdf
Unleashing the Power of Data_ Choosing a Trusted Analytics Platform.pdf
Enterprise Wired
 
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
slg6lamcq
 
The Building Blocks of QuestDB, a Time Series Database
The Building Blocks of QuestDB, a Time Series DatabaseThe Building Blocks of QuestDB, a Time Series Database
The Building Blocks of QuestDB, a Time Series Database
javier ramirez
 
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
sameer shah
 
Learn SQL from basic queries to Advance queries
Learn SQL from basic queries to Advance queriesLearn SQL from basic queries to Advance queries
Learn SQL from basic queries to Advance queries
manishkhaire30
 
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
Timothy Spann
 
原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样
原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样
原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样
u86oixdj
 
一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理
一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理
一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理
mzpolocfi
 
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
g4dpvqap0
 
Challenges of Nation Building-1.pptx with more important
Challenges of Nation Building-1.pptx with more importantChallenges of Nation Building-1.pptx with more important
Challenges of Nation Building-1.pptx with more important
Sm321
 
一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理
一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理
一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理
bopyb
 
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
ahzuo
 
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
v3tuleee
 
University of New South Wales degree offer diploma Transcript
University of New South Wales degree offer diploma TranscriptUniversity of New South Wales degree offer diploma Transcript
University of New South Wales degree offer diploma Transcript
soxrziqu
 

Recently uploaded (20)

Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)
 
The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...
 
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
 
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
 
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
 
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
 
Unleashing the Power of Data_ Choosing a Trusted Analytics Platform.pdf
Unleashing the Power of Data_ Choosing a Trusted Analytics Platform.pdfUnleashing the Power of Data_ Choosing a Trusted Analytics Platform.pdf
Unleashing the Power of Data_ Choosing a Trusted Analytics Platform.pdf
 
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
 
The Building Blocks of QuestDB, a Time Series Database
The Building Blocks of QuestDB, a Time Series DatabaseThe Building Blocks of QuestDB, a Time Series Database
The Building Blocks of QuestDB, a Time Series Database
 
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
 
Learn SQL from basic queries to Advance queries
Learn SQL from basic queries to Advance queriesLearn SQL from basic queries to Advance queries
Learn SQL from basic queries to Advance queries
 
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
 
原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样
原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样
原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样
 
一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理
一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理
一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理
 
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
 
Challenges of Nation Building-1.pptx with more important
Challenges of Nation Building-1.pptx with more importantChallenges of Nation Building-1.pptx with more important
Challenges of Nation Building-1.pptx with more important
 
一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理
一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理
一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理
 
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
 
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
 
University of New South Wales degree offer diploma Transcript
University of New South Wales degree offer diploma TranscriptUniversity of New South Wales degree offer diploma Transcript
University of New South Wales degree offer diploma Transcript
 

Data Science - Part IX - Support Vector Machine

  • 2.  What is a Support Vector Machine?  Support Vector Machine Applications  Linear Classifier Separators  Classification Margin  Maximum Margin / Support Vectors  Soft Margin  Non Linear Support Vector Machines  Feature Space / Holographic Projection  Kernels / Kernel Trick  Classification  Practical Application Example – Breast Cancer
  • 3.  SVMs were originally proposed by Boser, Guyon and Vapnik in 1992 and gained increasing popularity in late 1990s.  SVMs are currently among the best performers for a number of classification tasks ranging from text to genomic data.  A SVM is primarily used for determining the classification of a dichotomous binary response variable (0 or 1).  They are incredibly sensitive to noise in the data. A relatively small number of mislabeled examples can dramatically decrease the performance.  SVMs can be applied to complex data types beyond feature vectors (e.g. graphs, sequences, relational data) by designing kernel functions for such data.  SVM techniques have been extended to a number of tasks such as regression [Vapnik et al. ’97], principal component analysis [Schölkopf et al. ’99], etc.  Tuning SVMs remains a black art: selecting a specific kernel and parameters is usually done in a try-and-see manner.
  • 4. Numerous real world applications:  Hand-written character recognition.  Image classification (facial recognition).  Bioinformatics  Protein classification  Cancer classification  Text (and hypertext) categorization.
  • 5.  We can plot data along a 2-Dimensional feature space and find a separation between the different type of data. This space can be split using a linear separator called a hyperplane.  Is this a good split between the classes?
  • 6.  Which one of these hyperplanes create a better separation?  Is this a good split between the classes?  Or is this version better?
  • 7.  Which one of these hyperplanes creates the optimal separation?  How do you know?
  • 8.  Distance from example xi to the separator is:  Examples closest to the hyperplane are support vectors.  Margin ρ of the separator is the distance between support vectors.
  • 9.  A SVM relies on the principal that the optimal linear separator also maximizes the margin.  Maximizing the margin is good according to intuition and PAC theory.  Implies that only support vectors matter; other training examples are ignorable.  Most “important” training points are support vectors; they define the hyperplane.
  • 10.  What if the training set is noisy?  Solution 1: Utilize a soft margin calculation.  Solution 2: Use powerful kernels.
  • 11.  What if the training set is not linearly separable?  Slack variables ξi can be added to allow misclassification of difficult or noisy examples, resulting margin called soft.  Overfitting can be controlled by soft margin approach
  • 12.  Rather than fitting nonlinear curves to the data, SVM handles this by using a kernel function to map the data into a different space where a hyperplane can be used to do the separation.  This method is prone to over fitting and should be used sparingly.
  • 13.  Datasets that are linearly separable with some noise work out great:  But what are we going to do if the dataset is just too hard?  How about… mapping data to a higher-dimensional space:
  • 14. Φ: x → φ(x)  The original feature space of a non-linear SVM can be mapped to some higher-dimensional feature space where the training set is separable utilizing a kernel function. 2-Dimensional 3-Dimensional
  • 15.  Fitting hyperplanes as separators is mathematically easy. We can draw from the kernel function.  By replacing the raw input variables with a much larger set of features we get a nice property:  A planar separator in the high- dimensional space of feature vectors is a curved separator in the low dimensional space of the raw input variables.  Ex: A planar separator in a 20-Dimensional feature space projected back to the original 2-Dimensional space
  • 16.  The concept of a kernel mapping function is very powerful. It allows SVM models to perform separations even with very complex boundaries such as shown below.
  • 17.  If we map the input vectors into a very high-dimensional feature space, the task of finding the maximum- margin separator becomes computationally intractable.  The mathematics is all linear, which is good, but the vectors have a huge number of components. So taking the scalar product of two vectors is very expensive.  The way to keep things tractable is to use “the kernel trick”. Mercer’s theorem:  Every semi-positive definite symmetric function is a kernel
  • 18.  For many mappings from a low- dimensional space to a high- dimensional space, there is a simple operation on two vectors in the low- dimensional space that can be used to compute the scalar product of their two images in the high-dimensional space. )(.)(),( baba xxxxK   Letting the Kernel do the Work  doing the scalar products in the obvious way.
  • 19.  All of the computations that we need to do to find the maximum-margin separator can be expressed in terms of scalar products between pairs of datapoints (in the high- dimensional feature space).  These scalar products are the only part of the computation that depends on the dimensionality of the high-dimensional space.  So if we had a fast way to do the scalar products we would not have to pay a price for solving the learning problem in the high-dimensional space.  The kernel trick is just a magic way of doing scalar products a whole lot faster than is usually possible.  It relies on choosing a way of mapping to the high-dimensional feature space that allows fast scalar products.
  • 20.  There may be an infinite number of kernels which one can employ. Here are some of the more common kernels: * For the neural network kernel, there is one “hidden unit” per support vector, so the process of fitting the maximum margin hyperplane decides how many hidden units to use. Also, it may violate Mercer’s condition. )(tanh),( ),( )1.(),( 22 2/||||       x.yyx yx yxyx yx kK eK K p  Polynomial  Gaussian Radial Function  Neural Network*
  • 21.  The final classification rule is eloquently simple:  All the cleverness goes into selecting the support vectors that maximize the margin and computing the weight to use on each support vector.  We also need to choose a good kernel function and we may need to choose a lambda for dealing with non-separable cases.   SVs stest s xxKwbias  0),( The set of support vectors
  • 22.  Support Vector Machines work very well in practice.  The user must choose the kernel function and its parameters, but the rest is automatic.  The test performance is very good.  They can be expensive in time and space for big datasets.  The computation of the maximum-margin hyper-plane depends on the square of the number of training cases.  We need to store all the support vectors.  SVM’s are very good if you have no idea about what structure to impose on the task.  The kernel trick can also be used to do PCA in a much higher-dimensional space, thus giving a non-linear version of PCA in the original space.
  • 23.
  • 24.  The dataset contains information related to females with tumors of different characteristics and whether or not the tumor was benign or malignant.  The dataset contains 700 individuals, of which, 458 (65.5%) were benign and 241 (34.5%) were malignant.  Our goal is to devise:  A Support Vector Machine to classify case based upon tumor characteristics as benign or malignant.
  • 25.
  • 26.  First, lets run a tuning function to identify the best parameters to use for the SVM model.  The process runs a 10-fold cross validation methodology and identified the following:  Gamma = 0.01  Cost = 1  The E1071 package in R uses a linear kernel as its standard algorithm.
  • 27.  We created a random test sample of the dataset and included only the measurement variables. This data was not involved in the initial training of the Support Vector Machine but will be used to validate the results from passing the data through the SVM. Prediction Accuracy: 97.1%
  • 28.  Reside in Wayne, Illinois  Active Semi-Professional Classical Musician (Bassoon).  Married my wife on 10/10/10 and been together for 10 years.  Pet Yorkshire Terrier / Toy Poodle named Brunzie.  Pet Maine Coons’ named Maximus Power and Nemesis Gul du Cat.  Enjoy Cooking, Hiking, Cycling, Kayaking, and Astronomy.  Self proclaimed Data Nerd and Technology Lover.