SlideShare a Scribd company logo
1 of 80
Download to read offline
Kernel Based Methods
Colin Campbell,
Bristol University
1
In the second lecture we will:
ˆ show that the standard QP SVM is only one
optimisation approach: we can use formulations
which lead to linear programming, nonlinear
programming semi-definite programming (SDP) for
example.
2
ˆ show that there are further tasks beyond
classification/regression which can be handled by
kernel machines and we shall show that there are
different types of learning which can be performed
using kernel mechines.
3
ˆ we shall comment on different learning algorithms
and strategies for model selection (finding kernel
parameters).
ˆ we will consider different kernels including how to
combine kernels into composite kernels (data fusion).
4
2.1 Different types of optimisation: linear programming
and non-linear programming.
2.2 Other types of kernel problems beyond
classification/regression: novelty detection and its
applications
2.3 Different types of learning: passive and active
learning.
5
2.4 Training SVMs and other kernel machines.
2.5 Model Selection.
2.6 Different types of kernels and learning with composite
kernels.
6
2.1. A Linear Programming Approach to
Classification
The idea of kernel substitution isn’t just restricted to the
SVM framework, of course, and a very large range of
methods can be kernelised, including:
7
ˆ can kernelise simple neural network learning
algorithms such as the perceptron, minover and the
adatron (so we can handle non-linearly separable
datasets). You can try the kernel adatron this
afternoon.
ˆ kernel principal component analysis (kernel PCA) or
independent component analysis (kernel ICA).
ˆ can devise new types of kernelised learning machines
(e.g. the Bayes Point Machine).
8
For example, rather than using quadratic programming it
is also possible to derive a kernel classifier using linear
programming (LP) instead:
min
m
i=1
αi + C
m
i=1
ξi
9
subject to:
yi


m
j=1
αiK(xi, xj) + b

 ≥ 1 − ξi
where αi ≥ 0 and ξi ≥ 0.
This classifier is, in my experience, slightly slower to
train than normal QP SVM but it is robust.
10
2.2. Novelty Detection (one-class classifiers).
Other tasks can be handled beyond
classification/regression ...
E.g. for many real-world problems the task is not to
classify but to detect novel or abnormal instances.
Examples: condition monitoring or medical diagnosis.
11
One approach: model the support of a data distribution.
Create a binary-valued function which is positive in those
regions of input space where the normal data
predominantly lies and negative elsewhere.
Various schemes possible: LP (here) or QP.
12
Objective: find a surface in input space which wraps
around the data clusters: anything outside this surface is
viewed as abnormal.
Feature space: corresponds to a hyperplane which is
pulled onto the mapped datapoints.
13
14
This is achieved by minimising:
W(α, b) =
m
i=1


m
j=1
αjK(xi, xj) + b


15
subject to:
m
j=1
αjK(xi, xj) + b ≥ 0
m
i=1
αi = 1, αi ≥ 0
16
Bias b is just treated as an additional unrestricted sign
parameter.
Slack variables and a soft margin can also be introduced.
17
-0.3
-0.2
-0.1
0
0.1
0.2
0.3
-0.35 -0.3 -0.25 -0.2 -0.15 -0.1 -0.05 0 0.05 0.1 0.15 0.2
Figure 1: Hard margin and RBF kernel.
18
-0.8
-0.6
-0.4
-0.2
0
0.2
0.4
0.6
0.8
-0.8 -0.6 -0.4 -0.2 0 0.2 0.4 0.6 0.8
Figure 2: Using a soft margin (with λ = 10.0)
19
-0.3
-0.25
-0.2
-0.15
-0.1
-0.05
0
0.05
0.1
0.15
0.2
-0.3 -0.25 -0.2 -0.15 -0.1 -0.05 0 0.05 0.1 0.15 0.2
Figure 3: A modified RBF kernel
20
This basic model can be elaborated in various ways and
the choice of kernel parameter (model selection) is
important but it has worked well on real-life datasets e.g.
detecting anomalies in blood samples.
21
2.3. Different types of learning: passive vs active
learning.
Passive Learning: the learning machine receives
examples, learns these and attempts to generalize to new
instances.
Active Learning: the learning machine poses queries or
questions to the oracle or source of information (the
experimenatlist). With an efficient query learning
strategy it can learn the problem efficiently.
22
Comments on active learning:
ˆ Several strategies are possible:
ˆ Membership queries: the algorithm selects
unlabeled examples for the human expert to label
(e.g. handwritten character).
23
ˆ Creating queries: can create queries and ask for
the label or answer. But sometimes the invented
examples can be meaningless and impossible to label
(e.g. I invent an unrecognisable handwritten
character).
24
Worst case: possible to invent artificial examples for
which number of queries equals sample size (hence no
advantage to query learning)
but such adverse instances don’t commonly occur in
practice.
Average case analysis has been analysed using learning
theory and we find:
25
ˆ Active learning is better than passive learning.
ˆ For active learning the best choice for the next query
is a point lying within the current hyperplane or as
close as possible to the current hyperplane.
26
Support Vector Machines: construct hypothesis using the
most informative patterns in the data (the support
vectors) → good candidates for active learning.
27
Separating hyperplane
Margin
28
Suggests an algorithm:
ˆ Best points to query would be those points closest to
the current hyperplane (maximally ambiguous).
ˆ Querying a point within the margin band always
guarantees a gain ∆W > 0 whatever the label.
Querying points outside the band only gives gain if
current hypothesis predicts the label incorrectly.
29
29-1
Example: first let us consider a toy problem.
Simple rule (majority rule) in which we create binary
valued strings with components 0 or 1, e.g. if
(0, 1, 1, 0, 0, 0, 1, 0, 1, 0, 0, 0, 1, 0, 0, 1, 1, 0, 1, 0, 1, 1).
More 0’s than 1’s then target is 0, more 1’s than 0’s then
target 1.
30
0
5
10
15
20
25
30
35
40
0 20 40 60 80 100 120 140 160 180 200
31
ˆ How do we know when to stop asking queries?
ˆ Several criteria are possible.
ˆ One simple strategy (works for noiseless data) is to
make a prediction and see if it agrees with what you
find.
32
0
5
10
15
20
25
30
35
40
0 20 40 60 80 100 120 140 160 180 200
33
A real-life application: drug discovery
Each compound is described by a vector of 139,351
binary shape features and these vectors are typically
sparse (on average 1378 features per Round0 compound
and 7613 per Round1 compound)
34
Round0: dataset contained 1,316 chemically diverse
examples, only 39 of which are positive.
Evaluation is performed by virtual combinatorial
chemistry.
35
0 0.2 0.4 0.6 0.8 1
0
5
10
15
20
25
30
35
40
fraction of examples selected
numberofexamples
random true pos
random false pos
closest true pos
closest false pos
36
2.4. Training SVMs and other kernel machines.
So far the methods we have considered have involved
linear or quadratic programming.
Linear programming can be implemented using column
generation techniques (simplex or interior point
methods).
37
For quadratic programming e.g. conjugate gradient.
QP packages: MINOS and LOQO.
Problem: kernel matrix is stored in memory.
38
Alternatives split into three categories:
ˆ kernel components are evaluated and discarded
during learning,
ˆ working set methods in which an evolving subset of
data is used
ˆ algorithms that explicitly exploit the structure of the
problem.
39
2.4.1. Chunking and Decomposition: update the αi
but using only a subset or chunk of data at each stage.
QP routine optimizes the lagrangian on an initial
arbitrary subset of data.
The support vectors found are retained and all other
datapoints (with αi = 0) discarded.
40
New working set derived from these support vectors and
additional datapoints maximally violating storage
constraints.
This chunking process iterated until margin maximized.
41
Of course, this procedure may still fail.
If so decomposition methods provide a better approach:
these algorithms only use a fixed size subset of data with
the αi for the remainder kept fixed.
42
Decomposition and Sequential Minimal
optimization (SMO).
Limiting case of decomposition in which only two αi are
optimized at each iteration.
The smallest set of parameters which can be optimized
with each iteration is plainly two if the constraint m
i=1
αiyi = 0 is to hold.
43
Possible to derive an analytical solution which can be
executed using few numerical operations.
Can handle classification, regression and other tasks such
as novelty detection.
44
2.4.2. Further algorithms.
Directly approach training from an optimization
perspective and create new algorithms.
Keerthi et al have proposed a very effective binary
classification algorithm based on the dual geometry of
finding the two closest points in the convex hulls.
45
The Lagrangian SVM (LSVM) method of Mangasarian
and Musicant reformulates the classification problem as
an unconstrained optimization task and then solves
problem using an algorithm requiring the solution of
systems of linear equalities.
LSVM can solve linear classification problems for millions
of points in minutes.
46
The interior-point and Semi-Smooth Support Vector
Methods of Ferris and Munson can be used to solve
linear classification problems with up to 60 million data
points in 34 dimensions.
47
2.5. Model selection and Kernel Messaging
2.5.1. Model Selection. An important issue is the
choice of kernel parameter (the σ in the Gaussian kernel,
degree d of the polynomial kernel, etc).
If this parameter is poorly chosen the hypothesis
modelling the data can be oversimple or too complex
leading to poor generalisation.
48
4
5
6
7
8
9
10
11
12
13
2 3 4 5 6 7 8 9 10
49
The best value for this parameter can be found using
cross-validation of course.
However, cross-validation is wasteful of data and it would
be better to have a theoretical handle on the best choice
for this parameter.
50
A number of schemes have been proposed to indicate a
good choice for the kernel parameter without recourse to
validation data.
We will only illustrate the idea with one of these.
51
Theorem: the number of leave-one-out errors of an
L1-norm soft margin SVM is bounded by:
|{i : (2αiB2
+ ξi) ≥ 1}|
m
where αi are the solutions of the SVM optimization task,
B2
an upper bound on K(xi, xi) with K(xi, xj) ≥ 0.
52
Thus, for a given value of the kernel parameter, the
leave-one-out error is estimated from this quantity.
The kernel parameter is then incremented or decremented
in the direction needed to lower the LOO bound.
53
2.5.2. Kernel Massaging and Semi-Definite
Programming
Example: the existence of missing values is a common
problem in many application domains.
54
We complete the kernel matrix (compensating for missing
values) by solving the following semi-definite
programming problem:
55
minK |W · (K − K∗
)|2
subject to:
AK = b A ≥ 0
56
where K∗
is the current kernel matrix, A the matrix of
constraint coefficients, and b are slack variables to make
inequalities up to equalities. W is a masking matrix of
weights (0 or 1, with Wii > 0)
57
2.6. Different types of kernels and learning with
composite kernels
2.6.1. Different types of kernels
Keenels have been derived for a number of data objects
and here we will only consider one: string kernels
Consider the following text strings car, cat, cart, chart.
They have certain similarities despite being of unequal
length. dug is of equal length to car and cat but differs
in all letters.
58
A measure of the degree of alignment of two text strings
or sequences can be found using algorithmic techniques
e.g. edit codes (dynamic programming).
Frequent applications in
ˆ bioinformatics e.g. Smith-Waterman,
Waterman-Eggert algorithm (4-digit strings ...
ACCGTATGTAAA ... from genomes),
ˆ text processing (e.g. WWW), etc.
59
An appropriate kernel can be defined by a recurrence
relation:
K(s, ) = 1
K(sa, t) = K(s, t) +
k:tk=a
K (s, t(1; k − 1))
60
For a comparison of two strings with elements s, t, etc
and the empty string.
Each element of the kernel quantifies the sequence
similarity and the matrix as a whole is positive definite.
61
2.6.2. Data fusion using composite kernels
If we have kernels for different data objects we can
combine then to create classifiers capable of handling
disparate types of data using composite kernels:
K(x1, x2) =
j
βjKj(x1, x2)
First proposed Gunn and Kandola, 2002.
62
Example: Gert Lanckreit et al
Ref: JMLR 5 (2004) p. 27-72.
Task: predict functional classifications associated with
yeast proteins (MIPS Yeast Genome Database).
Five different types of kernels used:
63
ˆ amino acid sequences (inner product kernels,
Smith-Waterman pairwise sequence comparison
algorithm kernel),
ˆ protein-protein interactions (graph kernel, diffusion
kernel of Kondor-Lafferty),
ˆ genetic interactions (graph kernel),
ˆ protein complex data (weak interactions, graph
kernel),
ˆ expression data (Gaussian kernel)
64
Find that use of all 5 kernels better than using a single
kernel.
Complex story but basically 13 functional classes:
improvement 0.71 to 0.85 for fraction correctly classified
on unseen hold-out data (0.71 is best comparator).
65
. . . but researchers are inclined to present appealing
results in their papers . . .
If one type of data is dominant (e.g. microarray versus
graph information and microarray much more plentiful)
then learning algorithm can collapse onto one kernel
(K = β1K1 + β2K2 and β2 → 0, say).
66
Despite this problem there are various successful
applications (e.g. multimedia web page classification)
and a number of proposed schemes:
ˆ semi-definite programming (Lanckreit et al), leads to
a quadratically constrained quadratic programme
(QCQP).
ˆ boosting schemes (Crammet et al, 2003, Bennett et
al, 2002),
ˆ a kernelised Fisher discriminant approach (Fung et al
2004),
67
ˆ a kernel on kernels (hyperkernels, Ong et al, 2003),
ˆ a Bayesian strategy to find the βj coefficients for
classification and regression (Mark Girolami and
Simon Rogers, 2005, preprint, 2005), etc.
68
2.5. Applications of SVMs
Vast number number of applications: too many to
summarise here ...
Just two applications to show how useful SVMs are ...
69
Example 1.
ˆ Prediction of relapse/non relapse for Wilm’s tumour
using microarray technology.
ˆ affects children/young adults
ˆ with Richard Williams et al., ICR, London (Genes,
Chromosomes and Cancer, 2004; 41: 65-79).
70
DNA
FRAGMENT AND LABEL
ENDS WITH FLUORESCENT
LABELS FOR EXAMPLE
FRAGMENTS HYBRIDIZE WITH
DNA PROBES ON A
SUBSTRATE
USE LASER EXCITATION
TO FIND HOW MUCH mRNA
DEPOSITED AT A PROBE
SITE (INDICATES LEVEL
OF EXPRESSION OF
CORRESPONDING GENE)
THE PROCESS IS USUALLY PERFORMED WITH A CONTROL TO IMPROVE
ACCURACY
71
72
73
ˆ 29 examples balanced dataset with 17836 features
(probes)
ˆ used a Support Vector Machine classifier with linear
kernel and filter method for feature scoring.
74
ˆ impute missing values,
ˆ log data,
ˆ use leave-one-out testing
75
Number of test errors (y-axis) versus number of
remaining features (x-axis).
0
2
4
6
8
10
12
14
16
0 10 20 30 40 50 60 70 80 90 100
76
Example 2
Recognition of ZIP/postal codes: very important real-life
application.
Standard benchmarking dataset is NIST (60,000
handwritten characters).
AT&T (Bell Labs): state-of-the-art was a layered neural
networks (LeNet5, Yann Le Cun, et al).
77
Dennis de Coste/Bernhard Scholkopf (2002): used an
SVM but with virtual training vectors as a means of
ensuring invariant pattern recognition.
Improves on previous best by 0.15% test error reduction.
78
3. Conclusion.
Kernel methods: a powerful and systematic approach to
classification, regression, novelty detection and many
other machine learning problems.
Now widely used in many application domains including
bioinformatics, machine vision, text analysis, finance, ...
79

More Related Content

What's hot

15857 cse422 unsupervised-learning
15857 cse422 unsupervised-learning15857 cse422 unsupervised-learning
15857 cse422 unsupervised-learningAnil Yadav
 
ppt slides
ppt slidesppt slides
ppt slidesbutest
 
Unsupervised learning
Unsupervised learningUnsupervised learning
Unsupervised learningamalalhait
 
Distance-based bias in model-directed optimization of additively decomposable...
Distance-based bias in model-directed optimization of additively decomposable...Distance-based bias in model-directed optimization of additively decomposable...
Distance-based bias in model-directed optimization of additively decomposable...Martin Pelikan
 
PPT - Deep and Confident Prediction For Time Series at Uber
PPT - Deep and Confident Prediction For Time Series at UberPPT - Deep and Confident Prediction For Time Series at Uber
PPT - Deep and Confident Prediction For Time Series at UberJisang Yoon
 
Amnestic neural network for classification
Amnestic neural network for classificationAmnestic neural network for classification
Amnestic neural network for classificationlolokikipipi
 
IMAGE CLASSIFICATION USING DIFFERENT CLASSICAL APPROACHES
IMAGE CLASSIFICATION USING DIFFERENT CLASSICAL APPROACHESIMAGE CLASSIFICATION USING DIFFERENT CLASSICAL APPROACHES
IMAGE CLASSIFICATION USING DIFFERENT CLASSICAL APPROACHESVikash Kumar
 
Comparison Between Clustering Algorithms for Microarray Data Analysis
Comparison Between Clustering Algorithms for Microarray Data AnalysisComparison Between Clustering Algorithms for Microarray Data Analysis
Comparison Between Clustering Algorithms for Microarray Data AnalysisIOSR Journals
 
ANALYTICAL STUDY OF FEATURE EXTRACTION TECHNIQUES IN OPINION MINING
ANALYTICAL STUDY OF FEATURE EXTRACTION TECHNIQUES IN OPINION MININGANALYTICAL STUDY OF FEATURE EXTRACTION TECHNIQUES IN OPINION MINING
ANALYTICAL STUDY OF FEATURE EXTRACTION TECHNIQUES IN OPINION MININGcsandit
 
Analytical study of feature extraction techniques in opinion mining
Analytical study of feature extraction techniques in opinion miningAnalytical study of feature extraction techniques in opinion mining
Analytical study of feature extraction techniques in opinion miningcsandit
 
Text Classification/Categorization
Text Classification/CategorizationText Classification/Categorization
Text Classification/CategorizationOswal Abhishek
 
OPTIMIZATION AS A MODEL FOR FEW-SHOT LEARNING
 OPTIMIZATION AS A MODEL FOR FEW-SHOT LEARNING OPTIMIZATION AS A MODEL FOR FEW-SHOT LEARNING
OPTIMIZATION AS A MODEL FOR FEW-SHOT LEARNINGMLReview
 
Methodological study of opinion mining and sentiment analysis techniques
Methodological study of opinion mining and sentiment analysis techniquesMethodological study of opinion mining and sentiment analysis techniques
Methodological study of opinion mining and sentiment analysis techniquesijsc
 
Random Valued Impulse Noise Elimination using Neural Filter
Random Valued Impulse Noise Elimination using Neural FilterRandom Valued Impulse Noise Elimination using Neural Filter
Random Valued Impulse Noise Elimination using Neural FilterEditor IJCATR
 
Types of Machine Learnig Algorithms(CART, ID3)
Types of Machine Learnig Algorithms(CART, ID3)Types of Machine Learnig Algorithms(CART, ID3)
Types of Machine Learnig Algorithms(CART, ID3)Fatimakhan325
 
Robust Visual Tracking Based on Sparse PCA-L1
Robust Visual Tracking Based on Sparse PCA-L1Robust Visual Tracking Based on Sparse PCA-L1
Robust Visual Tracking Based on Sparse PCA-L1csandit
 
Chapter 09 class advanced
Chapter 09 class advancedChapter 09 class advanced
Chapter 09 class advancedHouw Liong The
 

What's hot (20)

15857 cse422 unsupervised-learning
15857 cse422 unsupervised-learning15857 cse422 unsupervised-learning
15857 cse422 unsupervised-learning
 
O18020393104
O18020393104O18020393104
O18020393104
 
ppt slides
ppt slidesppt slides
ppt slides
 
Unsupervised learning
Unsupervised learningUnsupervised learning
Unsupervised learning
 
Distance-based bias in model-directed optimization of additively decomposable...
Distance-based bias in model-directed optimization of additively decomposable...Distance-based bias in model-directed optimization of additively decomposable...
Distance-based bias in model-directed optimization of additively decomposable...
 
PPT - Deep and Confident Prediction For Time Series at Uber
PPT - Deep and Confident Prediction For Time Series at UberPPT - Deep and Confident Prediction For Time Series at Uber
PPT - Deep and Confident Prediction For Time Series at Uber
 
Amnestic neural network for classification
Amnestic neural network for classificationAmnestic neural network for classification
Amnestic neural network for classification
 
IMAGE CLASSIFICATION USING DIFFERENT CLASSICAL APPROACHES
IMAGE CLASSIFICATION USING DIFFERENT CLASSICAL APPROACHESIMAGE CLASSIFICATION USING DIFFERENT CLASSICAL APPROACHES
IMAGE CLASSIFICATION USING DIFFERENT CLASSICAL APPROACHES
 
Comparison Between Clustering Algorithms for Microarray Data Analysis
Comparison Between Clustering Algorithms for Microarray Data AnalysisComparison Between Clustering Algorithms for Microarray Data Analysis
Comparison Between Clustering Algorithms for Microarray Data Analysis
 
ANALYTICAL STUDY OF FEATURE EXTRACTION TECHNIQUES IN OPINION MINING
ANALYTICAL STUDY OF FEATURE EXTRACTION TECHNIQUES IN OPINION MININGANALYTICAL STUDY OF FEATURE EXTRACTION TECHNIQUES IN OPINION MINING
ANALYTICAL STUDY OF FEATURE EXTRACTION TECHNIQUES IN OPINION MINING
 
Analytical study of feature extraction techniques in opinion mining
Analytical study of feature extraction techniques in opinion miningAnalytical study of feature extraction techniques in opinion mining
Analytical study of feature extraction techniques in opinion mining
 
Text Classification/Categorization
Text Classification/CategorizationText Classification/Categorization
Text Classification/Categorization
 
OPTIMIZATION AS A MODEL FOR FEW-SHOT LEARNING
 OPTIMIZATION AS A MODEL FOR FEW-SHOT LEARNING OPTIMIZATION AS A MODEL FOR FEW-SHOT LEARNING
OPTIMIZATION AS A MODEL FOR FEW-SHOT LEARNING
 
Methodological study of opinion mining and sentiment analysis techniques
Methodological study of opinion mining and sentiment analysis techniquesMethodological study of opinion mining and sentiment analysis techniques
Methodological study of opinion mining and sentiment analysis techniques
 
Random Valued Impulse Noise Elimination using Neural Filter
Random Valued Impulse Noise Elimination using Neural FilterRandom Valued Impulse Noise Elimination using Neural Filter
Random Valued Impulse Noise Elimination using Neural Filter
 
Types of Machine Learnig Algorithms(CART, ID3)
Types of Machine Learnig Algorithms(CART, ID3)Types of Machine Learnig Algorithms(CART, ID3)
Types of Machine Learnig Algorithms(CART, ID3)
 
Support Vector Machines
Support Vector MachinesSupport Vector Machines
Support Vector Machines
 
EiB Seminar from Esteban Vegas, Ph.D.
EiB Seminar from Esteban Vegas, Ph.D. EiB Seminar from Esteban Vegas, Ph.D.
EiB Seminar from Esteban Vegas, Ph.D.
 
Robust Visual Tracking Based on Sparse PCA-L1
Robust Visual Tracking Based on Sparse PCA-L1Robust Visual Tracking Based on Sparse PCA-L1
Robust Visual Tracking Based on Sparse PCA-L1
 
Chapter 09 class advanced
Chapter 09 class advancedChapter 09 class advanced
Chapter 09 class advanced
 

Viewers also liked

Least squares support Vector Machine Classifier
Least squares support Vector Machine ClassifierLeast squares support Vector Machine Classifier
Least squares support Vector Machine ClassifierRaj Sikarwar
 
Game theoretic concepts in Support Vector Machines
Game theoretic concepts in Support Vector MachinesGame theoretic concepts in Support Vector Machines
Game theoretic concepts in Support Vector MachinesSubhayan Mukerjee
 
Support Vector Machines
Support Vector MachinesSupport Vector Machines
Support Vector Machinesguestfee8698
 
Kernels and Support Vector Machines
Kernels and Support Vector  MachinesKernels and Support Vector  Machines
Kernels and Support Vector MachinesEdgar Marca
 
Q-Metric Based Support Vector Machines
Q-Metric Based Support Vector MachinesQ-Metric Based Support Vector Machines
Q-Metric Based Support Vector MachinesMagdi Mohamed
 
Multiple Kernel Learning based Approach to Representation and Feature Selecti...
Multiple Kernel Learning based Approach to Representation and Feature Selecti...Multiple Kernel Learning based Approach to Representation and Feature Selecti...
Multiple Kernel Learning based Approach to Representation and Feature Selecti...ICAC09
 
2.6 support vector machines and associative classifiers revised
2.6 support vector machines and associative classifiers revised2.6 support vector machines and associative classifiers revised
2.6 support vector machines and associative classifiers revisedKrish_ver2
 
Lecture 9 - Machine Learning and Support Vector Machines (SVM)
Lecture 9 - Machine Learning and Support Vector Machines (SVM)Lecture 9 - Machine Learning and Support Vector Machines (SVM)
Lecture 9 - Machine Learning and Support Vector Machines (SVM)Sean Golliher
 
Destek Vektör Makineleri - Support Vector Machine
Destek Vektör Makineleri - Support Vector MachineDestek Vektör Makineleri - Support Vector Machine
Destek Vektör Makineleri - Support Vector MachineOğuzhan TAŞ Akademi
 
Destek vektör makineleri
Destek vektör makineleriDestek vektör makineleri
Destek vektör makineleriozgur_dolgun
 
Support Vector Machine without tears
Support Vector Machine without tearsSupport Vector Machine without tears
Support Vector Machine without tearsAnkit Sharma
 
Support Vector Machines
Support Vector MachinesSupport Vector Machines
Support Vector Machinesnextlib
 

Viewers also liked (13)

Least squares support Vector Machine Classifier
Least squares support Vector Machine ClassifierLeast squares support Vector Machine Classifier
Least squares support Vector Machine Classifier
 
Game theoretic concepts in Support Vector Machines
Game theoretic concepts in Support Vector MachinesGame theoretic concepts in Support Vector Machines
Game theoretic concepts in Support Vector Machines
 
Support Vector Machines
Support Vector MachinesSupport Vector Machines
Support Vector Machines
 
Kernels and Support Vector Machines
Kernels and Support Vector  MachinesKernels and Support Vector  Machines
Kernels and Support Vector Machines
 
Q-Metric Based Support Vector Machines
Q-Metric Based Support Vector MachinesQ-Metric Based Support Vector Machines
Q-Metric Based Support Vector Machines
 
Multiple Kernel Learning based Approach to Representation and Feature Selecti...
Multiple Kernel Learning based Approach to Representation and Feature Selecti...Multiple Kernel Learning based Approach to Representation and Feature Selecti...
Multiple Kernel Learning based Approach to Representation and Feature Selecti...
 
2.6 support vector machines and associative classifiers revised
2.6 support vector machines and associative classifiers revised2.6 support vector machines and associative classifiers revised
2.6 support vector machines and associative classifiers revised
 
Lecture 9 - Machine Learning and Support Vector Machines (SVM)
Lecture 9 - Machine Learning and Support Vector Machines (SVM)Lecture 9 - Machine Learning and Support Vector Machines (SVM)
Lecture 9 - Machine Learning and Support Vector Machines (SVM)
 
Destek Vektör Makineleri - Support Vector Machine
Destek Vektör Makineleri - Support Vector MachineDestek Vektör Makineleri - Support Vector Machine
Destek Vektör Makineleri - Support Vector Machine
 
Destek vektör makineleri
Destek vektör makineleriDestek vektör makineleri
Destek vektör makineleri
 
Support Vector Machine without tears
Support Vector Machine without tearsSupport Vector Machine without tears
Support Vector Machine without tears
 
Support Vector Machines
Support Vector MachinesSupport Vector Machines
Support Vector Machines
 
Bayes Aglari
Bayes AglariBayes Aglari
Bayes Aglari
 

Similar to Epsrcws08 campbell kbm_01

LNCS 5050 - Bilevel Optimization and Machine Learning
LNCS 5050 - Bilevel Optimization and Machine LearningLNCS 5050 - Bilevel Optimization and Machine Learning
LNCS 5050 - Bilevel Optimization and Machine Learningbutest
 
Machine learning interview questions and answers
Machine learning interview questions and answersMachine learning interview questions and answers
Machine learning interview questions and answerskavinilavuG
 
Surface features with nonparametric machine learning
Surface features with nonparametric machine learningSurface features with nonparametric machine learning
Surface features with nonparametric machine learningSylvain Ferrandiz
 
IRJET - A Survey on Machine Learning Algorithms, Techniques and Applications
IRJET - A Survey on Machine Learning Algorithms, Techniques and ApplicationsIRJET - A Survey on Machine Learning Algorithms, Techniques and Applications
IRJET - A Survey on Machine Learning Algorithms, Techniques and ApplicationsIRJET Journal
 
November, 2006 CCKM'06 1
November, 2006 CCKM'06 1 November, 2006 CCKM'06 1
November, 2006 CCKM'06 1 butest
 
Huong dan cu the svm
Huong dan cu the svmHuong dan cu the svm
Huong dan cu the svmtaikhoan262
 
Evaluation of a hybrid method for constructing multiple SVM kernels
Evaluation of a hybrid method for constructing multiple SVM kernelsEvaluation of a hybrid method for constructing multiple SVM kernels
Evaluation of a hybrid method for constructing multiple SVM kernelsinfopapers
 
Intro to machine learning
Intro to machine learningIntro to machine learning
Intro to machine learningAkshay Kanchan
 
Machine learning (5)
Machine learning (5)Machine learning (5)
Machine learning (5)NYversity
 
StackNet Meta-Modelling framework
StackNet Meta-Modelling frameworkStackNet Meta-Modelling framework
StackNet Meta-Modelling frameworkSri Ambati
 
SVM - Functional Verification
SVM - Functional VerificationSVM - Functional Verification
SVM - Functional VerificationSai Kiran Kadam
 
Data clustering using kernel based
Data clustering using kernel basedData clustering using kernel based
Data clustering using kernel basedIJITCA Journal
 
Adapted Branch-and-Bound Algorithm Using SVM With Model Selection
Adapted Branch-and-Bound Algorithm Using SVM With Model SelectionAdapted Branch-and-Bound Algorithm Using SVM With Model Selection
Adapted Branch-and-Bound Algorithm Using SVM With Model SelectionIJECEIAES
 
A Novel Approach to Mathematical Concepts in Data Mining
A Novel Approach to Mathematical Concepts in Data MiningA Novel Approach to Mathematical Concepts in Data Mining
A Novel Approach to Mathematical Concepts in Data Miningijdmtaiir
 
Support Vector Machine.pptx
Support Vector Machine.pptxSupport Vector Machine.pptx
Support Vector Machine.pptxHarishNayak44
 
Binary Class and Multi Class Strategies for Machine Learning
Binary Class and Multi Class Strategies for Machine LearningBinary Class and Multi Class Strategies for Machine Learning
Binary Class and Multi Class Strategies for Machine LearningPaxcel Technologies
 

Similar to Epsrcws08 campbell kbm_01 (20)

LNCS 5050 - Bilevel Optimization and Machine Learning
LNCS 5050 - Bilevel Optimization and Machine LearningLNCS 5050 - Bilevel Optimization and Machine Learning
LNCS 5050 - Bilevel Optimization and Machine Learning
 
deep CNN vs conventional ML
deep CNN vs conventional MLdeep CNN vs conventional ML
deep CNN vs conventional ML
 
Machine learning interview questions and answers
Machine learning interview questions and answersMachine learning interview questions and answers
Machine learning interview questions and answers
 
lecture_16.pptx
lecture_16.pptxlecture_16.pptx
lecture_16.pptx
 
Surface features with nonparametric machine learning
Surface features with nonparametric machine learningSurface features with nonparametric machine learning
Surface features with nonparametric machine learning
 
IRJET - A Survey on Machine Learning Algorithms, Techniques and Applications
IRJET - A Survey on Machine Learning Algorithms, Techniques and ApplicationsIRJET - A Survey on Machine Learning Algorithms, Techniques and Applications
IRJET - A Survey on Machine Learning Algorithms, Techniques and Applications
 
November, 2006 CCKM'06 1
November, 2006 CCKM'06 1 November, 2006 CCKM'06 1
November, 2006 CCKM'06 1
 
Text categorization
Text categorizationText categorization
Text categorization
 
Guide
GuideGuide
Guide
 
Huong dan cu the svm
Huong dan cu the svmHuong dan cu the svm
Huong dan cu the svm
 
Evaluation of a hybrid method for constructing multiple SVM kernels
Evaluation of a hybrid method for constructing multiple SVM kernelsEvaluation of a hybrid method for constructing multiple SVM kernels
Evaluation of a hybrid method for constructing multiple SVM kernels
 
Intro to machine learning
Intro to machine learningIntro to machine learning
Intro to machine learning
 
Machine learning (5)
Machine learning (5)Machine learning (5)
Machine learning (5)
 
StackNet Meta-Modelling framework
StackNet Meta-Modelling frameworkStackNet Meta-Modelling framework
StackNet Meta-Modelling framework
 
SVM - Functional Verification
SVM - Functional VerificationSVM - Functional Verification
SVM - Functional Verification
 
Data clustering using kernel based
Data clustering using kernel basedData clustering using kernel based
Data clustering using kernel based
 
Adapted Branch-and-Bound Algorithm Using SVM With Model Selection
Adapted Branch-and-Bound Algorithm Using SVM With Model SelectionAdapted Branch-and-Bound Algorithm Using SVM With Model Selection
Adapted Branch-and-Bound Algorithm Using SVM With Model Selection
 
A Novel Approach to Mathematical Concepts in Data Mining
A Novel Approach to Mathematical Concepts in Data MiningA Novel Approach to Mathematical Concepts in Data Mining
A Novel Approach to Mathematical Concepts in Data Mining
 
Support Vector Machine.pptx
Support Vector Machine.pptxSupport Vector Machine.pptx
Support Vector Machine.pptx
 
Binary Class and Multi Class Strategies for Machine Learning
Binary Class and Multi Class Strategies for Machine LearningBinary Class and Multi Class Strategies for Machine Learning
Binary Class and Multi Class Strategies for Machine Learning
 

More from Cheng Feng

Strata singapore survey
Strata singapore surveyStrata singapore survey
Strata singapore surveyCheng Feng
 
Sparkcamp stratasingapore
Sparkcamp stratasingaporeSparkcamp stratasingapore
Sparkcamp stratasingaporeCheng Feng
 
运营商去O浅析 公开版-王晓征
运营商去O浅析 公开版-王晓征运营商去O浅析 公开版-王晓征
运营商去O浅析 公开版-王晓征Cheng Feng
 
数据库架构师做什么 58同城数据库架构设计思路-沈剑
数据库架构师做什么 58同城数据库架构设计思路-沈剑数据库架构师做什么 58同城数据库架构设计思路-沈剑
数据库架构师做什么 58同城数据库架构设计思路-沈剑Cheng Feng
 
Tdsql在微众银行核心交易系统中的实践 雷海林
Tdsql在微众银行核心交易系统中的实践 雷海林Tdsql在微众银行核心交易系统中的实践 雷海林
Tdsql在微众银行核心交易系统中的实践 雷海林Cheng Feng
 
Maria db新特性剖析 京东张金鹏
Maria db新特性剖析 京东张金鹏Maria db新特性剖析 京东张金鹏
Maria db新特性剖析 京东张金鹏Cheng Feng
 
Inception自动审核系统设计与实现 王竹峰
Inception自动审核系统设计与实现 王竹峰Inception自动审核系统设计与实现 王竹峰
Inception自动审核系统设计与实现 王竹峰Cheng Feng
 
Maria db新特性剖析 京东张金鹏
Maria db新特性剖析 京东张金鹏Maria db新特性剖析 京东张金鹏
Maria db新特性剖析 京东张金鹏Cheng Feng
 
Epsrcws08 campbell isvm_01
Epsrcws08 campbell isvm_01Epsrcws08 campbell isvm_01
Epsrcws08 campbell isvm_01Cheng Feng
 

More from Cheng Feng (9)

Strata singapore survey
Strata singapore surveyStrata singapore survey
Strata singapore survey
 
Sparkcamp stratasingapore
Sparkcamp stratasingaporeSparkcamp stratasingapore
Sparkcamp stratasingapore
 
运营商去O浅析 公开版-王晓征
运营商去O浅析 公开版-王晓征运营商去O浅析 公开版-王晓征
运营商去O浅析 公开版-王晓征
 
数据库架构师做什么 58同城数据库架构设计思路-沈剑
数据库架构师做什么 58同城数据库架构设计思路-沈剑数据库架构师做什么 58同城数据库架构设计思路-沈剑
数据库架构师做什么 58同城数据库架构设计思路-沈剑
 
Tdsql在微众银行核心交易系统中的实践 雷海林
Tdsql在微众银行核心交易系统中的实践 雷海林Tdsql在微众银行核心交易系统中的实践 雷海林
Tdsql在微众银行核心交易系统中的实践 雷海林
 
Maria db新特性剖析 京东张金鹏
Maria db新特性剖析 京东张金鹏Maria db新特性剖析 京东张金鹏
Maria db新特性剖析 京东张金鹏
 
Inception自动审核系统设计与实现 王竹峰
Inception自动审核系统设计与实现 王竹峰Inception自动审核系统设计与实现 王竹峰
Inception自动审核系统设计与实现 王竹峰
 
Maria db新特性剖析 京东张金鹏
Maria db新特性剖析 京东张金鹏Maria db新特性剖析 京东张金鹏
Maria db新特性剖析 京东张金鹏
 
Epsrcws08 campbell isvm_01
Epsrcws08 campbell isvm_01Epsrcws08 campbell isvm_01
Epsrcws08 campbell isvm_01
 

Recently uploaded

Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024The Digital Insurer
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
costume and set research powerpoint presentation
costume and set research powerpoint presentationcostume and set research powerpoint presentation
costume and set research powerpoint presentationphoebematthew05
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piececharlottematthew16
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 

Recently uploaded (20)

Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
costume and set research powerpoint presentation
costume and set research powerpoint presentationcostume and set research powerpoint presentation
costume and set research powerpoint presentation
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piece
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 

Epsrcws08 campbell kbm_01

  • 1. Kernel Based Methods Colin Campbell, Bristol University 1
  • 2. In the second lecture we will: ˆ show that the standard QP SVM is only one optimisation approach: we can use formulations which lead to linear programming, nonlinear programming semi-definite programming (SDP) for example. 2
  • 3. ˆ show that there are further tasks beyond classification/regression which can be handled by kernel machines and we shall show that there are different types of learning which can be performed using kernel mechines. 3
  • 4. ˆ we shall comment on different learning algorithms and strategies for model selection (finding kernel parameters). ˆ we will consider different kernels including how to combine kernels into composite kernels (data fusion). 4
  • 5. 2.1 Different types of optimisation: linear programming and non-linear programming. 2.2 Other types of kernel problems beyond classification/regression: novelty detection and its applications 2.3 Different types of learning: passive and active learning. 5
  • 6. 2.4 Training SVMs and other kernel machines. 2.5 Model Selection. 2.6 Different types of kernels and learning with composite kernels. 6
  • 7. 2.1. A Linear Programming Approach to Classification The idea of kernel substitution isn’t just restricted to the SVM framework, of course, and a very large range of methods can be kernelised, including: 7
  • 8. ˆ can kernelise simple neural network learning algorithms such as the perceptron, minover and the adatron (so we can handle non-linearly separable datasets). You can try the kernel adatron this afternoon. ˆ kernel principal component analysis (kernel PCA) or independent component analysis (kernel ICA). ˆ can devise new types of kernelised learning machines (e.g. the Bayes Point Machine). 8
  • 9. For example, rather than using quadratic programming it is also possible to derive a kernel classifier using linear programming (LP) instead: min m i=1 αi + C m i=1 ξi 9
  • 10. subject to: yi   m j=1 αiK(xi, xj) + b   ≥ 1 − ξi where αi ≥ 0 and ξi ≥ 0. This classifier is, in my experience, slightly slower to train than normal QP SVM but it is robust. 10
  • 11. 2.2. Novelty Detection (one-class classifiers). Other tasks can be handled beyond classification/regression ... E.g. for many real-world problems the task is not to classify but to detect novel or abnormal instances. Examples: condition monitoring or medical diagnosis. 11
  • 12. One approach: model the support of a data distribution. Create a binary-valued function which is positive in those regions of input space where the normal data predominantly lies and negative elsewhere. Various schemes possible: LP (here) or QP. 12
  • 13. Objective: find a surface in input space which wraps around the data clusters: anything outside this surface is viewed as abnormal. Feature space: corresponds to a hyperplane which is pulled onto the mapped datapoints. 13
  • 14. 14
  • 15. This is achieved by minimising: W(α, b) = m i=1   m j=1 αjK(xi, xj) + b   15
  • 16. subject to: m j=1 αjK(xi, xj) + b ≥ 0 m i=1 αi = 1, αi ≥ 0 16
  • 17. Bias b is just treated as an additional unrestricted sign parameter. Slack variables and a soft margin can also be introduced. 17
  • 18. -0.3 -0.2 -0.1 0 0.1 0.2 0.3 -0.35 -0.3 -0.25 -0.2 -0.15 -0.1 -0.05 0 0.05 0.1 0.15 0.2 Figure 1: Hard margin and RBF kernel. 18
  • 19. -0.8 -0.6 -0.4 -0.2 0 0.2 0.4 0.6 0.8 -0.8 -0.6 -0.4 -0.2 0 0.2 0.4 0.6 0.8 Figure 2: Using a soft margin (with λ = 10.0) 19
  • 20. -0.3 -0.25 -0.2 -0.15 -0.1 -0.05 0 0.05 0.1 0.15 0.2 -0.3 -0.25 -0.2 -0.15 -0.1 -0.05 0 0.05 0.1 0.15 0.2 Figure 3: A modified RBF kernel 20
  • 21. This basic model can be elaborated in various ways and the choice of kernel parameter (model selection) is important but it has worked well on real-life datasets e.g. detecting anomalies in blood samples. 21
  • 22. 2.3. Different types of learning: passive vs active learning. Passive Learning: the learning machine receives examples, learns these and attempts to generalize to new instances. Active Learning: the learning machine poses queries or questions to the oracle or source of information (the experimenatlist). With an efficient query learning strategy it can learn the problem efficiently. 22
  • 23. Comments on active learning: ˆ Several strategies are possible: ˆ Membership queries: the algorithm selects unlabeled examples for the human expert to label (e.g. handwritten character). 23
  • 24. ˆ Creating queries: can create queries and ask for the label or answer. But sometimes the invented examples can be meaningless and impossible to label (e.g. I invent an unrecognisable handwritten character). 24
  • 25. Worst case: possible to invent artificial examples for which number of queries equals sample size (hence no advantage to query learning) but such adverse instances don’t commonly occur in practice. Average case analysis has been analysed using learning theory and we find: 25
  • 26. ˆ Active learning is better than passive learning. ˆ For active learning the best choice for the next query is a point lying within the current hyperplane or as close as possible to the current hyperplane. 26
  • 27. Support Vector Machines: construct hypothesis using the most informative patterns in the data (the support vectors) → good candidates for active learning. 27
  • 29. Suggests an algorithm: ˆ Best points to query would be those points closest to the current hyperplane (maximally ambiguous). ˆ Querying a point within the margin band always guarantees a gain ∆W > 0 whatever the label. Querying points outside the band only gives gain if current hypothesis predicts the label incorrectly. 29
  • 30. 29-1
  • 31. Example: first let us consider a toy problem. Simple rule (majority rule) in which we create binary valued strings with components 0 or 1, e.g. if (0, 1, 1, 0, 0, 0, 1, 0, 1, 0, 0, 0, 1, 0, 0, 1, 1, 0, 1, 0, 1, 1). More 0’s than 1’s then target is 0, more 1’s than 0’s then target 1. 30
  • 32. 0 5 10 15 20 25 30 35 40 0 20 40 60 80 100 120 140 160 180 200 31
  • 33. ˆ How do we know when to stop asking queries? ˆ Several criteria are possible. ˆ One simple strategy (works for noiseless data) is to make a prediction and see if it agrees with what you find. 32
  • 34. 0 5 10 15 20 25 30 35 40 0 20 40 60 80 100 120 140 160 180 200 33
  • 35. A real-life application: drug discovery Each compound is described by a vector of 139,351 binary shape features and these vectors are typically sparse (on average 1378 features per Round0 compound and 7613 per Round1 compound) 34
  • 36. Round0: dataset contained 1,316 chemically diverse examples, only 39 of which are positive. Evaluation is performed by virtual combinatorial chemistry. 35
  • 37. 0 0.2 0.4 0.6 0.8 1 0 5 10 15 20 25 30 35 40 fraction of examples selected numberofexamples random true pos random false pos closest true pos closest false pos 36
  • 38. 2.4. Training SVMs and other kernel machines. So far the methods we have considered have involved linear or quadratic programming. Linear programming can be implemented using column generation techniques (simplex or interior point methods). 37
  • 39. For quadratic programming e.g. conjugate gradient. QP packages: MINOS and LOQO. Problem: kernel matrix is stored in memory. 38
  • 40. Alternatives split into three categories: ˆ kernel components are evaluated and discarded during learning, ˆ working set methods in which an evolving subset of data is used ˆ algorithms that explicitly exploit the structure of the problem. 39
  • 41. 2.4.1. Chunking and Decomposition: update the αi but using only a subset or chunk of data at each stage. QP routine optimizes the lagrangian on an initial arbitrary subset of data. The support vectors found are retained and all other datapoints (with αi = 0) discarded. 40
  • 42. New working set derived from these support vectors and additional datapoints maximally violating storage constraints. This chunking process iterated until margin maximized. 41
  • 43. Of course, this procedure may still fail. If so decomposition methods provide a better approach: these algorithms only use a fixed size subset of data with the αi for the remainder kept fixed. 42
  • 44. Decomposition and Sequential Minimal optimization (SMO). Limiting case of decomposition in which only two αi are optimized at each iteration. The smallest set of parameters which can be optimized with each iteration is plainly two if the constraint m i=1 αiyi = 0 is to hold. 43
  • 45. Possible to derive an analytical solution which can be executed using few numerical operations. Can handle classification, regression and other tasks such as novelty detection. 44
  • 46. 2.4.2. Further algorithms. Directly approach training from an optimization perspective and create new algorithms. Keerthi et al have proposed a very effective binary classification algorithm based on the dual geometry of finding the two closest points in the convex hulls. 45
  • 47. The Lagrangian SVM (LSVM) method of Mangasarian and Musicant reformulates the classification problem as an unconstrained optimization task and then solves problem using an algorithm requiring the solution of systems of linear equalities. LSVM can solve linear classification problems for millions of points in minutes. 46
  • 48. The interior-point and Semi-Smooth Support Vector Methods of Ferris and Munson can be used to solve linear classification problems with up to 60 million data points in 34 dimensions. 47
  • 49. 2.5. Model selection and Kernel Messaging 2.5.1. Model Selection. An important issue is the choice of kernel parameter (the σ in the Gaussian kernel, degree d of the polynomial kernel, etc). If this parameter is poorly chosen the hypothesis modelling the data can be oversimple or too complex leading to poor generalisation. 48
  • 50. 4 5 6 7 8 9 10 11 12 13 2 3 4 5 6 7 8 9 10 49
  • 51. The best value for this parameter can be found using cross-validation of course. However, cross-validation is wasteful of data and it would be better to have a theoretical handle on the best choice for this parameter. 50
  • 52. A number of schemes have been proposed to indicate a good choice for the kernel parameter without recourse to validation data. We will only illustrate the idea with one of these. 51
  • 53. Theorem: the number of leave-one-out errors of an L1-norm soft margin SVM is bounded by: |{i : (2αiB2 + ξi) ≥ 1}| m where αi are the solutions of the SVM optimization task, B2 an upper bound on K(xi, xi) with K(xi, xj) ≥ 0. 52
  • 54. Thus, for a given value of the kernel parameter, the leave-one-out error is estimated from this quantity. The kernel parameter is then incremented or decremented in the direction needed to lower the LOO bound. 53
  • 55. 2.5.2. Kernel Massaging and Semi-Definite Programming Example: the existence of missing values is a common problem in many application domains. 54
  • 56. We complete the kernel matrix (compensating for missing values) by solving the following semi-definite programming problem: 55
  • 57. minK |W · (K − K∗ )|2 subject to: AK = b A ≥ 0 56
  • 58. where K∗ is the current kernel matrix, A the matrix of constraint coefficients, and b are slack variables to make inequalities up to equalities. W is a masking matrix of weights (0 or 1, with Wii > 0) 57
  • 59. 2.6. Different types of kernels and learning with composite kernels 2.6.1. Different types of kernels Keenels have been derived for a number of data objects and here we will only consider one: string kernels Consider the following text strings car, cat, cart, chart. They have certain similarities despite being of unequal length. dug is of equal length to car and cat but differs in all letters. 58
  • 60. A measure of the degree of alignment of two text strings or sequences can be found using algorithmic techniques e.g. edit codes (dynamic programming). Frequent applications in ˆ bioinformatics e.g. Smith-Waterman, Waterman-Eggert algorithm (4-digit strings ... ACCGTATGTAAA ... from genomes), ˆ text processing (e.g. WWW), etc. 59
  • 61. An appropriate kernel can be defined by a recurrence relation: K(s, ) = 1 K(sa, t) = K(s, t) + k:tk=a K (s, t(1; k − 1)) 60
  • 62. For a comparison of two strings with elements s, t, etc and the empty string. Each element of the kernel quantifies the sequence similarity and the matrix as a whole is positive definite. 61
  • 63. 2.6.2. Data fusion using composite kernels If we have kernels for different data objects we can combine then to create classifiers capable of handling disparate types of data using composite kernels: K(x1, x2) = j βjKj(x1, x2) First proposed Gunn and Kandola, 2002. 62
  • 64. Example: Gert Lanckreit et al Ref: JMLR 5 (2004) p. 27-72. Task: predict functional classifications associated with yeast proteins (MIPS Yeast Genome Database). Five different types of kernels used: 63
  • 65. ˆ amino acid sequences (inner product kernels, Smith-Waterman pairwise sequence comparison algorithm kernel), ˆ protein-protein interactions (graph kernel, diffusion kernel of Kondor-Lafferty), ˆ genetic interactions (graph kernel), ˆ protein complex data (weak interactions, graph kernel), ˆ expression data (Gaussian kernel) 64
  • 66. Find that use of all 5 kernels better than using a single kernel. Complex story but basically 13 functional classes: improvement 0.71 to 0.85 for fraction correctly classified on unseen hold-out data (0.71 is best comparator). 65
  • 67. . . . but researchers are inclined to present appealing results in their papers . . . If one type of data is dominant (e.g. microarray versus graph information and microarray much more plentiful) then learning algorithm can collapse onto one kernel (K = β1K1 + β2K2 and β2 → 0, say). 66
  • 68. Despite this problem there are various successful applications (e.g. multimedia web page classification) and a number of proposed schemes: ˆ semi-definite programming (Lanckreit et al), leads to a quadratically constrained quadratic programme (QCQP). ˆ boosting schemes (Crammet et al, 2003, Bennett et al, 2002), ˆ a kernelised Fisher discriminant approach (Fung et al 2004), 67
  • 69. ˆ a kernel on kernels (hyperkernels, Ong et al, 2003), ˆ a Bayesian strategy to find the βj coefficients for classification and regression (Mark Girolami and Simon Rogers, 2005, preprint, 2005), etc. 68
  • 70. 2.5. Applications of SVMs Vast number number of applications: too many to summarise here ... Just two applications to show how useful SVMs are ... 69
  • 71. Example 1. ˆ Prediction of relapse/non relapse for Wilm’s tumour using microarray technology. ˆ affects children/young adults ˆ with Richard Williams et al., ICR, London (Genes, Chromosomes and Cancer, 2004; 41: 65-79). 70
  • 72. DNA FRAGMENT AND LABEL ENDS WITH FLUORESCENT LABELS FOR EXAMPLE FRAGMENTS HYBRIDIZE WITH DNA PROBES ON A SUBSTRATE USE LASER EXCITATION TO FIND HOW MUCH mRNA DEPOSITED AT A PROBE SITE (INDICATES LEVEL OF EXPRESSION OF CORRESPONDING GENE) THE PROCESS IS USUALLY PERFORMED WITH A CONTROL TO IMPROVE ACCURACY 71
  • 73. 72
  • 74. 73
  • 75. ˆ 29 examples balanced dataset with 17836 features (probes) ˆ used a Support Vector Machine classifier with linear kernel and filter method for feature scoring. 74
  • 76. ˆ impute missing values, ˆ log data, ˆ use leave-one-out testing 75
  • 77. Number of test errors (y-axis) versus number of remaining features (x-axis). 0 2 4 6 8 10 12 14 16 0 10 20 30 40 50 60 70 80 90 100 76
  • 78. Example 2 Recognition of ZIP/postal codes: very important real-life application. Standard benchmarking dataset is NIST (60,000 handwritten characters). AT&T (Bell Labs): state-of-the-art was a layered neural networks (LeNet5, Yann Le Cun, et al). 77
  • 79. Dennis de Coste/Bernhard Scholkopf (2002): used an SVM but with virtual training vectors as a means of ensuring invariant pattern recognition. Improves on previous best by 0.15% test error reduction. 78
  • 80. 3. Conclusion. Kernel methods: a powerful and systematic approach to classification, regression, novelty detection and many other machine learning problems. Now widely used in many application domains including bioinformatics, machine vision, text analysis, finance, ... 79