SlideShare a Scribd company logo
1 of 51
Kernel Methods.
Ridge Regression.
Radu Ionescu, Prof. PhD.
raducu.ionescu@gmail.com
Faculty of Mathematics and Computer Science
University of Bucharest
The evolution of learning methods
• 1950s: the introduction of the perceptron
(Rosenblatt, 1957)
• 1980s: the back-propagation algorithm for
training multi-layer perceptron becomes widely
popular (Hinton, 1986)
• 1990s: the introduction of kernel methods
(Cortes, 1995)
The Perceptron
𝑥1
𝑥2
𝑥3
𝑥𝑛
𝑤0 = 𝑏
1
𝑦 = 𝑠𝑖𝑔𝑛(𝑤′ ∙ 𝑥 + 𝑏)
Linear separating hyperplane
𝑤
Where 𝑤1 = −1, 𝑤2 = 1, 𝑏 = 1
𝑥1
𝑤 ∙ 𝑥 + 𝑏 > 0
𝑦 = 1
𝑤 ∙ 𝑥 + 𝑏 = 0
𝑦 = 1
𝑤 ∙ 𝑥 + 𝑏 < 0
𝑦 = −1
-1 1
1
-1
−
𝑏
𝑤1
−
𝑏
𝑤2
𝑥2
XOR (Minsky & Papert, 1969)
• A linear classification method cannot solve
the XOR problem
Solution 1: Neural Networks
Solution 1: Neural Networks
• The decision border is non-linear
Solution 2: Kernel Methods
• Kernel methods are based on two steps:
 1. Embed data in a higher-dimensional Hilbert
space
 2. Search for linear relations in the embedding
space
• The embedding can be performed implicitly, by
specifying the scalar product among data
samples
 Steps 1 and 2 can be comprised in one step!
Embedding the data with an
embedding map
• Non-linear relations from the original space
become linear in the embedding space
Kernel methods
• The learning algorithms are usually
implemented such that the coordinates of the
embedded points are not needed
• Specifying the scalar product between pairs of
points is enough!
• “Kernel trick”: The scalar product is replaced
with any pairwise similarity function, also
termed kernel function
Primal Form
Dual form
Linear regression
• The problem of finding g of the form:
• That best interpolates a training set:
Linear regression
𝑤
𝑥𝑖
𝑦𝑖
𝜉
𝑦 = 𝑔(𝑥) = 𝑤, 𝑥
Linear regression
• The error of the linear function on a particular
training sample is:
• The loss on all training data points is:
Linear regression
• Loss written vectorially:
• What is the optimal value for w?
Loss is convex
w1
w2
ℒ(𝑤)
Linear regression
• The optimal w:
• We get the normal equation:
• We can compute w, if the inverse exists:
Ridge Regression
• If the inverse does not exist, the problem is “ill-
conditioned” and it needs regularization
• The optimization criterion becomes:
• And the optimal solution will be given by:
Ridge Regression
• The optimal solution:
Dual Ridge Regression
• But:
• So:
Dual Ridge Regression
• Where:
• is called the Gram matrix:
Dual Ridge Regression
• In the dual form, the information from the
training samples is given only by the inner
product between pairs of training points:
• The predictive function is given by:
Primal Form
Dual form
Kernel Ridge Regression
• We can now apply the “kernel trick”, replacing
the scalar product with another kernel function
k:
Kernel Ridge Regression
• The dual weights are computed as follows:
• The predictive function becomes:
Kernel Ridge Regression (Python)
# Regularization parameter lambda:
lmb = 10 ** -6
# X_train – training data (one sample per line)
# T_train – training labels
n = X_train.shape[0]
K = np.matmul(X_train, X_train.T)
# Training the method:
alpha = np.matmul(np.linalg.inv(K + lmb * np.eye(n)),
T_train)
# Predicting the training labels:
Y_train = np.matmul(K, alpha)
Y_train = np.sign(Y_train)
acc_train = (T_train == Y_train).mean())
print('Train accuracy: %.4f' % acc_train)
Kernel Ridge Regression (Python)
# X_test – test data (one sample per line)
# T_test – test labels
K_test = np.matmul(X_test, X_train.T)
# Predicting the test labels:
Y_test = np.matmul(K_test, alpha)
Y_test = np.sign(Y_test)
acc_test = (T_test == Y_test).mean()
print('Test accuracy: %.4f' % acc_test)
The Kernel Function
• Definition: A kernel is a function
for which there is a mapping from X to a Hilbert space F
such that for any we have:
• Theorem: The function k is a kernel if and only if k is
symmetric and finitely positive semi-definite
Examples of kernels
• By explicitly providing the embedding map:
Examples of kernels
• The kernel from the previous example:
• The same kernel also corresponds to this map:
The Polynomial Kernel
• For a real positive constant c and a natural
number d:
• The constant c controls the amount of
influence of polynomials of various degrees
The Gaussian (RBF) Kernel
• For and from :
The Intersection Kernel
• For and from :
Other kernel functions
• The Hellinger kernel:
• The PQ kernel [Ionescu & Popescu, PRL15]:
Other kernel functions
• The Hellinger kernel:
• The PQ kernel:
• String kernels measure the similarity of strings, by
counting the number of contiguous subsequences (n-
grams) of characters that two strings have in common
• Text documents can be interpreted as strings
• Advantages:
 We do not have to tokenize the text
 Language independence (can be applied to any
language, only re-training is necessary)
String kernels
String kernels
• Example:
Given s = “pineapple pi” and t = “apple pie” over an
alphabet Σ, and the n-gram length p = 2,
the hash maps S and T contain <key>:<value> pairs of the
type <2-gram>:<number of occurrences> in s and t:
S = {pi:2, in:1, ne:1, ea:1, ap:1, pp:1, pl:1, le:1, e_:1, _p:1},
T = {ap:1, pp:1, pl:1, le:1, e_:1, _p:1, pi:1, ie:1}
• The presence bits string kernel is defined as follows:
𝑘2
0/1
𝑠, 𝑡 =
𝑣∈Σ𝑝
𝑆0/1
𝑣 ∙ 𝑇0/1
(𝑣)
• Example (continued):
S = {pi:2, in:1, ne:1, ea:1, ap:1, pp:1, pl:1, le:1, e_:1, _p:1},
T = {ap:1, pp:1, pl:1, le:1, e_:1, _p:1, pi:1, ie:1}
𝑘2
0/1
𝑠, 𝑡 = 1 ∙ 1 + 1 ∙ 1 + 1 ∙ 1 + 1 ∙ 1 + 1 ∙ 1 + 1 ∙ 1 + 1 ∙ 1
= 1 + 1 + 1 + 1 + 1 + 1 + 1
= 7
Presence bits string kernel
• They obtain state-of-the-art results in various NLP tasks:
 Native Language Identification [Ionescu & Popescu, BEA17]
 Arabic Dialect Identification [Butnaru & Ionescu, VarDial18]
 Romanian Dialect Identification [Butnaru & Ionescu, ACL19]
• Useful for building a compact representation when:
number of samples << number of features
• E.g., in TOEFL11 dataset, the number of unique n-grams
(n={5,6,7,8,9}) is:
4,662,520
• … versus the number of training samples:
11,000
Why kernel methods?
• They generalize better than words
• Examples of native language transfer patterns on
TOEFL11
Why kernel methods?
• {onnal}
“…many academics subjects. Additionnally, people always have a subject…”
“I would not be in control of my personnal schedule during the trip.”
• {evelopp}
“…and who will have the curiosity to developp research on the disease.”
“…be able to do so. Underdevelopped countries are a case in point.”
• {n France}
“…studied law in both England and in France, I have had the chance…”
“Numbers have actually shown that in France the number of new cars…”
• {to conc}
“…without a tour guide. To conclude, there are several advantages…”
“…job they will enjoy. To conclude, I think that the best solution is…”
• {exemple}
“…after using them. Onother exemple is my underwear that I bougth…”
“Science is a great exemple of how successful people want to improve…”
French→English transfer patterns
New kernels based on combinations
• Given two kernels k1 and k2, a real positive constant a,
a function f with real values and a symmetric and
positive semi-definite matrix B, the following functions
are also kernels:
Primal Form
Dual Form
Data normalization
• In primal form:
• In dual form:
• Directly on the kernel matrix:
Data normalization (Python)
% X - data (one sample per row)
% L2 norm in primal form:
norms = np.linalg.norm(X, axis = 1, keepdims = True)
X = X / norms
% L2 norm in dual form:
K = np.matmul(X, X.T)
KNorm = np.sqrt(np.diag(K))
KNorm = KNorm[np.newaxis]
K = K / np.matmul(KNorm.T, KNorm)
Bibliography
Kernel Methods Guide for Classification
Kernel Methods Guide for Classification

More Related Content

Similar to Kernel Methods Guide for Classification

Fortran chapter 2.pdf
Fortran chapter 2.pdfFortran chapter 2.pdf
Fortran chapter 2.pdfJifarRaya
 
JAISTサマースクール2016「脳を知るための理論」講義04 Neural Networks and Neuroscience
JAISTサマースクール2016「脳を知るための理論」講義04 Neural Networks and Neuroscience JAISTサマースクール2016「脳を知るための理論」講義04 Neural Networks and Neuroscience
JAISTサマースクール2016「脳を知るための理論」講義04 Neural Networks and Neuroscience hirokazutanaka
 
Dynamic Programming
Dynamic ProgrammingDynamic Programming
Dynamic ProgrammingSahil Kumar
 
ARTIFICIAL-NEURAL-NETWORKMACHINELEARNING
ARTIFICIAL-NEURAL-NETWORKMACHINELEARNINGARTIFICIAL-NEURAL-NETWORKMACHINELEARNING
ARTIFICIAL-NEURAL-NETWORKMACHINELEARNINGmohanapriyastp
 
Accelerating Key Bioinformatics Tasks 100-fold by Improving Memory Access
Accelerating Key Bioinformatics Tasks 100-fold by Improving Memory AccessAccelerating Key Bioinformatics Tasks 100-fold by Improving Memory Access
Accelerating Key Bioinformatics Tasks 100-fold by Improving Memory AccessIgor Sfiligoi
 
High-dimensional polytopes defined by oracles: algorithms, computations and a...
High-dimensional polytopes defined by oracles: algorithms, computations and a...High-dimensional polytopes defined by oracles: algorithms, computations and a...
High-dimensional polytopes defined by oracles: algorithms, computations and a...Vissarion Fisikopoulos
 
On clusteredsteinertree slide-ver 1.1
On clusteredsteinertree slide-ver 1.1On clusteredsteinertree slide-ver 1.1
On clusteredsteinertree slide-ver 1.1VitAnhNguyn94
 
Artificial Neural Network
Artificial Neural NetworkArtificial Neural Network
Artificial Neural Networkssuserab4f3e
 
Optimization (DLAI D4L1 2017 UPC Deep Learning for Artificial Intelligence)
Optimization (DLAI D4L1 2017 UPC Deep Learning for Artificial Intelligence)Optimization (DLAI D4L1 2017 UPC Deep Learning for Artificial Intelligence)
Optimization (DLAI D4L1 2017 UPC Deep Learning for Artificial Intelligence)Universitat Politècnica de Catalunya
 
designanalysisalgorithm_unit-v-part2.pptx
designanalysisalgorithm_unit-v-part2.pptxdesignanalysisalgorithm_unit-v-part2.pptx
designanalysisalgorithm_unit-v-part2.pptxarifimad15
 
lecture01_lecture01_lecture0001_ceva.pdf
lecture01_lecture01_lecture0001_ceva.pdflecture01_lecture01_lecture0001_ceva.pdf
lecture01_lecture01_lecture0001_ceva.pdfAnaNeacsu5
 

Similar to Kernel Methods Guide for Classification (20)

Fortran chapter 2.pdf
Fortran chapter 2.pdfFortran chapter 2.pdf
Fortran chapter 2.pdf
 
JAISTサマースクール2016「脳を知るための理論」講義04 Neural Networks and Neuroscience
JAISTサマースクール2016「脳を知るための理論」講義04 Neural Networks and Neuroscience JAISTサマースクール2016「脳を知るための理論」講義04 Neural Networks and Neuroscience
JAISTサマースクール2016「脳を知るための理論」講義04 Neural Networks and Neuroscience
 
Dynamic Programming
Dynamic ProgrammingDynamic Programming
Dynamic Programming
 
ARTIFICIAL-NEURAL-NETWORKMACHINELEARNING
ARTIFICIAL-NEURAL-NETWORKMACHINELEARNINGARTIFICIAL-NEURAL-NETWORKMACHINELEARNING
ARTIFICIAL-NEURAL-NETWORKMACHINELEARNING
 
Accelerating Key Bioinformatics Tasks 100-fold by Improving Memory Access
Accelerating Key Bioinformatics Tasks 100-fold by Improving Memory AccessAccelerating Key Bioinformatics Tasks 100-fold by Improving Memory Access
Accelerating Key Bioinformatics Tasks 100-fold by Improving Memory Access
 
PS
PSPS
PS
 
High-dimensional polytopes defined by oracles: algorithms, computations and a...
High-dimensional polytopes defined by oracles: algorithms, computations and a...High-dimensional polytopes defined by oracles: algorithms, computations and a...
High-dimensional polytopes defined by oracles: algorithms, computations and a...
 
MLE.pdf
MLE.pdfMLE.pdf
MLE.pdf
 
Recursion
RecursionRecursion
Recursion
 
Chapter 1 - Introduction
Chapter 1 - IntroductionChapter 1 - Introduction
Chapter 1 - Introduction
 
Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...
Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...
Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...
 
Q
QQ
Q
 
On clusteredsteinertree slide-ver 1.1
On clusteredsteinertree slide-ver 1.1On clusteredsteinertree slide-ver 1.1
On clusteredsteinertree slide-ver 1.1
 
Unit 3
Unit 3Unit 3
Unit 3
 
Unit 3
Unit 3Unit 3
Unit 3
 
Artificial Neural Network
Artificial Neural NetworkArtificial Neural Network
Artificial Neural Network
 
Optimization (DLAI D4L1 2017 UPC Deep Learning for Artificial Intelligence)
Optimization (DLAI D4L1 2017 UPC Deep Learning for Artificial Intelligence)Optimization (DLAI D4L1 2017 UPC Deep Learning for Artificial Intelligence)
Optimization (DLAI D4L1 2017 UPC Deep Learning for Artificial Intelligence)
 
L16
L16L16
L16
 
designanalysisalgorithm_unit-v-part2.pptx
designanalysisalgorithm_unit-v-part2.pptxdesignanalysisalgorithm_unit-v-part2.pptx
designanalysisalgorithm_unit-v-part2.pptx
 
lecture01_lecture01_lecture0001_ceva.pdf
lecture01_lecture01_lecture0001_ceva.pdflecture01_lecture01_lecture0001_ceva.pdf
lecture01_lecture01_lecture0001_ceva.pdf
 

More from ARVIND SARDAR

Machine Learning Chapter one introduction
Machine Learning Chapter one introductionMachine Learning Chapter one introduction
Machine Learning Chapter one introductionARVIND SARDAR
 
Machine Learning Ch 1.ppt
Machine Learning Ch 1.pptMachine Learning Ch 1.ppt
Machine Learning Ch 1.pptARVIND SARDAR
 
Graph ASS DBATU.pptx
Graph ASS DBATU.pptxGraph ASS DBATU.pptx
Graph ASS DBATU.pptxARVIND SARDAR
 
Unit 1-android-and-its-tools-ass
Unit 1-android-and-its-tools-assUnit 1-android-and-its-tools-ass
Unit 1-android-and-its-tools-assARVIND SARDAR
 
Computer foundation course -Knowing Computers
Computer foundation course -Knowing ComputersComputer foundation course -Knowing Computers
Computer foundation course -Knowing ComputersARVIND SARDAR
 
Unit no 5 transation processing DMS 22319
Unit no 5 transation processing DMS 22319Unit no 5 transation processing DMS 22319
Unit no 5 transation processing DMS 22319ARVIND SARDAR
 
Teaching plan d1 dms 2019 20
Teaching plan  d1 dms 2019  20Teaching plan  d1 dms 2019  20
Teaching plan d1 dms 2019 20ARVIND SARDAR
 
Teaching plan d1 dms 2019 20
Teaching plan  d1 dms 2019  20Teaching plan  d1 dms 2019  20
Teaching plan d1 dms 2019 20ARVIND SARDAR
 
Project activity planning
Project activity planningProject activity planning
Project activity planningARVIND SARDAR
 
D2 practical planning dms 2019 20
D2 practical  planning dms 2019 20D2 practical  planning dms 2019 20
D2 practical planning dms 2019 20ARVIND SARDAR
 
D2 practical planning dms 2019 20
D2 practical  planning dms 2019 20D2 practical  planning dms 2019 20
D2 practical planning dms 2019 20ARVIND SARDAR
 
PL /SQL program UNIT 5 DMS 22319
PL /SQL program UNIT 5 DMS 22319PL /SQL program UNIT 5 DMS 22319
PL /SQL program UNIT 5 DMS 22319ARVIND SARDAR
 
Question bank class test ii sep 2019
Question bank class test ii sep 2019Question bank class test ii sep 2019
Question bank class test ii sep 2019ARVIND SARDAR
 
DMS Question bank class test ii sep 2019
DMS Question bank class test ii sep 2019DMS Question bank class test ii sep 2019
DMS Question bank class test ii sep 2019ARVIND SARDAR
 
Rdbms class test ii sep 2019
Rdbms class test  ii sep 2019Rdbms class test  ii sep 2019
Rdbms class test ii sep 2019ARVIND SARDAR
 
Unit 1 dbm questioN BANK 22139
Unit 1 dbm  questioN BANK 22139Unit 1 dbm  questioN BANK 22139
Unit 1 dbm questioN BANK 22139ARVIND SARDAR
 
CO PO MAPPING CO3I DMS 22319
CO PO MAPPING CO3I DMS 22319CO PO MAPPING CO3I DMS 22319
CO PO MAPPING CO3I DMS 22319ARVIND SARDAR
 
CO PO MAPPING CO3I DMS 22319
CO PO MAPPING CO3I DMS 22319CO PO MAPPING CO3I DMS 22319
CO PO MAPPING CO3I DMS 22319ARVIND SARDAR
 
DMS 22319 Viva questions
DMS 22319 Viva questions DMS 22319 Viva questions
DMS 22319 Viva questions ARVIND SARDAR
 

More from ARVIND SARDAR (20)

Machine Learning Chapter one introduction
Machine Learning Chapter one introductionMachine Learning Chapter one introduction
Machine Learning Chapter one introduction
 
Machine Learning Ch 1.ppt
Machine Learning Ch 1.pptMachine Learning Ch 1.ppt
Machine Learning Ch 1.ppt
 
Graph ASS DBATU.pptx
Graph ASS DBATU.pptxGraph ASS DBATU.pptx
Graph ASS DBATU.pptx
 
graph ASS (1).ppt
graph ASS (1).pptgraph ASS (1).ppt
graph ASS (1).ppt
 
Unit 1-android-and-its-tools-ass
Unit 1-android-and-its-tools-assUnit 1-android-and-its-tools-ass
Unit 1-android-and-its-tools-ass
 
Computer foundation course -Knowing Computers
Computer foundation course -Knowing ComputersComputer foundation course -Knowing Computers
Computer foundation course -Knowing Computers
 
Unit no 5 transation processing DMS 22319
Unit no 5 transation processing DMS 22319Unit no 5 transation processing DMS 22319
Unit no 5 transation processing DMS 22319
 
Teaching plan d1 dms 2019 20
Teaching plan  d1 dms 2019  20Teaching plan  d1 dms 2019  20
Teaching plan d1 dms 2019 20
 
Teaching plan d1 dms 2019 20
Teaching plan  d1 dms 2019  20Teaching plan  d1 dms 2019  20
Teaching plan d1 dms 2019 20
 
Project activity planning
Project activity planningProject activity planning
Project activity planning
 
D2 practical planning dms 2019 20
D2 practical  planning dms 2019 20D2 practical  planning dms 2019 20
D2 practical planning dms 2019 20
 
D2 practical planning dms 2019 20
D2 practical  planning dms 2019 20D2 practical  planning dms 2019 20
D2 practical planning dms 2019 20
 
PL /SQL program UNIT 5 DMS 22319
PL /SQL program UNIT 5 DMS 22319PL /SQL program UNIT 5 DMS 22319
PL /SQL program UNIT 5 DMS 22319
 
Question bank class test ii sep 2019
Question bank class test ii sep 2019Question bank class test ii sep 2019
Question bank class test ii sep 2019
 
DMS Question bank class test ii sep 2019
DMS Question bank class test ii sep 2019DMS Question bank class test ii sep 2019
DMS Question bank class test ii sep 2019
 
Rdbms class test ii sep 2019
Rdbms class test  ii sep 2019Rdbms class test  ii sep 2019
Rdbms class test ii sep 2019
 
Unit 1 dbm questioN BANK 22139
Unit 1 dbm  questioN BANK 22139Unit 1 dbm  questioN BANK 22139
Unit 1 dbm questioN BANK 22139
 
CO PO MAPPING CO3I DMS 22319
CO PO MAPPING CO3I DMS 22319CO PO MAPPING CO3I DMS 22319
CO PO MAPPING CO3I DMS 22319
 
CO PO MAPPING CO3I DMS 22319
CO PO MAPPING CO3I DMS 22319CO PO MAPPING CO3I DMS 22319
CO PO MAPPING CO3I DMS 22319
 
DMS 22319 Viva questions
DMS 22319 Viva questions DMS 22319 Viva questions
DMS 22319 Viva questions
 

Recently uploaded

An experimental study in using natural admixture as an alternative for chemic...
An experimental study in using natural admixture as an alternative for chemic...An experimental study in using natural admixture as an alternative for chemic...
An experimental study in using natural admixture as an alternative for chemic...Chandu841456
 
Software and Systems Engineering Standards: Verification and Validation of Sy...
Software and Systems Engineering Standards: Verification and Validation of Sy...Software and Systems Engineering Standards: Verification and Validation of Sy...
Software and Systems Engineering Standards: Verification and Validation of Sy...VICTOR MAESTRE RAMIREZ
 
IVE Industry Focused Event - Defence Sector 2024
IVE Industry Focused Event - Defence Sector 2024IVE Industry Focused Event - Defence Sector 2024
IVE Industry Focused Event - Defence Sector 2024Mark Billinghurst
 
CCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdf
CCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdfCCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdf
CCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdfAsst.prof M.Gokilavani
 
Call Girls Delhi {Jodhpur} 9711199012 high profile service
Call Girls Delhi {Jodhpur} 9711199012 high profile serviceCall Girls Delhi {Jodhpur} 9711199012 high profile service
Call Girls Delhi {Jodhpur} 9711199012 high profile servicerehmti665
 
Biology for Computer Engineers Course Handout.pptx
Biology for Computer Engineers Course Handout.pptxBiology for Computer Engineers Course Handout.pptx
Biology for Computer Engineers Course Handout.pptxDeepakSakkari2
 
Oxy acetylene welding presentation note.
Oxy acetylene welding presentation note.Oxy acetylene welding presentation note.
Oxy acetylene welding presentation note.eptoze12
 
Concrete Mix Design - IS 10262-2019 - .pptx
Concrete Mix Design - IS 10262-2019 - .pptxConcrete Mix Design - IS 10262-2019 - .pptx
Concrete Mix Design - IS 10262-2019 - .pptxKartikeyaDwivedi3
 
Architect Hassan Khalil Portfolio for 2024
Architect Hassan Khalil Portfolio for 2024Architect Hassan Khalil Portfolio for 2024
Architect Hassan Khalil Portfolio for 2024hassan khalil
 
Call Us ≽ 8377877756 ≼ Call Girls In Shastri Nagar (Delhi)
Call Us ≽ 8377877756 ≼ Call Girls In Shastri Nagar (Delhi)Call Us ≽ 8377877756 ≼ Call Girls In Shastri Nagar (Delhi)
Call Us ≽ 8377877756 ≼ Call Girls In Shastri Nagar (Delhi)dollysharma2066
 
Study on Air-Water & Water-Water Heat Exchange in a Finned Tube Exchanger
Study on Air-Water & Water-Water Heat Exchange in a Finned Tube ExchangerStudy on Air-Water & Water-Water Heat Exchange in a Finned Tube Exchanger
Study on Air-Water & Water-Water Heat Exchange in a Finned Tube ExchangerAnamika Sarkar
 
Risk Assessment For Installation of Drainage Pipes.pdf
Risk Assessment For Installation of Drainage Pipes.pdfRisk Assessment For Installation of Drainage Pipes.pdf
Risk Assessment For Installation of Drainage Pipes.pdfROCENODodongVILLACER
 
main PPT.pptx of girls hostel security using rfid
main PPT.pptx of girls hostel security using rfidmain PPT.pptx of girls hostel security using rfid
main PPT.pptx of girls hostel security using rfidNikhilNagaraju
 
Heart Disease Prediction using machine learning.pptx
Heart Disease Prediction using machine learning.pptxHeart Disease Prediction using machine learning.pptx
Heart Disease Prediction using machine learning.pptxPoojaBan
 
Arduino_CSE ece ppt for working and principal of arduino.ppt
Arduino_CSE ece ppt for working and principal of arduino.pptArduino_CSE ece ppt for working and principal of arduino.ppt
Arduino_CSE ece ppt for working and principal of arduino.pptSAURABHKUMAR892774
 
Electronically Controlled suspensions system .pdf
Electronically Controlled suspensions system .pdfElectronically Controlled suspensions system .pdf
Electronically Controlled suspensions system .pdfme23b1001
 

Recently uploaded (20)

An experimental study in using natural admixture as an alternative for chemic...
An experimental study in using natural admixture as an alternative for chemic...An experimental study in using natural admixture as an alternative for chemic...
An experimental study in using natural admixture as an alternative for chemic...
 
Software and Systems Engineering Standards: Verification and Validation of Sy...
Software and Systems Engineering Standards: Verification and Validation of Sy...Software and Systems Engineering Standards: Verification and Validation of Sy...
Software and Systems Engineering Standards: Verification and Validation of Sy...
 
young call girls in Rajiv Chowk🔝 9953056974 🔝 Delhi escort Service
young call girls in Rajiv Chowk🔝 9953056974 🔝 Delhi escort Serviceyoung call girls in Rajiv Chowk🔝 9953056974 🔝 Delhi escort Service
young call girls in Rajiv Chowk🔝 9953056974 🔝 Delhi escort Service
 
IVE Industry Focused Event - Defence Sector 2024
IVE Industry Focused Event - Defence Sector 2024IVE Industry Focused Event - Defence Sector 2024
IVE Industry Focused Event - Defence Sector 2024
 
CCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdf
CCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdfCCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdf
CCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdf
 
Call Girls Delhi {Jodhpur} 9711199012 high profile service
Call Girls Delhi {Jodhpur} 9711199012 high profile serviceCall Girls Delhi {Jodhpur} 9711199012 high profile service
Call Girls Delhi {Jodhpur} 9711199012 high profile service
 
Biology for Computer Engineers Course Handout.pptx
Biology for Computer Engineers Course Handout.pptxBiology for Computer Engineers Course Handout.pptx
Biology for Computer Engineers Course Handout.pptx
 
Oxy acetylene welding presentation note.
Oxy acetylene welding presentation note.Oxy acetylene welding presentation note.
Oxy acetylene welding presentation note.
 
Exploring_Network_Security_with_JA3_by_Rakesh Seal.pptx
Exploring_Network_Security_with_JA3_by_Rakesh Seal.pptxExploring_Network_Security_with_JA3_by_Rakesh Seal.pptx
Exploring_Network_Security_with_JA3_by_Rakesh Seal.pptx
 
Concrete Mix Design - IS 10262-2019 - .pptx
Concrete Mix Design - IS 10262-2019 - .pptxConcrete Mix Design - IS 10262-2019 - .pptx
Concrete Mix Design - IS 10262-2019 - .pptx
 
Architect Hassan Khalil Portfolio for 2024
Architect Hassan Khalil Portfolio for 2024Architect Hassan Khalil Portfolio for 2024
Architect Hassan Khalil Portfolio for 2024
 
Call Us ≽ 8377877756 ≼ Call Girls In Shastri Nagar (Delhi)
Call Us ≽ 8377877756 ≼ Call Girls In Shastri Nagar (Delhi)Call Us ≽ 8377877756 ≼ Call Girls In Shastri Nagar (Delhi)
Call Us ≽ 8377877756 ≼ Call Girls In Shastri Nagar (Delhi)
 
Study on Air-Water & Water-Water Heat Exchange in a Finned Tube Exchanger
Study on Air-Water & Water-Water Heat Exchange in a Finned Tube ExchangerStudy on Air-Water & Water-Water Heat Exchange in a Finned Tube Exchanger
Study on Air-Water & Water-Water Heat Exchange in a Finned Tube Exchanger
 
Risk Assessment For Installation of Drainage Pipes.pdf
Risk Assessment For Installation of Drainage Pipes.pdfRisk Assessment For Installation of Drainage Pipes.pdf
Risk Assessment For Installation of Drainage Pipes.pdf
 
Call Us -/9953056974- Call Girls In Vikaspuri-/- Delhi NCR
Call Us -/9953056974- Call Girls In Vikaspuri-/- Delhi NCRCall Us -/9953056974- Call Girls In Vikaspuri-/- Delhi NCR
Call Us -/9953056974- Call Girls In Vikaspuri-/- Delhi NCR
 
main PPT.pptx of girls hostel security using rfid
main PPT.pptx of girls hostel security using rfidmain PPT.pptx of girls hostel security using rfid
main PPT.pptx of girls hostel security using rfid
 
Heart Disease Prediction using machine learning.pptx
Heart Disease Prediction using machine learning.pptxHeart Disease Prediction using machine learning.pptx
Heart Disease Prediction using machine learning.pptx
 
9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf
9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf
9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf
 
Arduino_CSE ece ppt for working and principal of arduino.ppt
Arduino_CSE ece ppt for working and principal of arduino.pptArduino_CSE ece ppt for working and principal of arduino.ppt
Arduino_CSE ece ppt for working and principal of arduino.ppt
 
Electronically Controlled suspensions system .pdf
Electronically Controlled suspensions system .pdfElectronically Controlled suspensions system .pdf
Electronically Controlled suspensions system .pdf
 

Kernel Methods Guide for Classification

  • 1. Kernel Methods. Ridge Regression. Radu Ionescu, Prof. PhD. raducu.ionescu@gmail.com Faculty of Mathematics and Computer Science University of Bucharest
  • 2. The evolution of learning methods • 1950s: the introduction of the perceptron (Rosenblatt, 1957) • 1980s: the back-propagation algorithm for training multi-layer perceptron becomes widely popular (Hinton, 1986) • 1990s: the introduction of kernel methods (Cortes, 1995)
  • 3. The Perceptron 𝑥1 𝑥2 𝑥3 𝑥𝑛 𝑤0 = 𝑏 1 𝑦 = 𝑠𝑖𝑔𝑛(𝑤′ ∙ 𝑥 + 𝑏)
  • 4. Linear separating hyperplane 𝑤 Where 𝑤1 = −1, 𝑤2 = 1, 𝑏 = 1 𝑥1 𝑤 ∙ 𝑥 + 𝑏 > 0 𝑦 = 1 𝑤 ∙ 𝑥 + 𝑏 = 0 𝑦 = 1 𝑤 ∙ 𝑥 + 𝑏 < 0 𝑦 = −1 -1 1 1 -1 − 𝑏 𝑤1 − 𝑏 𝑤2 𝑥2
  • 5. XOR (Minsky & Papert, 1969) • A linear classification method cannot solve the XOR problem
  • 7. Solution 1: Neural Networks • The decision border is non-linear
  • 8. Solution 2: Kernel Methods • Kernel methods are based on two steps:  1. Embed data in a higher-dimensional Hilbert space  2. Search for linear relations in the embedding space • The embedding can be performed implicitly, by specifying the scalar product among data samples  Steps 1 and 2 can be comprised in one step!
  • 9. Embedding the data with an embedding map • Non-linear relations from the original space become linear in the embedding space
  • 10. Kernel methods • The learning algorithms are usually implemented such that the coordinates of the embedded points are not needed • Specifying the scalar product between pairs of points is enough! • “Kernel trick”: The scalar product is replaced with any pairwise similarity function, also termed kernel function
  • 13. Linear regression • The problem of finding g of the form: • That best interpolates a training set:
  • 15. Linear regression • The error of the linear function on a particular training sample is: • The loss on all training data points is:
  • 16. Linear regression • Loss written vectorially: • What is the optimal value for w?
  • 18. Linear regression • The optimal w: • We get the normal equation: • We can compute w, if the inverse exists:
  • 19. Ridge Regression • If the inverse does not exist, the problem is “ill- conditioned” and it needs regularization • The optimization criterion becomes: • And the optimal solution will be given by:
  • 20. Ridge Regression • The optimal solution:
  • 22. Dual Ridge Regression • Where: • is called the Gram matrix:
  • 23. Dual Ridge Regression • In the dual form, the information from the training samples is given only by the inner product between pairs of training points: • The predictive function is given by:
  • 26. Kernel Ridge Regression • We can now apply the “kernel trick”, replacing the scalar product with another kernel function k:
  • 27. Kernel Ridge Regression • The dual weights are computed as follows: • The predictive function becomes:
  • 28. Kernel Ridge Regression (Python) # Regularization parameter lambda: lmb = 10 ** -6 # X_train – training data (one sample per line) # T_train – training labels n = X_train.shape[0] K = np.matmul(X_train, X_train.T) # Training the method: alpha = np.matmul(np.linalg.inv(K + lmb * np.eye(n)), T_train) # Predicting the training labels: Y_train = np.matmul(K, alpha) Y_train = np.sign(Y_train) acc_train = (T_train == Y_train).mean()) print('Train accuracy: %.4f' % acc_train)
  • 29. Kernel Ridge Regression (Python) # X_test – test data (one sample per line) # T_test – test labels K_test = np.matmul(X_test, X_train.T) # Predicting the test labels: Y_test = np.matmul(K_test, alpha) Y_test = np.sign(Y_test) acc_test = (T_test == Y_test).mean() print('Test accuracy: %.4f' % acc_test)
  • 30. The Kernel Function • Definition: A kernel is a function for which there is a mapping from X to a Hilbert space F such that for any we have: • Theorem: The function k is a kernel if and only if k is symmetric and finitely positive semi-definite
  • 31. Examples of kernels • By explicitly providing the embedding map:
  • 32. Examples of kernels • The kernel from the previous example: • The same kernel also corresponds to this map:
  • 33. The Polynomial Kernel • For a real positive constant c and a natural number d: • The constant c controls the amount of influence of polynomials of various degrees
  • 34. The Gaussian (RBF) Kernel • For and from :
  • 35. The Intersection Kernel • For and from :
  • 36. Other kernel functions • The Hellinger kernel: • The PQ kernel [Ionescu & Popescu, PRL15]:
  • 37. Other kernel functions • The Hellinger kernel: • The PQ kernel:
  • 38. • String kernels measure the similarity of strings, by counting the number of contiguous subsequences (n- grams) of characters that two strings have in common • Text documents can be interpreted as strings • Advantages:  We do not have to tokenize the text  Language independence (can be applied to any language, only re-training is necessary) String kernels
  • 39. String kernels • Example: Given s = “pineapple pi” and t = “apple pie” over an alphabet Σ, and the n-gram length p = 2, the hash maps S and T contain <key>:<value> pairs of the type <2-gram>:<number of occurrences> in s and t: S = {pi:2, in:1, ne:1, ea:1, ap:1, pp:1, pl:1, le:1, e_:1, _p:1}, T = {ap:1, pp:1, pl:1, le:1, e_:1, _p:1, pi:1, ie:1}
  • 40. • The presence bits string kernel is defined as follows: 𝑘2 0/1 𝑠, 𝑡 = 𝑣∈Σ𝑝 𝑆0/1 𝑣 ∙ 𝑇0/1 (𝑣) • Example (continued): S = {pi:2, in:1, ne:1, ea:1, ap:1, pp:1, pl:1, le:1, e_:1, _p:1}, T = {ap:1, pp:1, pl:1, le:1, e_:1, _p:1, pi:1, ie:1} 𝑘2 0/1 𝑠, 𝑡 = 1 ∙ 1 + 1 ∙ 1 + 1 ∙ 1 + 1 ∙ 1 + 1 ∙ 1 + 1 ∙ 1 + 1 ∙ 1 = 1 + 1 + 1 + 1 + 1 + 1 + 1 = 7 Presence bits string kernel
  • 41. • They obtain state-of-the-art results in various NLP tasks:  Native Language Identification [Ionescu & Popescu, BEA17]  Arabic Dialect Identification [Butnaru & Ionescu, VarDial18]  Romanian Dialect Identification [Butnaru & Ionescu, ACL19] • Useful for building a compact representation when: number of samples << number of features • E.g., in TOEFL11 dataset, the number of unique n-grams (n={5,6,7,8,9}) is: 4,662,520 • … versus the number of training samples: 11,000 Why kernel methods?
  • 42. • They generalize better than words • Examples of native language transfer patterns on TOEFL11 Why kernel methods?
  • 43. • {onnal} “…many academics subjects. Additionnally, people always have a subject…” “I would not be in control of my personnal schedule during the trip.” • {evelopp} “…and who will have the curiosity to developp research on the disease.” “…be able to do so. Underdevelopped countries are a case in point.” • {n France} “…studied law in both England and in France, I have had the chance…” “Numbers have actually shown that in France the number of new cars…” • {to conc} “…without a tour guide. To conclude, there are several advantages…” “…job they will enjoy. To conclude, I think that the best solution is…” • {exemple} “…after using them. Onother exemple is my underwear that I bougth…” “Science is a great exemple of how successful people want to improve…” French→English transfer patterns
  • 44. New kernels based on combinations • Given two kernels k1 and k2, a real positive constant a, a function f with real values and a symmetric and positive semi-definite matrix B, the following functions are also kernels:
  • 47. Data normalization • In primal form: • In dual form: • Directly on the kernel matrix:
  • 48. Data normalization (Python) % X - data (one sample per row) % L2 norm in primal form: norms = np.linalg.norm(X, axis = 1, keepdims = True) X = X / norms % L2 norm in dual form: K = np.matmul(X, X.T) KNorm = np.sqrt(np.diag(K)) KNorm = KNorm[np.newaxis] K = K / np.matmul(KNorm.T, KNorm)