Space-efficient Feature Maps for String Alignment Kernels
1. Space-efficient Feature Maps
for String Alignment Kernels
Yasuo Tabei (RIKEN-AIP)
Joint work with
Yoshihiro Yamanishi (Kyutech)
Rasmus Pagh (IT University of Copenhagen)
ICDM’19@Beijing, Nov. 10th, 2019
2. Kernel methods
• Kernels are inner product in some feature space H:
𝐾 𝑥, 𝑥′ =< φ 𝑥 , φ 𝑥′ >
• Intuitively, kernel is a measure of similarity of x and x’
• x and x’ can be vectors, trees, graphs.
• x and x’ are strings in this talk.
• Kernels are useful for
- Classification (SVM), Regression, Feature selection, Two-sample
problems, etc.
3. String alignment kernels
• Typical string kernel uses substring (k-mers) features
• Alignment kernel uses string alignment (e.g., edit
distance) as a similarity measure
• It has a wide variety of applications in string processing
E.g.) text classification, remote homology detection for
proteins/DNA [BMCBioinfo.’06], etc
• Advantage: High prediction accuracy
• Drawback: Large computation complexity
Square time to the length of strings (dynamic programming)
Quadratic time in the number of training data
4. Feature maps (FMs) for kernel approximations
[A. Rahimi and B. Recht, NIPS, 2007]
• FMs map d-dimensional vector 𝑥 ∈ 𝑅 𝑑
into D-dimensional
vector φ(𝑥) ∈ 𝑅 𝐷
using O(d×D) memory and time
• It can approximate kernel function k(x,y) by the inner
product of compact vectors 𝜑(𝑥)・φ(𝑦)
• Linear model 𝑓𝑙 𝑥 = 𝑤・φ 𝑥 has approximately the same
functionality as nonlinear model 𝑓 𝑛 𝑥 = 𝑖 𝑘(𝑥, 𝑦𝑖)
• Advantage: can enhance the scalability of kernel methods
(i)Input vectors (ii)Compact vectors (iii)Linear model
Map Learn model weight w
𝑓𝑙 𝑥 = 𝑤・φ 𝑥
5. Existing feature maps (FMs)
• Several FMs with different input formats and kernel
similarities have been proposed
• No previous work has been able to approximate
alignment kernels
Method Kernel Input
b-bit MinHash [Li'11] Jaccard Binary vector
Tensor Sketching [Pham'13] Polynomial Real vector
0-bit CWS [Li’15] Min-Max Real vector
C-Hash [Mu'12] Cosine Real vector
Random Feature [Rahimi'07] RBF Real vector
RFM [Kar'12] Dot product Real vector
PWLSGD [Maji'09] Intersection Real vector
6. Space-efficient feature maps
for string alignment kernels
• Basic idea : use two hash functions of (i) edit-sensitive
parsing (ESP) and (ii) feature maps (FMs) for
Laplacian kernel
• Feature maps consumes a large amount of memory
O(d×D) memory for input dimension d and output
dimension D
• Present space-efficient FMs using O(d) memory
• Can achieve high classification accuracy by training
linear SVM with compact vectors
S1 = ABRACADA
S2 = ABRA
S3 = ABRACA
S4 = ATGCAGA
S5 = BARACR
x1 = (3, 1, 3)
x2 = (2, 4, 1)
x3 = (5, 1, 2)
x4 = (1, 0, 0)
x5 = (2, 2, 1)
z1 = (1.2,0.1,1,2)
z2 = (2,1,1.2,3.4)
z3 = (-1.2,0,2.2,3)
z4 = (-3.2,0,2.2,1)
z5 = (2, 2, -1.2, 0)
F(zi)=wzi
(i) ESP (ii) FMs (iii) Learn linear model
7. Edit-sensitive parsing (ESP)
[G.Cormode and S.Mushukrishnan, 2007]
• Build a single parse tree from input string S
• Build a parse tree from the bottom (leaves) to the top (root)
• Nodes with the same node label are built from pairs of the
same symbol pairs
• Can be used for mapping string S into integer vector x
– Each element of x is the number of appearances of node labels
• Can approximate edit distance with moves EDM as L1 distance
between mapped vectors (i.e., EDM(Si,Sj)≒||xi-xj||1)
• Computational time is linear to the length of S
ABBAABA B B
A B X1 X2 X3 X4 X5X6
x = (4, 5, 2, 1, 1, 1, 1, 1)
8. ESP for mapping strings
into integer vectors
Step1
Given S and S’, make vectors V(S)
and V(S’) each dimension of which is
the number of each characters in S
and S’.
A B
S=ABABABBAB → V(S) = (4,5)
S’=ABBABAB → V(S’) = (3,4)
ABBAABA B B Level1
Level2
A BBABA B Level1
Level2
Step2
Assign each pair or triple to the
same non-terminal symbol
Step3
Count the number of each node
label and update vectors V(S) and
V(S’)
A B X1 X2 X3 X4
V(S) = (4, 5, 2, 1, 1, 0)
V(S’)= (5, 4, 2, 0, 0, 1)
Step4
Replace strings by the sequence of
node labels in level2
S = X1X2X3X1
S’= X1X4X1
Goto step2
9. Feature maps for string alignment
kernels
• EDM is approximated by L1-distance with mapped vectors as
EDM(Si,Sj)≒||xi-xj||1
• Alignment kernel is defined as
K(Si,Sj) = exp(-||xi-xj||1/β)) (Laplacian kernel)
• Feature maps (FMs) can approximate Laplacian kernel as
exp(-||xi-xj||1/β)≒<zi,zj>
• FMs are space-inefficient with O(dD) memory for input
dimension d and output dimension D
Fast food approach [ICML’13] can approximate feature
maps for RBF kernels
S1 = ABRACADA
S2 = ABRA
S3 = ABRACA
x1 = (3, 1, 3)
x2 = (2, 4, 1)
x3 = (5, 1, 2)
z1 = (1.2,0.1,1,2)
z2 = (2,1,1.2,3.4)
z3 = (-1.2,0,2.2,3)
F(zi)=wzi
(i)ESP (ii) FMs
(iii) Learn
linear model
10. Space-efficient FMs (Beliefly)
• Basic idea: reduce random matrix R of size D×d in
standard FMs to random matrix M of size t×d
• Approximate R[i,j] element using polynomial equation:
R[i, j] ≒ M[i,1] + M[i,2]・j1 + ・・・+ M[i,t-1]・jt-1
t-wise independent family distribution
• Theoretical gurantee (concentration bound):
Pr[|𝑧 𝑥 ′
𝑧 𝑦 − 𝑘(𝑥, 𝑦)| ≥ ε] ≤ 2/(ε2
/𝐷)
Rd
D
Md
t
Random matrix R for
standard FMs
O(D×d)
memory
Random matrix M for
Space-efficient FMs
O(t×d) memory
11. Experiments
• 5 massive string datasets in real world
• Competitors
5 SVMs with string kernels: LAK [Bioinfo’08], GAK
[ICML’11], ESP+Kernel, CGK+Kernel, stk17 [NIPS’17]
FMs for alignment kernels: D2KE [KDD’19]
SFMEDM: proposed
15. Summary
• Space-efficient feature maps for string alignment kernels
• Use two hash functions
– ESP: maps strings into integer vectors
– Feature maps: maps integer vectors into feature vectors
• Linear SVMs are trained on feature vectors
– Linear SVMs behaves such as non-linear SVM with alignment
kernels
• Advantage: highly scalable
• Code and datasets are available:
https://sites.google.com/ view/alignmentkernels/home
Editor's Notes
Thank you for your kind introduction.
Today I’m going to talk about feature maps for string alignment kernels.
Our method can solve large-scale machine learning problems on strings.
This is joint work with Yoshihiro Yamanishi from Kyushu Institute of Technology and Ramus Pagh from IT University of Copenhagen.
First, I will present a brief introduction of kernel methods.
it can approximate non-linear function or decision boundary well with enough training data and can achieve high prediction accuracy
To solve the scalability issue on kernel methods, feature maps for kernel approximations have been proposed by A. Rahimi and B. Recht, NIPS, 2007.
FMs map d-dimensional vector x into D-dimensional vector φ(x) ∈ RD
It can approximate kernel function k(x,y) by the inner product of mapped vectors
Thus, linear model has approximately the same functionality as nonlinear model
Advantage is that it can enhance the scalability of kernel methods.
Several FMs with different input formats and kernel similarities have been proposed.
No previous work has been able to approximate string alignment kernels.
That’s why we present large-scale string classification by presenting space-efficient FMs for string alignment kernels.
We use two hash functions of edit-sensitive parsing (ESP) and feature maps for (FMs) for an Laplacian kernel.
We also present space-efficient feature maps reducing space-usage of FMs from O(dD) to O(d) for input dimension d.
Can achieve high classification accuracy by training linear SVM with mapped vectors
That’s why we present large-scale string classification by presenting space-efficient FMs for string alignment kernels.
We use two hash functions of edit-sensitive parsing (ESP) and feature maps for (FMs) for an Laplacian kernel.
We also present space-efficient feature maps reducing space-usage of FMs from O(dD) to O(d) for input dimension d.
Can achieve high classification accuracy by training linear SVM with mapped vectors
The first result shows the training time for each method in second.
Kernel methods cannot finish within 48hours for large dataset of sports and compounds.
Methods using feature hashing finished within 48hours in all the datasets.
For example, our SFMEDM finished with 9hours for compound datasets and D=16 thousand dimension.
Next result shows that memory usage in megabytes.
Kernel methods consumed a large amount of memory of 654GB and 1.3TB.
On the hand, The space used in methods using FMs is at least one order of magnitude smaller than string kernels.
Our SFMEDM consumed less than 160GB for sports and compound