Report

Share

Follow

•0 likes•253 views

•0 likes•253 views

Report

Share

Download to read offline

Presentation Slides at ICDM'19

Follow

- 1. Space-efficient Feature Maps for String Alignment Kernels Yasuo Tabei (RIKEN-AIP) Joint work with Yoshihiro Yamanishi (Kyutech) Rasmus Pagh (IT University of Copenhagen) ICDM’19@Beijing, Nov. 10th, 2019
- 2. Kernel methods • Kernels are inner product in some feature space H: 𝐾 𝑥, 𝑥′ =< φ 𝑥 , φ 𝑥′ > • Intuitively, kernel is a measure of similarity of x and x’ • x and x’ can be vectors, trees, graphs. • x and x’ are strings in this talk. • Kernels are useful for - Classification (SVM), Regression, Feature selection, Two-sample problems, etc.
- 3. String alignment kernels • Typical string kernel uses substring (k-mers) features • Alignment kernel uses string alignment (e.g., edit distance) as a similarity measure • It has a wide variety of applications in string processing E.g.) text classification, remote homology detection for proteins/DNA [BMCBioinfo.’06], etc • Advantage: High prediction accuracy • Drawback: Large computation complexity Square time to the length of strings (dynamic programming) Quadratic time in the number of training data
- 4. Feature maps (FMs) for kernel approximations [A. Rahimi and B. Recht, NIPS, 2007] • FMs map d-dimensional vector 𝑥 ∈ 𝑅 𝑑 into D-dimensional vector φ(𝑥) ∈ 𝑅 𝐷 using O(d×D) memory and time • It can approximate kernel function k(x,y) by the inner product of compact vectors 𝜑(𝑥)・φ(𝑦) • Linear model 𝑓𝑙 𝑥 = 𝑤・φ 𝑥 has approximately the same functionality as nonlinear model 𝑓 𝑛 𝑥 = 𝑖 𝑘(𝑥, 𝑦𝑖) • Advantage: can enhance the scalability of kernel methods (i)Input vectors (ii)Compact vectors (iii)Linear model Map Learn model weight w 𝑓𝑙 𝑥 = 𝑤・φ 𝑥
- 5. Existing feature maps (FMs) • Several FMs with different input formats and kernel similarities have been proposed • No previous work has been able to approximate alignment kernels Method Kernel Input b-bit MinHash [Li'11] Jaccard Binary vector Tensor Sketching [Pham'13] Polynomial Real vector 0-bit CWS [Li’15] Min-Max Real vector C-Hash [Mu'12] Cosine Real vector Random Feature [Rahimi'07] RBF Real vector RFM [Kar'12] Dot product Real vector PWLSGD [Maji'09] Intersection Real vector
- 6. Space-efficient feature maps for string alignment kernels • Basic idea : use two hash functions of (i) edit-sensitive parsing (ESP) and (ii) feature maps (FMs) for Laplacian kernel • Feature maps consumes a large amount of memory O(d×D) memory for input dimension d and output dimension D • Present space-efficient FMs using O(d) memory • Can achieve high classification accuracy by training linear SVM with compact vectors S1 = ABRACADA S2 = ABRA S3 = ABRACA S4 = ATGCAGA S5 = BARACR x1 = (3, 1, 3) x2 = (2, 4, 1) x3 = (5, 1, 2) x4 = (1, 0, 0) x5 = (2, 2, 1) z1 = (1.2,0.1,1,2) z2 = (2,1,1.2,3.4) z3 = (-1.2,0,2.2,3) z4 = (-3.2,0,2.2,1) z5 = (2, 2, -1.2, 0) F(zi)=wzi (i) ESP (ii) FMs (iii) Learn linear model
- 7. Edit-sensitive parsing (ESP) [G.Cormode and S.Mushukrishnan, 2007] • Build a single parse tree from input string S • Build a parse tree from the bottom (leaves) to the top (root) • Nodes with the same node label are built from pairs of the same symbol pairs • Can be used for mapping string S into integer vector x – Each element of x is the number of appearances of node labels • Can approximate edit distance with moves EDM as L1 distance between mapped vectors (i.e., EDM(Si,Sj)≒||xi-xj||1) • Computational time is linear to the length of S ABBAABA B B A B X1 X2 X3 X4 X5X6 x = (4, 5, 2, 1, 1, 1, 1, 1)
- 8. ESP for mapping strings into integer vectors Step1 Given S and S’, make vectors V(S) and V(S’) each dimension of which is the number of each characters in S and S’. A B S=ABABABBAB → V(S) = (4,5) S’=ABBABAB → V(S’) = (3,4) ABBAABA B B Level1 Level2 A BBABA B Level1 Level2 Step2 Assign each pair or triple to the same non-terminal symbol Step3 Count the number of each node label and update vectors V(S) and V(S’) A B X1 X2 X3 X4 V(S) = (4, 5, 2, 1, 1, 0) V(S’)= (5, 4, 2, 0, 0, 1) Step4 Replace strings by the sequence of node labels in level2 S = X1X2X3X1 S’= X1X4X1 Goto step2
- 9. Feature maps for string alignment kernels • EDM is approximated by L1-distance with mapped vectors as EDM(Si,Sj)≒||xi-xj||1 • Alignment kernel is defined as K(Si,Sj) = exp(-||xi-xj||1/β)) (Laplacian kernel) • Feature maps (FMs) can approximate Laplacian kernel as exp(-||xi-xj||1/β)≒<zi,zj> • FMs are space-inefficient with O(dD) memory for input dimension d and output dimension D Fast food approach [ICML’13] can approximate feature maps for RBF kernels S1 = ABRACADA S2 = ABRA S3 = ABRACA x1 = (3, 1, 3) x2 = (2, 4, 1) x3 = (5, 1, 2) z1 = (1.2,0.1,1,2) z2 = (2,1,1.2,3.4) z3 = (-1.2,0,2.2,3) F(zi)=wzi (i)ESP (ii) FMs (iii) Learn linear model
- 10. Space-efficient FMs (Beliefly) • Basic idea: reduce random matrix R of size D×d in standard FMs to random matrix M of size t×d • Approximate R[i,j] element using polynomial equation: R[i, j] ≒ M[i,1] + M[i,2]・j1 + ・・・+ M[i,t-1]・jt-1 t-wise independent family distribution • Theoretical gurantee (concentration bound): Pr[|𝑧 𝑥 ′ 𝑧 𝑦 − 𝑘(𝑥, 𝑦)| ≥ ε] ≤ 2/(ε2 /𝐷) Rd D Md t Random matrix R for standard FMs O(D×d) memory Random matrix M for Space-efficient FMs O(t×d) memory
- 11. Experiments • 5 massive string datasets in real world • Competitors 5 SVMs with string kernels: LAK [Bioinfo’08], GAK [ICML’11], ESP+Kernel, CGK+Kernel, stk17 [NIPS’17] FMs for alignment kernels: D2KE [KDD’19] SFMEDM: proposed
- 12. Training time in second
- 13. Memory
- 14. Classification accuracy in AUC score
- 15. Summary • Space-efficient feature maps for string alignment kernels • Use two hash functions – ESP: maps strings into integer vectors – Feature maps: maps integer vectors into feature vectors • Linear SVMs are trained on feature vectors – Linear SVMs behaves such as non-linear SVM with alignment kernels • Advantage: highly scalable • Code and datasets are available: https://sites.google.com/ view/alignmentkernels/home

- Thank you for your kind introduction. Today I’m going to talk about feature maps for string alignment kernels. Our method can solve large-scale machine learning problems on strings. This is joint work with Yoshihiro Yamanishi from Kyushu Institute of Technology and Ramus Pagh from IT University of Copenhagen.
- First, I will present a brief introduction of kernel methods.
- it can approximate non-linear function or decision boundary well with enough training data and can achieve high prediction accuracy
- To solve the scalability issue on kernel methods, feature maps for kernel approximations have been proposed by A. Rahimi and B. Recht, NIPS, 2007. FMs map d-dimensional vector x into D-dimensional vector φ(x) ∈ RD It can approximate kernel function k(x,y) by the inner product of mapped vectors Thus, linear model has approximately the same functionality as nonlinear model Advantage is that it can enhance the scalability of kernel methods.
- Several FMs with different input formats and kernel similarities have been proposed. No previous work has been able to approximate string alignment kernels.
- That’s why we present large-scale string classification by presenting space-efficient FMs for string alignment kernels. We use two hash functions of edit-sensitive parsing (ESP) and feature maps for (FMs) for an Laplacian kernel. We also present space-efficient feature maps reducing space-usage of FMs from O(dD) to O(d) for input dimension d. Can achieve high classification accuracy by training linear SVM with mapped vectors
- That’s why we present large-scale string classification by presenting space-efficient FMs for string alignment kernels. We use two hash functions of edit-sensitive parsing (ESP) and feature maps for (FMs) for an Laplacian kernel. We also present space-efficient feature maps reducing space-usage of FMs from O(dD) to O(d) for input dimension d. Can achieve high classification accuracy by training linear SVM with mapped vectors
- The first result shows the training time for each method in second. Kernel methods cannot finish within 48hours for large dataset of sports and compounds. Methods using feature hashing finished within 48hours in all the datasets. For example, our SFMEDM finished with 9hours for compound datasets and D=16 thousand dimension.
- Next result shows that memory usage in megabytes. Kernel methods consumed a large amount of memory of 654GB and 1.3TB. On the hand, The space used in methods using FMs is at least one order of magnitude smaller than string kernels. Our SFMEDM consumed less than 160GB for sports and compound
- The final figure shows that the