The document presents approximate tree kernels as a faster alternative to parse tree kernels for machine learning with tree-structured data. Parse tree kernels have quadratic computational complexity that makes them impractical for large trees. Approximate tree kernels speed up computation by selectively ignoring subtrees based on a selection function. Experimental results on synthetic and real-world datasets show approximate tree kernels achieve similar performance to parse tree kernels while reducing runtime by up to three orders of magnitude and memory usage from gigabytes to kilobytes.
1. B ACKGROUND PARSE T REE K ERNELS A PPROXIMATE T REE K ERNELS R ESULTS Conclusion
Approximate Tree Kernels
Konrad Rieck, Tammo Krueger, Ulf Brefeld, Klaus-Robert
¨
Muller
Presented By
Niharjyoti Sarangi
Indian Institute of Technology Madras
April 21, 2012
2. B ACKGROUND PARSE T REE K ERNELS A PPROXIMATE T REE K ERNELS R ESULTS Conclusion
O UTLINE OF THE P RESENTATION
1 B ACKGROUND
Learning from tree-structured data
Application Domains
2 PARSE T REE K ERNELS
Computing PTK
Computational constraints
3 A PPROXIMATE T REE K ERNELS
Computing ATK
Validity of ATK
Types of learning
4 R ESULTS
Performance
Time
Memory
5 Conclusion
3. B ACKGROUND PARSE T REE K ERNELS A PPROXIMATE T REE K ERNELS R ESULTS Conclusion
T REE - STRUCTURED DATA
Trees: carry hierarchical information
Flat feature Vectors: Fail to capture the underlying
dependency structure
Parse Tree
An ordered, rooted tree that represents the syntactic structure
of a string according to some formal grammar.
A tree X is called a parse tree of G = (S, P, s) if X is derived by
assembling productions p ∈ P such that every node x ∈ X is
labeled with a symbol l(x) ∈ S.
4. B ACKGROUND PARSE T REE K ERNELS A PPROXIMATE T REE K ERNELS R ESULTS Conclusion
E XAMPLES
Figure: Parse trees for natural language text and the HTTP network
protocol.
5. B ACKGROUND PARSE T REE K ERNELS A PPROXIMATE T REE K ERNELS R ESULTS Conclusion
L EARNING FROM TREES
Kernel functions for Structured data
Convolution of local kernels
Parse tree kernel proposed by Collins and Duffy(2002)
Kernel Functions
k : X × X → R is a symmetric and positive semi-definite
function, which implicitly computes an inner product in a
reproducing kernel Hilbert space
6. B ACKGROUND PARSE T REE K ERNELS A PPROXIMATE T REE K ERNELS R ESULTS Conclusion
A PPLICATION D OMAINS
Natural Language Processing
Web Spam Detection
Network Intrusion Detection
Information Retreival from structured
documents
...
7. B ACKGROUND PARSE T REE K ERNELS A PPROXIMATE T REE K ERNELS R ESULTS Conclusion
C OMPUTING PTK
A generic technique for defining kernel functions over
structured data is the convolution of local kernels defined over
sub-structures.
Parse Tree kernel
k(X, Z) = x∈X z∈Z c(x, z) , Where, X and Z are two parse trees.
Notations
xi : i-th child of a node x
|X|: Number of nodes in X
χ: Set of all possible trees
8. B ACKGROUND PARSE T REE K ERNELS A PPROXIMATE T REE K ERNELS R ESULTS Conclusion
I LLUSTRATION
Figure: Shared subtrees in two parse trees.
9. B ACKGROUND PARSE T REE K ERNELS A PPROXIMATE T REE K ERNELS R ESULTS Conclusion
C OUNTING F UNCTION
c(x, z) is known as the counting function which recursively
determines the number of shared subtrees rooted in the tree
nodes x and z.
Defining c(x,z)
0 if x,z not derived from same P
c(x, z) = λ if x,z are leaf nodes
|x|
λ i=1 c(xi , zi ) otherwise
0 ≤ λ ≤ 1 , balances the contribution of subtrees, such that
small values of decay the contribution of lower nodes in large
subtrees
10. B ACKGROUND PARSE T REE K ERNELS A PPROXIMATE T REE K ERNELS R ESULTS Conclusion
C OMPUTATIONAL COMPLEXITY
The complexity is (n2 ), where n is the number of nodes
in each parse tree.
Experimental data
The computation of a parse tree kernel for two HTML documents
comprising 10,000 nodes each, requires about 1 gigabyte of memory
and takes over 100 seconds on a recent computer system.
We need to compare a large number of parse trees. Going
by the above statistics, the use of PTKs are rendered to be
of no practical significance because of the computing
resources required.
11. B ACKGROUND PARSE T REE K ERNELS A PPROXIMATE T REE K ERNELS R ESULTS Conclusion
ATTEMPTED IMPROVEMENTS
A feature selection procedure based on statistical tests.
Suzuki et.al.
Limiting computation to node pairs with matching grammar
symbols. Moschitti
12. B ACKGROUND PARSE T REE K ERNELS A PPROXIMATE T REE K ERNELS R ESULTS Conclusion
C OMPUTING ATK
Approximation of tree kernels is based on the observation that
trees often contain redundant parts that are not only irrelevant
for the learning task but also slow-down the kernel
computation unnecessarily.
Approximate Tree kernel
ˆ z∈Z ˜(x, z)
k(X,Z)= s∈S w(s) x∈X c , Where, X and Z are two parse trees.
l(x)=s l(z)=s
Selection function: w : S → 0, 1
Controls whether subtrees rooted in nodes with the symbol
s ∈ S contribute to the convolution. (w(s) = 0 or w(s) = 1)
13. B ACKGROUND PARSE T REE K ERNELS A PPROXIMATE T REE K ERNELS R ESULTS Conclusion
A PPROXIMATE C OUNTING F UNCTION
˜(x, z) is the approximate counting function
c
Defining ˜(x, z)
c
0
if x,z not derived from same P
0 if x or z not selected
˜(x, z) =
c
λ
if x,z are leaf nodes
|x|
λ ˜
i=1 c(xi , zi ) otherwise
The selection function w(s) is decided based on the domain and
data. the exact parse tree kernel is obtained as a special case of
ATK if w(s) = 1 for all symbols s ∈ S.
14. B ACKGROUND PARSE T REE K ERNELS A PPROXIMATE T REE K ERNELS R ESULTS Conclusion
ATK IS A VALID KERNEL
Proof
Let Φ(X) be the vector of frequencies of all subtrees occurring
ˆ
in X. Then, by definition, Kw can always be written as
ˆ
Kw = Pw Φ(X), Pw Φ(Z) ,
For any w, the projection Pw is independent of the actual X and
ˆ
Z, and hence Kw is a valid kernel.
15. B ACKGROUND PARSE T REE K ERNELS A PPROXIMATE T REE K ERNELS R ESULTS Conclusion
ATK IS FASTER THAN PTK
Speed up factor qw
s∈S #s (X)#s (Z)
qw =
s∈S ws #s (X)#s (Z)
Where #s (X) denotes the occurances of nodes x ∈ X that were selected.
Looking at the above equation, we can argue that even if only
one symbol is rejected in Approximate Tree Kernel, we get a
speedup qw ≥ 1.
16. B ACKGROUND PARSE T REE K ERNELS A PPROXIMATE T REE K ERNELS R ESULTS Conclusion
S UPERVISED S ETTING
Given n labeled parse trees (X1 , y1 ), · · · , (Xn , yn ), where yi are
the class labels.
An ideal kernel gram matrix Y is given as follows:
Yij = [|yi = yj |] − [|yi = yj |]
Kernel Target alignment
ˆ
Y, Kw = ˆ
Ki j − ˆ
Ki j
F
yi =yj yi =yj
Our target now is to maximize the above term w.r.t w.
17. B ACKGROUND PARSE T REE K ERNELS A PPROXIMATE T REE K ERNELS R ESULTS Conclusion
S UPERVISED S ETTING ( CONTD .)
Optimization Problem
n
w = argmaxw∈[0,1]|S| w(S) ˜(x, z)
c
i,j=1 s∈S x∈Xi z∈Zi
i=j l(x)=s l(z)=s
subject to,
w(s) ≤ N, N∈N
s∈S
18. B ACKGROUND PARSE T REE K ERNELS A PPROXIMATE T REE K ERNELS R ESULTS Conclusion
U NSUPERVISED S ETTING
Average Frequency of Node comparison
n
1
f (s) = #s (Xi )#s (Xj )
n2
i,j=1
ExpectedNodeComparisons
ComparisonRatio(ρ) =
ActualNumberOfComparisonsinPTK
19. B ACKGROUND PARSE T REE K ERNELS A PPROXIMATE T REE K ERNELS R ESULTS Conclusion
U NSUPERVISED S ETTING ( CONTD .)
Optimization Problem
n
w = argmaxw∈[0,1]|S| w(S) ˜(x, z)
c
i,j=1 s∈S x∈Xi z∈Zi
i=j l(x)=s l(z)=s
subject to,
s∈S w(s)f (s)
≤ρ
s∈S f (s)
20. B ACKGROUND PARSE T REE K ERNELS A PPROXIMATE T REE K ERNELS R ESULTS Conclusion
S YNTHETIC D ATA
Figure: Classification performance Figure: Detection performance for
for the supervised synthetic data. the unsupervised synthetic data.
21. B ACKGROUND PARSE T REE K ERNELS A PPROXIMATE T REE K ERNELS R ESULTS Conclusion
R EAL D ATA
Figure: Classification performance Figure: Detection performance for
for question classification task. the intrusion detection task (FTP).
22. B ACKGROUND PARSE T REE K ERNELS A PPROXIMATE T REE K ERNELS R ESULTS Conclusion
T IME
Figure: Training and testing time of SVMs using the exact and the
approximate tree kernel.
23. B ACKGROUND PARSE T REE K ERNELS A PPROXIMATE T REE K ERNELS R ESULTS Conclusion
T IME ( COND .)
Figure: Run-times for web spam (WS) and intrusion detection (ID).
24. B ACKGROUND PARSE T REE K ERNELS A PPROXIMATE T REE K ERNELS R ESULTS Conclusion
M EMORY
Figure: Memory requirements for web spam (WS) and intrusion
detection (ID).
25. B ACKGROUND PARSE T REE K ERNELS A PPROXIMATE T REE K ERNELS R ESULTS Conclusion
C ONCLUSION
Approximate Parse tree Kernels give us a fast and efficient
way to work with parse trees.
Improvements in terms of run-time and memory
requirements. For large trees, the approximation reduces a
single kernel computation from 1 gigabyte to less than 800
kilobytes, accompanied by run-time improvements up to
three orders of magnitude.
Best results were obtained for Network Intrusion
Detection.
26. B ACKGROUND PARSE T REE K ERNELS A PPROXIMATE T REE K ERNELS R ESULTS Conclusion
Q UESTIONS
Any Questions ???