SlideShare a Scribd company logo
1 of 26
Download to read offline
B ACKGROUND    PARSE T REE K ERNELS      A PPROXIMATE T REE K ERNELS   R ESULTS   Conclusion




                      Approximate Tree Kernels
         Konrad Rieck, Tammo Krueger, Ulf Brefeld, Klaus-Robert
                                 ¨
                               Muller


                                   Presented By
                               Niharjyoti Sarangi
                      Indian Institute of Technology Madras




                                      April 21, 2012
B ACKGROUND    PARSE T REE K ERNELS   A PPROXIMATE T REE K ERNELS   R ESULTS   Conclusion



     O UTLINE OF THE P RESENTATION
     1   B ACKGROUND
            Learning from tree-structured data
            Application Domains
     2   PARSE T REE K ERNELS
           Computing PTK
           Computational constraints
     3   A PPROXIMATE T REE K ERNELS
           Computing ATK
           Validity of ATK
           Types of learning
     4   R ESULTS
            Performance
            Time
            Memory
     5   Conclusion
B ACKGROUND        PARSE T REE K ERNELS   A PPROXIMATE T REE K ERNELS   R ESULTS   Conclusion



     T REE - STRUCTURED DATA


              Trees: carry hierarchical information
              Flat feature Vectors: Fail to capture the underlying
              dependency structure

     Parse Tree
     An ordered, rooted tree that represents the syntactic structure
     of a string according to some formal grammar.


     A tree X is called a parse tree of G = (S, P, s) if X is derived by
     assembling productions p ∈ P such that every node x ∈ X is
     labeled with a symbol l(x) ∈ S.
B ACKGROUND    PARSE T REE K ERNELS   A PPROXIMATE T REE K ERNELS   R ESULTS   Conclusion



     E XAMPLES




     Figure: Parse trees for natural language text and the HTTP network
     protocol.
B ACKGROUND        PARSE T REE K ERNELS   A PPROXIMATE T REE K ERNELS   R ESULTS   Conclusion



     L EARNING FROM TREES


              Kernel functions for Structured data
              Convolution of local kernels
              Parse tree kernel proposed by Collins and Duffy(2002)




     Kernel Functions
     k : X × X → R is a symmetric and positive semi-definite
     function, which implicitly computes an inner product in a
     reproducing kernel Hilbert space
B ACKGROUND     PARSE T REE K ERNELS   A PPROXIMATE T REE K ERNELS   R ESULTS   Conclusion



     A PPLICATION D OMAINS




         Natural Language Processing
         Web Spam Detection
         Network Intrusion Detection
         Information Retreival from structured
         documents
         ...
B ACKGROUND    PARSE T REE K ERNELS        A PPROXIMATE T REE K ERNELS             R ESULTS   Conclusion



     C OMPUTING PTK


     A generic technique for defining kernel functions over
     structured data is the convolution of local kernels defined over
     sub-structures.
     Parse Tree kernel
     k(X, Z) = x∈X           z∈Z c(x, z)   , Where, X and Z are two parse trees.

     Notations
     xi : i-th child of a node x
     |X|: Number of nodes in X
     χ: Set of all possible trees
B ACKGROUND   PARSE T REE K ERNELS   A PPROXIMATE T REE K ERNELS   R ESULTS   Conclusion



     I LLUSTRATION




                  Figure: Shared subtrees in two parse trees.
B ACKGROUND   PARSE T REE K ERNELS         A PPROXIMATE T REE K ERNELS   R ESULTS   Conclusion



     C OUNTING F UNCTION

     c(x, z) is known as the counting function which recursively
     determines the number of shared subtrees rooted in the tree
     nodes x and z.
     Defining c(x,z)

                
                 0                             if x,z not derived from same P
      c(x, z) =   λ                             if x,z are leaf nodes
                        |x|
                  λ      i=1 c(xi , zi )        otherwise



     0 ≤ λ ≤ 1 , balances the contribution of subtrees, such that
     small values of decay the contribution of lower nodes in large
     subtrees
B ACKGROUND        PARSE T REE K ERNELS     A PPROXIMATE T REE K ERNELS   R ESULTS   Conclusion



     C OMPUTATIONAL COMPLEXITY


              The complexity is           (n2 ), where n is the number of nodes
              in each parse tree.

     Experimental data
     The computation of a parse tree kernel for two HTML documents
     comprising 10,000 nodes each, requires about 1 gigabyte of memory
     and takes over 100 seconds on a recent computer system.

              We need to compare a large number of parse trees. Going
              by the above statistics, the use of PTKs are rendered to be
              of no practical significance because of the computing
              resources required.
B ACKGROUND        PARSE T REE K ERNELS   A PPROXIMATE T REE K ERNELS   R ESULTS   Conclusion



     ATTEMPTED IMPROVEMENTS




              A feature selection procedure based on statistical tests.
              Suzuki et.al.
              Limiting computation to node pairs with matching grammar
              symbols. Moschitti
B ACKGROUND    PARSE T REE K ERNELS      A PPROXIMATE T REE K ERNELS             R ESULTS          Conclusion



     C OMPUTING ATK


     Approximation of tree kernels is based on the observation that
     trees often contain redundant parts that are not only irrelevant
     for the learning task but also slow-down the kernel
     computation unnecessarily.

     Approximate Tree kernel
     ˆ                                    z∈Z ˜(x, z)
     k(X,Z)=   s∈S w(s)          x∈X            c          , Where, X and Z are two parse trees.
                                l(x)=s   l(z)=s


     Selection function: w : S → 0, 1
     Controls whether subtrees rooted in nodes with the symbol
     s ∈ S contribute to the convolution. (w(s) = 0 or w(s) = 1)
B ACKGROUND    PARSE T REE K ERNELS         A PPROXIMATE T REE K ERNELS   R ESULTS   Conclusion



     A PPROXIMATE C OUNTING F UNCTION

     ˜(x, z) is the approximate counting function
     c

     Defining ˜(x, z)
             c

                 
                  0
                 
                                                if x,z not derived from same P
                  0                            if x or z not selected
       ˜(x, z) =
       c
                  λ
                                               if x,z are leaf nodes
                         |x|
                   λ          ˜
                          i=1 c(xi , zi )       otherwise



     The selection function w(s) is decided based on the domain and
     data. the exact parse tree kernel is obtained as a special case of
     ATK if w(s) = 1 for all symbols s ∈ S.
B ACKGROUND    PARSE T REE K ERNELS   A PPROXIMATE T REE K ERNELS   R ESULTS   Conclusion



     ATK IS A VALID KERNEL




     Proof
     Let Φ(X) be the vector of frequencies of all subtrees occurring
                               ˆ
     in X. Then, by definition, Kw can always be written as
            ˆ
            Kw = Pw Φ(X), Pw Φ(Z) ,
     For any w, the projection Pw is independent of the actual X and
                  ˆ
     Z, and hence Kw is a valid kernel.
B ACKGROUND   PARSE T REE K ERNELS          A PPROXIMATE T REE K ERNELS             R ESULTS   Conclusion



     ATK IS FASTER THAN PTK



     Speed up factor qw

                                            s∈S #s (X)#s (Z)
                             qw =
                                          s∈S ws #s (X)#s (Z)
                 Where #s (X) denotes the occurances of nodes x ∈ X that were selected.




     Looking at the above equation, we can argue that even if only
     one symbol is rejected in Approximate Tree Kernel, we get a
     speedup qw ≥ 1.
B ACKGROUND     PARSE T REE K ERNELS       A PPROXIMATE T REE K ERNELS           R ESULTS   Conclusion



     S UPERVISED S ETTING


     Given n labeled parse trees (X1 , y1 ), · · · , (Xn , yn ), where yi are
     the class labels.
     An ideal kernel gram matrix Y is given as follows:

                              Yij = [|yi = yj |] − [|yi = yj |]


     Kernel Target alignment

                                 ˆ
                              Y, Kw        =            ˆ
                                                        Ki j −            ˆ
                                                                          Ki j
                                       F
                                               yi =yj            yi =yj

     Our target now is to maximize the above term w.r.t w.
B ACKGROUND     PARSE T REE K ERNELS        A PPROXIMATE T REE K ERNELS        R ESULTS   Conclusion



     S UPERVISED S ETTING ( CONTD .)



     Optimization Problem
                                             n
              w = argmaxw∈[0,1]|S|                     w(S)                    ˜(x, z)
                                                                               c
                                           i,j=1 s∈S            x∈Xi z∈Zi
                                            i=j                l(x)=s l(z)=s

     subject to,
                                       w(s) ≤ N,            N∈N
                               s∈S
B ACKGROUND     PARSE T REE K ERNELS        A PPROXIMATE T REE K ERNELS   R ESULTS   Conclusion



     U NSUPERVISED S ETTING



     Average Frequency of Node comparison
                                              n
                                       1
                             f (s) =                #s (Xi )#s (Xj )
                                       n2
                                            i,j=1




                                           ExpectedNodeComparisons
          ComparisonRatio(ρ) =
                                       ActualNumberOfComparisonsinPTK
B ACKGROUND     PARSE T REE K ERNELS      A PPROXIMATE T REE K ERNELS        R ESULTS   Conclusion



     U NSUPERVISED S ETTING ( CONTD .)



     Optimization Problem
                                           n
              w = argmaxw∈[0,1]|S|                   w(S)                    ˜(x, z)
                                                                             c
                                         i,j=1 s∈S            x∈Xi z∈Zi
                                          i=j                l(x)=s l(z)=s

     subject to,
                                       s∈S w(s)f (s)
                                                        ≤ρ
                                         s∈S f (s)
B ACKGROUND   PARSE T REE K ERNELS   A PPROXIMATE T REE K ERNELS   R ESULTS   Conclusion



     S YNTHETIC D ATA




  Figure: Classification performance           Figure: Detection performance for
  for the supervised synthetic data.          the unsupervised synthetic data.
B ACKGROUND   PARSE T REE K ERNELS   A PPROXIMATE T REE K ERNELS   R ESULTS   Conclusion



     R EAL D ATA




  Figure: Classification performance           Figure: Detection performance for
  for question classification task.            the intrusion detection task (FTP).
B ACKGROUND    PARSE T REE K ERNELS   A PPROXIMATE T REE K ERNELS   R ESULTS   Conclusion



     T IME




     Figure: Training and testing time of SVMs using the exact and the
     approximate tree kernel.
B ACKGROUND     PARSE T REE K ERNELS   A PPROXIMATE T REE K ERNELS   R ESULTS   Conclusion



     T IME ( COND .)




       Figure: Run-times for web spam (WS) and intrusion detection (ID).
B ACKGROUND    PARSE T REE K ERNELS   A PPROXIMATE T REE K ERNELS   R ESULTS   Conclusion



     M EMORY




     Figure: Memory requirements for web spam (WS) and intrusion
     detection (ID).
B ACKGROUND        PARSE T REE K ERNELS   A PPROXIMATE T REE K ERNELS   R ESULTS   Conclusion



     C ONCLUSION



              Approximate Parse tree Kernels give us a fast and efficient
              way to work with parse trees.
              Improvements in terms of run-time and memory
              requirements. For large trees, the approximation reduces a
              single kernel computation from 1 gigabyte to less than 800
              kilobytes, accompanied by run-time improvements up to
              three orders of magnitude.
              Best results were obtained for Network Intrusion
              Detection.
B ACKGROUND   PARSE T REE K ERNELS       A PPROXIMATE T REE K ERNELS   R ESULTS   Conclusion



     Q UESTIONS




                                     Any Questions ???

More Related Content

What's hot

UMAP - Mathematics and implementational details
UMAP - Mathematics and implementational detailsUMAP - Mathematics and implementational details
UMAP - Mathematics and implementational detailsUmberto Lupo
 
Semi-Supervised Regression using Cluster Ensemble
Semi-Supervised Regression using Cluster EnsembleSemi-Supervised Regression using Cluster Ensemble
Semi-Supervised Regression using Cluster EnsembleAlexander Litvinenko
 
Maximum likelihood estimation of regularisation parameters in inverse problem...
Maximum likelihood estimation of regularisation parameters in inverse problem...Maximum likelihood estimation of regularisation parameters in inverse problem...
Maximum likelihood estimation of regularisation parameters in inverse problem...Valentin De Bortoli
 
IJCER (www.ijceronline.com) International Journal of computational Engineerin...
IJCER (www.ijceronline.com) International Journal of computational Engineerin...IJCER (www.ijceronline.com) International Journal of computational Engineerin...
IJCER (www.ijceronline.com) International Journal of computational Engineerin...ijceronline
 
Artificial Intelligence 06.3 Bayesian Networks - Belief Propagation - Junctio...
Artificial Intelligence 06.3 Bayesian Networks - Belief Propagation - Junctio...Artificial Intelligence 06.3 Bayesian Networks - Belief Propagation - Junctio...
Artificial Intelligence 06.3 Bayesian Networks - Belief Propagation - Junctio...Andres Mendez-Vazquez
 
Logistic Regression(SGD)
Logistic Regression(SGD)Logistic Regression(SGD)
Logistic Regression(SGD)Prentice Xu
 
IVR - Chapter 1 - Introduction
IVR - Chapter 1 - IntroductionIVR - Chapter 1 - Introduction
IVR - Chapter 1 - IntroductionCharles Deledalle
 
Rank awarealgs small11
Rank awarealgs small11Rank awarealgs small11
Rank awarealgs small11Jules Esp
 
04 greedyalgorithmsii 2x2
04 greedyalgorithmsii 2x204 greedyalgorithmsii 2x2
04 greedyalgorithmsii 2x2MuradAmn
 
16 Machine Learning Universal Approximation Multilayer Perceptron
16 Machine Learning Universal Approximation Multilayer Perceptron16 Machine Learning Universal Approximation Multilayer Perceptron
16 Machine Learning Universal Approximation Multilayer PerceptronAndres Mendez-Vazquez
 
Matrix Computations in Machine Learning
Matrix Computations in Machine LearningMatrix Computations in Machine Learning
Matrix Computations in Machine Learningbutest
 
Talk given at the Twelfth Workshop on Non-Perurbative Quantum Chromodynamics ...
Talk given at the Twelfth Workshop on Non-Perurbative Quantum Chromodynamics ...Talk given at the Twelfth Workshop on Non-Perurbative Quantum Chromodynamics ...
Talk given at the Twelfth Workshop on Non-Perurbative Quantum Chromodynamics ...Marco Frasca
 
Lecture4 kenrels functions_rkhs
Lecture4 kenrels functions_rkhsLecture4 kenrels functions_rkhs
Lecture4 kenrels functions_rkhsStéphane Canu
 
Birkhoff coordinates for the Toda Lattice in the limit of infinitely many par...
Birkhoff coordinates for the Toda Lattice in the limit of infinitely many par...Birkhoff coordinates for the Toda Lattice in the limit of infinitely many par...
Birkhoff coordinates for the Toda Lattice in the limit of infinitely many par...Alberto Maspero
 
A discussion on sampling graphs to approximate network classification functions
A discussion on sampling graphs to approximate network classification functionsA discussion on sampling graphs to approximate network classification functions
A discussion on sampling graphs to approximate network classification functionsLARCA UPC
 
Security of continuous variable quantum key distribution against general attacks
Security of continuous variable quantum key distribution against general attacksSecurity of continuous variable quantum key distribution against general attacks
Security of continuous variable quantum key distribution against general attackswtyru1989
 
Developing fast low-rank tensor methods for solving PDEs with uncertain coef...
Developing fast  low-rank tensor methods for solving PDEs with uncertain coef...Developing fast  low-rank tensor methods for solving PDEs with uncertain coef...
Developing fast low-rank tensor methods for solving PDEs with uncertain coef...Alexander Litvinenko
 
從 VAE 走向深度學習新理論
從 VAE 走向深度學習新理論從 VAE 走向深度學習新理論
從 VAE 走向深度學習新理論岳華 杜
 
(C f)- weak contraction in cone metric spaces
(C f)- weak contraction in cone metric spaces(C f)- weak contraction in cone metric spaces
(C f)- weak contraction in cone metric spacesAlexander Decker
 

What's hot (20)

UMAP - Mathematics and implementational details
UMAP - Mathematics and implementational detailsUMAP - Mathematics and implementational details
UMAP - Mathematics and implementational details
 
Semi-Supervised Regression using Cluster Ensemble
Semi-Supervised Regression using Cluster EnsembleSemi-Supervised Regression using Cluster Ensemble
Semi-Supervised Regression using Cluster Ensemble
 
Maximum likelihood estimation of regularisation parameters in inverse problem...
Maximum likelihood estimation of regularisation parameters in inverse problem...Maximum likelihood estimation of regularisation parameters in inverse problem...
Maximum likelihood estimation of regularisation parameters in inverse problem...
 
IJCER (www.ijceronline.com) International Journal of computational Engineerin...
IJCER (www.ijceronline.com) International Journal of computational Engineerin...IJCER (www.ijceronline.com) International Journal of computational Engineerin...
IJCER (www.ijceronline.com) International Journal of computational Engineerin...
 
Ak03102250234
Ak03102250234Ak03102250234
Ak03102250234
 
Artificial Intelligence 06.3 Bayesian Networks - Belief Propagation - Junctio...
Artificial Intelligence 06.3 Bayesian Networks - Belief Propagation - Junctio...Artificial Intelligence 06.3 Bayesian Networks - Belief Propagation - Junctio...
Artificial Intelligence 06.3 Bayesian Networks - Belief Propagation - Junctio...
 
Logistic Regression(SGD)
Logistic Regression(SGD)Logistic Regression(SGD)
Logistic Regression(SGD)
 
IVR - Chapter 1 - Introduction
IVR - Chapter 1 - IntroductionIVR - Chapter 1 - Introduction
IVR - Chapter 1 - Introduction
 
Rank awarealgs small11
Rank awarealgs small11Rank awarealgs small11
Rank awarealgs small11
 
04 greedyalgorithmsii 2x2
04 greedyalgorithmsii 2x204 greedyalgorithmsii 2x2
04 greedyalgorithmsii 2x2
 
16 Machine Learning Universal Approximation Multilayer Perceptron
16 Machine Learning Universal Approximation Multilayer Perceptron16 Machine Learning Universal Approximation Multilayer Perceptron
16 Machine Learning Universal Approximation Multilayer Perceptron
 
Matrix Computations in Machine Learning
Matrix Computations in Machine LearningMatrix Computations in Machine Learning
Matrix Computations in Machine Learning
 
Talk given at the Twelfth Workshop on Non-Perurbative Quantum Chromodynamics ...
Talk given at the Twelfth Workshop on Non-Perurbative Quantum Chromodynamics ...Talk given at the Twelfth Workshop on Non-Perurbative Quantum Chromodynamics ...
Talk given at the Twelfth Workshop on Non-Perurbative Quantum Chromodynamics ...
 
Lecture4 kenrels functions_rkhs
Lecture4 kenrels functions_rkhsLecture4 kenrels functions_rkhs
Lecture4 kenrels functions_rkhs
 
Birkhoff coordinates for the Toda Lattice in the limit of infinitely many par...
Birkhoff coordinates for the Toda Lattice in the limit of infinitely many par...Birkhoff coordinates for the Toda Lattice in the limit of infinitely many par...
Birkhoff coordinates for the Toda Lattice in the limit of infinitely many par...
 
A discussion on sampling graphs to approximate network classification functions
A discussion on sampling graphs to approximate network classification functionsA discussion on sampling graphs to approximate network classification functions
A discussion on sampling graphs to approximate network classification functions
 
Security of continuous variable quantum key distribution against general attacks
Security of continuous variable quantum key distribution against general attacksSecurity of continuous variable quantum key distribution against general attacks
Security of continuous variable quantum key distribution against general attacks
 
Developing fast low-rank tensor methods for solving PDEs with uncertain coef...
Developing fast  low-rank tensor methods for solving PDEs with uncertain coef...Developing fast  low-rank tensor methods for solving PDEs with uncertain coef...
Developing fast low-rank tensor methods for solving PDEs with uncertain coef...
 
從 VAE 走向深度學習新理論
從 VAE 走向深度學習新理論從 VAE 走向深度學習新理論
從 VAE 走向深度學習新理論
 
(C f)- weak contraction in cone metric spaces
(C f)- weak contraction in cone metric spaces(C f)- weak contraction in cone metric spaces
(C f)- weak contraction in cone metric spaces
 

Similar to Approximate Tree Kernels

Kernel Entropy Component Analysis in Remote Sensing Data Clustering.pdf
Kernel Entropy Component Analysis in Remote Sensing Data Clustering.pdfKernel Entropy Component Analysis in Remote Sensing Data Clustering.pdf
Kernel Entropy Component Analysis in Remote Sensing Data Clustering.pdfgrssieee
 
Optimal interval clustering: Application to Bregman clustering and statistica...
Optimal interval clustering: Application to Bregman clustering and statistica...Optimal interval clustering: Application to Bregman clustering and statistica...
Optimal interval clustering: Application to Bregman clustering and statistica...Frank Nielsen
 
Divergence center-based clustering and their applications
Divergence center-based clustering and their applicationsDivergence center-based clustering and their applications
Divergence center-based clustering and their applicationsFrank Nielsen
 
Kernels and Support Vector Machines
Kernels and Support Vector  MachinesKernels and Support Vector  Machines
Kernels and Support Vector MachinesEdgar Marca
 
Master Thesis on the Mathematial Analysis of Neural Networks
Master Thesis on the Mathematial Analysis of Neural NetworksMaster Thesis on the Mathematial Analysis of Neural Networks
Master Thesis on the Mathematial Analysis of Neural NetworksAlina Leidinger
 
Hierarchical matrices for approximating large covariance matries and computin...
Hierarchical matrices for approximating large covariance matries and computin...Hierarchical matrices for approximating large covariance matries and computin...
Hierarchical matrices for approximating large covariance matries and computin...Alexander Litvinenko
 
Relaxation methods for the matrix exponential on large networks
Relaxation methods for the matrix exponential on large networksRelaxation methods for the matrix exponential on large networks
Relaxation methods for the matrix exponential on large networksDavid Gleich
 
Least squares support Vector Machine Classifier
Least squares support Vector Machine ClassifierLeast squares support Vector Machine Classifier
Least squares support Vector Machine ClassifierRaj Sikarwar
 
The Probability that a Matrix of Integers Is Diagonalizable
The Probability that a Matrix of Integers Is DiagonalizableThe Probability that a Matrix of Integers Is Diagonalizable
The Probability that a Matrix of Integers Is DiagonalizableJay Liew
 
Numerical Analysis Assignment Help
Numerical Analysis Assignment HelpNumerical Analysis Assignment Help
Numerical Analysis Assignment HelpMath Homework Solver
 
Testing for mixtures by seeking components
Testing for mixtures by seeking componentsTesting for mixtures by seeking components
Testing for mixtures by seeking componentsChristian Robert
 
Kernel Lower Bounds
Kernel Lower BoundsKernel Lower Bounds
Kernel Lower BoundsASPAK2014
 
Rank awarealgs small11
Rank awarealgs small11Rank awarealgs small11
Rank awarealgs small11Jules Esp
 

Similar to Approximate Tree Kernels (20)

kcde
kcdekcde
kcde
 
Kernel Entropy Component Analysis in Remote Sensing Data Clustering.pdf
Kernel Entropy Component Analysis in Remote Sensing Data Clustering.pdfKernel Entropy Component Analysis in Remote Sensing Data Clustering.pdf
Kernel Entropy Component Analysis in Remote Sensing Data Clustering.pdf
 
Optimal interval clustering: Application to Bregman clustering and statistica...
Optimal interval clustering: Application to Bregman clustering and statistica...Optimal interval clustering: Application to Bregman clustering and statistica...
Optimal interval clustering: Application to Bregman clustering and statistica...
 
Divergence center-based clustering and their applications
Divergence center-based clustering and their applicationsDivergence center-based clustering and their applications
Divergence center-based clustering and their applications
 
Kernels and Support Vector Machines
Kernels and Support Vector  MachinesKernels and Support Vector  Machines
Kernels and Support Vector Machines
 
Master Thesis on the Mathematial Analysis of Neural Networks
Master Thesis on the Mathematial Analysis of Neural NetworksMaster Thesis on the Mathematial Analysis of Neural Networks
Master Thesis on the Mathematial Analysis of Neural Networks
 
Hierarchical matrices for approximating large covariance matries and computin...
Hierarchical matrices for approximating large covariance matries and computin...Hierarchical matrices for approximating large covariance matries and computin...
Hierarchical matrices for approximating large covariance matries and computin...
 
Relaxation methods for the matrix exponential on large networks
Relaxation methods for the matrix exponential on large networksRelaxation methods for the matrix exponential on large networks
Relaxation methods for the matrix exponential on large networks
 
Least squares support Vector Machine Classifier
Least squares support Vector Machine ClassifierLeast squares support Vector Machine Classifier
Least squares support Vector Machine Classifier
 
The Probability that a Matrix of Integers Is Diagonalizable
The Probability that a Matrix of Integers Is DiagonalizableThe Probability that a Matrix of Integers Is Diagonalizable
The Probability that a Matrix of Integers Is Diagonalizable
 
Set convergence
Set convergenceSet convergence
Set convergence
 
Numerical Analysis Assignment Help
Numerical Analysis Assignment HelpNumerical Analysis Assignment Help
Numerical Analysis Assignment Help
 
Numerical Analysis Assignment Help
Numerical Analysis Assignment HelpNumerical Analysis Assignment Help
Numerical Analysis Assignment Help
 
Testing for mixtures by seeking components
Testing for mixtures by seeking componentsTesting for mixtures by seeking components
Testing for mixtures by seeking components
 
Kernel Lower Bounds
Kernel Lower BoundsKernel Lower Bounds
Kernel Lower Bounds
 
Rank awarealgs small11
Rank awarealgs small11Rank awarealgs small11
Rank awarealgs small11
 
Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...
Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...
Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...
 
Unit 3
Unit 3Unit 3
Unit 3
 
Unit 3
Unit 3Unit 3
Unit 3
 
Slides4
Slides4Slides4
Slides4
 

Recently uploaded

Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
APIForce Zurich 5 April Automation LPDG
APIForce Zurich 5 April  Automation LPDGAPIForce Zurich 5 April  Automation LPDG
APIForce Zurich 5 April Automation LPDGMarianaLemus7
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr LapshynFwdays
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 

Recently uploaded (20)

Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
APIForce Zurich 5 April Automation LPDG
APIForce Zurich 5 April  Automation LPDGAPIForce Zurich 5 April  Automation LPDG
APIForce Zurich 5 April Automation LPDG
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 

Approximate Tree Kernels

  • 1. B ACKGROUND PARSE T REE K ERNELS A PPROXIMATE T REE K ERNELS R ESULTS Conclusion Approximate Tree Kernels Konrad Rieck, Tammo Krueger, Ulf Brefeld, Klaus-Robert ¨ Muller Presented By Niharjyoti Sarangi Indian Institute of Technology Madras April 21, 2012
  • 2. B ACKGROUND PARSE T REE K ERNELS A PPROXIMATE T REE K ERNELS R ESULTS Conclusion O UTLINE OF THE P RESENTATION 1 B ACKGROUND Learning from tree-structured data Application Domains 2 PARSE T REE K ERNELS Computing PTK Computational constraints 3 A PPROXIMATE T REE K ERNELS Computing ATK Validity of ATK Types of learning 4 R ESULTS Performance Time Memory 5 Conclusion
  • 3. B ACKGROUND PARSE T REE K ERNELS A PPROXIMATE T REE K ERNELS R ESULTS Conclusion T REE - STRUCTURED DATA Trees: carry hierarchical information Flat feature Vectors: Fail to capture the underlying dependency structure Parse Tree An ordered, rooted tree that represents the syntactic structure of a string according to some formal grammar. A tree X is called a parse tree of G = (S, P, s) if X is derived by assembling productions p ∈ P such that every node x ∈ X is labeled with a symbol l(x) ∈ S.
  • 4. B ACKGROUND PARSE T REE K ERNELS A PPROXIMATE T REE K ERNELS R ESULTS Conclusion E XAMPLES Figure: Parse trees for natural language text and the HTTP network protocol.
  • 5. B ACKGROUND PARSE T REE K ERNELS A PPROXIMATE T REE K ERNELS R ESULTS Conclusion L EARNING FROM TREES Kernel functions for Structured data Convolution of local kernels Parse tree kernel proposed by Collins and Duffy(2002) Kernel Functions k : X × X → R is a symmetric and positive semi-definite function, which implicitly computes an inner product in a reproducing kernel Hilbert space
  • 6. B ACKGROUND PARSE T REE K ERNELS A PPROXIMATE T REE K ERNELS R ESULTS Conclusion A PPLICATION D OMAINS Natural Language Processing Web Spam Detection Network Intrusion Detection Information Retreival from structured documents ...
  • 7. B ACKGROUND PARSE T REE K ERNELS A PPROXIMATE T REE K ERNELS R ESULTS Conclusion C OMPUTING PTK A generic technique for defining kernel functions over structured data is the convolution of local kernels defined over sub-structures. Parse Tree kernel k(X, Z) = x∈X z∈Z c(x, z) , Where, X and Z are two parse trees. Notations xi : i-th child of a node x |X|: Number of nodes in X χ: Set of all possible trees
  • 8. B ACKGROUND PARSE T REE K ERNELS A PPROXIMATE T REE K ERNELS R ESULTS Conclusion I LLUSTRATION Figure: Shared subtrees in two parse trees.
  • 9. B ACKGROUND PARSE T REE K ERNELS A PPROXIMATE T REE K ERNELS R ESULTS Conclusion C OUNTING F UNCTION c(x, z) is known as the counting function which recursively determines the number of shared subtrees rooted in the tree nodes x and z. Defining c(x,z)   0 if x,z not derived from same P c(x, z) = λ if x,z are leaf nodes  |x| λ i=1 c(xi , zi ) otherwise 0 ≤ λ ≤ 1 , balances the contribution of subtrees, such that small values of decay the contribution of lower nodes in large subtrees
  • 10. B ACKGROUND PARSE T REE K ERNELS A PPROXIMATE T REE K ERNELS R ESULTS Conclusion C OMPUTATIONAL COMPLEXITY The complexity is (n2 ), where n is the number of nodes in each parse tree. Experimental data The computation of a parse tree kernel for two HTML documents comprising 10,000 nodes each, requires about 1 gigabyte of memory and takes over 100 seconds on a recent computer system. We need to compare a large number of parse trees. Going by the above statistics, the use of PTKs are rendered to be of no practical significance because of the computing resources required.
  • 11. B ACKGROUND PARSE T REE K ERNELS A PPROXIMATE T REE K ERNELS R ESULTS Conclusion ATTEMPTED IMPROVEMENTS A feature selection procedure based on statistical tests. Suzuki et.al. Limiting computation to node pairs with matching grammar symbols. Moschitti
  • 12. B ACKGROUND PARSE T REE K ERNELS A PPROXIMATE T REE K ERNELS R ESULTS Conclusion C OMPUTING ATK Approximation of tree kernels is based on the observation that trees often contain redundant parts that are not only irrelevant for the learning task but also slow-down the kernel computation unnecessarily. Approximate Tree kernel ˆ z∈Z ˜(x, z) k(X,Z)= s∈S w(s) x∈X c , Where, X and Z are two parse trees. l(x)=s l(z)=s Selection function: w : S → 0, 1 Controls whether subtrees rooted in nodes with the symbol s ∈ S contribute to the convolution. (w(s) = 0 or w(s) = 1)
  • 13. B ACKGROUND PARSE T REE K ERNELS A PPROXIMATE T REE K ERNELS R ESULTS Conclusion A PPROXIMATE C OUNTING F UNCTION ˜(x, z) is the approximate counting function c Defining ˜(x, z) c   0  if x,z not derived from same P  0 if x or z not selected ˜(x, z) = c  λ  if x,z are leaf nodes  |x| λ ˜ i=1 c(xi , zi ) otherwise The selection function w(s) is decided based on the domain and data. the exact parse tree kernel is obtained as a special case of ATK if w(s) = 1 for all symbols s ∈ S.
  • 14. B ACKGROUND PARSE T REE K ERNELS A PPROXIMATE T REE K ERNELS R ESULTS Conclusion ATK IS A VALID KERNEL Proof Let Φ(X) be the vector of frequencies of all subtrees occurring ˆ in X. Then, by definition, Kw can always be written as ˆ Kw = Pw Φ(X), Pw Φ(Z) , For any w, the projection Pw is independent of the actual X and ˆ Z, and hence Kw is a valid kernel.
  • 15. B ACKGROUND PARSE T REE K ERNELS A PPROXIMATE T REE K ERNELS R ESULTS Conclusion ATK IS FASTER THAN PTK Speed up factor qw s∈S #s (X)#s (Z) qw = s∈S ws #s (X)#s (Z) Where #s (X) denotes the occurances of nodes x ∈ X that were selected. Looking at the above equation, we can argue that even if only one symbol is rejected in Approximate Tree Kernel, we get a speedup qw ≥ 1.
  • 16. B ACKGROUND PARSE T REE K ERNELS A PPROXIMATE T REE K ERNELS R ESULTS Conclusion S UPERVISED S ETTING Given n labeled parse trees (X1 , y1 ), · · · , (Xn , yn ), where yi are the class labels. An ideal kernel gram matrix Y is given as follows: Yij = [|yi = yj |] − [|yi = yj |] Kernel Target alignment ˆ Y, Kw = ˆ Ki j − ˆ Ki j F yi =yj yi =yj Our target now is to maximize the above term w.r.t w.
  • 17. B ACKGROUND PARSE T REE K ERNELS A PPROXIMATE T REE K ERNELS R ESULTS Conclusion S UPERVISED S ETTING ( CONTD .) Optimization Problem n w = argmaxw∈[0,1]|S| w(S) ˜(x, z) c i,j=1 s∈S x∈Xi z∈Zi i=j l(x)=s l(z)=s subject to, w(s) ≤ N, N∈N s∈S
  • 18. B ACKGROUND PARSE T REE K ERNELS A PPROXIMATE T REE K ERNELS R ESULTS Conclusion U NSUPERVISED S ETTING Average Frequency of Node comparison n 1 f (s) = #s (Xi )#s (Xj ) n2 i,j=1 ExpectedNodeComparisons ComparisonRatio(ρ) = ActualNumberOfComparisonsinPTK
  • 19. B ACKGROUND PARSE T REE K ERNELS A PPROXIMATE T REE K ERNELS R ESULTS Conclusion U NSUPERVISED S ETTING ( CONTD .) Optimization Problem n w = argmaxw∈[0,1]|S| w(S) ˜(x, z) c i,j=1 s∈S x∈Xi z∈Zi i=j l(x)=s l(z)=s subject to, s∈S w(s)f (s) ≤ρ s∈S f (s)
  • 20. B ACKGROUND PARSE T REE K ERNELS A PPROXIMATE T REE K ERNELS R ESULTS Conclusion S YNTHETIC D ATA Figure: Classification performance Figure: Detection performance for for the supervised synthetic data. the unsupervised synthetic data.
  • 21. B ACKGROUND PARSE T REE K ERNELS A PPROXIMATE T REE K ERNELS R ESULTS Conclusion R EAL D ATA Figure: Classification performance Figure: Detection performance for for question classification task. the intrusion detection task (FTP).
  • 22. B ACKGROUND PARSE T REE K ERNELS A PPROXIMATE T REE K ERNELS R ESULTS Conclusion T IME Figure: Training and testing time of SVMs using the exact and the approximate tree kernel.
  • 23. B ACKGROUND PARSE T REE K ERNELS A PPROXIMATE T REE K ERNELS R ESULTS Conclusion T IME ( COND .) Figure: Run-times for web spam (WS) and intrusion detection (ID).
  • 24. B ACKGROUND PARSE T REE K ERNELS A PPROXIMATE T REE K ERNELS R ESULTS Conclusion M EMORY Figure: Memory requirements for web spam (WS) and intrusion detection (ID).
  • 25. B ACKGROUND PARSE T REE K ERNELS A PPROXIMATE T REE K ERNELS R ESULTS Conclusion C ONCLUSION Approximate Parse tree Kernels give us a fast and efficient way to work with parse trees. Improvements in terms of run-time and memory requirements. For large trees, the approximation reduces a single kernel computation from 1 gigabyte to less than 800 kilobytes, accompanied by run-time improvements up to three orders of magnitude. Best results were obtained for Network Intrusion Detection.
  • 26. B ACKGROUND PARSE T REE K ERNELS A PPROXIMATE T REE K ERNELS R ESULTS Conclusion Q UESTIONS Any Questions ???