SlideShare a Scribd company logo
1 of 88
Download to read offline
Adaptive Learning and Mining for Data Streams and
                 Frequent Patterns


                                  Albert Bifet

        Laboratory for Relational Algorithmics, Complexity and Learning LARCA
                 Departament de Llenguatges i Sistemes Informàtics
                          Universitat Politècnica de Catalunya




                Ph.D. dissertation, 24 April 2009
         Advisors: Ricard Gavaldà and José L. Balcázar




LARCA
Future Data Mining




    Future Data Mining
       Structured data
       Find Interesting Patterns
       Predictions
       On-line processing




                                   2 / 59
Mining Evolving Massive Structured Data


                                      The basic problem
                                      Finding interesting structure
                                      on data
                                          Mining massive data
                                          Mining time varying data
                                          Mining on real time
                                          Mining structured data


  The Disintegration of Persistence
        of Memory 1952-54

           Salvador Dalí


                                                                   3 / 59
Data Streams

  Data Streams
     Sequence is potentially infinite
     High amount of data: sublinear space
     High speed of arrival: sublinear time per example
     Once an element from a data stream has been processed
     it is discarded or archived

  Approximation algorithms
     Small error rate with high probability
                                                       ˜
     An algorithm (ε, δ )−approximates F if it outputs F for
     which Pr[|F˜ − F | > εF ] < δ .




                                                               4 / 59
Tree Pattern Mining

                             Given a dataset of trees, find the
                             complete set of frequent subtrees
                                 Frequent Tree Pattern (FT):
                                     Include all the trees whose
                                     support is no less than
                                     min_sup
                                 Closed Frequent Tree Pattern
                                 (CT):
    Trees are sanctuaries.           Include no tree which has a
    Whoever knows how                super-tree with the same
       to listen to them,            support
    can learn the truth.
                                 CT ⊆ FT

      Herman Hesse

                                                               5 / 59
Outline


Mining Evolving   Tree Mining              Mining Evolving Tree
Data Streams       6   Closure Operator    Data Streams
 1   Framework         on Trees             10   Incremental
 2   ADWIN         7   Unlabeled Tree            Method
 3   Classifiers        Mining Methods       11   Sliding Window
                   8   Deterministic             Method
 4   MOA
                       Association Rules    12   Adaptive Method
 5   ASHT
                   9   Implicit Rules       13   Logarithmic
                                                 Relaxed Support
                                            14   XML Classification




                                                                  6 / 59
Outline


Mining Evolving   Tree Mining              Mining Evolving Tree
Data Streams       6   Closure Operator    Data Streams
 1   Framework         on Trees             10   Incremental
 2   ADWIN         7   Unlabeled Tree            Method
 3   Classifiers        Mining Methods       11   Sliding Window
                   8   Deterministic             Method
 4   MOA
                       Association Rules    12   Adaptive Method
 5   ASHT
                   9   Implicit Rules       13   Logarithmic
                                                 Relaxed Support
                                            14   XML Classification




                                                                  6 / 59
Outline


Mining Evolving   Tree Mining              Mining Evolving Tree
Data Streams       6   Closure Operator    Data Streams
 1   Framework         on Trees             10   Incremental
 2   ADWIN         7   Unlabeled Tree            Method
 3   Classifiers        Mining Methods       11   Sliding Window
                   8   Deterministic             Method
 4   MOA
                       Association Rules    12   Adaptive Method
 5   ASHT
                   9   Implicit Rules       13   Logarithmic
                                                 Relaxed Support
                                            14   XML Classification




                                                                  6 / 59
Outline


   1   Introduction

   2   Mining Evolving Data Streams

   3   Tree Mining

   4   Mining Evolving Tree Data Streams

   5   Conclusions




                                           7 / 59
Data Mining Algorithms with Concept Drift

 No Concept Drift                 Concept Drift

          DM Algorithm
                                           DM Algorithm
  input                  output    input                    output
      -
            Counter5       -           -                        -

            Counter4                        Static Model

            Counter3                              6

            Counter2
            Counter1                   -
                                           Change Detect.
                                                            




                                                                8 / 59
Data Mining Algorithms with Concept Drift

 No Concept Drift                 Concept Drift

          DM Algorithm                     DM Algorithm
  input                  output    input                  output
      -
            Counter5       -           -   Estimator5       -

            Counter4                       Estimator4
            Counter3                       Estimator3
            Counter2                       Estimator2
            Counter1                       Estimator1




                                                             8 / 59
Time Change Detectors and Predictors
(1) General Framework




    Problem
    Given an input sequence x1 , x2 , . . . , xt , . . . we want to output at
    instant t
         a prediction xt+1 minimizing prediction error:

                                     |xt+1 − xt+1 |

         an alert if change is detected
    considering distribution changes overtime.




                                                                                9 / 59
Time Change Detectors and Predictors
(1) General Framework



                                       Estimation
                                       -
     xt
           -
                 Estimator




                                             10 / 59
Time Change Detectors and Predictors
(1) General Framework



                                                  Estimation
                                                  -
     xt
           -
                 Estimator                        Alarm
                             -                    -
                                 Change Detect.




                                                          10 / 59
Time Change Detectors and Predictors
(1) General Framework



                                                          Estimation
                                                          -
     xt
           -
                 Estimator                                Alarm
                                     -                    -
                                         Change Detect.
                        6
                                           6
                                           ?
                -
                            Memory




                                                                  10 / 59
Optimal Change Detector and Predictor
(1) General Framework




         High accuracy
         Fast detection of change
         Low false positives and false negatives ratios
         Low computational cost: minimum space and time needed
         Theoretical guarantees
         No parameters needed
         Estimator with Memory and Change Detector




                                                                 11 / 59
Algorithm ADaptive Sliding WINdow
(2) ADWIN
    Example
    W = 101010110111111
    W0 = 1



    ADWIN: A DAPTIVE W INDOWING A LGORITHM
    1 Initialize Window W
    2 for each t  0
    3       do W ← W ∪ {xt } (i.e., add xt to the head of W )
    4           repeat Drop elements from the tail of W
    5             until |µW0 − µW1 |  εc holds
                         ˆ     ˆ
    6               for every split of W into W = W0 · W1
    7                   ˆ
                Output µW


                                                                12 / 59
Algorithm ADaptive Sliding WINdow
(2) ADWIN
    Example
    W = 101010110111111
    W0 = 1 W1 = 01010110111111



    ADWIN: A DAPTIVE W INDOWING A LGORITHM
    1 Initialize Window W
    2 for each t  0
    3       do W ← W ∪ {xt } (i.e., add xt to the head of W )
    4           repeat Drop elements from the tail of W
    5             until |µW0 − µW1 |  εc holds
                         ˆ     ˆ
    6               for every split of W into W = W0 · W1
    7                   ˆ
                Output µW


                                                                12 / 59
Algorithm ADaptive Sliding WINdow
(2) ADWIN
    Example
    W = 101010110111111
    W0 = 10 W1 = 1010110111111



    ADWIN: A DAPTIVE W INDOWING A LGORITHM
    1 Initialize Window W
    2 for each t  0
    3       do W ← W ∪ {xt } (i.e., add xt to the head of W )
    4           repeat Drop elements from the tail of W
    5             until |µW0 − µW1 |  εc holds
                         ˆ     ˆ
    6               for every split of W into W = W0 · W1
    7                   ˆ
                Output µW


                                                                12 / 59
Algorithm ADaptive Sliding WINdow
(2) ADWIN
    Example
    W = 101010110111111
    W0 = 101 W1 = 010110111111



    ADWIN: A DAPTIVE W INDOWING A LGORITHM
    1 Initialize Window W
    2 for each t  0
    3       do W ← W ∪ {xt } (i.e., add xt to the head of W )
    4           repeat Drop elements from the tail of W
    5             until |µW0 − µW1 |  εc holds
                         ˆ     ˆ
    6               for every split of W into W = W0 · W1
    7                   ˆ
                Output µW


                                                                12 / 59
Algorithm ADaptive Sliding WINdow
(2) ADWIN
    Example
    W = 101010110111111
    W0 = 1010 W1 = 10110111111



    ADWIN: A DAPTIVE W INDOWING A LGORITHM
    1 Initialize Window W
    2 for each t  0
    3       do W ← W ∪ {xt } (i.e., add xt to the head of W )
    4           repeat Drop elements from the tail of W
    5             until |µW0 − µW1 |  εc holds
                         ˆ     ˆ
    6               for every split of W into W = W0 · W1
    7                   ˆ
                Output µW


                                                                12 / 59
Algorithm ADaptive Sliding WINdow
(2) ADWIN
    Example
    W = 101010110111111
    W0 = 10101 W1 = 0110111111



    ADWIN: A DAPTIVE W INDOWING A LGORITHM
    1 Initialize Window W
    2 for each t  0
    3       do W ← W ∪ {xt } (i.e., add xt to the head of W )
    4           repeat Drop elements from the tail of W
    5             until |µW0 − µW1 |  εc holds
                         ˆ     ˆ
    6               for every split of W into W = W0 · W1
    7                   ˆ
                Output µW


                                                                12 / 59
Algorithm ADaptive Sliding WINdow
(2) ADWIN
    Example
    W = 101010110111111
    W0 = 101010 W1 = 110111111



    ADWIN: A DAPTIVE W INDOWING A LGORITHM
    1 Initialize Window W
    2 for each t  0
    3       do W ← W ∪ {xt } (i.e., add xt to the head of W )
    4           repeat Drop elements from the tail of W
    5             until |µW0 − µW1 |  εc holds
                         ˆ     ˆ
    6               for every split of W into W = W0 · W1
    7                   ˆ
                Output µW


                                                                12 / 59
Algorithm ADaptive Sliding WINdow
(2) ADWIN
    Example
    W = 101010110111111
    W0 = 1010101 W1 = 10111111



    ADWIN: A DAPTIVE W INDOWING A LGORITHM
    1 Initialize Window W
    2 for each t  0
    3       do W ← W ∪ {xt } (i.e., add xt to the head of W )
    4           repeat Drop elements from the tail of W
    5             until |µW0 − µW1 |  εc holds
                         ˆ     ˆ
    6               for every split of W into W = W0 · W1
    7                   ˆ
                Output µW


                                                                12 / 59
Algorithm ADaptive Sliding WINdow
(2) ADWIN
    Example
    W = 101010110111111
    W0 = 10101011 W1 = 0111111



    ADWIN: A DAPTIVE W INDOWING A LGORITHM
    1 Initialize Window W
    2 for each t  0
    3       do W ← W ∪ {xt } (i.e., add xt to the head of W )
    4           repeat Drop elements from the tail of W
    5             until |µW0 − µW1 |  εc holds
                         ˆ     ˆ
    6               for every split of W into W = W0 · W1
    7                   ˆ
                Output µW


                                                                12 / 59
Algorithm ADaptive Sliding WINdow
(2) ADWIN
    Example
    W = 101010110111111 |µW0 − µW1 | ≥ εc : CHANGE DET.!
                          ˆ    ˆ
    W0 = 101010110 W1 = 111111



    ADWIN: A DAPTIVE W INDOWING A LGORITHM
    1 Initialize Window W
    2 for each t  0
    3       do W ← W ∪ {xt } (i.e., add xt to the head of W )
    4           repeat Drop elements from the tail of W
    5             until |µW0 − µW1 |  εc holds
                         ˆ     ˆ
    6               for every split of W into W = W0 · W1
    7                   ˆ
                Output µW


                                                                12 / 59
Algorithm ADaptive Sliding WINdow
(2) ADWIN
    Example
    W = 101010110111111 Drop elements from the tail of W
    W0 = 101010110 W1 = 111111



    ADWIN: A DAPTIVE W INDOWING A LGORITHM
    1 Initialize Window W
    2 for each t  0
    3       do W ← W ∪ {xt } (i.e., add xt to the head of W )
    4           repeat Drop elements from the tail of W
    5             until |µW0 − µW1 |  εc holds
                         ˆ     ˆ
    6               for every split of W into W = W0 · W1
    7                   ˆ
                Output µW


                                                                12 / 59
Algorithm ADaptive Sliding WINdow
(2) ADWIN
    Example
    W = 01010110111111 Drop elements from the tail of W
    W0 = 101010110 W1 = 111111



    ADWIN: A DAPTIVE W INDOWING A LGORITHM
    1 Initialize Window W
    2 for each t  0
    3       do W ← W ∪ {xt } (i.e., add xt to the head of W )
    4           repeat Drop elements from the tail of W
    5             until |µW0 − µW1 |  εc holds
                         ˆ     ˆ
    6               for every split of W into W = W0 · W1
    7                   ˆ
                Output µW


                                                                12 / 59
Algorithm ADaptive Sliding WINdow
(2) ADWIN


    Theorem
    At every time step we have:
      1     (False positive rate bound). If µt remains constant within
            W , the probability that ADWIN shrinks the window at this
            step is at most δ .
      2     (False negative rate bound). Suppose that for some
            partition of W in two parts W0 W1 (where W1 contains the
            most recent items) we have |µW0 − µW1 |  2εc . Then with
            probability 1 − δ ADWIN shrinks W to W1 , or shorter.

    ADWIN tunes itself to the data stream at hand, with no need for
    the user to hardwire or precompute parameters.


                                                                         13 / 59
Algorithm ADaptive Sliding WINdow
(2) ADWIN

    ADWIN using a Data Stream Sliding Window Model,
            can provide the exact counts of 1’s in O(1) time per point.
            tries O(log W ) cutpoints
            uses O( 1 log W ) memory words
                    ε
            the processing time per example is O(log W ) (amortized
            and worst-case).

                             Sliding Window Model

                   1010101 101 11 1 1
    Content:             4      2       2   1 1
    Capacity:            7      3       2   1 1



                                                                          14 / 59
K-ADWIN = ADWIN + Kalman Filtering
(2) ADWIN


                                                          Estimation
                                                         -
     xt
            -
                    Kalman                                 Alarm
                                 -                       -
                                          ADWIN
                       6
                                      6
                                      ?
                -
                       ADWIN Memory


    R = W 2 /50 and Q = 200/W (theoretically justifiable), where W
    is the length of the window maintained by ADWIN.



                                                                    15 / 59
Classification
(3) Mining Algorithms




    Definition
    Given nC different classes, a classifier algorithm builds a model
    that predicts for every unlabeled instance I the class C to which
    it belongs with accuracy.

    Classification Mining Algorithms
          Naïve Bayes
          Decision Trees
          Ensemble Methods




                                                                    16 / 59
Hoeffding Tree / CVFDT
(3) Mining Algorithms
    Hoeffding Tree : VFDT

         Pedro Domingos and Geoff Hulten.
         Mining high-speed data streams. 2000
          With high probability, constructs an identical model that a
          traditional (greedy) method would learn
          With theoretical guarantees on the error rate

                                         Time
                                  Day            Night
                         Contains “Money”          YES
                         Yes            No
                        YES              NO

                                                                        17 / 59
VFDT / CVFDT
(3) Mining Algorithms


    Concept-adapting Very Fast Decision Trees: CVFDT

         G. Hulten, L. Spencer, and P. Domingos.
         Mining time-changing data streams. 2001
          It keeps its model consistent with a sliding window of
          examples
          Construct “alternative branches” as preparation for
          changes
          If the alternative branch becomes more accurate, switch of
          tree branches occurs




                                                                       18 / 59
Decision Trees: CVFDT
(3) Mining Algorithms

                                         Time
                                  Day            Night
                         Contains “Money”          YES
                         Yes            No
                        YES              NO

    No theoretical guarantees on the error rate of CVFDT
    CVFDT parameters :
       1   W : is the example window size.
       2   T0 : number of examples used to check at each node if the
           splitting attribute is still the best.
       3   T1 : number of examples used to build the alternate tree.
       4   T2 : number of examples used to test the accuracy of the
           alternate tree.
                                                                       19 / 59
Decision Trees: Hoeffding Adaptive Tree
(3) Mining Algorithms


    Hoeffding Adaptive Tree:
           replace frequency statistics counters by estimators
               don’t need a window to store examples, due to the fact that
               we maintain the statistics data needed with estimators
           change the way of checking the substitution of alternate
           subtrees, using a change detector with theoretical
           guarantees

    Summary:
       1   Theoretical guarantees
       2   No Parameters



                                                                             20 / 59
What is MOA?

  {M}assive {O}nline {A}nalysis is a framework for online learning
  from data streams.




      It is closely related to WEKA
      It includes a collection of offline and online as well as tools
      for evaluation:
           boosting and bagging
           Hoeffding Trees
      with and without Naïve Bayes classifiers at the leaves.



                                                                       21 / 59
Ensemble Methods
(4) MOA for Evolving Data Streams

         http://www.cs.waikato.ac.nz/∼abifet/MOA/




    New ensemble methods:
         ADWIN bagging: When a change is detected, the worst
         classifier is removed and a new classifier is added.
         Adaptive-Size Hoeffding Tree bagging

                                                               22 / 59
Adaptive-Size Hoeffding Tree
(5) ASHT

                   T1       T2          T3            T4




    Ensemble of trees of different size
           smaller trees adapt more quickly to changes,
           larger trees do better during periods with little change
           diversity
                                                                      23 / 59
Adaptive-Size Hoeffding Tree
(5) ASHT




                0,3                                                      0,28

               0,29
                                                                        0,275
               0,28

               0,27
                                                                         0,27
               0,26
       Error




                                                                Error
               0,25                                                     0,265

               0,24
                                                                         0,26
               0,23

               0,22
                                                                        0,255
               0,21

                0,2                                                      0,25
                      0   0,1   0,2    0,3    0,4   0,5   0,6                   0,1   0,12   0,14   0,16   0,18    0,2    0,22   0,24   0,26   0,28   0,3
                                      Kappa                                                                       Kappa




    Figure: Kappa-Error diagrams for ASHT bagging (left) and bagging
    (right) on dataset RandomRBF with drift, plotting 90 pairs of
    classifiers.




                                                                                                                                                            24 / 59
Adaptive-Size Hoeffding Tree
(5) ASHT




     Figure: Accuracy and size on dataset LED with three concept drifts.

                                                                           25 / 59
Main contributions (i)
Mining Evolving Data Streams




      1   General Framework for Time Change Detectors and
          Predictors
      2   ADWIN
      3   Mining methods: Naive Bayes, Decision Trees, Ensemble
          Methods
      4   MOA for Evolving Data Streams
      5   Adaptive-Size Hoeffding Tree




                                                                  26 / 59
Outline


   1   Introduction

   2   Mining Evolving Data Streams

   3   Tree Mining

   4   Mining Evolving Tree Data Streams

   5   Conclusions




                                           27 / 59
Mining Closed Frequent Trees

  Our trees are:                   Our subtrees are:
      Labeled and Unlabeled             Induced
      Ordered and Unordered             Top-down

                    Two different ordered trees
                   but the same unordered tree




                                                       28 / 59
A tale of two trees


 Consider D = {A, B}, where   Frequent subtrees
                                         A   B
      A:




               B:




 and let min_sup = 2.


                                                  29 / 59
A tale of two trees


 Consider D = {A, B}, where   Closed subtrees
                                         A      B
      A:




               B:




 and let min_sup = 2.


                                                    29 / 59
Mining Closed Unordered Subtrees
(6) Unlabeled Closed Frequent Tree Method




    C LOSED _S UBTREES(t, D, min_sup, T )
     1
     2
     3 for every t that can be extended from t in one step
     4      do if Support(t ) ≥ min_sup
     5            then T ← C LOSED _S UBTREES(t , D, min_sup, T )
     6
     7
     8
     9
    10 return T



                                                                30 / 59
Mining Closed Unordered Subtrees
(6) Unlabeled Closed Frequent Tree Method




    C LOSED _S UBTREES(t, D, min_sup, T )
     1 if not C ANONICAL _R EPRESENTATIVE(t)
     2    then return T
     3 for every t that can be extended from t in one step
     4      do if Support(t ) ≥ min_sup
     5            then T ← C LOSED _S UBTREES(t , D, min_sup, T )
     6
     7
     8
     9
    10 return T



                                                                30 / 59
Mining Closed Unordered Subtrees
(6) Unlabeled Closed Frequent Tree Method




    C LOSED _S UBTREES(t, D, min_sup, T )
     1   if not C ANONICAL _R EPRESENTATIVE(t)
     2       then return T
     3   for every t that can be extended from t in one step
     4          do if Support(t ) ≥ min_sup
     5                then T ← C LOSED _S UBTREES(t , D, min_sup, T )
     6          do if Support(t ) = Support(t)
     7                then t is not closed
     8   if t is closed
     9       then insert t into T
    10   return T



                                                                    30 / 59
Example
D = {A, B}            A = (0, 1, 2, 3, 2, 1)     B = (0, 1, 2, 3, 1, 2, 2)

min_sup = 2.


                                        (0, 1, 2, 1)

                                                         (0, 1, 2, 2, 1)
                          (0, 1, 1)

             (0, 1)                     (0, 1, 2, 2)
  (0)
                          (0, 1, 2)                      (0, 1, 2, 3, 1)

                                        (0, 1, 2, 3)




                                                                       31 / 59
Example
D = {A, B}            A = (0, 1, 2, 3, 2, 1)     B = (0, 1, 2, 3, 1, 2, 2)

min_sup = 2.


                                        (0, 1, 2, 1)

                                                         (0, 1, 2, 2, 1)
                          (0, 1, 1)

             (0, 1)                     (0, 1, 2, 2)
  (0)
                          (0, 1, 2)                      (0, 1, 2, 3, 1)

                                        (0, 1, 2, 3)




                                                                       31 / 59
Experimental results
(6) Unlabeled Closed Frequent Tree Method




       TreeNat                              CMTreeMiner
            Unlabeled Trees                    Labeled Trees
            Top-Down Subtrees                  Induced Subtrees
            No Occurrences                     Occurrences
                                                                  32 / 59
Closure Operator on Trees
(7) Closure Operator
          D: the finite input dataset of trees
          T : the (infinite) set of all trees

    Definition
    We define the following the Galois connection pair:
          For finite A ⊆ D
               σ (A) is the set of subtrees of the A trees in T

                              σ (A) = {t ∈ T   ∀ t ∈ A (t   t )}
          For finite B ⊂ T
               τD (B) is the set of supertrees of the B trees in D

                             τD (B) = {t ∈ D ∀ t ∈ B (t     t )}

    Closure Operator
    The composition ΓD = σ ◦ τD is a closure operator.
                                                                     33 / 59
Galois Lattice of closed set of trees
(7) Closure Operator




                       1    2           3




                       12         13    23




                            123
                                             34 / 59
Galois Lattice of closed set of trees




                           1            2          3

             D


       B={       }        12                  13   23




                                        123
                                                       35 / 59
Galois Lattice of closed set of trees

       B={      }



                           1            2          3




  τD (B) = {    ,    }
                          12                  13   23




                                        123
                                                       35 / 59
Galois Lattice of closed set of trees

       B={         }



                                   1               2          3




  τD (B) = {       ,      }
                                  12                     13   23




  ΓD (B) = σ ◦τD(B) = {       and its subtrees }

                                                   123
                                                                  35 / 59
Mining Implications from Lattices of Closed Trees
(8) Association Rules

    Problem
    Given a dataset D of rooted, unlabeled and unordered trees,
    find a “basis”: a set of rules that are sufficient to infer all the
    rules that hold in the dataset D.


                                            ∧     →




                                            ∧     →




                                                  →



                   D
                                            ∧     →




                                                                        36 / 59
Mining Implications from Lattices of Closed Trees
  Set of Rules:

        A → ΓD (A).

      antecedents are                2
                           1                        3
      obtained through a
      computation akin
      to a hypergraph
      transversal
      consequents
                           12             13        23
      follow from an
      application of the
      closure operators



                                    123
                                                        37 / 59
Mining Implications from Lattices of Closed Trees
  Set of Rules:

         A → ΓD (A).

     ∧     →
                         1           2              3
     ∧     →




           →


                         12               13        23
     ∧     →




                                    123
                                                        37 / 59
Association Rule Computation Example
(8) Association Rules




       1                2          3




      12                      13   23




                        123
                                        38 / 59
Association Rule Computation Example
(8) Association Rules




       1                2          3




      12                      13   23




                        123
                                        38 / 59
Association Rule Computation Example
(8) Association Rules



                                        →



       1                2          3




      12                      13   23




                        123
                                            38 / 59
Association Rule Computation Example
(8) Association Rules



                                        →



       1                2          3




      12                      13   23




                        123
                                            38 / 59
Model transformation
(8) Association Rules



    Intuition
          One propositional variable vt is assigned to each possible
          subtree t.
          A set of trees A corresponds in a natural way to a model
          mA .
          Let mA be a model: we impose on mA the constraints that if
          mA (vt ) = 1 for a variable vt , then mA (vt ) = 1 for all those
          variables vt such that vt represents a subtree of the tree
          represented by vt .

                        R0 = {vt → vt t     t, t ∈ U , t ∈ U }




                                                                         39 / 59
Implicit Rules Definition
(9) Implicit Rules



                                                      Implicit Rule



                                                     ∧            →


                     D

     Given three trees t1 , t2 , t3 , we say that t1 ∧ t2 → t3 , is an implicit
     Horn rule (abbreviately, an implicit rule) if for every tree t it holds

                           t1   t ∧ t2     t ↔ t3     t.

     t1 and t2 have implicit rules if t1 ∧ t2 → t is an implicit rule for
     some t.
                                                                              40 / 59
Implicit Rules Definition
(9) Implicit Rules



                                                    NOT Implicit Rule



                                                    ∧            →


                     D

     Given three trees t1 , t2 , t3 , we say that t1 ∧ t2 → t3 , is an implicit
     Horn rule (abbreviately, an implicit rule) if for every tree t it holds

                           t1   t ∧ t2     t ↔ t3       t.

     t1 and t2 have implicit rules if t1 ∧ t2 → t is an implicit rule for
     some t.
                                                                              40 / 59
Implicit Rules Definition
(9) Implicit Rules




                                 NOT Implicit Rule



                                 ∧          →




         This supertree of the
         antecedents is NOT a
            supertree of the
             consequents.


                                                     40 / 59
Implicit Rules Characterization
(9) Implicit Rules
     Theorem
     All trees a, b such that a   b have implicit rules.

     Theorem
     Suppose that b has only one component. Then they have
     implicit rules if and only if a has a maximum component which
     is a subtree of the component of b.
           for all i  n
                                  ai        an   b1



                                  ∧          →
      a1       ···    an−1   an        b1        a1   ···   an−1   b1

                                                                        41 / 59
Main contributions (ii)
Tree Mining




      6   Closure Operator on Trees
      7   Unlabeled Closed Frequent Tree Mining
      8   A way of extracting high-confidence association rules from
          datasets consisting of unlabeled trees
              antecedents are obtained through a computation akin to a
              hypergraph transversal
              consequents follow from an application of the closure
              operators
      9   Detection of some cases of implicit rules: rules that
          always hold, independently of the dataset




                                                                         42 / 59
Outline


   1   Introduction

   2   Mining Evolving Data Streams

   3   Tree Mining

   4   Mining Evolving Tree Data Streams

   5   Conclusions




                                           43 / 59
Mining Evolving Tree Data Streams
(10,11,12) Incremental, Sliding Window, and Adaptive Tree Mining Methods



    Problem
    Given a data stream D of rooted, unlabeled and unordered
    trees, find frequent closed trees.


                                            We provide three algorithms,
                                            of increasing power
                                                  Incremental
                                                  Sliding Window
                                                  Adaptive

                  D



                                                                           44 / 59
Relaxed Support
(13) Logarithmic Relaxed Support


         Guojie Song, Dongqing Yang, Bin Cui, Baihua Zheng,
         Yunfeng Liu and Kunqing Xie.
         CLAIM: An Efficient Method for Relaxed Frequent Closed
         Itemsets Mining over Stream Data
         Linear Relaxed Interval:The support space of all
         subpatterns can be divided into n = 1/εr intervals, where
         εr is a user-specified relaxed factor, and each interval can
         be denoted by Ii = [li , ui ), where li = (n − i) ∗ εr ≥ 0,
         ui = (n − i + 1) ∗ εr ≤ 1 and i ≤ n.
         Linear Relaxed closed subpattern t: if and only if there
         exists no proper superpattern t of t such that their suports
         belong to the same interval Ii .



                                                                        45 / 59
Relaxed Support
(13) Logarithmic Relaxed Support



    As the number of closed frequent patterns is not linear with
    respect support, we introduce a new relaxed support:
         Logarithmic Relaxed Interval:The support space of all
         subpatterns can be divided into n = 1/εr intervals, where
         εr is a user-specified relaxed factor, and each interval can
         be denoted by Ii = [li , ui ), where li = c i , ui = c i+1 − 1
         and i ≤ n.
         Logarithmic Relaxed closed subpattern t: if and only if
         there exists no proper superpattern t of t such that their
         suports belong to the same interval Ii .




                                                                          45 / 59
Algorithms
(10,11,12) Incremental, Sliding Window, and Adaptive Tree Mining Methods

    Algorithms
          Incremental: I NC T REE N AT
          Sliding Window: W IN T REE N AT
          Adaptive: A DAT REE N AT Uses ADWIN to monitor change

    ADWIN
    An adaptive sliding window whose size is recomputed online
    according to the rate of change observed.

    ADWIN has rigorous guarantees (theorems)
          On ratio of false positives and false negatives
          On the relation of the size of the current window and
          change rates

                                                                           46 / 59
Experimental Validation: TN1
(10,11,12) Incremental, Sliding Window, and Adaptive Tree Mining Methods




                                                    CMTreeMiner
                    300

             Time 200
            (sec.)
                   100
                                                     I NC T REE N AT
                                    2        4          6              8
                                         Size (Milions)

            Figure: Experiments on ordered trees with TN1 dataset




                                                                           47 / 59
Adaptive XML Tree Classification on evolving data
streams
(14) XML Tree Classification

                    D               D               D           D

                    B           B       B       B       B   B       B

                C       C       C           C       C       C

                A                                           A

               C LASS 1        C LASS 2         C LASS 1    C LASS 2


                                            D

                              Figure: A dataset example

                                                                        48 / 59
Adaptive XML Tree Classification on evolving data
streams
(14) XML Tree Classification

                                                        Tree Trans.
                Closed        Freq. not Closed Trees   1 2 3 4
                     D
                     B                    B

          c1     C       C            C       C        1   0   1   0
                     D
                     B            B       C       A
                     C            C       A

          c2         A            A                    1   0   0   1


                                                                       49 / 59
Adaptive XML Tree Classification on evolving data
streams
(14) XML Tree Classification
                                 Frequent Trees
                c1          c2         c3            c4
          Id       1
               c1 f1        1 2 3
                        c2 f2 f2 f2       1
                                      c3 f3      1 2 3    4 5
                                             c4 f4 f4 f4 f4 f4
           1   1 1     1 1 1 1 0 0 1 1 1 1 1 1
           2   0 0     0 0 0 0 1 1 1 1 1 1 1 1
           3   1 1     0 0 0 0 1 1 1 1 1 1 1 1
           4   0 0     1 1 1 1 1 1 1 1 1 1 1 1

                              Closed         Maximal
                               Trees          Trees
           Id Tree     c1     c2 c3    c4   c1 c2 c3    Class
              1        1      1 0      1    1 1 0      C LASS 1
              2        0      0 1      1    0 0 1      C LASS 2
              3        1      0 1      1    1 0 1      C LASS 1
              4        0      1 1      1    0 1 1      C LASS 2
                                                                  50 / 59
Adaptive XML Tree Framework on evolving data
streams
(14) XML Tree Classification



    XML Tree Classification Framework Components
          An XML closed frequent tree miner
          A Data stream classifier algorithm, which we will feed with
          tuples to be classified online.




                                                                       51 / 59
Adaptive XML Tree Framework on evolving data
streams
(14) XML Tree Classification




                                      Maximal                 Closed
                     # Trees   Att.   Acc.    Mem.     Att.   Acc.     Mem.
      CSLOG12         15483    84     79.64      1.2   228    78.12    2.54
      CSLOG23         15037    88     79.81     1.21   243    78.77    2.75
      CSLOG31         15702    86     79.94     1.25   243    77.60    2.73
      CSLOG123        23111    84     80.02      1.7   228    78.91    4.18
                     Table: BAGGING on unordered trees.




                                                                        52 / 59
Main contributions (iii)
Mining Evolving Tree Data Streams




      10   Incremental Method
      11   Sliding Window Method
      12   Adaptive Method
      13   Logarithmic Relaxed Support
      14   XML Classification




                                         53 / 59
Outline


   1   Introduction

   2   Mining Evolving Data Streams

   3   Tree Mining

   4   Mining Evolving Tree Data Streams

   5   Conclusions




                                           54 / 59
Main contributions


Mining Evolving   Tree Mining              Mining Evolving Tree
Data Streams       6   Closure Operator    Data Streams
 1   Framework         on Trees             10   Incremental
 2   ADWIN         7   Unlabeled Tree            Method
 3   Classifiers        Mining Methods       11   Sliding Window
                   8   Deterministic             Method
 4   MOA
                       Association Rules    12   Adaptive Method
 5   ASHT
                   9   Implicit Rules       13   Logarithmic
                                                 Relaxed Support
                                            14   XML Classification




                                                                  55 / 59
Future Lines (i)



      Adaptive Kalman Filter
          Kalman filter adaptive computing Q and R without using the
          size of the window of ADWIN.

      Extend MOA framework
          Support vector machines
          Clustering
          Itemset mining
          Association rules




                                                                      56 / 59
Future Lines (ii)


       Adaptive Deterministic Association Rules
           Deterministic Association Rules computed on evolving data
           streams

       General Implicit Rules Characterization
           Find a characterization of implicit rules with any number of
           components

       Not Deterministic Association Rules
           Find basis of association rules for trees with confidence
           lower than 100%




                                                                          57 / 59
Future Lines (iii)


       Closed Frequent Graph Mining
           Mining methods to obtain closed frequent graphs.
               Not incremental
               Incremental
               Sliding Window
               Adaptive

       Graph Classification
           Classifiers of graphs using maximal and closed frequent
           subgraphs.




                                                                    58 / 59
Relevant publications
      Albert Bifet and Ricard Gavaldà.
      Kalman filters and adaptive windows for learning in data streams. DS’06
      Albert Bifet and Ricard Gavaldà.
      Learning from time-changing data with adaptive windowing. SDM’07
      Albert Bifet and Ricard Gavaldà.
      Adaptive parameter-free learning from evolving data streams. Tech-Rep R09-9
      A. Bifet, G. Holmes, B. Pfahringer, R. Kirkby, and R. Gavaldà.
      New ensemble methods for evolving data streams. KDD’09
      José L. Balcázar, Albert Bifet, and Antoni Lozano.
      Mining frequent closed unordered trees through natural representations. ICCS’07
      José L. Balcázar, Albert Bifet, and Antoni Lozano.
      Subtree testing and closed tree mining through natural representations. DEXA’07
      José L. Balcázar, Albert Bifet, and Antoni Lozano.
      Mining implications from lattices of closed trees. EGC’2008
      José L. Balcázar, Albert Bifet, and Antoni Lozano.
      Mining Frequent Closed Rooted Trees. MLJ’09
      Albert Bifet and Ricard Gavaldà.
      Mining adaptively frequent closed unlabeled rooted trees in data streams. KDD’08
      Albert Bifet and Ricard Gavaldà.
      Adaptive XML Tree Classification on evolving data streams
                                                                                        59 / 59

More Related Content

What's hot

[DL輪読会]Convolutional Conditional Neural Processesと Neural Processes Familyの紹介
[DL輪読会]Convolutional Conditional Neural Processesと Neural Processes Familyの紹介[DL輪読会]Convolutional Conditional Neural Processesと Neural Processes Familyの紹介
[DL輪読会]Convolutional Conditional Neural Processesと Neural Processes Familyの紹介Deep Learning JP
 
Graph Attention Network
Graph Attention NetworkGraph Attention Network
Graph Attention NetworkTakahiro Kubo
 
Trend of 3D object detections
Trend of 3D object detectionsTrend of 3D object detections
Trend of 3D object detectionsEiji Sekiya
 
Crowd Counting & Detection論文紹介
Crowd Counting & Detection論文紹介Crowd Counting & Detection論文紹介
Crowd Counting & Detection論文紹介Plot Hong
 
機械学習ゴリゴリ派のための数学とPython
機械学習ゴリゴリ派のための数学とPython機械学習ゴリゴリ派のための数学とPython
機械学習ゴリゴリ派のための数学とPythonKimikazu Kato
 
ブースティング入門
ブースティング入門ブースティング入門
ブースティング入門Retrieva inc.
 
量子コンピュータでニューラルネットワークな論文紹介
量子コンピュータでニューラルネットワークな論文紹介量子コンピュータでニューラルネットワークな論文紹介
量子コンピュータでニューラルネットワークな論文紹介Takatomo Torigoe
 
感覚運動随伴性、予測符号化、そして自由エネルギー原理 (Sensory-Motor Contingency, Predictive Coding and ...
感覚運動随伴性、予測符号化、そして自由エネルギー原理 (Sensory-Motor Contingency, Predictive Coding and ...感覚運動随伴性、予測符号化、そして自由エネルギー原理 (Sensory-Motor Contingency, Predictive Coding and ...
感覚運動随伴性、予測符号化、そして自由エネルギー原理 (Sensory-Motor Contingency, Predictive Coding and ...Masatoshi Yoshida
 
「統計的学習理論」第1章
「統計的学習理論」第1章「統計的学習理論」第1章
「統計的学習理論」第1章Kota Matsui
 
べき乗則・パレート分布・ジップの法則
べき乗則・パレート分布・ジップの法則べき乗則・パレート分布・ジップの法則
べき乗則・パレート分布・ジップの法則Hiroyuki Kuromiya
 
Partial least squares回帰と画像認識への応用
Partial least squares回帰と画像認識への応用Partial least squares回帰と画像認識への応用
Partial least squares回帰と画像認識への応用Shohei Kumagai
 
視覚認知システムにおける知覚と推論
視覚認知システムにおける知覚と推論視覚認知システムにおける知覚と推論
視覚認知システムにおける知覚と推論Shunta Saito
 
Deep Learningを用いた経路予測の研究動向
Deep Learningを用いた経路予測の研究動向Deep Learningを用いた経路予測の研究動向
Deep Learningを用いた経路予測の研究動向HiroakiMinoura
 
[DL輪読会]A closer look at few shot classification
[DL輪読会]A closer look at few shot classification[DL輪読会]A closer look at few shot classification
[DL輪読会]A closer look at few shot classificationDeep Learning JP
 
[DL輪読会]Deep Anomaly Detection Using Geometric Transformations
[DL輪読会]Deep Anomaly Detection Using Geometric Transformations[DL輪読会]Deep Anomaly Detection Using Geometric Transformations
[DL輪読会]Deep Anomaly Detection Using Geometric TransformationsDeep Learning JP
 
[DL輪読会]Libra R-CNN: Towards Balanced Learning for Object Detection
[DL輪読会]Libra R-CNN: Towards Balanced Learning for Object Detection[DL輪読会]Libra R-CNN: Towards Balanced Learning for Object Detection
[DL輪読会]Libra R-CNN: Towards Balanced Learning for Object DetectionDeep Learning JP
 
PRML学習者から入る深層生成モデル入門
PRML学習者から入る深層生成モデル入門PRML学習者から入る深層生成モデル入門
PRML学習者から入る深層生成モデル入門tmtm otm
 
機械学習におけるオンライン確率的最適化の理論
機械学習におけるオンライン確率的最適化の理論機械学習におけるオンライン確率的最適化の理論
機械学習におけるオンライン確率的最適化の理論Taiji Suzuki
 
Semi supervised, weakly-supervised, unsupervised, and active learning
Semi supervised, weakly-supervised, unsupervised, and active learningSemi supervised, weakly-supervised, unsupervised, and active learning
Semi supervised, weakly-supervised, unsupervised, and active learningYusuke Uchida
 
[DL輪読会]Object-Centric Learning with Slot Attention
[DL輪読会]Object-Centric Learning with Slot Attention[DL輪読会]Object-Centric Learning with Slot Attention
[DL輪読会]Object-Centric Learning with Slot AttentionDeep Learning JP
 

What's hot (20)

[DL輪読会]Convolutional Conditional Neural Processesと Neural Processes Familyの紹介
[DL輪読会]Convolutional Conditional Neural Processesと Neural Processes Familyの紹介[DL輪読会]Convolutional Conditional Neural Processesと Neural Processes Familyの紹介
[DL輪読会]Convolutional Conditional Neural Processesと Neural Processes Familyの紹介
 
Graph Attention Network
Graph Attention NetworkGraph Attention Network
Graph Attention Network
 
Trend of 3D object detections
Trend of 3D object detectionsTrend of 3D object detections
Trend of 3D object detections
 
Crowd Counting & Detection論文紹介
Crowd Counting & Detection論文紹介Crowd Counting & Detection論文紹介
Crowd Counting & Detection論文紹介
 
機械学習ゴリゴリ派のための数学とPython
機械学習ゴリゴリ派のための数学とPython機械学習ゴリゴリ派のための数学とPython
機械学習ゴリゴリ派のための数学とPython
 
ブースティング入門
ブースティング入門ブースティング入門
ブースティング入門
 
量子コンピュータでニューラルネットワークな論文紹介
量子コンピュータでニューラルネットワークな論文紹介量子コンピュータでニューラルネットワークな論文紹介
量子コンピュータでニューラルネットワークな論文紹介
 
感覚運動随伴性、予測符号化、そして自由エネルギー原理 (Sensory-Motor Contingency, Predictive Coding and ...
感覚運動随伴性、予測符号化、そして自由エネルギー原理 (Sensory-Motor Contingency, Predictive Coding and ...感覚運動随伴性、予測符号化、そして自由エネルギー原理 (Sensory-Motor Contingency, Predictive Coding and ...
感覚運動随伴性、予測符号化、そして自由エネルギー原理 (Sensory-Motor Contingency, Predictive Coding and ...
 
「統計的学習理論」第1章
「統計的学習理論」第1章「統計的学習理論」第1章
「統計的学習理論」第1章
 
べき乗則・パレート分布・ジップの法則
べき乗則・パレート分布・ジップの法則べき乗則・パレート分布・ジップの法則
べき乗則・パレート分布・ジップの法則
 
Partial least squares回帰と画像認識への応用
Partial least squares回帰と画像認識への応用Partial least squares回帰と画像認識への応用
Partial least squares回帰と画像認識への応用
 
視覚認知システムにおける知覚と推論
視覚認知システムにおける知覚と推論視覚認知システムにおける知覚と推論
視覚認知システムにおける知覚と推論
 
Deep Learningを用いた経路予測の研究動向
Deep Learningを用いた経路予測の研究動向Deep Learningを用いた経路予測の研究動向
Deep Learningを用いた経路予測の研究動向
 
[DL輪読会]A closer look at few shot classification
[DL輪読会]A closer look at few shot classification[DL輪読会]A closer look at few shot classification
[DL輪読会]A closer look at few shot classification
 
[DL輪読会]Deep Anomaly Detection Using Geometric Transformations
[DL輪読会]Deep Anomaly Detection Using Geometric Transformations[DL輪読会]Deep Anomaly Detection Using Geometric Transformations
[DL輪読会]Deep Anomaly Detection Using Geometric Transformations
 
[DL輪読会]Libra R-CNN: Towards Balanced Learning for Object Detection
[DL輪読会]Libra R-CNN: Towards Balanced Learning for Object Detection[DL輪読会]Libra R-CNN: Towards Balanced Learning for Object Detection
[DL輪読会]Libra R-CNN: Towards Balanced Learning for Object Detection
 
PRML学習者から入る深層生成モデル入門
PRML学習者から入る深層生成モデル入門PRML学習者から入る深層生成モデル入門
PRML学習者から入る深層生成モデル入門
 
機械学習におけるオンライン確率的最適化の理論
機械学習におけるオンライン確率的最適化の理論機械学習におけるオンライン確率的最適化の理論
機械学習におけるオンライン確率的最適化の理論
 
Semi supervised, weakly-supervised, unsupervised, and active learning
Semi supervised, weakly-supervised, unsupervised, and active learningSemi supervised, weakly-supervised, unsupervised, and active learning
Semi supervised, weakly-supervised, unsupervised, and active learning
 
[DL輪読会]Object-Centric Learning with Slot Attention
[DL輪読会]Object-Centric Learning with Slot Attention[DL輪読会]Object-Centric Learning with Slot Attention
[DL輪読会]Object-Centric Learning with Slot Attention
 

Similar to Adaptive Learning and Mining for Data Streams and Frequent Patterns

Graphlab dunning-clustering
Graphlab dunning-clusteringGraphlab dunning-clustering
Graphlab dunning-clusteringTed Dunning
 
Data Mining: Mining stream time series and sequence data
Data Mining: Mining stream time series and sequence dataData Mining: Mining stream time series and sequence data
Data Mining: Mining stream time series and sequence dataDatamining Tools
 
Data Mining: Mining stream time series and sequence data
Data Mining: Mining stream time series and sequence dataData Mining: Mining stream time series and sequence data
Data Mining: Mining stream time series and sequence dataDataminingTools Inc
 
Graphlab Ted Dunning Clustering
Graphlab Ted Dunning  ClusteringGraphlab Ted Dunning  Clustering
Graphlab Ted Dunning ClusteringMapR Technologies
 
DSTree: A Tree Structure for the Mining of Frequent Sets from Data Streams
DSTree: A Tree Structure for the Mining of Frequent Sets from Data StreamsDSTree: A Tree Structure for the Mining of Frequent Sets from Data Streams
DSTree: A Tree Structure for the Mining of Frequent Sets from Data StreamsAllenWu
 
LOFAR Global Sky Model database
LOFAR Global Sky Model databaseLOFAR Global Sky Model database
LOFAR Global Sky Model databaseAlexey Mints
 
Using Many-Core Processors to Improve the Performance of Space Computing Plat...
Using Many-Core Processors to Improve the Performance of Space Computing Plat...Using Many-Core Processors to Improve the Performance of Space Computing Plat...
Using Many-Core Processors to Improve the Performance of Space Computing Plat...Fisnik Kraja
 
Cmu Lecture on Hadoop Performance
Cmu Lecture on Hadoop PerformanceCmu Lecture on Hadoop Performance
Cmu Lecture on Hadoop PerformanceTed Dunning
 
Evaluating Classification Algorithms Applied To Data Streams Esteban Donato
Evaluating Classification Algorithms Applied To Data Streams   Esteban DonatoEvaluating Classification Algorithms Applied To Data Streams   Esteban Donato
Evaluating Classification Algorithms Applied To Data Streams Esteban DonatoEsteban Donato
 
Provenance for Data Munging Environments
Provenance for Data Munging EnvironmentsProvenance for Data Munging Environments
Provenance for Data Munging EnvironmentsPaul Groth
 
Huawei Advanced Data Science With Spark Streaming
Huawei Advanced Data Science With Spark StreamingHuawei Advanced Data Science With Spark Streaming
Huawei Advanced Data Science With Spark StreamingJen Aman
 
Sliding window
 Sliding window Sliding window
Sliding windowradhaswam
 
Boston Hug by Ted Dunning 2012
Boston Hug by Ted Dunning 2012Boston Hug by Ted Dunning 2012
Boston Hug by Ted Dunning 2012MapR Technologies
 

Similar to Adaptive Learning and Mining for Data Streams and Frequent Patterns (15)

Graphlab dunning-clustering
Graphlab dunning-clusteringGraphlab dunning-clustering
Graphlab dunning-clustering
 
Data Mining: Mining stream time series and sequence data
Data Mining: Mining stream time series and sequence dataData Mining: Mining stream time series and sequence data
Data Mining: Mining stream time series and sequence data
 
Data Mining: Mining stream time series and sequence data
Data Mining: Mining stream time series and sequence dataData Mining: Mining stream time series and sequence data
Data Mining: Mining stream time series and sequence data
 
Graphlab Ted Dunning Clustering
Graphlab Ted Dunning  ClusteringGraphlab Ted Dunning  Clustering
Graphlab Ted Dunning Clustering
 
DSTree: A Tree Structure for the Mining of Frequent Sets from Data Streams
DSTree: A Tree Structure for the Mining of Frequent Sets from Data StreamsDSTree: A Tree Structure for the Mining of Frequent Sets from Data Streams
DSTree: A Tree Structure for the Mining of Frequent Sets from Data Streams
 
LOFAR Global Sky Model database
LOFAR Global Sky Model databaseLOFAR Global Sky Model database
LOFAR Global Sky Model database
 
Using Many-Core Processors to Improve the Performance of Space Computing Plat...
Using Many-Core Processors to Improve the Performance of Space Computing Plat...Using Many-Core Processors to Improve the Performance of Space Computing Plat...
Using Many-Core Processors to Improve the Performance of Space Computing Plat...
 
Cmu Lecture on Hadoop Performance
Cmu Lecture on Hadoop PerformanceCmu Lecture on Hadoop Performance
Cmu Lecture on Hadoop Performance
 
Evaluating Classification Algorithms Applied To Data Streams Esteban Donato
Evaluating Classification Algorithms Applied To Data Streams   Esteban DonatoEvaluating Classification Algorithms Applied To Data Streams   Esteban Donato
Evaluating Classification Algorithms Applied To Data Streams Esteban Donato
 
Provenance for Data Munging Environments
Provenance for Data Munging EnvironmentsProvenance for Data Munging Environments
Provenance for Data Munging Environments
 
Huawei Advanced Data Science With Spark Streaming
Huawei Advanced Data Science With Spark StreamingHuawei Advanced Data Science With Spark Streaming
Huawei Advanced Data Science With Spark Streaming
 
Hkpark apan030828
Hkpark apan030828Hkpark apan030828
Hkpark apan030828
 
18 Data Streams
18 Data Streams18 Data Streams
18 Data Streams
 
Sliding window
 Sliding window Sliding window
Sliding window
 
Boston Hug by Ted Dunning 2012
Boston Hug by Ted Dunning 2012Boston Hug by Ted Dunning 2012
Boston Hug by Ted Dunning 2012
 

More from Albert Bifet

Artificial intelligence and data stream mining
Artificial intelligence and data stream miningArtificial intelligence and data stream mining
Artificial intelligence and data stream miningAlbert Bifet
 
MOA for the IoT at ACML 2016
MOA for the IoT at ACML 2016 MOA for the IoT at ACML 2016
MOA for the IoT at ACML 2016 Albert Bifet
 
Mining Big Data Streams with APACHE SAMOA
Mining Big Data Streams with APACHE SAMOAMining Big Data Streams with APACHE SAMOA
Mining Big Data Streams with APACHE SAMOAAlbert Bifet
 
Efficient Online Evaluation of Big Data Stream Classifiers
Efficient Online Evaluation of Big Data Stream ClassifiersEfficient Online Evaluation of Big Data Stream Classifiers
Efficient Online Evaluation of Big Data Stream ClassifiersAlbert Bifet
 
Apache Samoa: Mining Big Data Streams with Apache Flink
Apache Samoa: Mining Big Data Streams with Apache FlinkApache Samoa: Mining Big Data Streams with Apache Flink
Apache Samoa: Mining Big Data Streams with Apache FlinkAlbert Bifet
 
Introduction to Big Data Science
Introduction to Big Data ScienceIntroduction to Big Data Science
Introduction to Big Data ScienceAlbert Bifet
 
Introduction to Big Data
Introduction to Big DataIntroduction to Big Data
Introduction to Big DataAlbert Bifet
 
Internet of Things Data Science
Internet of Things Data ScienceInternet of Things Data Science
Internet of Things Data ScienceAlbert Bifet
 
Real Time Big Data Management
Real Time Big Data ManagementReal Time Big Data Management
Real Time Big Data ManagementAlbert Bifet
 
A Short Course in Data Stream Mining
A Short Course in Data Stream MiningA Short Course in Data Stream Mining
A Short Course in Data Stream MiningAlbert Bifet
 
Real-Time Big Data Stream Analytics
Real-Time Big Data Stream AnalyticsReal-Time Big Data Stream Analytics
Real-Time Big Data Stream AnalyticsAlbert Bifet
 
Multi-label Classification with Meta-labels
Multi-label Classification with Meta-labelsMulti-label Classification with Meta-labels
Multi-label Classification with Meta-labelsAlbert Bifet
 
Pitfalls in benchmarking data stream classification and how to avoid them
Pitfalls in benchmarking data stream classification and how to avoid themPitfalls in benchmarking data stream classification and how to avoid them
Pitfalls in benchmarking data stream classification and how to avoid themAlbert Bifet
 
STRIP: stream learning of influence probabilities.
STRIP: stream learning of influence probabilities.STRIP: stream learning of influence probabilities.
STRIP: stream learning of influence probabilities.Albert Bifet
 
Efficient Data Stream Classification via Probabilistic Adaptive Windows
Efficient Data Stream Classification via Probabilistic Adaptive WindowsEfficient Data Stream Classification via Probabilistic Adaptive Windows
Efficient Data Stream Classification via Probabilistic Adaptive WindowsAlbert Bifet
 
Mining Big Data in Real Time
Mining Big Data in Real TimeMining Big Data in Real Time
Mining Big Data in Real TimeAlbert Bifet
 
Mining Big Data in Real Time
Mining Big Data in Real TimeMining Big Data in Real Time
Mining Big Data in Real TimeAlbert Bifet
 
Mining Frequent Closed Graphs on Evolving Data Streams
Mining Frequent Closed Graphs on Evolving Data StreamsMining Frequent Closed Graphs on Evolving Data Streams
Mining Frequent Closed Graphs on Evolving Data StreamsAlbert Bifet
 
PAKDD 2011 TUTORIAL Handling Concept Drift: Importance, Challenges and Solutions
PAKDD 2011 TUTORIAL Handling Concept Drift: Importance, Challenges and SolutionsPAKDD 2011 TUTORIAL Handling Concept Drift: Importance, Challenges and Solutions
PAKDD 2011 TUTORIAL Handling Concept Drift: Importance, Challenges and SolutionsAlbert Bifet
 
Sentiment Knowledge Discovery in Twitter Streaming Data
Sentiment Knowledge Discovery in Twitter Streaming DataSentiment Knowledge Discovery in Twitter Streaming Data
Sentiment Knowledge Discovery in Twitter Streaming DataAlbert Bifet
 

More from Albert Bifet (20)

Artificial intelligence and data stream mining
Artificial intelligence and data stream miningArtificial intelligence and data stream mining
Artificial intelligence and data stream mining
 
MOA for the IoT at ACML 2016
MOA for the IoT at ACML 2016 MOA for the IoT at ACML 2016
MOA for the IoT at ACML 2016
 
Mining Big Data Streams with APACHE SAMOA
Mining Big Data Streams with APACHE SAMOAMining Big Data Streams with APACHE SAMOA
Mining Big Data Streams with APACHE SAMOA
 
Efficient Online Evaluation of Big Data Stream Classifiers
Efficient Online Evaluation of Big Data Stream ClassifiersEfficient Online Evaluation of Big Data Stream Classifiers
Efficient Online Evaluation of Big Data Stream Classifiers
 
Apache Samoa: Mining Big Data Streams with Apache Flink
Apache Samoa: Mining Big Data Streams with Apache FlinkApache Samoa: Mining Big Data Streams with Apache Flink
Apache Samoa: Mining Big Data Streams with Apache Flink
 
Introduction to Big Data Science
Introduction to Big Data ScienceIntroduction to Big Data Science
Introduction to Big Data Science
 
Introduction to Big Data
Introduction to Big DataIntroduction to Big Data
Introduction to Big Data
 
Internet of Things Data Science
Internet of Things Data ScienceInternet of Things Data Science
Internet of Things Data Science
 
Real Time Big Data Management
Real Time Big Data ManagementReal Time Big Data Management
Real Time Big Data Management
 
A Short Course in Data Stream Mining
A Short Course in Data Stream MiningA Short Course in Data Stream Mining
A Short Course in Data Stream Mining
 
Real-Time Big Data Stream Analytics
Real-Time Big Data Stream AnalyticsReal-Time Big Data Stream Analytics
Real-Time Big Data Stream Analytics
 
Multi-label Classification with Meta-labels
Multi-label Classification with Meta-labelsMulti-label Classification with Meta-labels
Multi-label Classification with Meta-labels
 
Pitfalls in benchmarking data stream classification and how to avoid them
Pitfalls in benchmarking data stream classification and how to avoid themPitfalls in benchmarking data stream classification and how to avoid them
Pitfalls in benchmarking data stream classification and how to avoid them
 
STRIP: stream learning of influence probabilities.
STRIP: stream learning of influence probabilities.STRIP: stream learning of influence probabilities.
STRIP: stream learning of influence probabilities.
 
Efficient Data Stream Classification via Probabilistic Adaptive Windows
Efficient Data Stream Classification via Probabilistic Adaptive WindowsEfficient Data Stream Classification via Probabilistic Adaptive Windows
Efficient Data Stream Classification via Probabilistic Adaptive Windows
 
Mining Big Data in Real Time
Mining Big Data in Real TimeMining Big Data in Real Time
Mining Big Data in Real Time
 
Mining Big Data in Real Time
Mining Big Data in Real TimeMining Big Data in Real Time
Mining Big Data in Real Time
 
Mining Frequent Closed Graphs on Evolving Data Streams
Mining Frequent Closed Graphs on Evolving Data StreamsMining Frequent Closed Graphs on Evolving Data Streams
Mining Frequent Closed Graphs on Evolving Data Streams
 
PAKDD 2011 TUTORIAL Handling Concept Drift: Importance, Challenges and Solutions
PAKDD 2011 TUTORIAL Handling Concept Drift: Importance, Challenges and SolutionsPAKDD 2011 TUTORIAL Handling Concept Drift: Importance, Challenges and Solutions
PAKDD 2011 TUTORIAL Handling Concept Drift: Importance, Challenges and Solutions
 
Sentiment Knowledge Discovery in Twitter Streaming Data
Sentiment Knowledge Discovery in Twitter Streaming DataSentiment Knowledge Discovery in Twitter Streaming Data
Sentiment Knowledge Discovery in Twitter Streaming Data
 

Recently uploaded

What is Artificial Intelligence?????????
What is Artificial Intelligence?????????What is Artificial Intelligence?????????
What is Artificial Intelligence?????????blackmambaettijean
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersNicole Novielli
 
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESSALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESmohitsingh558521
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsNathaniel Shimoni
 
Sample pptx for embedding into website for demo
Sample pptx for embedding into website for demoSample pptx for embedding into website for demo
Sample pptx for embedding into website for demoHarshalMandlekar2
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxLoriGlavin3
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rick Flair
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionDilum Bandara
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningLars Bell
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxLoriGlavin3
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Manik S Magar
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfLoriGlavin3
 
unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxBkGupta21
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxLoriGlavin3
 

Recently uploaded (20)

What is Artificial Intelligence?????????
What is Artificial Intelligence?????????What is Artificial Intelligence?????????
What is Artificial Intelligence?????????
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software Developers
 
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESSALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directions
 
Sample pptx for embedding into website for demo
Sample pptx for embedding into website for demoSample pptx for embedding into website for demo
Sample pptx for embedding into website for demo
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptx
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An Introduction
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine Tuning
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdf
 
unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptx
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
 

Adaptive Learning and Mining for Data Streams and Frequent Patterns

  • 1. Adaptive Learning and Mining for Data Streams and Frequent Patterns Albert Bifet Laboratory for Relational Algorithmics, Complexity and Learning LARCA Departament de Llenguatges i Sistemes Informàtics Universitat Politècnica de Catalunya Ph.D. dissertation, 24 April 2009 Advisors: Ricard Gavaldà and José L. Balcázar LARCA
  • 2. Future Data Mining Future Data Mining Structured data Find Interesting Patterns Predictions On-line processing 2 / 59
  • 3. Mining Evolving Massive Structured Data The basic problem Finding interesting structure on data Mining massive data Mining time varying data Mining on real time Mining structured data The Disintegration of Persistence of Memory 1952-54 Salvador Dalí 3 / 59
  • 4. Data Streams Data Streams Sequence is potentially infinite High amount of data: sublinear space High speed of arrival: sublinear time per example Once an element from a data stream has been processed it is discarded or archived Approximation algorithms Small error rate with high probability ˜ An algorithm (ε, δ )−approximates F if it outputs F for which Pr[|F˜ − F | > εF ] < δ . 4 / 59
  • 5. Tree Pattern Mining Given a dataset of trees, find the complete set of frequent subtrees Frequent Tree Pattern (FT): Include all the trees whose support is no less than min_sup Closed Frequent Tree Pattern (CT): Trees are sanctuaries. Include no tree which has a Whoever knows how super-tree with the same to listen to them, support can learn the truth. CT ⊆ FT Herman Hesse 5 / 59
  • 6. Outline Mining Evolving Tree Mining Mining Evolving Tree Data Streams 6 Closure Operator Data Streams 1 Framework on Trees 10 Incremental 2 ADWIN 7 Unlabeled Tree Method 3 Classifiers Mining Methods 11 Sliding Window 8 Deterministic Method 4 MOA Association Rules 12 Adaptive Method 5 ASHT 9 Implicit Rules 13 Logarithmic Relaxed Support 14 XML Classification 6 / 59
  • 7. Outline Mining Evolving Tree Mining Mining Evolving Tree Data Streams 6 Closure Operator Data Streams 1 Framework on Trees 10 Incremental 2 ADWIN 7 Unlabeled Tree Method 3 Classifiers Mining Methods 11 Sliding Window 8 Deterministic Method 4 MOA Association Rules 12 Adaptive Method 5 ASHT 9 Implicit Rules 13 Logarithmic Relaxed Support 14 XML Classification 6 / 59
  • 8. Outline Mining Evolving Tree Mining Mining Evolving Tree Data Streams 6 Closure Operator Data Streams 1 Framework on Trees 10 Incremental 2 ADWIN 7 Unlabeled Tree Method 3 Classifiers Mining Methods 11 Sliding Window 8 Deterministic Method 4 MOA Association Rules 12 Adaptive Method 5 ASHT 9 Implicit Rules 13 Logarithmic Relaxed Support 14 XML Classification 6 / 59
  • 9. Outline 1 Introduction 2 Mining Evolving Data Streams 3 Tree Mining 4 Mining Evolving Tree Data Streams 5 Conclusions 7 / 59
  • 10. Data Mining Algorithms with Concept Drift No Concept Drift Concept Drift DM Algorithm DM Algorithm input output input output - Counter5 - - - Counter4 Static Model Counter3 6 Counter2 Counter1 - Change Detect. 8 / 59
  • 11. Data Mining Algorithms with Concept Drift No Concept Drift Concept Drift DM Algorithm DM Algorithm input output input output - Counter5 - - Estimator5 - Counter4 Estimator4 Counter3 Estimator3 Counter2 Estimator2 Counter1 Estimator1 8 / 59
  • 12. Time Change Detectors and Predictors (1) General Framework Problem Given an input sequence x1 , x2 , . . . , xt , . . . we want to output at instant t a prediction xt+1 minimizing prediction error: |xt+1 − xt+1 | an alert if change is detected considering distribution changes overtime. 9 / 59
  • 13. Time Change Detectors and Predictors (1) General Framework Estimation - xt - Estimator 10 / 59
  • 14. Time Change Detectors and Predictors (1) General Framework Estimation - xt - Estimator Alarm - - Change Detect. 10 / 59
  • 15. Time Change Detectors and Predictors (1) General Framework Estimation - xt - Estimator Alarm - - Change Detect. 6 6 ? - Memory 10 / 59
  • 16. Optimal Change Detector and Predictor (1) General Framework High accuracy Fast detection of change Low false positives and false negatives ratios Low computational cost: minimum space and time needed Theoretical guarantees No parameters needed Estimator with Memory and Change Detector 11 / 59
  • 17. Algorithm ADaptive Sliding WINdow (2) ADWIN Example W = 101010110111111 W0 = 1 ADWIN: A DAPTIVE W INDOWING A LGORITHM 1 Initialize Window W 2 for each t 0 3 do W ← W ∪ {xt } (i.e., add xt to the head of W ) 4 repeat Drop elements from the tail of W 5 until |µW0 − µW1 | εc holds ˆ ˆ 6 for every split of W into W = W0 · W1 7 ˆ Output µW 12 / 59
  • 18. Algorithm ADaptive Sliding WINdow (2) ADWIN Example W = 101010110111111 W0 = 1 W1 = 01010110111111 ADWIN: A DAPTIVE W INDOWING A LGORITHM 1 Initialize Window W 2 for each t 0 3 do W ← W ∪ {xt } (i.e., add xt to the head of W ) 4 repeat Drop elements from the tail of W 5 until |µW0 − µW1 | εc holds ˆ ˆ 6 for every split of W into W = W0 · W1 7 ˆ Output µW 12 / 59
  • 19. Algorithm ADaptive Sliding WINdow (2) ADWIN Example W = 101010110111111 W0 = 10 W1 = 1010110111111 ADWIN: A DAPTIVE W INDOWING A LGORITHM 1 Initialize Window W 2 for each t 0 3 do W ← W ∪ {xt } (i.e., add xt to the head of W ) 4 repeat Drop elements from the tail of W 5 until |µW0 − µW1 | εc holds ˆ ˆ 6 for every split of W into W = W0 · W1 7 ˆ Output µW 12 / 59
  • 20. Algorithm ADaptive Sliding WINdow (2) ADWIN Example W = 101010110111111 W0 = 101 W1 = 010110111111 ADWIN: A DAPTIVE W INDOWING A LGORITHM 1 Initialize Window W 2 for each t 0 3 do W ← W ∪ {xt } (i.e., add xt to the head of W ) 4 repeat Drop elements from the tail of W 5 until |µW0 − µW1 | εc holds ˆ ˆ 6 for every split of W into W = W0 · W1 7 ˆ Output µW 12 / 59
  • 21. Algorithm ADaptive Sliding WINdow (2) ADWIN Example W = 101010110111111 W0 = 1010 W1 = 10110111111 ADWIN: A DAPTIVE W INDOWING A LGORITHM 1 Initialize Window W 2 for each t 0 3 do W ← W ∪ {xt } (i.e., add xt to the head of W ) 4 repeat Drop elements from the tail of W 5 until |µW0 − µW1 | εc holds ˆ ˆ 6 for every split of W into W = W0 · W1 7 ˆ Output µW 12 / 59
  • 22. Algorithm ADaptive Sliding WINdow (2) ADWIN Example W = 101010110111111 W0 = 10101 W1 = 0110111111 ADWIN: A DAPTIVE W INDOWING A LGORITHM 1 Initialize Window W 2 for each t 0 3 do W ← W ∪ {xt } (i.e., add xt to the head of W ) 4 repeat Drop elements from the tail of W 5 until |µW0 − µW1 | εc holds ˆ ˆ 6 for every split of W into W = W0 · W1 7 ˆ Output µW 12 / 59
  • 23. Algorithm ADaptive Sliding WINdow (2) ADWIN Example W = 101010110111111 W0 = 101010 W1 = 110111111 ADWIN: A DAPTIVE W INDOWING A LGORITHM 1 Initialize Window W 2 for each t 0 3 do W ← W ∪ {xt } (i.e., add xt to the head of W ) 4 repeat Drop elements from the tail of W 5 until |µW0 − µW1 | εc holds ˆ ˆ 6 for every split of W into W = W0 · W1 7 ˆ Output µW 12 / 59
  • 24. Algorithm ADaptive Sliding WINdow (2) ADWIN Example W = 101010110111111 W0 = 1010101 W1 = 10111111 ADWIN: A DAPTIVE W INDOWING A LGORITHM 1 Initialize Window W 2 for each t 0 3 do W ← W ∪ {xt } (i.e., add xt to the head of W ) 4 repeat Drop elements from the tail of W 5 until |µW0 − µW1 | εc holds ˆ ˆ 6 for every split of W into W = W0 · W1 7 ˆ Output µW 12 / 59
  • 25. Algorithm ADaptive Sliding WINdow (2) ADWIN Example W = 101010110111111 W0 = 10101011 W1 = 0111111 ADWIN: A DAPTIVE W INDOWING A LGORITHM 1 Initialize Window W 2 for each t 0 3 do W ← W ∪ {xt } (i.e., add xt to the head of W ) 4 repeat Drop elements from the tail of W 5 until |µW0 − µW1 | εc holds ˆ ˆ 6 for every split of W into W = W0 · W1 7 ˆ Output µW 12 / 59
  • 26. Algorithm ADaptive Sliding WINdow (2) ADWIN Example W = 101010110111111 |µW0 − µW1 | ≥ εc : CHANGE DET.! ˆ ˆ W0 = 101010110 W1 = 111111 ADWIN: A DAPTIVE W INDOWING A LGORITHM 1 Initialize Window W 2 for each t 0 3 do W ← W ∪ {xt } (i.e., add xt to the head of W ) 4 repeat Drop elements from the tail of W 5 until |µW0 − µW1 | εc holds ˆ ˆ 6 for every split of W into W = W0 · W1 7 ˆ Output µW 12 / 59
  • 27. Algorithm ADaptive Sliding WINdow (2) ADWIN Example W = 101010110111111 Drop elements from the tail of W W0 = 101010110 W1 = 111111 ADWIN: A DAPTIVE W INDOWING A LGORITHM 1 Initialize Window W 2 for each t 0 3 do W ← W ∪ {xt } (i.e., add xt to the head of W ) 4 repeat Drop elements from the tail of W 5 until |µW0 − µW1 | εc holds ˆ ˆ 6 for every split of W into W = W0 · W1 7 ˆ Output µW 12 / 59
  • 28. Algorithm ADaptive Sliding WINdow (2) ADWIN Example W = 01010110111111 Drop elements from the tail of W W0 = 101010110 W1 = 111111 ADWIN: A DAPTIVE W INDOWING A LGORITHM 1 Initialize Window W 2 for each t 0 3 do W ← W ∪ {xt } (i.e., add xt to the head of W ) 4 repeat Drop elements from the tail of W 5 until |µW0 − µW1 | εc holds ˆ ˆ 6 for every split of W into W = W0 · W1 7 ˆ Output µW 12 / 59
  • 29. Algorithm ADaptive Sliding WINdow (2) ADWIN Theorem At every time step we have: 1 (False positive rate bound). If µt remains constant within W , the probability that ADWIN shrinks the window at this step is at most δ . 2 (False negative rate bound). Suppose that for some partition of W in two parts W0 W1 (where W1 contains the most recent items) we have |µW0 − µW1 | 2εc . Then with probability 1 − δ ADWIN shrinks W to W1 , or shorter. ADWIN tunes itself to the data stream at hand, with no need for the user to hardwire or precompute parameters. 13 / 59
  • 30. Algorithm ADaptive Sliding WINdow (2) ADWIN ADWIN using a Data Stream Sliding Window Model, can provide the exact counts of 1’s in O(1) time per point. tries O(log W ) cutpoints uses O( 1 log W ) memory words ε the processing time per example is O(log W ) (amortized and worst-case). Sliding Window Model 1010101 101 11 1 1 Content: 4 2 2 1 1 Capacity: 7 3 2 1 1 14 / 59
  • 31. K-ADWIN = ADWIN + Kalman Filtering (2) ADWIN Estimation - xt - Kalman Alarm - - ADWIN 6 6 ? - ADWIN Memory R = W 2 /50 and Q = 200/W (theoretically justifiable), where W is the length of the window maintained by ADWIN. 15 / 59
  • 32. Classification (3) Mining Algorithms Definition Given nC different classes, a classifier algorithm builds a model that predicts for every unlabeled instance I the class C to which it belongs with accuracy. Classification Mining Algorithms Naïve Bayes Decision Trees Ensemble Methods 16 / 59
  • 33. Hoeffding Tree / CVFDT (3) Mining Algorithms Hoeffding Tree : VFDT Pedro Domingos and Geoff Hulten. Mining high-speed data streams. 2000 With high probability, constructs an identical model that a traditional (greedy) method would learn With theoretical guarantees on the error rate Time Day Night Contains “Money” YES Yes No YES NO 17 / 59
  • 34. VFDT / CVFDT (3) Mining Algorithms Concept-adapting Very Fast Decision Trees: CVFDT G. Hulten, L. Spencer, and P. Domingos. Mining time-changing data streams. 2001 It keeps its model consistent with a sliding window of examples Construct “alternative branches” as preparation for changes If the alternative branch becomes more accurate, switch of tree branches occurs 18 / 59
  • 35. Decision Trees: CVFDT (3) Mining Algorithms Time Day Night Contains “Money” YES Yes No YES NO No theoretical guarantees on the error rate of CVFDT CVFDT parameters : 1 W : is the example window size. 2 T0 : number of examples used to check at each node if the splitting attribute is still the best. 3 T1 : number of examples used to build the alternate tree. 4 T2 : number of examples used to test the accuracy of the alternate tree. 19 / 59
  • 36. Decision Trees: Hoeffding Adaptive Tree (3) Mining Algorithms Hoeffding Adaptive Tree: replace frequency statistics counters by estimators don’t need a window to store examples, due to the fact that we maintain the statistics data needed with estimators change the way of checking the substitution of alternate subtrees, using a change detector with theoretical guarantees Summary: 1 Theoretical guarantees 2 No Parameters 20 / 59
  • 37. What is MOA? {M}assive {O}nline {A}nalysis is a framework for online learning from data streams. It is closely related to WEKA It includes a collection of offline and online as well as tools for evaluation: boosting and bagging Hoeffding Trees with and without Naïve Bayes classifiers at the leaves. 21 / 59
  • 38. Ensemble Methods (4) MOA for Evolving Data Streams http://www.cs.waikato.ac.nz/∼abifet/MOA/ New ensemble methods: ADWIN bagging: When a change is detected, the worst classifier is removed and a new classifier is added. Adaptive-Size Hoeffding Tree bagging 22 / 59
  • 39. Adaptive-Size Hoeffding Tree (5) ASHT T1 T2 T3 T4 Ensemble of trees of different size smaller trees adapt more quickly to changes, larger trees do better during periods with little change diversity 23 / 59
  • 40. Adaptive-Size Hoeffding Tree (5) ASHT 0,3 0,28 0,29 0,275 0,28 0,27 0,27 0,26 Error Error 0,25 0,265 0,24 0,26 0,23 0,22 0,255 0,21 0,2 0,25 0 0,1 0,2 0,3 0,4 0,5 0,6 0,1 0,12 0,14 0,16 0,18 0,2 0,22 0,24 0,26 0,28 0,3 Kappa Kappa Figure: Kappa-Error diagrams for ASHT bagging (left) and bagging (right) on dataset RandomRBF with drift, plotting 90 pairs of classifiers. 24 / 59
  • 41. Adaptive-Size Hoeffding Tree (5) ASHT Figure: Accuracy and size on dataset LED with three concept drifts. 25 / 59
  • 42. Main contributions (i) Mining Evolving Data Streams 1 General Framework for Time Change Detectors and Predictors 2 ADWIN 3 Mining methods: Naive Bayes, Decision Trees, Ensemble Methods 4 MOA for Evolving Data Streams 5 Adaptive-Size Hoeffding Tree 26 / 59
  • 43. Outline 1 Introduction 2 Mining Evolving Data Streams 3 Tree Mining 4 Mining Evolving Tree Data Streams 5 Conclusions 27 / 59
  • 44. Mining Closed Frequent Trees Our trees are: Our subtrees are: Labeled and Unlabeled Induced Ordered and Unordered Top-down Two different ordered trees but the same unordered tree 28 / 59
  • 45. A tale of two trees Consider D = {A, B}, where Frequent subtrees A B A: B: and let min_sup = 2. 29 / 59
  • 46. A tale of two trees Consider D = {A, B}, where Closed subtrees A B A: B: and let min_sup = 2. 29 / 59
  • 47. Mining Closed Unordered Subtrees (6) Unlabeled Closed Frequent Tree Method C LOSED _S UBTREES(t, D, min_sup, T ) 1 2 3 for every t that can be extended from t in one step 4 do if Support(t ) ≥ min_sup 5 then T ← C LOSED _S UBTREES(t , D, min_sup, T ) 6 7 8 9 10 return T 30 / 59
  • 48. Mining Closed Unordered Subtrees (6) Unlabeled Closed Frequent Tree Method C LOSED _S UBTREES(t, D, min_sup, T ) 1 if not C ANONICAL _R EPRESENTATIVE(t) 2 then return T 3 for every t that can be extended from t in one step 4 do if Support(t ) ≥ min_sup 5 then T ← C LOSED _S UBTREES(t , D, min_sup, T ) 6 7 8 9 10 return T 30 / 59
  • 49. Mining Closed Unordered Subtrees (6) Unlabeled Closed Frequent Tree Method C LOSED _S UBTREES(t, D, min_sup, T ) 1 if not C ANONICAL _R EPRESENTATIVE(t) 2 then return T 3 for every t that can be extended from t in one step 4 do if Support(t ) ≥ min_sup 5 then T ← C LOSED _S UBTREES(t , D, min_sup, T ) 6 do if Support(t ) = Support(t) 7 then t is not closed 8 if t is closed 9 then insert t into T 10 return T 30 / 59
  • 50. Example D = {A, B} A = (0, 1, 2, 3, 2, 1) B = (0, 1, 2, 3, 1, 2, 2) min_sup = 2. (0, 1, 2, 1) (0, 1, 2, 2, 1) (0, 1, 1) (0, 1) (0, 1, 2, 2) (0) (0, 1, 2) (0, 1, 2, 3, 1) (0, 1, 2, 3) 31 / 59
  • 51. Example D = {A, B} A = (0, 1, 2, 3, 2, 1) B = (0, 1, 2, 3, 1, 2, 2) min_sup = 2. (0, 1, 2, 1) (0, 1, 2, 2, 1) (0, 1, 1) (0, 1) (0, 1, 2, 2) (0) (0, 1, 2) (0, 1, 2, 3, 1) (0, 1, 2, 3) 31 / 59
  • 52. Experimental results (6) Unlabeled Closed Frequent Tree Method TreeNat CMTreeMiner Unlabeled Trees Labeled Trees Top-Down Subtrees Induced Subtrees No Occurrences Occurrences 32 / 59
  • 53. Closure Operator on Trees (7) Closure Operator D: the finite input dataset of trees T : the (infinite) set of all trees Definition We define the following the Galois connection pair: For finite A ⊆ D σ (A) is the set of subtrees of the A trees in T σ (A) = {t ∈ T ∀ t ∈ A (t t )} For finite B ⊂ T τD (B) is the set of supertrees of the B trees in D τD (B) = {t ∈ D ∀ t ∈ B (t t )} Closure Operator The composition ΓD = σ ◦ τD is a closure operator. 33 / 59
  • 54. Galois Lattice of closed set of trees (7) Closure Operator 1 2 3 12 13 23 123 34 / 59
  • 55. Galois Lattice of closed set of trees 1 2 3 D B={ } 12 13 23 123 35 / 59
  • 56. Galois Lattice of closed set of trees B={ } 1 2 3 τD (B) = { , } 12 13 23 123 35 / 59
  • 57. Galois Lattice of closed set of trees B={ } 1 2 3 τD (B) = { , } 12 13 23 ΓD (B) = σ ◦τD(B) = { and its subtrees } 123 35 / 59
  • 58. Mining Implications from Lattices of Closed Trees (8) Association Rules Problem Given a dataset D of rooted, unlabeled and unordered trees, find a “basis”: a set of rules that are sufficient to infer all the rules that hold in the dataset D. ∧ → ∧ → → D ∧ → 36 / 59
  • 59. Mining Implications from Lattices of Closed Trees Set of Rules: A → ΓD (A). antecedents are 2 1 3 obtained through a computation akin to a hypergraph transversal consequents 12 13 23 follow from an application of the closure operators 123 37 / 59
  • 60. Mining Implications from Lattices of Closed Trees Set of Rules: A → ΓD (A). ∧ → 1 2 3 ∧ → → 12 13 23 ∧ → 123 37 / 59
  • 61. Association Rule Computation Example (8) Association Rules 1 2 3 12 13 23 123 38 / 59
  • 62. Association Rule Computation Example (8) Association Rules 1 2 3 12 13 23 123 38 / 59
  • 63. Association Rule Computation Example (8) Association Rules → 1 2 3 12 13 23 123 38 / 59
  • 64. Association Rule Computation Example (8) Association Rules → 1 2 3 12 13 23 123 38 / 59
  • 65. Model transformation (8) Association Rules Intuition One propositional variable vt is assigned to each possible subtree t. A set of trees A corresponds in a natural way to a model mA . Let mA be a model: we impose on mA the constraints that if mA (vt ) = 1 for a variable vt , then mA (vt ) = 1 for all those variables vt such that vt represents a subtree of the tree represented by vt . R0 = {vt → vt t t, t ∈ U , t ∈ U } 39 / 59
  • 66. Implicit Rules Definition (9) Implicit Rules Implicit Rule ∧ → D Given three trees t1 , t2 , t3 , we say that t1 ∧ t2 → t3 , is an implicit Horn rule (abbreviately, an implicit rule) if for every tree t it holds t1 t ∧ t2 t ↔ t3 t. t1 and t2 have implicit rules if t1 ∧ t2 → t is an implicit rule for some t. 40 / 59
  • 67. Implicit Rules Definition (9) Implicit Rules NOT Implicit Rule ∧ → D Given three trees t1 , t2 , t3 , we say that t1 ∧ t2 → t3 , is an implicit Horn rule (abbreviately, an implicit rule) if for every tree t it holds t1 t ∧ t2 t ↔ t3 t. t1 and t2 have implicit rules if t1 ∧ t2 → t is an implicit rule for some t. 40 / 59
  • 68. Implicit Rules Definition (9) Implicit Rules NOT Implicit Rule ∧ → This supertree of the antecedents is NOT a supertree of the consequents. 40 / 59
  • 69. Implicit Rules Characterization (9) Implicit Rules Theorem All trees a, b such that a b have implicit rules. Theorem Suppose that b has only one component. Then they have implicit rules if and only if a has a maximum component which is a subtree of the component of b. for all i n ai an b1 ∧ → a1 ··· an−1 an b1 a1 ··· an−1 b1 41 / 59
  • 70. Main contributions (ii) Tree Mining 6 Closure Operator on Trees 7 Unlabeled Closed Frequent Tree Mining 8 A way of extracting high-confidence association rules from datasets consisting of unlabeled trees antecedents are obtained through a computation akin to a hypergraph transversal consequents follow from an application of the closure operators 9 Detection of some cases of implicit rules: rules that always hold, independently of the dataset 42 / 59
  • 71. Outline 1 Introduction 2 Mining Evolving Data Streams 3 Tree Mining 4 Mining Evolving Tree Data Streams 5 Conclusions 43 / 59
  • 72. Mining Evolving Tree Data Streams (10,11,12) Incremental, Sliding Window, and Adaptive Tree Mining Methods Problem Given a data stream D of rooted, unlabeled and unordered trees, find frequent closed trees. We provide three algorithms, of increasing power Incremental Sliding Window Adaptive D 44 / 59
  • 73. Relaxed Support (13) Logarithmic Relaxed Support Guojie Song, Dongqing Yang, Bin Cui, Baihua Zheng, Yunfeng Liu and Kunqing Xie. CLAIM: An Efficient Method for Relaxed Frequent Closed Itemsets Mining over Stream Data Linear Relaxed Interval:The support space of all subpatterns can be divided into n = 1/εr intervals, where εr is a user-specified relaxed factor, and each interval can be denoted by Ii = [li , ui ), where li = (n − i) ∗ εr ≥ 0, ui = (n − i + 1) ∗ εr ≤ 1 and i ≤ n. Linear Relaxed closed subpattern t: if and only if there exists no proper superpattern t of t such that their suports belong to the same interval Ii . 45 / 59
  • 74. Relaxed Support (13) Logarithmic Relaxed Support As the number of closed frequent patterns is not linear with respect support, we introduce a new relaxed support: Logarithmic Relaxed Interval:The support space of all subpatterns can be divided into n = 1/εr intervals, where εr is a user-specified relaxed factor, and each interval can be denoted by Ii = [li , ui ), where li = c i , ui = c i+1 − 1 and i ≤ n. Logarithmic Relaxed closed subpattern t: if and only if there exists no proper superpattern t of t such that their suports belong to the same interval Ii . 45 / 59
  • 75. Algorithms (10,11,12) Incremental, Sliding Window, and Adaptive Tree Mining Methods Algorithms Incremental: I NC T REE N AT Sliding Window: W IN T REE N AT Adaptive: A DAT REE N AT Uses ADWIN to monitor change ADWIN An adaptive sliding window whose size is recomputed online according to the rate of change observed. ADWIN has rigorous guarantees (theorems) On ratio of false positives and false negatives On the relation of the size of the current window and change rates 46 / 59
  • 76. Experimental Validation: TN1 (10,11,12) Incremental, Sliding Window, and Adaptive Tree Mining Methods CMTreeMiner 300 Time 200 (sec.) 100 I NC T REE N AT 2 4 6 8 Size (Milions) Figure: Experiments on ordered trees with TN1 dataset 47 / 59
  • 77. Adaptive XML Tree Classification on evolving data streams (14) XML Tree Classification D D D D B B B B B B B C C C C C C A A C LASS 1 C LASS 2 C LASS 1 C LASS 2 D Figure: A dataset example 48 / 59
  • 78. Adaptive XML Tree Classification on evolving data streams (14) XML Tree Classification Tree Trans. Closed Freq. not Closed Trees 1 2 3 4 D B B c1 C C C C 1 0 1 0 D B B C A C C A c2 A A 1 0 0 1 49 / 59
  • 79. Adaptive XML Tree Classification on evolving data streams (14) XML Tree Classification Frequent Trees c1 c2 c3 c4 Id 1 c1 f1 1 2 3 c2 f2 f2 f2 1 c3 f3 1 2 3 4 5 c4 f4 f4 f4 f4 f4 1 1 1 1 1 1 1 0 0 1 1 1 1 1 1 2 0 0 0 0 0 0 1 1 1 1 1 1 1 1 3 1 1 0 0 0 0 1 1 1 1 1 1 1 1 4 0 0 1 1 1 1 1 1 1 1 1 1 1 1 Closed Maximal Trees Trees Id Tree c1 c2 c3 c4 c1 c2 c3 Class 1 1 1 0 1 1 1 0 C LASS 1 2 0 0 1 1 0 0 1 C LASS 2 3 1 0 1 1 1 0 1 C LASS 1 4 0 1 1 1 0 1 1 C LASS 2 50 / 59
  • 80. Adaptive XML Tree Framework on evolving data streams (14) XML Tree Classification XML Tree Classification Framework Components An XML closed frequent tree miner A Data stream classifier algorithm, which we will feed with tuples to be classified online. 51 / 59
  • 81. Adaptive XML Tree Framework on evolving data streams (14) XML Tree Classification Maximal Closed # Trees Att. Acc. Mem. Att. Acc. Mem. CSLOG12 15483 84 79.64 1.2 228 78.12 2.54 CSLOG23 15037 88 79.81 1.21 243 78.77 2.75 CSLOG31 15702 86 79.94 1.25 243 77.60 2.73 CSLOG123 23111 84 80.02 1.7 228 78.91 4.18 Table: BAGGING on unordered trees. 52 / 59
  • 82. Main contributions (iii) Mining Evolving Tree Data Streams 10 Incremental Method 11 Sliding Window Method 12 Adaptive Method 13 Logarithmic Relaxed Support 14 XML Classification 53 / 59
  • 83. Outline 1 Introduction 2 Mining Evolving Data Streams 3 Tree Mining 4 Mining Evolving Tree Data Streams 5 Conclusions 54 / 59
  • 84. Main contributions Mining Evolving Tree Mining Mining Evolving Tree Data Streams 6 Closure Operator Data Streams 1 Framework on Trees 10 Incremental 2 ADWIN 7 Unlabeled Tree Method 3 Classifiers Mining Methods 11 Sliding Window 8 Deterministic Method 4 MOA Association Rules 12 Adaptive Method 5 ASHT 9 Implicit Rules 13 Logarithmic Relaxed Support 14 XML Classification 55 / 59
  • 85. Future Lines (i) Adaptive Kalman Filter Kalman filter adaptive computing Q and R without using the size of the window of ADWIN. Extend MOA framework Support vector machines Clustering Itemset mining Association rules 56 / 59
  • 86. Future Lines (ii) Adaptive Deterministic Association Rules Deterministic Association Rules computed on evolving data streams General Implicit Rules Characterization Find a characterization of implicit rules with any number of components Not Deterministic Association Rules Find basis of association rules for trees with confidence lower than 100% 57 / 59
  • 87. Future Lines (iii) Closed Frequent Graph Mining Mining methods to obtain closed frequent graphs. Not incremental Incremental Sliding Window Adaptive Graph Classification Classifiers of graphs using maximal and closed frequent subgraphs. 58 / 59
  • 88. Relevant publications Albert Bifet and Ricard Gavaldà. Kalman filters and adaptive windows for learning in data streams. DS’06 Albert Bifet and Ricard Gavaldà. Learning from time-changing data with adaptive windowing. SDM’07 Albert Bifet and Ricard Gavaldà. Adaptive parameter-free learning from evolving data streams. Tech-Rep R09-9 A. Bifet, G. Holmes, B. Pfahringer, R. Kirkby, and R. Gavaldà. New ensemble methods for evolving data streams. KDD’09 José L. Balcázar, Albert Bifet, and Antoni Lozano. Mining frequent closed unordered trees through natural representations. ICCS’07 José L. Balcázar, Albert Bifet, and Antoni Lozano. Subtree testing and closed tree mining through natural representations. DEXA’07 José L. Balcázar, Albert Bifet, and Antoni Lozano. Mining implications from lattices of closed trees. EGC’2008 José L. Balcázar, Albert Bifet, and Antoni Lozano. Mining Frequent Closed Rooted Trees. MLJ’09 Albert Bifet and Ricard Gavaldà. Mining adaptively frequent closed unlabeled rooted trees in data streams. KDD’08 Albert Bifet and Ricard Gavaldà. Adaptive XML Tree Classification on evolving data streams 59 / 59