SlideShare a Scribd company logo
1 of 63
Download to read offline
Introduction    Problem Definition   Bandedness and Bi-Clustering   MMBS Algorithm   Experimental Results   Conclusion




               Mining Maximally Banded Matrices in Binary
                                 Data

                                                Faris Alqadah
                                                Raj Bhatnagar
                                                 Anil Jegga

                                             University of Cincinnati
                                          Cincinnati Children’s Hospital
Introduction    Problem Definition   Bandedness and Bi-Clustering   MMBS Algorithm   Experimental Results   Conclusion



Outline

       1       Introduction
                  Motivation
       2       Problem Definition
                 Preliminaries
       3       Bandedness and Bi-Clustering
                 Formal Concept Analysis
                 Concept Lattice Paths
       4       MMBS Algorithm
                Three Steps
       5       Experimental Results
                 Synthetic Data
                 Real-World Data
       6       Conclusion
Introduction    Problem Definition   Bandedness and Bi-Clustering   MMBS Algorithm   Experimental Results   Conclusion



Outline

       1       Introduction
                  Motivation
       2       Problem Definition
                 Preliminaries
       3       Bandedness and Bi-Clustering
                 Formal Concept Analysis
                 Concept Lattice Paths
       4       MMBS Algorithm
                Three Steps
       5       Experimental Results
                 Synthetic Data
                 Real-World Data
       6       Conclusion
Introduction   Problem Definition   Bandedness and Bi-Clustering   MMBS Algorithm   Experimental Results   Conclusion



Banded Matrices in Data



                                                                     Banded structures in
                                                                     binary matrices have
               A   B      C        D   E
                                                                     natural interpretations
      1        1   1      1        0   0
      2        0   1      1        0   0                             Bioinformatics (overlapping
      3        0   0      1        0   0                             roles of genes)
      4        0   0      1        1   0                             Paleontology (patterns of
      5        0   0      0        1   1                             species in space)
                                                                     Social Networks
                                                                     (community structures)
Introduction    Problem Definition   Bandedness and Bi-Clustering   MMBS Algorithm   Experimental Results   Conclusion



Motivating Example




                         k-means           multi-way           EM      bi-cluster        subspace
               doc1         1                  0                1           0               1
               doc2         0                  1                0           1               0
               doc3         0                  0                0           0               1
               doc4         0                  0                0           1               1
               doc5         0                  0                1           0               1
Introduction    Problem Definition   Bandedness and Bi-Clustering   MMBS Algorithm   Experimental Results   Conclusion



Motivating Example




                         k-means           EM        subspace           bi-cluster        multi-way
               doc1         1               1           1                    0                0
               doc5         0               1           1                    0                0
               doc3         0               0           1                    0                0
               doc4         0               0           1                    1                0
               doc2         0               0           0                    1                1
Introduction   Problem Definition   Bandedness and Bi-Clustering   MMBS Algorithm   Experimental Results   Conclusion



Bi-Clustering Problem




               Banded sub-matrices are a form of bi-clusters
               Bi-Clustering in binary data focuses on maximally
               rectangles full of (or almost full) of 1s
Introduction   Problem Definition   Bandedness and Bi-Clustering   MMBS Algorithm   Experimental Results   Conclusion



Related Work




               Nestedness and segmented nestedness [6]
               MBS algorithm [2]
               Fix column permutations
               Solve the consecutive ones problem
               Only find a single band
Introduction   Problem Definition   Bandedness and Bi-Clustering   MMBS Algorithm   Experimental Results   Conclusion



Contributions




          1    Establish correspondence between banded structures and
               bi-clustering in binary data
          2    Introduce the novel MMBS algorithm to uncover multiple,
               possibly overlapping banded sub-matrices
          3    Empirical evaluation verifying advantage of MMBS over
               previous approaches
Introduction   Problem Definition   Bandedness and Bi-Clustering   MMBS Algorithm   Experimental Results   Conclusion



Contributions




          1    Establish correspondence between banded structures and
               bi-clustering in binary data
          2    Introduce the novel MMBS algorithm to uncover multiple,
               possibly overlapping banded sub-matrices
          3    Empirical evaluation verifying advantage of MMBS over
               previous approaches
Introduction   Problem Definition   Bandedness and Bi-Clustering   MMBS Algorithm   Experimental Results   Conclusion



Contributions




          1    Establish correspondence between banded structures and
               bi-clustering in binary data
          2    Introduce the novel MMBS algorithm to uncover multiple,
               possibly overlapping banded sub-matrices
          3    Empirical evaluation verifying advantage of MMBS over
               previous approaches
Introduction    Problem Definition   Bandedness and Bi-Clustering   MMBS Algorithm   Experimental Results   Conclusion



Outline

       1       Introduction
                  Motivation
       2       Problem Definition
                 Preliminaries
       3       Bandedness and Bi-Clustering
                 Formal Concept Analysis
                 Concept Lattice Paths
       4       MMBS Algorithm
                Three Steps
       5       Experimental Results
                 Synthetic Data
                 Real-World Data
       6       Conclusion
Introduction   Problem Definition   Bandedness and Bi-Clustering   MMBS Algorithm   Experimental Results   Conclusion



Basic Notation




               Matrix K with row labels G and column labels M
               Think of K as K = (G, M, I)
               π permutation of G and τ permutation of M
               Kπ
                τ
               g πi and mτj
Introduction   Problem Definition   Bandedness and Bi-Clustering   MMBS Algorithm   Experimental Results   Conclusion



Basic Notation




               Matrix K with row labels G and column labels M
               Think of K as K = (G, M, I)
               π permutation of G and τ permutation of M
               Kπ
                τ
               g πi and mτj
Introduction   Problem Definition   Bandedness and Bi-Clustering   MMBS Algorithm   Experimental Results   Conclusion



Fully Banded Matrix




       Definition
       A binary matrix K= (G, M, I) is fully banded if there exists a
       permutation π of G and permutation τ of M such that (1) for
       every row i in Kπ the entries with 1s occur in consecutive
                        τ
       column indices {mi , mi + 1, . . . , mi⋆ } and (2) the values of
       starting indices for 1s in successive rows (i and i + 1) satisfy
       the conditions mi ≤ mi+1 and mi⋆ ≤ mi+1 .   ⋆
Introduction   Problem Definition   Bandedness and Bi-Clustering   MMBS Algorithm   Experimental Results   Conclusion



Relaxation of Fully Banded




               Real data has noise
               Subspaces may encompass banded structure
               e(Kπ ): number of 1s or 0s that must be flipped to achieve
                  τ
               banded structure
               Maximal banded sub-matrix: no more rows or columns can
               be added while still preserving bandedness
Introduction   Problem Definition   Bandedness and Bi-Clustering   MMBS Algorithm   Experimental Results   Conclusion



Relaxation of Fully Banded




               Real data has noise
               Subspaces may encompass banded structure
               e(Kπ ): number of 1s or 0s that must be flipped to achieve
                  τ
               banded structure
               Maximal banded sub-matrix: no more rows or columns can
               be added while still preserving bandedness
Introduction   Problem Definition   Bandedness and Bi-Clustering   MMBS Algorithm   Experimental Results   Conclusion




       Problem Statement
       Given binary matrix K and noise threshold ǫ find all
                    ˆ
       sub-matrices K of K that are ǫ-banded and maximal.
Introduction    Problem Definition   Bandedness and Bi-Clustering   MMBS Algorithm   Experimental Results   Conclusion



Outline

       1       Introduction
                  Motivation
       2       Problem Definition
                 Preliminaries
       3       Bandedness and Bi-Clustering
                 Formal Concept Analysis
                 Concept Lattice Paths
       4       MMBS Algorithm
                Three Steps
       5       Experimental Results
                 Synthetic Data
                 Real-World Data
       6       Conclusion
Introduction   Problem Definition   Bandedness and Bi-Clustering   MMBS Algorithm   Experimental Results   Conclusion



Bi-clustering




               Bi-clusters in binary data defined as Formal Concepts
               For A ⊆ G, then A′ = {m ∈ M|gIm for all g ∈ A}.
               B ⊆ M, we have B ′ = {g ∈ G|gImfor allm ∈ B}
               Formal Concept: C = (A, B) such that A′ = B and B ′ = A
Introduction   Problem Definition   Bandedness and Bi-Clustering   MMBS Algorithm   Experimental Results   Conclusion



Bi-clustering




               Bi-clusters in binary data defined as Formal Concepts
               For A ⊆ G, then A′ = {m ∈ M|gIm for all g ∈ A}.
               B ⊆ M, we have B ′ = {g ∈ G|gImfor allm ∈ B}
               Formal Concept: C = (A, B) such that A′ = B and B ′ = A
Introduction   Problem Definition   Bandedness and Bi-Clustering   MMBS Algorithm   Experimental Results   Conclusion



Formal Concepts

                                               m1       m2        m3     m4
                                       g1      0        1         0      1
                                       g2      0        0         1      1
                                       g3      0        0         0      1
                                       g4      1        0         0      0
                                       g5      1        1         1      0
                                       g7      1         1        0      0
                                       g6      0        0         1      0

               Maximal rectangles of 1s
               Maximal bicliques
               Bi-clusters may be ordered by the subset superset
               relationship and form a complete lattice
               B(G, M, I) denotes the concept or bi-cluster lattice
Introduction   Problem Definition   Bandedness and Bi-Clustering   MMBS Algorithm   Experimental Results   Conclusion



Formal Concepts

                                               m1       m2        m3     m4
                                       g1      0        1         0      1
                                       g2      0        0         1      1
                                       g3      0        0         0      1
                                       g4      1        0         0      0
                                       g5      1        1         1      0
                                       g7      1         1        0      0
                                       g6      0        0         1      0

               Maximal rectangles of 1s
               Maximal bicliques
               Bi-clusters may be ordered by the subset superset
               relationship and form a complete lattice
               B(G, M, I) denotes the concept or bi-cluster lattice
Introduction   Problem Definition   Bandedness and Bi-Clustering   MMBS Algorithm   Experimental Results   Conclusion



Splintering Bands




       Trivially a bi-cluster is fully banded
Introduction   Problem Definition   Bandedness and Bi-Clustering   MMBS Algorithm   Experimental Results   Conclusion



Splintering Bands



       Trivially a bi-cluster is fully banded
                                                A      B     C    D     E
                                          1     1      1     1    0     0
                                          2     0      1     1    0     0
                                          3     0      0     1    0     0
                                          4     0      0     1    1     0
                                          5     0      0     0    1     1
Introduction   Problem Definition   Bandedness and Bi-Clustering   MMBS Algorithm   Experimental Results   Conclusion



Splintering Bands




                                                A      B     C    D     E
                                          1     1      1     1    0     0
                                          2     0      1     1    0     0
                                          3     0      0     1    0     0
                                          4     0      0     1    1     0
                                          5     0      0     0    1     1

       Intuitively, any fully banded matrix can be splintered exactly into
       maximal rectangles of 1s or bi-clusters
Introduction   Problem Definition   Bandedness and Bi-Clustering   MMBS Algorithm   Experimental Results   Conclusion



Ordering Splintered Bands



               Let Kπ be fully banded
                    τ
               Γ(g) is a mapping from row g to the bi-clusters g appears
               in
               The union of all Γ(g) can always be ordered
               n-tuple of bi-clusters {C1 , . . . , Cn } having total ordering
               {<π1 ,τ1 , . . . , <πn ,τn }
               Define lexicographical order <π,τ on C1 × C2 × · · · × Cn .
               Considering {C1 , . . . , Cn } in order completely specifies the
               permutations π and τ
Introduction   Problem Definition   Bandedness and Bi-Clustering   MMBS Algorithm   Experimental Results   Conclusion



Ordering Splintered Bands



               Let Kπ be fully banded
                    τ
               Γ(g) is a mapping from row g to the bi-clusters g appears
               in
               The union of all Γ(g) can always be ordered
               n-tuple of bi-clusters {C1 , . . . , Cn } having total ordering
               {<π1 ,τ1 , . . . , <πn ,τn }
               Define lexicographical order <π,τ on C1 × C2 × · · · × Cn .
               Considering {C1 , . . . , Cn } in order completely specifies the
               permutations π and τ
Introduction   Problem Definition   Bandedness and Bi-Clustering   MMBS Algorithm   Experimental Results   Conclusion



Ordering Splintered Bands



               Let Kπ be fully banded
                    τ
               Γ(g) is a mapping from row g to the bi-clusters g appears
               in
               The union of all Γ(g) can always be ordered
               n-tuple of bi-clusters {C1 , . . . , Cn } having total ordering
               {<π1 ,τ1 , . . . , <πn ,τn }
               Define lexicographical order <π,τ on C1 × C2 × · · · × Cn .
               Considering {C1 , . . . , Cn } in order completely specifies the
               permutations π and τ
Introduction   Problem Definition   Bandedness and Bi-Clustering   MMBS Algorithm   Experimental Results   Conclusion



Ordering Splintered Bands



               Let Kπ be fully banded
                    τ
               Γ(g) is a mapping from row g to the bi-clusters g appears
               in
               The union of all Γ(g) can always be ordered
               n-tuple of bi-clusters {C1 , . . . , Cn } having total ordering
               {<π1 ,τ1 , . . . , <πn ,τn }
               Define lexicographical order <π,τ on C1 × C2 × · · · × Cn .
               Considering {C1 , . . . , Cn } in order completely specifies the
               permutations π and τ
Introduction   Problem Definition   Bandedness and Bi-Clustering   MMBS Algorithm   Experimental Results   Conclusion



Bands as Sequences of Concepts




       Proposition
       Given a context K, if permutations π and τ exist such that Kπ is
                                                                   τ
       fully banded then there exists a sequence of bi-clusters
       C1 = (A1 , B1 ), . . . , Cn = (An , Bn ) s.t.

                             π = A1 , A2  A1 , . . . , An  An−1
                              τ = B1  B2 , . . . , Bn−1  Bn , Bn
Introduction   Problem Definition   Bandedness and Bi-Clustering              MMBS Algorithm           Experimental Results   Conclusion



An Example
                                                    A        B        C        D        E
                                            1       1        1        1        0        0
                                            2       0        1        1        0        0
                                            3       0        0        1        0        0
                                            4       0        0        1        1        0
                                            5       0        0        0        1        1

                                    g                              Γ(g)
                                    1                  (1, ABC), (12, BC), (1234, C)
                                    2                       (12, BC), (1234, C)
                                    3                            (1234, C)
                                    4                         (4, CD), (45, D)
                                    5                         (5, DE ), (45, D)
                                                                 F(Kπ )
                                                                     τ
                                     (1, ABC) < (12, BC) < (1234, C) < (4, CD) < (45, D) < (5, DE )




                             π =             1, 12  1, . . . , 5  45
                                   = {1, 2, 3, 4, 5}
                             τ     =         ABC  BC, . . . , D  DE , DE
                                   = {A, B, C, D, E }
Introduction    Problem Definition   Bandedness and Bi-Clustering   MMBS Algorithm   Experimental Results   Conclusion



Outline

       1       Introduction
                  Motivation
       2       Problem Definition
                 Preliminaries
       3       Bandedness and Bi-Clustering
                 Formal Concept Analysis
                 Concept Lattice Paths
       4       MMBS Algorithm
                Three Steps
       5       Experimental Results
                 Synthetic Data
                 Real-World Data
       6       Conclusion
Introduction   Problem Definition   Bandedness and Bi-Clustering   MMBS Algorithm   Experimental Results   Conclusion



Paths in the lattice




               Represent B(G, M, I) as G = (V , E )
               Edge set define as: C1 , C2 ∈ E ↔ C1 ≺ C2 ∨ C2 ≺ C1
               Concept lattice order enforces: Ai+1 ⊆ Ai and Bi ⊆ Bi+1 if
               Ci ≺ Ci+1
               Dual: Ai ⊆ Ai+1 and Bi+1 ⊆ Bi if Ci ≻ Ci+1
Introduction   Problem Definition   Bandedness and Bi-Clustering   MMBS Algorithm   Experimental Results   Conclusion



Paths in the lattice




               Represent B(G, M, I) as G = (V , E )
               Edge set define as: C1 , C2 ∈ E ↔ C1 ≺ C2 ∨ C2 ≺ C1
               Concept lattice order enforces: Ai+1 ⊆ Ai and Bi ⊆ Bi+1 if
               Ci ≺ Ci+1
               Dual: Ai ⊆ Ai+1 and Bi+1 ⊆ Bi if Ci ≻ Ci+1
Introduction   Problem Definition   Bandedness and Bi-Clustering   MMBS Algorithm    Experimental Results   Conclusion



Construct Partial Bands Via Paths


                                                                                               s
                                                                                            1,2,3,4,5

                                                                                      C
                                                                                      s
                                                                                    1,2,3,4                 Ds
                                                                                   B,C                      4,5
                                                                                    s
                                                                                   1,2           C,D              D,E
                                                                                                  s                s
                                                                                                  4                5
                                                                                   A,B,C
                                                                                    s
                                                                                    1

                                                                                           A,B,C,D,E
                                                                                              s
Introduction   Problem Definition   Bandedness and Bi-Clustering   MMBS Algorithm   Experimental Results   Conclusion



Bound on the error




       Key Fact
       Each individual edge in a path P is guaranteed to produce a
       banded structure
Introduction   Problem Definition   Bandedness and Bi-Clustering   MMBS Algorithm   Experimental Results   Conclusion



Bound on the error



       Proposition
                           
                            0
                                                                         if n ≤ 1
                                                          ′
                            e(P n−1 ) +
                           
                                                       |a ∩ B|           if Cn+1 ≻ Cn
               e(Pn ) ≤                             ˆ
                                                  a∈A
                                                        |b ′ ∩ A|
                           
                                 n−1 ) +                                  if Cn+1 ≺ Cn
                            e(P
                           
                           
                           
                                                   ˆ
                                                  b∈B
Introduction    Problem Definition   Bandedness and Bi-Clustering   MMBS Algorithm   Experimental Results   Conclusion



Outline

       1       Introduction
                  Motivation
       2       Problem Definition
                 Preliminaries
       3       Bandedness and Bi-Clustering
                 Formal Concept Analysis
                 Concept Lattice Paths
       4       MMBS Algorithm
                Three Steps
       5       Experimental Results
                 Synthetic Data
                 Real-World Data
       6       Conclusion
Introduction   Problem Definition   Bandedness and Bi-Clustering   MMBS Algorithm   Experimental Results   Conclusion



Overview



               Weigh edges of concept lattice with upper bound of error
               Bad news: weights change depending on path
               Good news: Error is monotonic along a path, so pruning
               with backtracking works!
               Three steps:
                  1   Compute G
                  2   Search paths of G
                  3   Determine top bands
Introduction   Problem Definition   Bandedness and Bi-Clustering   MMBS Algorithm   Experimental Results   Conclusion



Overview



               Weigh edges of concept lattice with upper bound of error
               Bad news: weights change depending on path
               Good news: Error is monotonic along a path, so pruning
               with backtracking works!
               Three steps:
                  1   Compute G
                  2   Search paths of G
                  3   Determine top bands
Introduction   Problem Definition   Bandedness and Bi-Clustering   MMBS Algorithm   Experimental Results   Conclusion



Compute G




               Many existing algorithms [1, 5, 3, 4, 7]
               Incremental vs. non-incremental
               Assume availability of G
Introduction   Problem Definition   Bandedness and Bi-Clustering   MMBS Algorithm   Experimental Results   Conclusion



Search Paths




               Potentially exponential number of paths
               Any bi-cluster is a valid starting point...but initiate with
               upper neighbors of null-element
               At each edge add concept to path utilizing previous
               procedure
               Utilize backtracking, mark previously visited edges
Introduction   Problem Definition   Bandedness and Bi-Clustering   MMBS Algorithm   Experimental Results   Conclusion



Search Paths




               Potentially exponential number of paths
               Any bi-cluster is a valid starting point...but initiate with
               upper neighbors of null-element
               At each edge add concept to path utilizing previous
               procedure
               Utilize backtracking, mark previously visited edges
Introduction   Problem Definition   Bandedness and Bi-Clustering   MMBS Algorithm   Experimental Results   Conclusion



Top Bands




               Allow user to specify : minRows, minCols, maxOvlp
               Quality measure: q(P) = |r (P)| ∗ |c(P)| − w ∗ e(P)
               If two bands exceed maxOvlp select the higher quality one
Introduction   Problem Definition   Bandedness and Bi-Clustering   MMBS Algorithm   Experimental Results   Conclusion



Analysis and Improvements



               Running time: O(|U| × |E | × max{X , Y }|)
                      |U| : size of initial concepts
                      X , Y : largest symmetric difference between neighboring
                      concepts
               Speed up by reducing size of |U|
               Perform simple clustering of U based on maxOvlp
               parameter
               Good experimental results with this speed up.
Introduction   Problem Definition   Bandedness and Bi-Clustering   MMBS Algorithm   Experimental Results   Conclusion



Analysis and Improvements



               Running time: O(|U| × |E | × max{X , Y }|)
                      |U| : size of initial concepts
                      X , Y : largest symmetric difference between neighboring
                      concepts
               Speed up by reducing size of |U|
               Perform simple clustering of U based on maxOvlp
               parameter
               Good experimental results with this speed up.
Introduction    Problem Definition   Bandedness and Bi-Clustering   MMBS Algorithm   Experimental Results   Conclusion



Outline

       1       Introduction
                  Motivation
       2       Problem Definition
                 Preliminaries
       3       Bandedness and Bi-Clustering
                 Formal Concept Analysis
                 Concept Lattice Paths
       4       MMBS Algorithm
                Three Steps
       5       Experimental Results
                 Synthetic Data
                 Real-World Data
       6       Conclusion
Introduction   Problem Definition   Bandedness and Bi-Clustering   MMBS Algorithm   Experimental Results   Conclusion



Setup




               Single band and segmented bands planted in synthetic
               data
               All experiments:
                      w =1
                      maxOvlp = 0.1
                      minRows = 5
                      minCols = 5
                      ǫ = 99
Introduction    Problem Definition            Bandedness and Bi-Clustering                    MMBS Algorithm                    Experimental Results               Conclusion



Results
                50                                                                      20

               100                                                                      40

               150                                                                      60

               200                                                                      80

               250                                                                     100

               300                                                                     120

               350                                                                     140

               400                                                                     160

               450                                                                     180

               500                                                                     200
                     50   100   150   200   250   300   350   400   450    500                     20    40   60         80   100   120   140   160   180   200




                                                                Planted Bands

                                                   50



                                                  100



                                                  150



                                                  200



                                                  250



                                                  300
                                                               50         100    150         200        250        300
Introduction   Problem Definition        Bandedness and Bi-Clustering             MMBS Algorithm           Experimental Results      Conclusion



Results

                  Dataset name       Dataset Size    p     Num. Planted bands   Algorithm   Quality top ranked   Num. bands mined
                                                                                  MMBS             3590                  6
                                                                                MMBS_Fast          3406                  4
                SynBand100_001        100 × 100     0.01           1
                                                                                 MBS_BD            2507                  1
                                                                                 MBS_SD             438                  1
                                                                                  MMBS             2278                  9
                                                                                MMBS_Fast          1503                  8
                SynBand100_005        100 × 100     0.05           1
                                                                                 MBS_BD            1050                  1
                                                                                 MBS_SD            1201                  1
                                                                                  MMBS             8918                  7
                                                                                MMBS_Fast          8261                  6
                SynBand500_001        500 × 500     0.01           1
                                                                                 MBS_BD            2822                  1
                                                                                 MBS_SD            2145                  1
                                                                                  MMBS             3367                  2
                                                                                MMBS_Fast          3367                  2
               SynMultiBand100_001    100 × 100     0.01           2
                                                                                   MBS             4101                  1
                                                                                 MBS_SD            4045                  1
                                                                                  MMBS             4054                  2
                                                                                MMBS_Fast          3933                  2
               SynMultiBand100_001    100 × 100     0.05           2
                                                                                 MBS_BD            3910                  1
                                                                                 MBS_SD            3736                  1
                                                                                  MMBS            28242                  8
                                                                                MMBS_Fast         21346                  5
               SynMultiBand500_001    500 × 500     0.01           2
                                                                                 MBS_BD           17498                  1
                                                                                 MBS_SD             430                  1
                                                                                  MMBS             3311                 17
                                                                                MMBS_Fast          3220                 14
               SynRandom100_005       100 × 100     0.05        unknown
                                                                                 MBS_BD            2801                  1
                                                                                 MBS_SD            1949                  1
                                                                                  MMBS            18635                 73
                                                                                MMBS_Fast         16163                 64
               SynRandom500_001       500 × 500     0.01        unknown
                                                                                 MBS_BD           16771                  1
                                                                                 MBS_SD            5229                  1
Introduction    Problem Definition   Bandedness and Bi-Clustering   MMBS Algorithm   Experimental Results   Conclusion



Outline

       1       Introduction
                  Motivation
       2       Problem Definition
                 Preliminaries
       3       Bandedness and Bi-Clustering
                 Formal Concept Analysis
                 Concept Lattice Paths
       4       MMBS Algorithm
                Three Steps
       5       Experimental Results
                 Synthetic Data
                 Real-World Data
       6       Conclusion
Introduction   Problem Definition      Bandedness and Bi-Clustering           MMBS Algorithm              Experimental Results   Conclusion




                          Dataset                 Size       Sparsity   Algorithm   Quality top ranked    Num. bands mined
                                                                          MMBS             6665                  56
                                                                        MMBS_Fast          6665                  43
                      Genes_Phenotypes         1910 × 3965    0.008
                                                                         MBS_BD            5204                  1
                                                                         MBS_SD            3578                  1
                                                                          MMBS             6423                  18
                                                                        MMBS_Fast          6423                  13
                        Genes_Drugs             1608 × 49     0.042
                                                                         MBS_BD            5346                  1
                                                                         MBS_SD            3047                  1
                                                                          MMBS            72906                  42
                                                                        MMBS_Fast         61410                  31
                 NewsGroups_Mideast_Religion   2000 × 890     0.003
                                                                         MBS_BD           59781                  1
                                                                         MBS_SD           58713                  1
                                                                          MMBS            93368                  5
                                                                        MMBS_Fast         93368                  5
                     NewsGroups_AllPC          5000 × 2805   0.0001
                                                                         MBS_BD           89106                  1
                                                                         MBS_SD           74125                  1
Introduction    Problem Definition      Bandedness and Bi-Clustering   MMBS Algorithm    Experimental Results   Conclusion




                       early eyelid         1
                        opening
                  eyelids open at birth     2
                  abnormal timing of
                postnatal eyelid opening    3

                     abnormal eyelid
                                            4
                       morphology
                      abnormal eye          5
                       morphology
                  abnormal homeostasis      6

                abnormal ear physiology     7
                    abnormal hearing
                                            8
                       physiology
               abnormal brainstem audiotry 9
                    evokedpotential
                         deafness          10

                                                   50    100    150   200   250   300   350    400



                                                Genes_Phenotypes
Introduction   Problem Definition   Bandedness and Bi-Clustering    MMBS Algorithm     Experimental Results   Conclusion




                             1


                             2


                             3


                             4


                             5


                             6


                             7

                                   100   200    300    400   500    600   700   800     900



                                                Genes_Drugs
Introduction   Problem Definition   Bandedness and Bi-Clustering        MMBS Algorithm   Experimental Results   Conclusion




                           100


                           200


                           300


                           400


                           500


                           600


                           700


                           800

                                   10      20     30     40       50       60    70     80



                                   MideastReligion_SubjectLines
Introduction   Problem Definition   Bandedness and Bi-Clustering        MMBS Algorithm   Experimental Results   Conclusion




                           100

                           200

                           300

                           400

                           500

                           600

                           700

                           800

                           900

                          1000

                                   10     20     30     40        50      60     70     80



                                           AllPC_SubjectLines
Introduction   Problem Definition   Bandedness and Bi-Clustering   MMBS Algorithm   Experimental Results   Conclusion
Introduction                         Problem Definition           Bandedness and Bi-Clustering                                       MMBS Algorithm           Experimental Results     Conclusion



Performance
                                     4                                                                                      5
                                    10                                                                                 10




                                     3       MMBS_fast                                                                                   MMBS_fast
                                    10
                                             MMBS                                                                                        MMBS
                                                                                                                            4
               CPU Time (seconds)




                                                                                                  CPU Time (seconds)
                                             MBS                                                                       10                MBS


                                     2
                                    10


                                                                                                                            3
                                                                                                                       10
                                     1
                                    10




                                     0                                                                                      2
                                    10                                                                                 10
                                         0     20         40             60   80   100                                          0         20         40             60   80     100
                                                               epsilon                                                                                    epsilon


                                     5                                                                                  2
                                    10                                                                                 10
                                                                                                                                         MMBS_fast
                                                                                                                                         MMBS
                                                                                                                                         MBS
                                     4
                                    10
                                                                                                                        1
               CPU Time (seconds)




                                                                                           CPU Time (seconds)          10


                                     3
                                    10
                                              MMBS_fast
                                              MMBS
                                              MBS                                                                       0
                                                                                                                       10
                                     2
                                    10




                                     1                                                                                  −1
                                    10                                                                                 10
                                         0     20         40             60   80   100                                          0         20         40             60   80     100
                                                               epsilon                                                                                    epsilon
Introduction   Problem Definition   Bandedness and Bi-Clustering   MMBS Algorithm   Experimental Results   Conclusion



Conclusion


               Explored connection between bi-clustering and banded
               structures in matrices
               Banded sub-matrices correspond to paths in the bi-cluster
               lattice
               MMBS algorithm is based on this correspondence and
               ability to bound error
               Future work: More efficient search methodologies,
               stronger bounds on error
               Future work: Quantitative measures of bandedness,
               different types of bands desirable in different applications
Introduction   Problem Definition   Bandedness and Bi-Clustering   MMBS Algorithm   Experimental Results   Conclusion



Conclusion


               Explored connection between bi-clustering and banded
               structures in matrices
               Banded sub-matrices correspond to paths in the bi-cluster
               lattice
               MMBS algorithm is based on this correspondence and
               ability to bound error
               Future work: More efficient search methodologies,
               stronger bounds on error
               Future work: Quantitative measures of bandedness,
               different types of bands desirable in different applications
Introduction   Problem Definition   Bandedness and Bi-Clustering   MMBS Algorithm   Experimental Results   Conclusion



               B. Gamter and R. Wille.
               Formal Concept Analysis: Mathematical Foundations.
               Springer-Verlag, Berlin, 1999.
               G. C. Garriga, E. Junttila, and H. Mannila.
               Banded structure in binary matrices.
               In KDD ’08: Proceeding of the 14th ACM SIGKDD
               international conference on Knowledge discovery and data
               mining, pages 292–300, New York, NY, USA, 2008. ACM.
               R. B. H. Bian.
               An algorithm for lattice-structured subspace clustering.
               Proceedings of the SIAM International Conference on Data
               Mining, 2005.
               S. O. Kuznetsov and S. A. Obiedkov.
               Algorithms for the construction of concept lattices and their
               diagram graphs.
Introduction   Problem Definition   Bandedness and Bi-Clustering   MMBS Algorithm   Experimental Results   Conclusion


               In PKDD ’01: Proceedings of the 5th European Conference
               on Principles of Data Mining and Knowledge Discovery,
               pages 289–300, London, UK, 2001. Springer-Verlag.
               C. Lindig.
               Fast concept analysis.
               8th International Conference on Conceptual Structures,
               2000.
               H. Mannila and E. Terzi.
               Nestedness and segmented nestedness.
               In KDD ’07: Proceedings of the 13th ACM SIGKDD
               international conference on Knowledge discovery and data
               mining, pages 480–489, New York, NY, USA, 2007. ACM.
               C.-J. H. Mohammed J. Zaki.
               Efficient algorithms for mining closed itemsets and their
               lattice structure.
               IEEE Transactions on Knowledge and Data Engineering,
               17 (4), 2005.

More Related Content

What's hot

FUZZY SET THEORETIC APPROACH TO IMAGE THRESHOLDING
FUZZY SET THEORETIC APPROACH TO IMAGE THRESHOLDINGFUZZY SET THEORETIC APPROACH TO IMAGE THRESHOLDING
FUZZY SET THEORETIC APPROACH TO IMAGE THRESHOLDINGIJCSEA Journal
 
1.differential approach to cardioid distribution -1-6
1.differential approach to cardioid distribution -1-61.differential approach to cardioid distribution -1-6
1.differential approach to cardioid distribution -1-6Alexander Decker
 
11.0001www.iiste.org call for paper.differential approach to cardioid distrib...
11.0001www.iiste.org call for paper.differential approach to cardioid distrib...11.0001www.iiste.org call for paper.differential approach to cardioid distrib...
11.0001www.iiste.org call for paper.differential approach to cardioid distrib...Alexander Decker
 
A comprehensive survey of contemporary
A comprehensive survey of contemporaryA comprehensive survey of contemporary
A comprehensive survey of contemporaryprjpublications
 
Semantic Video Segmentation with Using Ensemble of Particular Classifiers and...
Semantic Video Segmentation with Using Ensemble of Particular Classifiers and...Semantic Video Segmentation with Using Ensemble of Particular Classifiers and...
Semantic Video Segmentation with Using Ensemble of Particular Classifiers and...ITIIIndustries
 
X trepan an extended trepan for
X trepan an extended trepan forX trepan an extended trepan for
X trepan an extended trepan forijaia
 
A BLIND ROBUST WATERMARKING SCHEME BASED ON SVD AND CIRCULANT MATRICES
A BLIND ROBUST WATERMARKING SCHEME BASED ON SVD AND CIRCULANT MATRICESA BLIND ROBUST WATERMARKING SCHEME BASED ON SVD AND CIRCULANT MATRICES
A BLIND ROBUST WATERMARKING SCHEME BASED ON SVD AND CIRCULANT MATRICEScsandit
 
Improving Machine Learning Approaches to Coreference Resolution
Improving Machine Learning Approaches to Coreference ResolutionImproving Machine Learning Approaches to Coreference Resolution
Improving Machine Learning Approaches to Coreference Resolutionbutest
 
LE03.doc
LE03.docLE03.doc
LE03.docbutest
 
Illustration Clamor Echelon Evaluation via Prime Piece Psychotherapy
Illustration Clamor Echelon Evaluation via Prime Piece PsychotherapyIllustration Clamor Echelon Evaluation via Prime Piece Psychotherapy
Illustration Clamor Echelon Evaluation via Prime Piece PsychotherapyIJMER
 
ICCV2009: MAP Inference in Discrete Models: Part 1: Introduction
ICCV2009: MAP Inference in Discrete Models: Part 1: IntroductionICCV2009: MAP Inference in Discrete Models: Part 1: Introduction
ICCV2009: MAP Inference in Discrete Models: Part 1: Introductionzukun
 
abstrakty přijatých příspěvků.doc
abstrakty přijatých příspěvků.docabstrakty přijatých příspěvků.doc
abstrakty přijatých příspěvků.docbutest
 
Truncated boolean matrices for dna
Truncated boolean matrices for dnaTruncated boolean matrices for dna
Truncated boolean matrices for dnaIJCSEA Journal
 
InternshipReport
InternshipReportInternshipReport
InternshipReportHamza Ameur
 
Object recognition with cortex like mechanisms pami-07
Object recognition with cortex like mechanisms pami-07Object recognition with cortex like mechanisms pami-07
Object recognition with cortex like mechanisms pami-07dingggthu
 
IJERD (www.ijerd.com) International Journal of Engineering Research and Devel...
IJERD (www.ijerd.com) International Journal of Engineering Research and Devel...IJERD (www.ijerd.com) International Journal of Engineering Research and Devel...
IJERD (www.ijerd.com) International Journal of Engineering Research and Devel...IJERD Editor
 
Gaussian Fuzzy Blocking Artifacts Removal of High DCT Compressed Images
Gaussian Fuzzy Blocking Artifacts Removal of High DCT Compressed ImagesGaussian Fuzzy Blocking Artifacts Removal of High DCT Compressed Images
Gaussian Fuzzy Blocking Artifacts Removal of High DCT Compressed Imagesijtsrd
 
Sensitivity Analysis of GRA Method for Interval Valued Intuitionistic Fuzzy M...
Sensitivity Analysis of GRA Method for Interval Valued Intuitionistic Fuzzy M...Sensitivity Analysis of GRA Method for Interval Valued Intuitionistic Fuzzy M...
Sensitivity Analysis of GRA Method for Interval Valued Intuitionistic Fuzzy M...ijsrd.com
 

What's hot (19)

FUZZY SET THEORETIC APPROACH TO IMAGE THRESHOLDING
FUZZY SET THEORETIC APPROACH TO IMAGE THRESHOLDINGFUZZY SET THEORETIC APPROACH TO IMAGE THRESHOLDING
FUZZY SET THEORETIC APPROACH TO IMAGE THRESHOLDING
 
1.differential approach to cardioid distribution -1-6
1.differential approach to cardioid distribution -1-61.differential approach to cardioid distribution -1-6
1.differential approach to cardioid distribution -1-6
 
11.0001www.iiste.org call for paper.differential approach to cardioid distrib...
11.0001www.iiste.org call for paper.differential approach to cardioid distrib...11.0001www.iiste.org call for paper.differential approach to cardioid distrib...
11.0001www.iiste.org call for paper.differential approach to cardioid distrib...
 
A comprehensive survey of contemporary
A comprehensive survey of contemporaryA comprehensive survey of contemporary
A comprehensive survey of contemporary
 
Semantic Video Segmentation with Using Ensemble of Particular Classifiers and...
Semantic Video Segmentation with Using Ensemble of Particular Classifiers and...Semantic Video Segmentation with Using Ensemble of Particular Classifiers and...
Semantic Video Segmentation with Using Ensemble of Particular Classifiers and...
 
X trepan an extended trepan for
X trepan an extended trepan forX trepan an extended trepan for
X trepan an extended trepan for
 
A BLIND ROBUST WATERMARKING SCHEME BASED ON SVD AND CIRCULANT MATRICES
A BLIND ROBUST WATERMARKING SCHEME BASED ON SVD AND CIRCULANT MATRICESA BLIND ROBUST WATERMARKING SCHEME BASED ON SVD AND CIRCULANT MATRICES
A BLIND ROBUST WATERMARKING SCHEME BASED ON SVD AND CIRCULANT MATRICES
 
Improving Machine Learning Approaches to Coreference Resolution
Improving Machine Learning Approaches to Coreference ResolutionImproving Machine Learning Approaches to Coreference Resolution
Improving Machine Learning Approaches to Coreference Resolution
 
LE03.doc
LE03.docLE03.doc
LE03.doc
 
Illustration Clamor Echelon Evaluation via Prime Piece Psychotherapy
Illustration Clamor Echelon Evaluation via Prime Piece PsychotherapyIllustration Clamor Echelon Evaluation via Prime Piece Psychotherapy
Illustration Clamor Echelon Evaluation via Prime Piece Psychotherapy
 
ICCV2009: MAP Inference in Discrete Models: Part 1: Introduction
ICCV2009: MAP Inference in Discrete Models: Part 1: IntroductionICCV2009: MAP Inference in Discrete Models: Part 1: Introduction
ICCV2009: MAP Inference in Discrete Models: Part 1: Introduction
 
abstrakty přijatých příspěvků.doc
abstrakty přijatých příspěvků.docabstrakty přijatých příspěvků.doc
abstrakty přijatých příspěvků.doc
 
Truncated boolean matrices for dna
Truncated boolean matrices for dnaTruncated boolean matrices for dna
Truncated boolean matrices for dna
 
InternshipReport
InternshipReportInternshipReport
InternshipReport
 
Em molnar2015
Em molnar2015Em molnar2015
Em molnar2015
 
Object recognition with cortex like mechanisms pami-07
Object recognition with cortex like mechanisms pami-07Object recognition with cortex like mechanisms pami-07
Object recognition with cortex like mechanisms pami-07
 
IJERD (www.ijerd.com) International Journal of Engineering Research and Devel...
IJERD (www.ijerd.com) International Journal of Engineering Research and Devel...IJERD (www.ijerd.com) International Journal of Engineering Research and Devel...
IJERD (www.ijerd.com) International Journal of Engineering Research and Devel...
 
Gaussian Fuzzy Blocking Artifacts Removal of High DCT Compressed Images
Gaussian Fuzzy Blocking Artifacts Removal of High DCT Compressed ImagesGaussian Fuzzy Blocking Artifacts Removal of High DCT Compressed Images
Gaussian Fuzzy Blocking Artifacts Removal of High DCT Compressed Images
 
Sensitivity Analysis of GRA Method for Interval Valued Intuitionistic Fuzzy M...
Sensitivity Analysis of GRA Method for Interval Valued Intuitionistic Fuzzy M...Sensitivity Analysis of GRA Method for Interval Valued Intuitionistic Fuzzy M...
Sensitivity Analysis of GRA Method for Interval Valued Intuitionistic Fuzzy M...
 

Similar to Mining Maximally Banded Matrices in Binary Data

Maxwell W Libbrecht - pomegranate: fast and flexible probabilistic modeling i...
Maxwell W Libbrecht - pomegranate: fast and flexible probabilistic modeling i...Maxwell W Libbrecht - pomegranate: fast and flexible probabilistic modeling i...
Maxwell W Libbrecht - pomegranate: fast and flexible probabilistic modeling i...PyData
 
Puzzle-Based Automatic Testing: Bringing Humans Into the Loop by Solving Puzz...
Puzzle-Based Automatic Testing: Bringing Humans Into the Loop by Solving Puzz...Puzzle-Based Automatic Testing: Bringing Humans Into the Loop by Solving Puzz...
Puzzle-Based Automatic Testing: Bringing Humans Into the Loop by Solving Puzz...Sung Kim
 
Distance-based bias in model-directed optimization of additively decomposable...
Distance-based bias in model-directed optimization of additively decomposable...Distance-based bias in model-directed optimization of additively decomposable...
Distance-based bias in model-directed optimization of additively decomposable...Martin Pelikan
 
Bayesian Co clustering
Bayesian Co clusteringBayesian Co clustering
Bayesian Co clusteringlau
 
Epsrcws08 campbell kbm_01
Epsrcws08 campbell kbm_01Epsrcws08 campbell kbm_01
Epsrcws08 campbell kbm_01Cheng Feng
 
Machine learning for_finance
Machine learning for_financeMachine learning for_finance
Machine learning for_financeStefan Duprey
 
OpenCL applications in genomics
OpenCL applications in genomicsOpenCL applications in genomics
OpenCL applications in genomicsUSC
 
Course module of DS
Course module of DSCourse module of DS
Course module of DSPCTE
 
Monte Carlo Statistical Methods
Monte Carlo Statistical MethodsMonte Carlo Statistical Methods
Monte Carlo Statistical MethodsChristian Robert
 
Multilabel Classification by BCH Code and Random Forests
Multilabel Classification by BCH Code and Random ForestsMultilabel Classification by BCH Code and Random Forests
Multilabel Classification by BCH Code and Random ForestsIDES Editor
 
Hedibert Lopes' talk at BigMC
Hedibert Lopes' talk at  BigMCHedibert Lopes' talk at  BigMC
Hedibert Lopes' talk at BigMCBigMC
 
Analyzing probabilistic models in hierarchical BOA on traps and spin glasses
Analyzing probabilistic models in hierarchical BOA on traps and spin glassesAnalyzing probabilistic models in hierarchical BOA on traps and spin glasses
Analyzing probabilistic models in hierarchical BOA on traps and spin glasseskknsastry
 
NBDT : Neural-backed Decision Tree 2021 ICLR
 NBDT : Neural-backed Decision Tree 2021 ICLR NBDT : Neural-backed Decision Tree 2021 ICLR
NBDT : Neural-backed Decision Tree 2021 ICLRtaeseon ryu
 
Open source GLMM tools: Concordia
Open source GLMM tools: ConcordiaOpen source GLMM tools: Concordia
Open source GLMM tools: ConcordiaBen Bolker
 
240318_Thuy_Labseminar[Fragment-based Pretraining and Finetuning on Molecular...
240318_Thuy_Labseminar[Fragment-based Pretraining and Finetuning on Molecular...240318_Thuy_Labseminar[Fragment-based Pretraining and Finetuning on Molecular...
240318_Thuy_Labseminar[Fragment-based Pretraining and Finetuning on Molecular...thanhdowork
 
Gecco 2011 - Effects of Topology on the diversity of spatially-structured evo...
Gecco 2011 - Effects of Topology on the diversity of spatially-structured evo...Gecco 2011 - Effects of Topology on the diversity of spatially-structured evo...
Gecco 2011 - Effects of Topology on the diversity of spatially-structured evo...matteodefelice
 
Substructrual surrogates for learning decomposable classification problems: i...
Substructrual surrogates for learning decomposable classification problems: i...Substructrual surrogates for learning decomposable classification problems: i...
Substructrual surrogates for learning decomposable classification problems: i...kknsastry
 
Topic model, LDA and all that
Topic model, LDA and all thatTopic model, LDA and all that
Topic model, LDA and all thatZhibo Xiao
 
Combinatorial Problems2
Combinatorial Problems2Combinatorial Problems2
Combinatorial Problems23ashmawy
 

Similar to Mining Maximally Banded Matrices in Binary Data (20)

EiB Seminar from Esteban Vegas, Ph.D.
EiB Seminar from Esteban Vegas, Ph.D. EiB Seminar from Esteban Vegas, Ph.D.
EiB Seminar from Esteban Vegas, Ph.D.
 
Maxwell W Libbrecht - pomegranate: fast and flexible probabilistic modeling i...
Maxwell W Libbrecht - pomegranate: fast and flexible probabilistic modeling i...Maxwell W Libbrecht - pomegranate: fast and flexible probabilistic modeling i...
Maxwell W Libbrecht - pomegranate: fast and flexible probabilistic modeling i...
 
Puzzle-Based Automatic Testing: Bringing Humans Into the Loop by Solving Puzz...
Puzzle-Based Automatic Testing: Bringing Humans Into the Loop by Solving Puzz...Puzzle-Based Automatic Testing: Bringing Humans Into the Loop by Solving Puzz...
Puzzle-Based Automatic Testing: Bringing Humans Into the Loop by Solving Puzz...
 
Distance-based bias in model-directed optimization of additively decomposable...
Distance-based bias in model-directed optimization of additively decomposable...Distance-based bias in model-directed optimization of additively decomposable...
Distance-based bias in model-directed optimization of additively decomposable...
 
Bayesian Co clustering
Bayesian Co clusteringBayesian Co clustering
Bayesian Co clustering
 
Epsrcws08 campbell kbm_01
Epsrcws08 campbell kbm_01Epsrcws08 campbell kbm_01
Epsrcws08 campbell kbm_01
 
Machine learning for_finance
Machine learning for_financeMachine learning for_finance
Machine learning for_finance
 
OpenCL applications in genomics
OpenCL applications in genomicsOpenCL applications in genomics
OpenCL applications in genomics
 
Course module of DS
Course module of DSCourse module of DS
Course module of DS
 
Monte Carlo Statistical Methods
Monte Carlo Statistical MethodsMonte Carlo Statistical Methods
Monte Carlo Statistical Methods
 
Multilabel Classification by BCH Code and Random Forests
Multilabel Classification by BCH Code and Random ForestsMultilabel Classification by BCH Code and Random Forests
Multilabel Classification by BCH Code and Random Forests
 
Hedibert Lopes' talk at BigMC
Hedibert Lopes' talk at  BigMCHedibert Lopes' talk at  BigMC
Hedibert Lopes' talk at BigMC
 
Analyzing probabilistic models in hierarchical BOA on traps and spin glasses
Analyzing probabilistic models in hierarchical BOA on traps and spin glassesAnalyzing probabilistic models in hierarchical BOA on traps and spin glasses
Analyzing probabilistic models in hierarchical BOA on traps and spin glasses
 
NBDT : Neural-backed Decision Tree 2021 ICLR
 NBDT : Neural-backed Decision Tree 2021 ICLR NBDT : Neural-backed Decision Tree 2021 ICLR
NBDT : Neural-backed Decision Tree 2021 ICLR
 
Open source GLMM tools: Concordia
Open source GLMM tools: ConcordiaOpen source GLMM tools: Concordia
Open source GLMM tools: Concordia
 
240318_Thuy_Labseminar[Fragment-based Pretraining and Finetuning on Molecular...
240318_Thuy_Labseminar[Fragment-based Pretraining and Finetuning on Molecular...240318_Thuy_Labseminar[Fragment-based Pretraining and Finetuning on Molecular...
240318_Thuy_Labseminar[Fragment-based Pretraining and Finetuning on Molecular...
 
Gecco 2011 - Effects of Topology on the diversity of spatially-structured evo...
Gecco 2011 - Effects of Topology on the diversity of spatially-structured evo...Gecco 2011 - Effects of Topology on the diversity of spatially-structured evo...
Gecco 2011 - Effects of Topology on the diversity of spatially-structured evo...
 
Substructrual surrogates for learning decomposable classification problems: i...
Substructrual surrogates for learning decomposable classification problems: i...Substructrual surrogates for learning decomposable classification problems: i...
Substructrual surrogates for learning decomposable classification problems: i...
 
Topic model, LDA and all that
Topic model, LDA and all thatTopic model, LDA and all that
Topic model, LDA and all that
 
Combinatorial Problems2
Combinatorial Problems2Combinatorial Problems2
Combinatorial Problems2
 

Recently uploaded

TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...apidays
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsNanddeep Nachan
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Jeffrey Haguewood
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Zilliz
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native ApplicationsWSO2
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024The Digital Insurer
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...Zilliz
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century educationjfdjdjcjdnsjd
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...apidays
 

Recently uploaded (20)

TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 

Mining Maximally Banded Matrices in Binary Data

  • 1. Introduction Problem Definition Bandedness and Bi-Clustering MMBS Algorithm Experimental Results Conclusion Mining Maximally Banded Matrices in Binary Data Faris Alqadah Raj Bhatnagar Anil Jegga University of Cincinnati Cincinnati Children’s Hospital
  • 2. Introduction Problem Definition Bandedness and Bi-Clustering MMBS Algorithm Experimental Results Conclusion Outline 1 Introduction Motivation 2 Problem Definition Preliminaries 3 Bandedness and Bi-Clustering Formal Concept Analysis Concept Lattice Paths 4 MMBS Algorithm Three Steps 5 Experimental Results Synthetic Data Real-World Data 6 Conclusion
  • 3. Introduction Problem Definition Bandedness and Bi-Clustering MMBS Algorithm Experimental Results Conclusion Outline 1 Introduction Motivation 2 Problem Definition Preliminaries 3 Bandedness and Bi-Clustering Formal Concept Analysis Concept Lattice Paths 4 MMBS Algorithm Three Steps 5 Experimental Results Synthetic Data Real-World Data 6 Conclusion
  • 4. Introduction Problem Definition Bandedness and Bi-Clustering MMBS Algorithm Experimental Results Conclusion Banded Matrices in Data Banded structures in binary matrices have A B C D E natural interpretations 1 1 1 1 0 0 2 0 1 1 0 0 Bioinformatics (overlapping 3 0 0 1 0 0 roles of genes) 4 0 0 1 1 0 Paleontology (patterns of 5 0 0 0 1 1 species in space) Social Networks (community structures)
  • 5. Introduction Problem Definition Bandedness and Bi-Clustering MMBS Algorithm Experimental Results Conclusion Motivating Example k-means multi-way EM bi-cluster subspace doc1 1 0 1 0 1 doc2 0 1 0 1 0 doc3 0 0 0 0 1 doc4 0 0 0 1 1 doc5 0 0 1 0 1
  • 6. Introduction Problem Definition Bandedness and Bi-Clustering MMBS Algorithm Experimental Results Conclusion Motivating Example k-means EM subspace bi-cluster multi-way doc1 1 1 1 0 0 doc5 0 1 1 0 0 doc3 0 0 1 0 0 doc4 0 0 1 1 0 doc2 0 0 0 1 1
  • 7. Introduction Problem Definition Bandedness and Bi-Clustering MMBS Algorithm Experimental Results Conclusion Bi-Clustering Problem Banded sub-matrices are a form of bi-clusters Bi-Clustering in binary data focuses on maximally rectangles full of (or almost full) of 1s
  • 8. Introduction Problem Definition Bandedness and Bi-Clustering MMBS Algorithm Experimental Results Conclusion Related Work Nestedness and segmented nestedness [6] MBS algorithm [2] Fix column permutations Solve the consecutive ones problem Only find a single band
  • 9. Introduction Problem Definition Bandedness and Bi-Clustering MMBS Algorithm Experimental Results Conclusion Contributions 1 Establish correspondence between banded structures and bi-clustering in binary data 2 Introduce the novel MMBS algorithm to uncover multiple, possibly overlapping banded sub-matrices 3 Empirical evaluation verifying advantage of MMBS over previous approaches
  • 10. Introduction Problem Definition Bandedness and Bi-Clustering MMBS Algorithm Experimental Results Conclusion Contributions 1 Establish correspondence between banded structures and bi-clustering in binary data 2 Introduce the novel MMBS algorithm to uncover multiple, possibly overlapping banded sub-matrices 3 Empirical evaluation verifying advantage of MMBS over previous approaches
  • 11. Introduction Problem Definition Bandedness and Bi-Clustering MMBS Algorithm Experimental Results Conclusion Contributions 1 Establish correspondence between banded structures and bi-clustering in binary data 2 Introduce the novel MMBS algorithm to uncover multiple, possibly overlapping banded sub-matrices 3 Empirical evaluation verifying advantage of MMBS over previous approaches
  • 12. Introduction Problem Definition Bandedness and Bi-Clustering MMBS Algorithm Experimental Results Conclusion Outline 1 Introduction Motivation 2 Problem Definition Preliminaries 3 Bandedness and Bi-Clustering Formal Concept Analysis Concept Lattice Paths 4 MMBS Algorithm Three Steps 5 Experimental Results Synthetic Data Real-World Data 6 Conclusion
  • 13. Introduction Problem Definition Bandedness and Bi-Clustering MMBS Algorithm Experimental Results Conclusion Basic Notation Matrix K with row labels G and column labels M Think of K as K = (G, M, I) π permutation of G and τ permutation of M Kπ τ g πi and mτj
  • 14. Introduction Problem Definition Bandedness and Bi-Clustering MMBS Algorithm Experimental Results Conclusion Basic Notation Matrix K with row labels G and column labels M Think of K as K = (G, M, I) π permutation of G and τ permutation of M Kπ τ g πi and mτj
  • 15. Introduction Problem Definition Bandedness and Bi-Clustering MMBS Algorithm Experimental Results Conclusion Fully Banded Matrix Definition A binary matrix K= (G, M, I) is fully banded if there exists a permutation π of G and permutation τ of M such that (1) for every row i in Kπ the entries with 1s occur in consecutive τ column indices {mi , mi + 1, . . . , mi⋆ } and (2) the values of starting indices for 1s in successive rows (i and i + 1) satisfy the conditions mi ≤ mi+1 and mi⋆ ≤ mi+1 . ⋆
  • 16. Introduction Problem Definition Bandedness and Bi-Clustering MMBS Algorithm Experimental Results Conclusion Relaxation of Fully Banded Real data has noise Subspaces may encompass banded structure e(Kπ ): number of 1s or 0s that must be flipped to achieve τ banded structure Maximal banded sub-matrix: no more rows or columns can be added while still preserving bandedness
  • 17. Introduction Problem Definition Bandedness and Bi-Clustering MMBS Algorithm Experimental Results Conclusion Relaxation of Fully Banded Real data has noise Subspaces may encompass banded structure e(Kπ ): number of 1s or 0s that must be flipped to achieve τ banded structure Maximal banded sub-matrix: no more rows or columns can be added while still preserving bandedness
  • 18. Introduction Problem Definition Bandedness and Bi-Clustering MMBS Algorithm Experimental Results Conclusion Problem Statement Given binary matrix K and noise threshold ǫ find all ˆ sub-matrices K of K that are ǫ-banded and maximal.
  • 19. Introduction Problem Definition Bandedness and Bi-Clustering MMBS Algorithm Experimental Results Conclusion Outline 1 Introduction Motivation 2 Problem Definition Preliminaries 3 Bandedness and Bi-Clustering Formal Concept Analysis Concept Lattice Paths 4 MMBS Algorithm Three Steps 5 Experimental Results Synthetic Data Real-World Data 6 Conclusion
  • 20. Introduction Problem Definition Bandedness and Bi-Clustering MMBS Algorithm Experimental Results Conclusion Bi-clustering Bi-clusters in binary data defined as Formal Concepts For A ⊆ G, then A′ = {m ∈ M|gIm for all g ∈ A}. B ⊆ M, we have B ′ = {g ∈ G|gImfor allm ∈ B} Formal Concept: C = (A, B) such that A′ = B and B ′ = A
  • 21. Introduction Problem Definition Bandedness and Bi-Clustering MMBS Algorithm Experimental Results Conclusion Bi-clustering Bi-clusters in binary data defined as Formal Concepts For A ⊆ G, then A′ = {m ∈ M|gIm for all g ∈ A}. B ⊆ M, we have B ′ = {g ∈ G|gImfor allm ∈ B} Formal Concept: C = (A, B) such that A′ = B and B ′ = A
  • 22. Introduction Problem Definition Bandedness and Bi-Clustering MMBS Algorithm Experimental Results Conclusion Formal Concepts m1 m2 m3 m4 g1 0 1 0 1 g2 0 0 1 1 g3 0 0 0 1 g4 1 0 0 0 g5 1 1 1 0 g7 1 1 0 0 g6 0 0 1 0 Maximal rectangles of 1s Maximal bicliques Bi-clusters may be ordered by the subset superset relationship and form a complete lattice B(G, M, I) denotes the concept or bi-cluster lattice
  • 23. Introduction Problem Definition Bandedness and Bi-Clustering MMBS Algorithm Experimental Results Conclusion Formal Concepts m1 m2 m3 m4 g1 0 1 0 1 g2 0 0 1 1 g3 0 0 0 1 g4 1 0 0 0 g5 1 1 1 0 g7 1 1 0 0 g6 0 0 1 0 Maximal rectangles of 1s Maximal bicliques Bi-clusters may be ordered by the subset superset relationship and form a complete lattice B(G, M, I) denotes the concept or bi-cluster lattice
  • 24. Introduction Problem Definition Bandedness and Bi-Clustering MMBS Algorithm Experimental Results Conclusion Splintering Bands Trivially a bi-cluster is fully banded
  • 25. Introduction Problem Definition Bandedness and Bi-Clustering MMBS Algorithm Experimental Results Conclusion Splintering Bands Trivially a bi-cluster is fully banded A B C D E 1 1 1 1 0 0 2 0 1 1 0 0 3 0 0 1 0 0 4 0 0 1 1 0 5 0 0 0 1 1
  • 26. Introduction Problem Definition Bandedness and Bi-Clustering MMBS Algorithm Experimental Results Conclusion Splintering Bands A B C D E 1 1 1 1 0 0 2 0 1 1 0 0 3 0 0 1 0 0 4 0 0 1 1 0 5 0 0 0 1 1 Intuitively, any fully banded matrix can be splintered exactly into maximal rectangles of 1s or bi-clusters
  • 27. Introduction Problem Definition Bandedness and Bi-Clustering MMBS Algorithm Experimental Results Conclusion Ordering Splintered Bands Let Kπ be fully banded τ Γ(g) is a mapping from row g to the bi-clusters g appears in The union of all Γ(g) can always be ordered n-tuple of bi-clusters {C1 , . . . , Cn } having total ordering {<π1 ,τ1 , . . . , <πn ,τn } Define lexicographical order <π,τ on C1 × C2 × · · · × Cn . Considering {C1 , . . . , Cn } in order completely specifies the permutations π and τ
  • 28. Introduction Problem Definition Bandedness and Bi-Clustering MMBS Algorithm Experimental Results Conclusion Ordering Splintered Bands Let Kπ be fully banded τ Γ(g) is a mapping from row g to the bi-clusters g appears in The union of all Γ(g) can always be ordered n-tuple of bi-clusters {C1 , . . . , Cn } having total ordering {<π1 ,τ1 , . . . , <πn ,τn } Define lexicographical order <π,τ on C1 × C2 × · · · × Cn . Considering {C1 , . . . , Cn } in order completely specifies the permutations π and τ
  • 29. Introduction Problem Definition Bandedness and Bi-Clustering MMBS Algorithm Experimental Results Conclusion Ordering Splintered Bands Let Kπ be fully banded τ Γ(g) is a mapping from row g to the bi-clusters g appears in The union of all Γ(g) can always be ordered n-tuple of bi-clusters {C1 , . . . , Cn } having total ordering {<π1 ,τ1 , . . . , <πn ,τn } Define lexicographical order <π,τ on C1 × C2 × · · · × Cn . Considering {C1 , . . . , Cn } in order completely specifies the permutations π and τ
  • 30. Introduction Problem Definition Bandedness and Bi-Clustering MMBS Algorithm Experimental Results Conclusion Ordering Splintered Bands Let Kπ be fully banded τ Γ(g) is a mapping from row g to the bi-clusters g appears in The union of all Γ(g) can always be ordered n-tuple of bi-clusters {C1 , . . . , Cn } having total ordering {<π1 ,τ1 , . . . , <πn ,τn } Define lexicographical order <π,τ on C1 × C2 × · · · × Cn . Considering {C1 , . . . , Cn } in order completely specifies the permutations π and τ
  • 31. Introduction Problem Definition Bandedness and Bi-Clustering MMBS Algorithm Experimental Results Conclusion Bands as Sequences of Concepts Proposition Given a context K, if permutations π and τ exist such that Kπ is τ fully banded then there exists a sequence of bi-clusters C1 = (A1 , B1 ), . . . , Cn = (An , Bn ) s.t. π = A1 , A2 A1 , . . . , An An−1 τ = B1 B2 , . . . , Bn−1 Bn , Bn
  • 32. Introduction Problem Definition Bandedness and Bi-Clustering MMBS Algorithm Experimental Results Conclusion An Example A B C D E 1 1 1 1 0 0 2 0 1 1 0 0 3 0 0 1 0 0 4 0 0 1 1 0 5 0 0 0 1 1 g Γ(g) 1 (1, ABC), (12, BC), (1234, C) 2 (12, BC), (1234, C) 3 (1234, C) 4 (4, CD), (45, D) 5 (5, DE ), (45, D) F(Kπ ) τ (1, ABC) < (12, BC) < (1234, C) < (4, CD) < (45, D) < (5, DE ) π = 1, 12 1, . . . , 5 45 = {1, 2, 3, 4, 5} τ = ABC BC, . . . , D DE , DE = {A, B, C, D, E }
  • 33. Introduction Problem Definition Bandedness and Bi-Clustering MMBS Algorithm Experimental Results Conclusion Outline 1 Introduction Motivation 2 Problem Definition Preliminaries 3 Bandedness and Bi-Clustering Formal Concept Analysis Concept Lattice Paths 4 MMBS Algorithm Three Steps 5 Experimental Results Synthetic Data Real-World Data 6 Conclusion
  • 34. Introduction Problem Definition Bandedness and Bi-Clustering MMBS Algorithm Experimental Results Conclusion Paths in the lattice Represent B(G, M, I) as G = (V , E ) Edge set define as: C1 , C2 ∈ E ↔ C1 ≺ C2 ∨ C2 ≺ C1 Concept lattice order enforces: Ai+1 ⊆ Ai and Bi ⊆ Bi+1 if Ci ≺ Ci+1 Dual: Ai ⊆ Ai+1 and Bi+1 ⊆ Bi if Ci ≻ Ci+1
  • 35. Introduction Problem Definition Bandedness and Bi-Clustering MMBS Algorithm Experimental Results Conclusion Paths in the lattice Represent B(G, M, I) as G = (V , E ) Edge set define as: C1 , C2 ∈ E ↔ C1 ≺ C2 ∨ C2 ≺ C1 Concept lattice order enforces: Ai+1 ⊆ Ai and Bi ⊆ Bi+1 if Ci ≺ Ci+1 Dual: Ai ⊆ Ai+1 and Bi+1 ⊆ Bi if Ci ≻ Ci+1
  • 36. Introduction Problem Definition Bandedness and Bi-Clustering MMBS Algorithm Experimental Results Conclusion Construct Partial Bands Via Paths s 1,2,3,4,5 C s 1,2,3,4 Ds B,C 4,5 s 1,2 C,D D,E s s 4 5 A,B,C s 1 A,B,C,D,E s
  • 37. Introduction Problem Definition Bandedness and Bi-Clustering MMBS Algorithm Experimental Results Conclusion Bound on the error Key Fact Each individual edge in a path P is guaranteed to produce a banded structure
  • 38. Introduction Problem Definition Bandedness and Bi-Clustering MMBS Algorithm Experimental Results Conclusion Bound on the error Proposition   0  if n ≤ 1  ′  e(P n−1 ) +   |a ∩ B| if Cn+1 ≻ Cn e(Pn ) ≤ ˆ a∈A |b ′ ∩ A|  n−1 ) + if Cn+1 ≺ Cn  e(P     ˆ b∈B
  • 39. Introduction Problem Definition Bandedness and Bi-Clustering MMBS Algorithm Experimental Results Conclusion Outline 1 Introduction Motivation 2 Problem Definition Preliminaries 3 Bandedness and Bi-Clustering Formal Concept Analysis Concept Lattice Paths 4 MMBS Algorithm Three Steps 5 Experimental Results Synthetic Data Real-World Data 6 Conclusion
  • 40. Introduction Problem Definition Bandedness and Bi-Clustering MMBS Algorithm Experimental Results Conclusion Overview Weigh edges of concept lattice with upper bound of error Bad news: weights change depending on path Good news: Error is monotonic along a path, so pruning with backtracking works! Three steps: 1 Compute G 2 Search paths of G 3 Determine top bands
  • 41. Introduction Problem Definition Bandedness and Bi-Clustering MMBS Algorithm Experimental Results Conclusion Overview Weigh edges of concept lattice with upper bound of error Bad news: weights change depending on path Good news: Error is monotonic along a path, so pruning with backtracking works! Three steps: 1 Compute G 2 Search paths of G 3 Determine top bands
  • 42. Introduction Problem Definition Bandedness and Bi-Clustering MMBS Algorithm Experimental Results Conclusion Compute G Many existing algorithms [1, 5, 3, 4, 7] Incremental vs. non-incremental Assume availability of G
  • 43. Introduction Problem Definition Bandedness and Bi-Clustering MMBS Algorithm Experimental Results Conclusion Search Paths Potentially exponential number of paths Any bi-cluster is a valid starting point...but initiate with upper neighbors of null-element At each edge add concept to path utilizing previous procedure Utilize backtracking, mark previously visited edges
  • 44. Introduction Problem Definition Bandedness and Bi-Clustering MMBS Algorithm Experimental Results Conclusion Search Paths Potentially exponential number of paths Any bi-cluster is a valid starting point...but initiate with upper neighbors of null-element At each edge add concept to path utilizing previous procedure Utilize backtracking, mark previously visited edges
  • 45. Introduction Problem Definition Bandedness and Bi-Clustering MMBS Algorithm Experimental Results Conclusion Top Bands Allow user to specify : minRows, minCols, maxOvlp Quality measure: q(P) = |r (P)| ∗ |c(P)| − w ∗ e(P) If two bands exceed maxOvlp select the higher quality one
  • 46. Introduction Problem Definition Bandedness and Bi-Clustering MMBS Algorithm Experimental Results Conclusion Analysis and Improvements Running time: O(|U| × |E | × max{X , Y }|) |U| : size of initial concepts X , Y : largest symmetric difference between neighboring concepts Speed up by reducing size of |U| Perform simple clustering of U based on maxOvlp parameter Good experimental results with this speed up.
  • 47. Introduction Problem Definition Bandedness and Bi-Clustering MMBS Algorithm Experimental Results Conclusion Analysis and Improvements Running time: O(|U| × |E | × max{X , Y }|) |U| : size of initial concepts X , Y : largest symmetric difference between neighboring concepts Speed up by reducing size of |U| Perform simple clustering of U based on maxOvlp parameter Good experimental results with this speed up.
  • 48. Introduction Problem Definition Bandedness and Bi-Clustering MMBS Algorithm Experimental Results Conclusion Outline 1 Introduction Motivation 2 Problem Definition Preliminaries 3 Bandedness and Bi-Clustering Formal Concept Analysis Concept Lattice Paths 4 MMBS Algorithm Three Steps 5 Experimental Results Synthetic Data Real-World Data 6 Conclusion
  • 49. Introduction Problem Definition Bandedness and Bi-Clustering MMBS Algorithm Experimental Results Conclusion Setup Single band and segmented bands planted in synthetic data All experiments: w =1 maxOvlp = 0.1 minRows = 5 minCols = 5 ǫ = 99
  • 50. Introduction Problem Definition Bandedness and Bi-Clustering MMBS Algorithm Experimental Results Conclusion Results 50 20 100 40 150 60 200 80 250 100 300 120 350 140 400 160 450 180 500 200 50 100 150 200 250 300 350 400 450 500 20 40 60 80 100 120 140 160 180 200 Planted Bands 50 100 150 200 250 300 50 100 150 200 250 300
  • 51. Introduction Problem Definition Bandedness and Bi-Clustering MMBS Algorithm Experimental Results Conclusion Results Dataset name Dataset Size p Num. Planted bands Algorithm Quality top ranked Num. bands mined MMBS 3590 6 MMBS_Fast 3406 4 SynBand100_001 100 × 100 0.01 1 MBS_BD 2507 1 MBS_SD 438 1 MMBS 2278 9 MMBS_Fast 1503 8 SynBand100_005 100 × 100 0.05 1 MBS_BD 1050 1 MBS_SD 1201 1 MMBS 8918 7 MMBS_Fast 8261 6 SynBand500_001 500 × 500 0.01 1 MBS_BD 2822 1 MBS_SD 2145 1 MMBS 3367 2 MMBS_Fast 3367 2 SynMultiBand100_001 100 × 100 0.01 2 MBS 4101 1 MBS_SD 4045 1 MMBS 4054 2 MMBS_Fast 3933 2 SynMultiBand100_001 100 × 100 0.05 2 MBS_BD 3910 1 MBS_SD 3736 1 MMBS 28242 8 MMBS_Fast 21346 5 SynMultiBand500_001 500 × 500 0.01 2 MBS_BD 17498 1 MBS_SD 430 1 MMBS 3311 17 MMBS_Fast 3220 14 SynRandom100_005 100 × 100 0.05 unknown MBS_BD 2801 1 MBS_SD 1949 1 MMBS 18635 73 MMBS_Fast 16163 64 SynRandom500_001 500 × 500 0.01 unknown MBS_BD 16771 1 MBS_SD 5229 1
  • 52. Introduction Problem Definition Bandedness and Bi-Clustering MMBS Algorithm Experimental Results Conclusion Outline 1 Introduction Motivation 2 Problem Definition Preliminaries 3 Bandedness and Bi-Clustering Formal Concept Analysis Concept Lattice Paths 4 MMBS Algorithm Three Steps 5 Experimental Results Synthetic Data Real-World Data 6 Conclusion
  • 53. Introduction Problem Definition Bandedness and Bi-Clustering MMBS Algorithm Experimental Results Conclusion Dataset Size Sparsity Algorithm Quality top ranked Num. bands mined MMBS 6665 56 MMBS_Fast 6665 43 Genes_Phenotypes 1910 × 3965 0.008 MBS_BD 5204 1 MBS_SD 3578 1 MMBS 6423 18 MMBS_Fast 6423 13 Genes_Drugs 1608 × 49 0.042 MBS_BD 5346 1 MBS_SD 3047 1 MMBS 72906 42 MMBS_Fast 61410 31 NewsGroups_Mideast_Religion 2000 × 890 0.003 MBS_BD 59781 1 MBS_SD 58713 1 MMBS 93368 5 MMBS_Fast 93368 5 NewsGroups_AllPC 5000 × 2805 0.0001 MBS_BD 89106 1 MBS_SD 74125 1
  • 54. Introduction Problem Definition Bandedness and Bi-Clustering MMBS Algorithm Experimental Results Conclusion early eyelid 1 opening eyelids open at birth 2 abnormal timing of postnatal eyelid opening 3 abnormal eyelid 4 morphology abnormal eye 5 morphology abnormal homeostasis 6 abnormal ear physiology 7 abnormal hearing 8 physiology abnormal brainstem audiotry 9 evokedpotential deafness 10 50 100 150 200 250 300 350 400 Genes_Phenotypes
  • 55. Introduction Problem Definition Bandedness and Bi-Clustering MMBS Algorithm Experimental Results Conclusion 1 2 3 4 5 6 7 100 200 300 400 500 600 700 800 900 Genes_Drugs
  • 56. Introduction Problem Definition Bandedness and Bi-Clustering MMBS Algorithm Experimental Results Conclusion 100 200 300 400 500 600 700 800 10 20 30 40 50 60 70 80 MideastReligion_SubjectLines
  • 57. Introduction Problem Definition Bandedness and Bi-Clustering MMBS Algorithm Experimental Results Conclusion 100 200 300 400 500 600 700 800 900 1000 10 20 30 40 50 60 70 80 AllPC_SubjectLines
  • 58. Introduction Problem Definition Bandedness and Bi-Clustering MMBS Algorithm Experimental Results Conclusion
  • 59. Introduction Problem Definition Bandedness and Bi-Clustering MMBS Algorithm Experimental Results Conclusion Performance 4 5 10 10 3 MMBS_fast MMBS_fast 10 MMBS MMBS 4 CPU Time (seconds) CPU Time (seconds) MBS 10 MBS 2 10 3 10 1 10 0 2 10 10 0 20 40 60 80 100 0 20 40 60 80 100 epsilon epsilon 5 2 10 10 MMBS_fast MMBS MBS 4 10 1 CPU Time (seconds) CPU Time (seconds) 10 3 10 MMBS_fast MMBS MBS 0 10 2 10 1 −1 10 10 0 20 40 60 80 100 0 20 40 60 80 100 epsilon epsilon
  • 60. Introduction Problem Definition Bandedness and Bi-Clustering MMBS Algorithm Experimental Results Conclusion Conclusion Explored connection between bi-clustering and banded structures in matrices Banded sub-matrices correspond to paths in the bi-cluster lattice MMBS algorithm is based on this correspondence and ability to bound error Future work: More efficient search methodologies, stronger bounds on error Future work: Quantitative measures of bandedness, different types of bands desirable in different applications
  • 61. Introduction Problem Definition Bandedness and Bi-Clustering MMBS Algorithm Experimental Results Conclusion Conclusion Explored connection between bi-clustering and banded structures in matrices Banded sub-matrices correspond to paths in the bi-cluster lattice MMBS algorithm is based on this correspondence and ability to bound error Future work: More efficient search methodologies, stronger bounds on error Future work: Quantitative measures of bandedness, different types of bands desirable in different applications
  • 62. Introduction Problem Definition Bandedness and Bi-Clustering MMBS Algorithm Experimental Results Conclusion B. Gamter and R. Wille. Formal Concept Analysis: Mathematical Foundations. Springer-Verlag, Berlin, 1999. G. C. Garriga, E. Junttila, and H. Mannila. Banded structure in binary matrices. In KDD ’08: Proceeding of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 292–300, New York, NY, USA, 2008. ACM. R. B. H. Bian. An algorithm for lattice-structured subspace clustering. Proceedings of the SIAM International Conference on Data Mining, 2005. S. O. Kuznetsov and S. A. Obiedkov. Algorithms for the construction of concept lattices and their diagram graphs.
  • 63. Introduction Problem Definition Bandedness and Bi-Clustering MMBS Algorithm Experimental Results Conclusion In PKDD ’01: Proceedings of the 5th European Conference on Principles of Data Mining and Knowledge Discovery, pages 289–300, London, UK, 2001. Springer-Verlag. C. Lindig. Fast concept analysis. 8th International Conference on Conceptual Structures, 2000. H. Mannila and E. Terzi. Nestedness and segmented nestedness. In KDD ’07: Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 480–489, New York, NY, USA, 2007. ACM. C.-J. H. Mohammed J. Zaki. Efficient algorithms for mining closed itemsets and their lattice structure. IEEE Transactions on Knowledge and Data Engineering, 17 (4), 2005.