This document presents an algorithm called MMBS (Mining Maximally Banded Submatrices) for uncovering multiple, possibly overlapping banded submatrices in binary data. It establishes a correspondence between banded structures and bi-clustering, and introduces an approach using formal concept analysis and concept lattice paths. The MMBS algorithm finds banded submatrices in binary data in three steps. Experimental results on synthetic and real-world data demonstrate the advantages of MMBS over previous approaches.
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Mining Maximally Banded Matrices in Binary Data
1. Introduction Problem Definition Bandedness and Bi-Clustering MMBS Algorithm Experimental Results Conclusion
Mining Maximally Banded Matrices in Binary
Data
Faris Alqadah
Raj Bhatnagar
Anil Jegga
University of Cincinnati
Cincinnati Children’s Hospital
2. Introduction Problem Definition Bandedness and Bi-Clustering MMBS Algorithm Experimental Results Conclusion
Outline
1 Introduction
Motivation
2 Problem Definition
Preliminaries
3 Bandedness and Bi-Clustering
Formal Concept Analysis
Concept Lattice Paths
4 MMBS Algorithm
Three Steps
5 Experimental Results
Synthetic Data
Real-World Data
6 Conclusion
3. Introduction Problem Definition Bandedness and Bi-Clustering MMBS Algorithm Experimental Results Conclusion
Outline
1 Introduction
Motivation
2 Problem Definition
Preliminaries
3 Bandedness and Bi-Clustering
Formal Concept Analysis
Concept Lattice Paths
4 MMBS Algorithm
Three Steps
5 Experimental Results
Synthetic Data
Real-World Data
6 Conclusion
4. Introduction Problem Definition Bandedness and Bi-Clustering MMBS Algorithm Experimental Results Conclusion
Banded Matrices in Data
Banded structures in
binary matrices have
A B C D E
natural interpretations
1 1 1 1 0 0
2 0 1 1 0 0 Bioinformatics (overlapping
3 0 0 1 0 0 roles of genes)
4 0 0 1 1 0 Paleontology (patterns of
5 0 0 0 1 1 species in space)
Social Networks
(community structures)
7. Introduction Problem Definition Bandedness and Bi-Clustering MMBS Algorithm Experimental Results Conclusion
Bi-Clustering Problem
Banded sub-matrices are a form of bi-clusters
Bi-Clustering in binary data focuses on maximally
rectangles full of (or almost full) of 1s
8. Introduction Problem Definition Bandedness and Bi-Clustering MMBS Algorithm Experimental Results Conclusion
Related Work
Nestedness and segmented nestedness [6]
MBS algorithm [2]
Fix column permutations
Solve the consecutive ones problem
Only find a single band
9. Introduction Problem Definition Bandedness and Bi-Clustering MMBS Algorithm Experimental Results Conclusion
Contributions
1 Establish correspondence between banded structures and
bi-clustering in binary data
2 Introduce the novel MMBS algorithm to uncover multiple,
possibly overlapping banded sub-matrices
3 Empirical evaluation verifying advantage of MMBS over
previous approaches
10. Introduction Problem Definition Bandedness and Bi-Clustering MMBS Algorithm Experimental Results Conclusion
Contributions
1 Establish correspondence between banded structures and
bi-clustering in binary data
2 Introduce the novel MMBS algorithm to uncover multiple,
possibly overlapping banded sub-matrices
3 Empirical evaluation verifying advantage of MMBS over
previous approaches
11. Introduction Problem Definition Bandedness and Bi-Clustering MMBS Algorithm Experimental Results Conclusion
Contributions
1 Establish correspondence between banded structures and
bi-clustering in binary data
2 Introduce the novel MMBS algorithm to uncover multiple,
possibly overlapping banded sub-matrices
3 Empirical evaluation verifying advantage of MMBS over
previous approaches
12. Introduction Problem Definition Bandedness and Bi-Clustering MMBS Algorithm Experimental Results Conclusion
Outline
1 Introduction
Motivation
2 Problem Definition
Preliminaries
3 Bandedness and Bi-Clustering
Formal Concept Analysis
Concept Lattice Paths
4 MMBS Algorithm
Three Steps
5 Experimental Results
Synthetic Data
Real-World Data
6 Conclusion
13. Introduction Problem Definition Bandedness and Bi-Clustering MMBS Algorithm Experimental Results Conclusion
Basic Notation
Matrix K with row labels G and column labels M
Think of K as K = (G, M, I)
π permutation of G and τ permutation of M
Kπ
τ
g πi and mτj
14. Introduction Problem Definition Bandedness and Bi-Clustering MMBS Algorithm Experimental Results Conclusion
Basic Notation
Matrix K with row labels G and column labels M
Think of K as K = (G, M, I)
π permutation of G and τ permutation of M
Kπ
τ
g πi and mτj
15. Introduction Problem Definition Bandedness and Bi-Clustering MMBS Algorithm Experimental Results Conclusion
Fully Banded Matrix
Definition
A binary matrix K= (G, M, I) is fully banded if there exists a
permutation π of G and permutation τ of M such that (1) for
every row i in Kπ the entries with 1s occur in consecutive
τ
column indices {mi , mi + 1, . . . , mi⋆ } and (2) the values of
starting indices for 1s in successive rows (i and i + 1) satisfy
the conditions mi ≤ mi+1 and mi⋆ ≤ mi+1 . ⋆
16. Introduction Problem Definition Bandedness and Bi-Clustering MMBS Algorithm Experimental Results Conclusion
Relaxation of Fully Banded
Real data has noise
Subspaces may encompass banded structure
e(Kπ ): number of 1s or 0s that must be flipped to achieve
τ
banded structure
Maximal banded sub-matrix: no more rows or columns can
be added while still preserving bandedness
17. Introduction Problem Definition Bandedness and Bi-Clustering MMBS Algorithm Experimental Results Conclusion
Relaxation of Fully Banded
Real data has noise
Subspaces may encompass banded structure
e(Kπ ): number of 1s or 0s that must be flipped to achieve
τ
banded structure
Maximal banded sub-matrix: no more rows or columns can
be added while still preserving bandedness
18. Introduction Problem Definition Bandedness and Bi-Clustering MMBS Algorithm Experimental Results Conclusion
Problem Statement
Given binary matrix K and noise threshold ǫ find all
ˆ
sub-matrices K of K that are ǫ-banded and maximal.
19. Introduction Problem Definition Bandedness and Bi-Clustering MMBS Algorithm Experimental Results Conclusion
Outline
1 Introduction
Motivation
2 Problem Definition
Preliminaries
3 Bandedness and Bi-Clustering
Formal Concept Analysis
Concept Lattice Paths
4 MMBS Algorithm
Three Steps
5 Experimental Results
Synthetic Data
Real-World Data
6 Conclusion
20. Introduction Problem Definition Bandedness and Bi-Clustering MMBS Algorithm Experimental Results Conclusion
Bi-clustering
Bi-clusters in binary data defined as Formal Concepts
For A ⊆ G, then A′ = {m ∈ M|gIm for all g ∈ A}.
B ⊆ M, we have B ′ = {g ∈ G|gImfor allm ∈ B}
Formal Concept: C = (A, B) such that A′ = B and B ′ = A
21. Introduction Problem Definition Bandedness and Bi-Clustering MMBS Algorithm Experimental Results Conclusion
Bi-clustering
Bi-clusters in binary data defined as Formal Concepts
For A ⊆ G, then A′ = {m ∈ M|gIm for all g ∈ A}.
B ⊆ M, we have B ′ = {g ∈ G|gImfor allm ∈ B}
Formal Concept: C = (A, B) such that A′ = B and B ′ = A
22. Introduction Problem Definition Bandedness and Bi-Clustering MMBS Algorithm Experimental Results Conclusion
Formal Concepts
m1 m2 m3 m4
g1 0 1 0 1
g2 0 0 1 1
g3 0 0 0 1
g4 1 0 0 0
g5 1 1 1 0
g7 1 1 0 0
g6 0 0 1 0
Maximal rectangles of 1s
Maximal bicliques
Bi-clusters may be ordered by the subset superset
relationship and form a complete lattice
B(G, M, I) denotes the concept or bi-cluster lattice
23. Introduction Problem Definition Bandedness and Bi-Clustering MMBS Algorithm Experimental Results Conclusion
Formal Concepts
m1 m2 m3 m4
g1 0 1 0 1
g2 0 0 1 1
g3 0 0 0 1
g4 1 0 0 0
g5 1 1 1 0
g7 1 1 0 0
g6 0 0 1 0
Maximal rectangles of 1s
Maximal bicliques
Bi-clusters may be ordered by the subset superset
relationship and form a complete lattice
B(G, M, I) denotes the concept or bi-cluster lattice
24. Introduction Problem Definition Bandedness and Bi-Clustering MMBS Algorithm Experimental Results Conclusion
Splintering Bands
Trivially a bi-cluster is fully banded
25. Introduction Problem Definition Bandedness and Bi-Clustering MMBS Algorithm Experimental Results Conclusion
Splintering Bands
Trivially a bi-cluster is fully banded
A B C D E
1 1 1 1 0 0
2 0 1 1 0 0
3 0 0 1 0 0
4 0 0 1 1 0
5 0 0 0 1 1
26. Introduction Problem Definition Bandedness and Bi-Clustering MMBS Algorithm Experimental Results Conclusion
Splintering Bands
A B C D E
1 1 1 1 0 0
2 0 1 1 0 0
3 0 0 1 0 0
4 0 0 1 1 0
5 0 0 0 1 1
Intuitively, any fully banded matrix can be splintered exactly into
maximal rectangles of 1s or bi-clusters
27. Introduction Problem Definition Bandedness and Bi-Clustering MMBS Algorithm Experimental Results Conclusion
Ordering Splintered Bands
Let Kπ be fully banded
τ
Γ(g) is a mapping from row g to the bi-clusters g appears
in
The union of all Γ(g) can always be ordered
n-tuple of bi-clusters {C1 , . . . , Cn } having total ordering
{<π1 ,τ1 , . . . , <πn ,τn }
Define lexicographical order <π,τ on C1 × C2 × · · · × Cn .
Considering {C1 , . . . , Cn } in order completely specifies the
permutations π and τ
28. Introduction Problem Definition Bandedness and Bi-Clustering MMBS Algorithm Experimental Results Conclusion
Ordering Splintered Bands
Let Kπ be fully banded
τ
Γ(g) is a mapping from row g to the bi-clusters g appears
in
The union of all Γ(g) can always be ordered
n-tuple of bi-clusters {C1 , . . . , Cn } having total ordering
{<π1 ,τ1 , . . . , <πn ,τn }
Define lexicographical order <π,τ on C1 × C2 × · · · × Cn .
Considering {C1 , . . . , Cn } in order completely specifies the
permutations π and τ
29. Introduction Problem Definition Bandedness and Bi-Clustering MMBS Algorithm Experimental Results Conclusion
Ordering Splintered Bands
Let Kπ be fully banded
τ
Γ(g) is a mapping from row g to the bi-clusters g appears
in
The union of all Γ(g) can always be ordered
n-tuple of bi-clusters {C1 , . . . , Cn } having total ordering
{<π1 ,τ1 , . . . , <πn ,τn }
Define lexicographical order <π,τ on C1 × C2 × · · · × Cn .
Considering {C1 , . . . , Cn } in order completely specifies the
permutations π and τ
30. Introduction Problem Definition Bandedness and Bi-Clustering MMBS Algorithm Experimental Results Conclusion
Ordering Splintered Bands
Let Kπ be fully banded
τ
Γ(g) is a mapping from row g to the bi-clusters g appears
in
The union of all Γ(g) can always be ordered
n-tuple of bi-clusters {C1 , . . . , Cn } having total ordering
{<π1 ,τ1 , . . . , <πn ,τn }
Define lexicographical order <π,τ on C1 × C2 × · · · × Cn .
Considering {C1 , . . . , Cn } in order completely specifies the
permutations π and τ
31. Introduction Problem Definition Bandedness and Bi-Clustering MMBS Algorithm Experimental Results Conclusion
Bands as Sequences of Concepts
Proposition
Given a context K, if permutations π and τ exist such that Kπ is
τ
fully banded then there exists a sequence of bi-clusters
C1 = (A1 , B1 ), . . . , Cn = (An , Bn ) s.t.
π = A1 , A2 A1 , . . . , An An−1
τ = B1 B2 , . . . , Bn−1 Bn , Bn
32. Introduction Problem Definition Bandedness and Bi-Clustering MMBS Algorithm Experimental Results Conclusion
An Example
A B C D E
1 1 1 1 0 0
2 0 1 1 0 0
3 0 0 1 0 0
4 0 0 1 1 0
5 0 0 0 1 1
g Γ(g)
1 (1, ABC), (12, BC), (1234, C)
2 (12, BC), (1234, C)
3 (1234, C)
4 (4, CD), (45, D)
5 (5, DE ), (45, D)
F(Kπ )
τ
(1, ABC) < (12, BC) < (1234, C) < (4, CD) < (45, D) < (5, DE )
π = 1, 12 1, . . . , 5 45
= {1, 2, 3, 4, 5}
τ = ABC BC, . . . , D DE , DE
= {A, B, C, D, E }
33. Introduction Problem Definition Bandedness and Bi-Clustering MMBS Algorithm Experimental Results Conclusion
Outline
1 Introduction
Motivation
2 Problem Definition
Preliminaries
3 Bandedness and Bi-Clustering
Formal Concept Analysis
Concept Lattice Paths
4 MMBS Algorithm
Three Steps
5 Experimental Results
Synthetic Data
Real-World Data
6 Conclusion
34. Introduction Problem Definition Bandedness and Bi-Clustering MMBS Algorithm Experimental Results Conclusion
Paths in the lattice
Represent B(G, M, I) as G = (V , E )
Edge set define as: C1 , C2 ∈ E ↔ C1 ≺ C2 ∨ C2 ≺ C1
Concept lattice order enforces: Ai+1 ⊆ Ai and Bi ⊆ Bi+1 if
Ci ≺ Ci+1
Dual: Ai ⊆ Ai+1 and Bi+1 ⊆ Bi if Ci ≻ Ci+1
35. Introduction Problem Definition Bandedness and Bi-Clustering MMBS Algorithm Experimental Results Conclusion
Paths in the lattice
Represent B(G, M, I) as G = (V , E )
Edge set define as: C1 , C2 ∈ E ↔ C1 ≺ C2 ∨ C2 ≺ C1
Concept lattice order enforces: Ai+1 ⊆ Ai and Bi ⊆ Bi+1 if
Ci ≺ Ci+1
Dual: Ai ⊆ Ai+1 and Bi+1 ⊆ Bi if Ci ≻ Ci+1
36. Introduction Problem Definition Bandedness and Bi-Clustering MMBS Algorithm Experimental Results Conclusion
Construct Partial Bands Via Paths
s
1,2,3,4,5
C
s
1,2,3,4 Ds
B,C 4,5
s
1,2 C,D D,E
s s
4 5
A,B,C
s
1
A,B,C,D,E
s
37. Introduction Problem Definition Bandedness and Bi-Clustering MMBS Algorithm Experimental Results Conclusion
Bound on the error
Key Fact
Each individual edge in a path P is guaranteed to produce a
banded structure
38. Introduction Problem Definition Bandedness and Bi-Clustering MMBS Algorithm Experimental Results Conclusion
Bound on the error
Proposition
0
if n ≤ 1
′
e(P n−1 ) +
|a ∩ B| if Cn+1 ≻ Cn
e(Pn ) ≤ ˆ
a∈A
|b ′ ∩ A|
n−1 ) + if Cn+1 ≺ Cn
e(P
ˆ
b∈B
39. Introduction Problem Definition Bandedness and Bi-Clustering MMBS Algorithm Experimental Results Conclusion
Outline
1 Introduction
Motivation
2 Problem Definition
Preliminaries
3 Bandedness and Bi-Clustering
Formal Concept Analysis
Concept Lattice Paths
4 MMBS Algorithm
Three Steps
5 Experimental Results
Synthetic Data
Real-World Data
6 Conclusion
40. Introduction Problem Definition Bandedness and Bi-Clustering MMBS Algorithm Experimental Results Conclusion
Overview
Weigh edges of concept lattice with upper bound of error
Bad news: weights change depending on path
Good news: Error is monotonic along a path, so pruning
with backtracking works!
Three steps:
1 Compute G
2 Search paths of G
3 Determine top bands
41. Introduction Problem Definition Bandedness and Bi-Clustering MMBS Algorithm Experimental Results Conclusion
Overview
Weigh edges of concept lattice with upper bound of error
Bad news: weights change depending on path
Good news: Error is monotonic along a path, so pruning
with backtracking works!
Three steps:
1 Compute G
2 Search paths of G
3 Determine top bands
42. Introduction Problem Definition Bandedness and Bi-Clustering MMBS Algorithm Experimental Results Conclusion
Compute G
Many existing algorithms [1, 5, 3, 4, 7]
Incremental vs. non-incremental
Assume availability of G
43. Introduction Problem Definition Bandedness and Bi-Clustering MMBS Algorithm Experimental Results Conclusion
Search Paths
Potentially exponential number of paths
Any bi-cluster is a valid starting point...but initiate with
upper neighbors of null-element
At each edge add concept to path utilizing previous
procedure
Utilize backtracking, mark previously visited edges
44. Introduction Problem Definition Bandedness and Bi-Clustering MMBS Algorithm Experimental Results Conclusion
Search Paths
Potentially exponential number of paths
Any bi-cluster is a valid starting point...but initiate with
upper neighbors of null-element
At each edge add concept to path utilizing previous
procedure
Utilize backtracking, mark previously visited edges
45. Introduction Problem Definition Bandedness and Bi-Clustering MMBS Algorithm Experimental Results Conclusion
Top Bands
Allow user to specify : minRows, minCols, maxOvlp
Quality measure: q(P) = |r (P)| ∗ |c(P)| − w ∗ e(P)
If two bands exceed maxOvlp select the higher quality one
46. Introduction Problem Definition Bandedness and Bi-Clustering MMBS Algorithm Experimental Results Conclusion
Analysis and Improvements
Running time: O(|U| × |E | × max{X , Y }|)
|U| : size of initial concepts
X , Y : largest symmetric difference between neighboring
concepts
Speed up by reducing size of |U|
Perform simple clustering of U based on maxOvlp
parameter
Good experimental results with this speed up.
47. Introduction Problem Definition Bandedness and Bi-Clustering MMBS Algorithm Experimental Results Conclusion
Analysis and Improvements
Running time: O(|U| × |E | × max{X , Y }|)
|U| : size of initial concepts
X , Y : largest symmetric difference between neighboring
concepts
Speed up by reducing size of |U|
Perform simple clustering of U based on maxOvlp
parameter
Good experimental results with this speed up.
48. Introduction Problem Definition Bandedness and Bi-Clustering MMBS Algorithm Experimental Results Conclusion
Outline
1 Introduction
Motivation
2 Problem Definition
Preliminaries
3 Bandedness and Bi-Clustering
Formal Concept Analysis
Concept Lattice Paths
4 MMBS Algorithm
Three Steps
5 Experimental Results
Synthetic Data
Real-World Data
6 Conclusion
49. Introduction Problem Definition Bandedness and Bi-Clustering MMBS Algorithm Experimental Results Conclusion
Setup
Single band and segmented bands planted in synthetic
data
All experiments:
w =1
maxOvlp = 0.1
minRows = 5
minCols = 5
ǫ = 99
60. Introduction Problem Definition Bandedness and Bi-Clustering MMBS Algorithm Experimental Results Conclusion
Conclusion
Explored connection between bi-clustering and banded
structures in matrices
Banded sub-matrices correspond to paths in the bi-cluster
lattice
MMBS algorithm is based on this correspondence and
ability to bound error
Future work: More efficient search methodologies,
stronger bounds on error
Future work: Quantitative measures of bandedness,
different types of bands desirable in different applications
61. Introduction Problem Definition Bandedness and Bi-Clustering MMBS Algorithm Experimental Results Conclusion
Conclusion
Explored connection between bi-clustering and banded
structures in matrices
Banded sub-matrices correspond to paths in the bi-cluster
lattice
MMBS algorithm is based on this correspondence and
ability to bound error
Future work: More efficient search methodologies,
stronger bounds on error
Future work: Quantitative measures of bandedness,
different types of bands desirable in different applications
62. Introduction Problem Definition Bandedness and Bi-Clustering MMBS Algorithm Experimental Results Conclusion
B. Gamter and R. Wille.
Formal Concept Analysis: Mathematical Foundations.
Springer-Verlag, Berlin, 1999.
G. C. Garriga, E. Junttila, and H. Mannila.
Banded structure in binary matrices.
In KDD ’08: Proceeding of the 14th ACM SIGKDD
international conference on Knowledge discovery and data
mining, pages 292–300, New York, NY, USA, 2008. ACM.
R. B. H. Bian.
An algorithm for lattice-structured subspace clustering.
Proceedings of the SIAM International Conference on Data
Mining, 2005.
S. O. Kuznetsov and S. A. Obiedkov.
Algorithms for the construction of concept lattices and their
diagram graphs.
63. Introduction Problem Definition Bandedness and Bi-Clustering MMBS Algorithm Experimental Results Conclusion
In PKDD ’01: Proceedings of the 5th European Conference
on Principles of Data Mining and Knowledge Discovery,
pages 289–300, London, UK, 2001. Springer-Verlag.
C. Lindig.
Fast concept analysis.
8th International Conference on Conceptual Structures,
2000.
H. Mannila and E. Terzi.
Nestedness and segmented nestedness.
In KDD ’07: Proceedings of the 13th ACM SIGKDD
international conference on Knowledge discovery and data
mining, pages 480–489, New York, NY, USA, 2007. ACM.
C.-J. H. Mohammed J. Zaki.
Efficient algorithms for mining closed itemsets and their
lattice structure.
IEEE Transactions on Knowledge and Data Engineering,
17 (4), 2005.